SlideShare a Scribd company logo
Executive Intro to BigQuery
William M. Cohee
November 2017
Prepared using Apache OpenOffice 4.1.4
Presenter Bio
● 15+ years of Wall Street Technology
experience
● Expertise in front-office Fixed Income
Systems, Analytics, Pricing, Instrument,
& Entity Reference Data Management
● BA, Computer Science
● MS, Information Systems Engineering
● Certified Bloomberg Specialist
● Currently in the Chief Data Office
@ HSBC
● www.linkedin.com/in/billcohee
Topic
● Gaining insights from data becoming a strategic imperative
● Amazon, Microsoft, IBM, and Google aggressively competing for
your data and your $$
● Google's offering is the 'Google Cloud Platform' (GCP)
● Google considers BigQuery to be a game changing differentiator
● BigQuery is disruptive
➢ super scalable & super fast - runs at 'Google Speed'
➢ inexpensive – no charge to store. Pay for what you query
➢ easy to use – No MapReduce programming. Works with SQL
● BigQuery lets you and your data teams focus on analyzing Big Data
Agenda
● What is BigQuery [slides 5-7]
● How does BigQuery work [slides 8-10]
● BigQuery in Action: query from an R script [slides 11-14]
● Recap & where to learn more [slides 15-16]
BigQuery – What is it?
● In BigQuery, Google is monetizing core technology assets used
& perfected internally for over a decade
● A Cloud-powered, massively paralleled, interactive query service
● Provides extremely fast performance on extremely large datasets
● Can search billion row datasets in seconds using basic SQL
● BigQuery unleashes the processing power of Google on your Big Data
BigQuery – What is it?
● Serverless, Fully Managed
➢ no provisioning, no capacity planning. No need for a DBA or an SA
➢ takes the costly Big Data support burden off of your IT organization
● Unlimited Storage – Google makes it cheap & easy to put data into it
● REST API – Google makes it easy to integrate and securely access
your data
➢ interfaces for R, Python, C#/.NET, Java, Google Sheets, Excel, etc
➢ integrate popular Visualization tools such as Tableau, Qlik, & Looker
BigQuery – What is it?
● Data is automatically encrypted, distributed, & replicated in Google
Data Centers
➢ state-of-the-art Biometric access controls
● No indexes – every query is a Full Scan!
● With BigQuery, Full Scans are actually intended & inexpensive
● Google is able to achieve ludicrous performance by leveraging two
core technical innovations it relies on every day to run itself...
BigQuery – How does it work?
● The primary technologies behind BigQuery
➢ Colossus
➢ Dremel
● Colossus is Google's next generation Distributed File System
➢ lets Google treat a cluster of computers as one big virtual disk
➢ horizontally scalable with inexpensive, commodity hardware
➢ sequential I/O bottlenecks are avoided using parallelism
➢ seeks/read operations are optimized – makes searching very fast
➢ trivia: Colossus was the name of the World War II super-computer
that helped Cryptologists crack the German Enigma
BigQuery – How does it work?
● Dremel is an interactive query engine designed for interrogating
'web-scale' datasets using standard SQL syntax
➢ 'web-scale' pre-dates the term 'Big Data' – that's how mature
Dremel technology is
● Dremel uses Columnar Storage
➢ table data stored by columns instead of by rows
➢ optimal for reading large amounts of data
➢ affords better compression and more efficient use of disk
➢ drastically reduces read operations and memory required for
queries – only the columns requested need to be scanned
BigQuery – How does it work?
● Dremel employs a query tree to dispatch the fetching of data
➢ exploits Google's Cloud-infrastructure to execute queries
➢ the work of data retrieval is distributed with massive parallelism
➢ results are aggregated with blazing speed
➢ the tree architecture works very well on tabular data stored as
columns across a massively, parallel distributed file system
● The unique combination of Colossus, Dremel, and Google's Cloud
are what make awesome BigQuery performance possible
BigQuery in Action
● As a simple demonstration of BigQuery's power, we will run a small
R script to query the free, public Wikipedia dataset
● This sample dataset contains 313,797,035 rows
● Our query will look at all 2016 revisions to Wikipedia pages to find
the ID, title, and comment for all edits made to pages with 'HSBC' in
the title
BigQuery in Action [ lines 1-8 ]
BigQuery in Action [ lines 9-22 ]
BigQuery in Action [ script output ]
BigQuery scanned 313+ million rows and returned 5,936
results in less than 2.5 seconds
Recap & Resources
● BigQuery exposes the power of Colossus, Dremel, and Google's
massive computational infrastructure to everyone outside of Google
● BigQuery is super fast, cost-effective, & easy to use for analyzing
Big Data using standard SQL. An open REST API makes it
accessible from popular Data Science and Visualization platforms.
● True NoOps model lets teams stay focused on Big Data Analytics
● In the Amazon-Microsoft-IBM-Google cloud wars, Google may have
a big edge with BigQuery...
Recap & Resources
● Where to learn more...
● BigQuery homepage: https://cloud.google.com/bigquery/
● Getting Started with GCP: https://cloud.google.com/getting-started
● Jordan Tigani's BigQuery 101 presentation: https://youtu.be/kKBnFsNWwYM
● BigQuery Whitepaper: https://goo.gl/PBfXX6
● Hadley Wickham's bigrquery R package: https://goo.gl/EXQ3jd
● Coursera course: https://goo.gl/YDxjEv
● BigQuery Best Practices: https://goo.gl/iLYvkE
● HSBC Group CIO Darryl West presenting at Google Next Cloud:
https://youtu.be/esqArNcxaao
● Google Data Center Security: https://goo.gl/UVD2jW

More Related Content

PDF
Executive Intro to R
PDF
Big Data - Analytics with R
PDF
Introduction to basic data analytics tools
PPTX
Big data bi-mature-oanyc summit
PDF
Open source analytics
PPTX
Big data-science-oanyc
PDF
Tracking data lineage at Stitch Fix
PPTX
No sql and sql - open analytics summit
Executive Intro to R
Big Data - Analytics with R
Introduction to basic data analytics tools
Big data bi-mature-oanyc summit
Open source analytics
Big data-science-oanyc
Tracking data lineage at Stitch Fix
No sql and sql - open analytics summit

What's hot (20)

PPTX
Revolution Analytics: a 5-minute history
PDF
Big Data Analysis Starts with R
PDF
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
PPTX
Top 10 Data analytics tools to look for in 2021
PPTX
How the growth of R helps data-driven organizations succeed
PPTX
Big Data - Part III
PPTX
Big Data - Part I
PDF
Microsoft R Server for Data Sciencea
PPTX
Big Data - Part II
PPTX
The Business Economics and Opportunity of Open Source Data Science
PPTX
Big Data - Part IV
PDF
Batter Up! Advanced Sports Analytics with R and Storm
PPTX
The evolution of DBaaS - israelcloudsummit
PDF
Slide 2 collecting, storing and analyzing big data
PPTX
American Century (Revolution Analytics Customer Day)
PPTX
Applications of R (DataWeek 2014)
ODP
Graphing Your Data
PPTX
How Linked Data Can Speed Information Discovery
PDF
Applications in R - Success and Lessons Learned from the Marketplace
PPTX
R at Microsoft (useR! 2016)
Revolution Analytics: a 5-minute history
Big Data Analysis Starts with R
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Top 10 Data analytics tools to look for in 2021
How the growth of R helps data-driven organizations succeed
Big Data - Part III
Big Data - Part I
Microsoft R Server for Data Sciencea
Big Data - Part II
The Business Economics and Opportunity of Open Source Data Science
Big Data - Part IV
Batter Up! Advanced Sports Analytics with R and Storm
The evolution of DBaaS - israelcloudsummit
Slide 2 collecting, storing and analyzing big data
American Century (Revolution Analytics Customer Day)
Applications of R (DataWeek 2014)
Graphing Your Data
How Linked Data Can Speed Information Discovery
Applications in R - Success and Lessons Learned from the Marketplace
R at Microsoft (useR! 2016)
Ad

Similar to Executive Intro to BigQuery (20)

PDF
Exploring BigData with Google BigQuery
PDF
Google BigQuery is the future of Analytics! (Google Developer Conference)
PDF
Big query
PPTX
bigquery.pptx
PDF
An overview of BigQuery
PPTX
Introduction to Google BigQuery
PDF
Google BigQuery - Features & Benefits
PDF
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
PDF
Big Query - Women Techmarkers (Ukraine - March 2014)
PPTX
Google Developer Group - Cloud Singapore BigQuery Webinar
PDF
Google Dremel. Concept and Implementations.
PDF
Big Query Basics
PDF
Big query the first step - (MOSG)
PPTX
BigQuery_Architecture_Componaaaents.pptx
PDF
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
PPTX
BigQuery for the Big Data win
PPTX
GOOGLE BIG QUERY
PDF
Google BigQuery for Everyday Developer
PDF
BigQuery for Beginners
ODP
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Exploring BigData with Google BigQuery
Google BigQuery is the future of Analytics! (Google Developer Conference)
Big query
bigquery.pptx
An overview of BigQuery
Introduction to Google BigQuery
Google BigQuery - Features & Benefits
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
Big Query - Women Techmarkers (Ukraine - March 2014)
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Dremel. Concept and Implementations.
Big Query Basics
Big query the first step - (MOSG)
BigQuery_Architecture_Componaaaents.pptx
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
BigQuery for the Big Data win
GOOGLE BIG QUERY
Google BigQuery for Everyday Developer
BigQuery for Beginners
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Ad

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
.pdf is not working space design for the following data for the following dat...
Business Ppt On Nestle.pptx huunnnhhgfvu
IBA_Chapter_11_Slides_Final_Accessible.pptx
Fluorescence-microscope_Botany_detailed content
Introduction-to-Cloud-ComputingFinal.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
ISS -ESG Data flows What is ESG and HowHow
SAP 2 completion done . PRESENTATION.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Miokarditis (Inflamasi pada Otot Jantung)
oil_refinery_comprehensive_20250804084928 (1).pptx
Qualitative Qantitative and Mixed Methods.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx

Executive Intro to BigQuery

  • 1. Executive Intro to BigQuery William M. Cohee November 2017 Prepared using Apache OpenOffice 4.1.4
  • 2. Presenter Bio ● 15+ years of Wall Street Technology experience ● Expertise in front-office Fixed Income Systems, Analytics, Pricing, Instrument, & Entity Reference Data Management ● BA, Computer Science ● MS, Information Systems Engineering ● Certified Bloomberg Specialist ● Currently in the Chief Data Office @ HSBC ● www.linkedin.com/in/billcohee
  • 3. Topic ● Gaining insights from data becoming a strategic imperative ● Amazon, Microsoft, IBM, and Google aggressively competing for your data and your $$ ● Google's offering is the 'Google Cloud Platform' (GCP) ● Google considers BigQuery to be a game changing differentiator ● BigQuery is disruptive ➢ super scalable & super fast - runs at 'Google Speed' ➢ inexpensive – no charge to store. Pay for what you query ➢ easy to use – No MapReduce programming. Works with SQL ● BigQuery lets you and your data teams focus on analyzing Big Data
  • 4. Agenda ● What is BigQuery [slides 5-7] ● How does BigQuery work [slides 8-10] ● BigQuery in Action: query from an R script [slides 11-14] ● Recap & where to learn more [slides 15-16]
  • 5. BigQuery – What is it? ● In BigQuery, Google is monetizing core technology assets used & perfected internally for over a decade ● A Cloud-powered, massively paralleled, interactive query service ● Provides extremely fast performance on extremely large datasets ● Can search billion row datasets in seconds using basic SQL ● BigQuery unleashes the processing power of Google on your Big Data
  • 6. BigQuery – What is it? ● Serverless, Fully Managed ➢ no provisioning, no capacity planning. No need for a DBA or an SA ➢ takes the costly Big Data support burden off of your IT organization ● Unlimited Storage – Google makes it cheap & easy to put data into it ● REST API – Google makes it easy to integrate and securely access your data ➢ interfaces for R, Python, C#/.NET, Java, Google Sheets, Excel, etc ➢ integrate popular Visualization tools such as Tableau, Qlik, & Looker
  • 7. BigQuery – What is it? ● Data is automatically encrypted, distributed, & replicated in Google Data Centers ➢ state-of-the-art Biometric access controls ● No indexes – every query is a Full Scan! ● With BigQuery, Full Scans are actually intended & inexpensive ● Google is able to achieve ludicrous performance by leveraging two core technical innovations it relies on every day to run itself...
  • 8. BigQuery – How does it work? ● The primary technologies behind BigQuery ➢ Colossus ➢ Dremel ● Colossus is Google's next generation Distributed File System ➢ lets Google treat a cluster of computers as one big virtual disk ➢ horizontally scalable with inexpensive, commodity hardware ➢ sequential I/O bottlenecks are avoided using parallelism ➢ seeks/read operations are optimized – makes searching very fast ➢ trivia: Colossus was the name of the World War II super-computer that helped Cryptologists crack the German Enigma
  • 9. BigQuery – How does it work? ● Dremel is an interactive query engine designed for interrogating 'web-scale' datasets using standard SQL syntax ➢ 'web-scale' pre-dates the term 'Big Data' – that's how mature Dremel technology is ● Dremel uses Columnar Storage ➢ table data stored by columns instead of by rows ➢ optimal for reading large amounts of data ➢ affords better compression and more efficient use of disk ➢ drastically reduces read operations and memory required for queries – only the columns requested need to be scanned
  • 10. BigQuery – How does it work? ● Dremel employs a query tree to dispatch the fetching of data ➢ exploits Google's Cloud-infrastructure to execute queries ➢ the work of data retrieval is distributed with massive parallelism ➢ results are aggregated with blazing speed ➢ the tree architecture works very well on tabular data stored as columns across a massively, parallel distributed file system ● The unique combination of Colossus, Dremel, and Google's Cloud are what make awesome BigQuery performance possible
  • 11. BigQuery in Action ● As a simple demonstration of BigQuery's power, we will run a small R script to query the free, public Wikipedia dataset ● This sample dataset contains 313,797,035 rows ● Our query will look at all 2016 revisions to Wikipedia pages to find the ID, title, and comment for all edits made to pages with 'HSBC' in the title
  • 12. BigQuery in Action [ lines 1-8 ]
  • 13. BigQuery in Action [ lines 9-22 ]
  • 14. BigQuery in Action [ script output ] BigQuery scanned 313+ million rows and returned 5,936 results in less than 2.5 seconds
  • 15. Recap & Resources ● BigQuery exposes the power of Colossus, Dremel, and Google's massive computational infrastructure to everyone outside of Google ● BigQuery is super fast, cost-effective, & easy to use for analyzing Big Data using standard SQL. An open REST API makes it accessible from popular Data Science and Visualization platforms. ● True NoOps model lets teams stay focused on Big Data Analytics ● In the Amazon-Microsoft-IBM-Google cloud wars, Google may have a big edge with BigQuery...
  • 16. Recap & Resources ● Where to learn more... ● BigQuery homepage: https://cloud.google.com/bigquery/ ● Getting Started with GCP: https://cloud.google.com/getting-started ● Jordan Tigani's BigQuery 101 presentation: https://youtu.be/kKBnFsNWwYM ● BigQuery Whitepaper: https://goo.gl/PBfXX6 ● Hadley Wickham's bigrquery R package: https://goo.gl/EXQ3jd ● Coursera course: https://goo.gl/YDxjEv ● BigQuery Best Practices: https://goo.gl/iLYvkE ● HSBC Group CIO Darryl West presenting at Google Next Cloud: https://youtu.be/esqArNcxaao ● Google Data Center Security: https://goo.gl/UVD2jW