SlideShare a Scribd company logo
Sponsors
Silver Sponsors
Strategic Sponsor
Media Partners
Google BigQuery
Wlodek Bielski
cloud.developerdays.pl@DeveloperDaysPL
About me
• In BI / Analytics space since 2005
• In Clouds since 2014
• Google Professional Data Architect
• Google Professional Data Engineer
• MCSD: Azure Solutions Architect
• MCSE: Cloud Platform and Infrastructure
• MCSE: Data Management and Analytics
• AWS Certified Solution Architect - Associate
cloud.developerdays.pl@DeveloperDaysPL
Agenda
• GCP overview
• BigQuery overview
• BigQuery internals
• BigQuery use cases
• Session on ML with GCP: 15:00-16:00
cloud.developerdays.pl@DeveloperDaysPL
Google Cloud Platform
Brief overview
cloud.developerdays.pl@DeveloperDaysPL
Gartner MQ for Cloud IaaS, 2018
„Google has been most differentiated
on the forward edge of IT, with deep
investments in analytics and ML, and many
customers who choose Google for strategic
adoption have applications
that are anchored by BigQuery”
cloud.developerdays.pl@DeveloperDaysPL
Google Cloud Platform
cloud.developerdays.pl@DeveloperDaysPL
Open-source innovations
cloud.developerdays.pl@DeveloperDaysPL
BigQuery
Brief overview
cloud.developerdays.pl@DeveloperDaysPL
What is BigQuery?
• Fully managed, No-Ops analytics data warehouse
• Highly parallel / distributed processing model
• Only pay for actual storage and compute used
• Virtually unlimited storage and compute resources
• Runs on Google infrastructure (US, EU, Asia)
• Multi-tenant architecture
cloud.developerdays.pl@DeveloperDaysPL
Nested and repeated schema
cloud.developerdays.pl@DeveloperDaysPL
BigQuery timeline
cloud.developerdays.pl@DeveloperDaysPL
Demo
GCP console and BigQuery
cloud.developerdays.pl@DeveloperDaysPL
Query example
cloud.developerdays.pl@DeveloperDaysPL
BigQuery internals
cloud.developerdays.pl@DeveloperDaysPL
BigQuery under the hood
cloud.developerdays.pl@DeveloperDaysPL
Storage subsystem: Colossus
• Successor to GFS (GFS: 2003, Colossus: 2010)
• While GFS – batch, Colossus – real time
• Powering most of Google internal services (Gmail, YouTube, GCS)
• Reliable and fault-tolerant (Reed-Solomon)
• Supports fast table scans (no indexes in BigQuery!)
https://cloud.google.com/files/storage_architecture_and_challenges.pdf
cloud.developerdays.pl@DeveloperDaysPL
Storage format: Capacitor
• Replaced earlier ColumnIO (base for Parquet and ORC)
• Columnar storage format for Colossus
• Once imported, all input formats encoded into Capacitor
• Able to operate directly on compressed data
• Maintains rich metadata on datasets – used by Dremel
https://cloud.google.com/files/storage_architecture_and_challenges.pdf
cloud.developerdays.pl@DeveloperDaysPL
cloud.developerdays.pl@DeveloperDaysPL
Networking: Jupyter
• Google petabit network
• Google internal design
• Separation of storage and compute
cloud.developerdays.pl@DeveloperDaysPL
Execution engine: Dremel
• Scalable, interactive ad-hoc query system for analysis of read-only nested data
• Dynamic processing trees, in-memory shuffle component
• Dynamic query execution, aided by metadata
• No configs/knobs exposed to end user
• When idle, running Batch Ingests (for free!) - Poseidon
https://ai.google/research/pubs/pub36632
cloud.developerdays.pl@DeveloperDaysPL
Compute: Borg
• Container-oriented cluster-management system
• Precursor to Kubernetes
• E.g. Allocs = Pods
https://ai.google/research/pubs/pub43438
https://kubernetes.io/blog/2015/04/borg-predecessor-to-kubernetes/
cloud.developerdays.pl@DeveloperDaysPL
Overall BigQuery architecture
cloud.developerdays.pl@DeveloperDaysPL
Business cases
cloud.developerdays.pl@DeveloperDaysPL
BigQuery use cases
• Analyzing online marketing
• Google Analytics 360
• Firebase
• AdWords
• Doubleclick
• Hadoop-like processing
• BigQuery + DataFlow
• Data lake for ML
• Archive for cold data
(long-term storage pricing)
cloud.developerdays.pl@DeveloperDaysPL
Typical e-commerce integration
cloud.developerdays.pl@DeveloperDaysPL
Demo
BigQuery with Cloud Datalab
cloud.developerdays.pl@DeveloperDaysPL
Thank you!
cloud.developerdays.pl
@DeveloperDaysPL
Sponsors
Silver Sponsors
Strategic Sponsor
Media Partners

More Related Content

PDF
Big Data and ML on Google Cloud
PDF
An overview of BigQuery
PDF
Bigquery 101
PPTX
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
PDF
Quick Intro to Google Cloud Technologies
PDF
Google Bigtable
PPTX
BigQuery for the Big Data win
PDF
StackEngine Demo - Docker Austin
Big Data and ML on Google Cloud
An overview of BigQuery
Bigquery 101
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Quick Intro to Google Cloud Technologies
Google Bigtable
BigQuery for the Big Data win
StackEngine Demo - Docker Austin

What's hot (20)

PDF
Google BigQuery Best Practices
PDF
Google cloud big data summit master gcp big data summit la - 10-20-2015
PDF
Google App Engine 7 9-14
PDF
Google Cloud Next 2021 Recap
PDF
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
PPTX
Lets Talk Google BigQuery
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
PDF
Google Cloud Platform at Vente-Exclusive.com
PDF
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
PPTX
SnapLogic Live: AWS Integration
PDF
Big Data Driven At Eway
PPTX
Leader in Cloud and Object Storage for Service Providers
PDF
Big Query Basics
PPTX
SnapLogic Live: IoT Integration
PDF
Google BigQuery - Features & Benefits
PDF
Google Cloud Platform Introduction - 2016Q3
PPTX
Wally MacDermid presents Scality Connect for Microsoft Azure at Microsoft Ign...
PDF
End To End Business Intelligence On Google Cloud
PDF
code lab live Google Cloud Endpoints [DevFest 2015 Bari]
Google BigQuery Best Practices
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google App Engine 7 9-14
Google Cloud Next 2021 Recap
Un orquestador en la nube: Azure Data Factory (por Carlos Sacristán)
Lets Talk Google BigQuery
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform at Vente-Exclusive.com
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
SnapLogic Live: AWS Integration
Big Data Driven At Eway
Leader in Cloud and Object Storage for Service Providers
Big Query Basics
SnapLogic Live: IoT Integration
Google BigQuery - Features & Benefits
Google Cloud Platform Introduction - 2016Q3
Wally MacDermid presents Scality Connect for Microsoft Azure at Microsoft Ign...
End To End Business Intelligence On Google Cloud
code lab live Google Cloud Endpoints [DevFest 2015 Bari]
Ad

Similar to Cloud Developer Days - BigQuery (20)

PDF
Google Cloud - Stand Out Features
PDF
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
PDF
GCSJ Session 4.pdf
PDF
Workshop on Google Cloud Data Platform
PPTX
Eric Andersen Keynote
PPTX
Google Cloud Platform: Prototype ->Production-> Planet scale
PDF
Modern Thinking área digital MSKM 21/09/2017
PPTX
BigQuery_Architecture_Componaaaents.pptx
PDF
Exploring BigData with Google BigQuery
PDF
The Evolving Landscape of Data Engineering
PDF
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
PDF
Google BigQuery for Everyday Developer
PDF
Machine Learning with GCP
PDF
How Google Does Big Data - DevNexus 2014
PDF
Data Platform on GCP
PPTX
Dbs302 driving a realtime personalization engine with cloud bigtable
PDF
Critical Breakthroughs and Challenges in Big Data and Analytics
PDF
Google Cloud Platform for Data Science teams
PPTX
GDSC Cloud Jam.pptx
PDF
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Google Cloud - Stand Out Features
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
GCSJ Session 4.pdf
Workshop on Google Cloud Data Platform
Eric Andersen Keynote
Google Cloud Platform: Prototype ->Production-> Planet scale
Modern Thinking área digital MSKM 21/09/2017
BigQuery_Architecture_Componaaaents.pptx
Exploring BigData with Google BigQuery
The Evolving Landscape of Data Engineering
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Google BigQuery for Everyday Developer
Machine Learning with GCP
How Google Does Big Data - DevNexus 2014
Data Platform on GCP
Dbs302 driving a realtime personalization engine with cloud bigtable
Critical Breakthroughs and Challenges in Big Data and Analytics
Google Cloud Platform for Data Science teams
GDSC Cloud Jam.pptx
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Ad

More from Wlodek Bielski (8)

PDF
Migrating Enterprise BI to Azure
PDF
Pierwsze kroki w chmurze
PDF
Azure Analysis Services - AzureDay Poland 2018
PDF
AnalyticsConf : Azure SQL Data Warehouse
PDF
Nowoczesne technologie In-Memory
PPTX
StreamInsight - Analiza danych w ruchu
PPTX
SQL Server 2014: In-memory OLTP
PPTX
WHUG Azure HDInsight
Migrating Enterprise BI to Azure
Pierwsze kroki w chmurze
Azure Analysis Services - AzureDay Poland 2018
AnalyticsConf : Azure SQL Data Warehouse
Nowoczesne technologie In-Memory
StreamInsight - Analiza danych w ruchu
SQL Server 2014: In-memory OLTP
WHUG Azure HDInsight

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Computer network topology notes for revision
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Lecture1 pattern recognition............
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Logistic Regression ml machine learning.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Global journeys: estimating international migration
Clinical guidelines as a resource for EBP(1).pdf
Taxes Foundatisdcsdcsdon Certificate.pdf
climate analysis of Dhaka ,Banglades.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Computer network topology notes for revision
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
.pdf is not working space design for the following data for the following dat...
Business Acumen Training GuidePresentation.pptx
Lecture1 pattern recognition............
Mega Projects Data Mega Projects Data
Business Ppt On Nestle.pptx huunnnhhgfvu
Database Infoormation System (DBIS).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Logistic Regression ml machine learning.pptx
Introduction to Knowledge Engineering Part 1
Global journeys: estimating international migration

Cloud Developer Days - BigQuery