SlideShare a Scribd company logo
IoT @ Google Scale
James Chittenden
Google Cloud Platform Solutions Engineer
jameschi@google.com
+James Chittenden
(Big Data Cloud Engineer)
jameschi@google.com
Big Data at Google
aka. Data at Google
Manage the Entire Lifecycle of Big Data
Cloud Logs
Google App
Engine
Google Analytics
Premium
Cloud Pub/Sub
BigQuery Storage
(tables)
Cloud Bigtable
(noSQL)
Cloud Storage
(files)
Cloud Dataflow
BigQuery Analytics
(SQL)
Capture Store Analyze
Batch
Real time analytics
and Alerts
Cloud DataStore
Process
Stream
Cloud Dataflow
Cloud
Monitoring
End to End View of the GCP IoT Architecture
Device to Device Protocols
● Device Discovery
● Device to Device authentication
● Device Configuration
● Protocol Routing
Machine Learning: Pattern Detection and Prediction
● Subscribers scan real time
streams and feed data into the
Machine Learning Recognition
algorithm
● Dataflow Orchestrates
streaming algorithms which
compare data streams against
Experience Database
● Correlators detect known
patterns and publish alerts
using Cloud Pub/Sub
Cloud Storage Archival and Retrieval
● Data is periodically unloaded
from Big Table and stored in
Cloud Storage for archival
● Data in Cloud Storage can be
quickly re-loaded in Big Table
should it need to be re-
processed.
Cloud Pub/Sub
Real-time and reliable messaging with Pub/Sub
Messaging is a shock-absorber
Throughput LatencyAvailability
Images by Connie
Zhou
• Buffer new requests
during outages
• Prevent overloads that
cause outages
• Redirect requests to
recover from outages
• Smooth out spikes in
new request rate
• Balance load across
multiple workers
• Balance arrival rate
with service rate
• Accept requests closer
to the network edge
• Optimize message
flow across regions
• Leverage shared
efforts to improve
protocols
Pub/Sub is a change-absorber
Sinks TransformsSources
Images by Connie
Zhou
• New data sources can
plug into old data
flows
• New data sources can
use new schemas
• Common security
policies for all sources
• Data can be sent to
new destinations
• Push and Pull delivery
are both available
• Spans organizational
boundaries
• Select subsets of
messages that matter
• Helps manage schema
and version changes
• Can merge streams
into new topics
Chat & Mobile
Every time your GMail box
pops up a new message,
it’s because of a push
notification to your
browser or mobile
device.
One of the most important
real-time information
streams in the company is
advertising revenue — we
use Pub/Sub to broadcast
budgets to our entire fleet
of search engines
Google Cloud Messaging
for Android delivers
billions of messages a
day, reliably and securely
for Google’s own mobile
apps and the entire
developer community
Updating search results as
you type is a feat of real-
time indexing that
depends on Pub/Sub to
update caches with
breaking news
Ads & Budgets Instant SearchPush Notifications
Pub/Sub at Google
HTTP Server
Subscriber
Pub/Sub System
Webhook
Delivery
Publisher
Topic
Subscription
HTTP Push
Delivery
Google
App Engine
Pull
Subscriber
Subscription Subscription
Google RPC
Delivery
Cloud
Dataflow
Subscription
On-Prem/Cloud Any Environment
Subscriber
Msg
Pub/Sub System
Subscriber
Msg
Pub/Sub System
Ack
RPC Send
RPC Return
Ack
Push Subscription Pull Subscription
“We don’t really run MapReduce at Google anymore”
- Urs Hoelzle
Google Dataflow
Google Technologies
SpannerDremelMapReduce
Big Table
MillWheel
2012 2014+2002 2004 2006 2008 2010
GFS
2013
More!
Flumejava
Colossus
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Pipeline p = Pipeline.create();
p.begin()
.apply(TextIO.Read.from(“gs://…”))
.apply(ParDo.of(new ExtractTags())
.apply(Count.create())
.apply(ParDo.of(new ExpandPrefixes())
.apply(Top.largestPerKey(3))
.apply(TextIO.Write.to(“gs://…”));
p.run();
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Deploy
Schedule & Monitor
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
800 RPS 1200 RPS 5000 RPS 50 RPS
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Dataflow Goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Pipeline p = Pipeline.create();
p.begin()
.apply(TextIO.Read.from(“gs://…”))
.apply(ParDo.of(new ExtractTags())
.apply(Count.create())
.apply(ParDo.of(new ExpandPrefixes())
.apply(Top.largestPerKey(3))
.apply(TextIO.Write.to(“gs://…”));
p.run();
.apply(PubsubIO.Read.from(“input_topic”))
.apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))
.apply(PubsubIO.Write.to(“output_topic”));
Dataflow Goodies
Unified Model
Unified Model
Pub/Sub + Dataflow + BigQuery Demo
Life of a Pipeline
IoT at Google Scale
Dataflow
Your Data
BigQuery
Fast ETL
Regex
JSON
UDFs
Spreadsheets
BI Tools
Coworkers
Applications + Reports
PubSub
Cloud Storage
BigTable
Enterprise Big Data Architecture on Google
Plus True Stream Processing
Plus Autoscaling and per-minute billing
All the benefits of Hadoop-on-Google
Plus a Fully-Managed Service
Plus New, Intuitive Framework
1
2
3
4
5
Why Dataflow?
Questions?

More Related Content

PDF
Containerizing the Cloud with Kubernetes and Docker
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
PDF
Visualising and Linking Open Data from Multiple Sources
PDF
Google cloud big data summit master gcp big data summit la - 10-20-2015
PPTX
Real-Time Analytics with MemSQL and Spark
PPTX
Real-Time, Geospatial, Maps by Neil Dahlke
PPTX
CTO View: Driving the On-Demand Economy with Predictive Analytics
PDF
Big Data and ML on Google Cloud
Containerizing the Cloud with Kubernetes and Docker
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Visualising and Linking Open Data from Multiple Sources
Google cloud big data summit master gcp big data summit la - 10-20-2015
Real-Time Analytics with MemSQL and Spark
Real-Time, Geospatial, Maps by Neil Dahlke
CTO View: Driving the On-Demand Economy with Predictive Analytics
Big Data and ML on Google Cloud

What's hot (20)

PPTX
In-Memory Computing Webcast. Market Predictions 2017
PDF
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
PDF
Winning the On-Demand Economy with Spark and Predictive Analytics
PDF
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
PPTX
Driving the On-Demand Economy with Spark and Predictive Analytics
PDF
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
PPTX
SnapLogic Live: AWS Integration
PPTX
Zero Downtime App Deployment using Hadoop
PPTX
SnapLogic Live: IoT Integration
PPTX
Snaplogic Live: Big Data in Motion
PDF
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
PDF
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
PDF
The Fast Path to Building Operational Applications with Spark
PPTX
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
PPTX
SnapLogic Live: Salesforce Integration
PPTX
Next Generation of Data Integration with Azure Data Factory by Tom Kerkhove
PDF
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
PPTX
Streaming Analytics for IoT with Apache Spark
PDF
Cloud Developer Days - BigQuery
PDF
Google Bigtable
In-Memory Computing Webcast. Market Predictions 2017
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Winning the On-Demand Economy with Spark and Predictive Analytics
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Driving the On-Demand Economy with Spark and Predictive Analytics
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
SnapLogic Live: AWS Integration
Zero Downtime App Deployment using Hadoop
SnapLogic Live: IoT Integration
Snaplogic Live: Big Data in Motion
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Fast Path to Building Operational Applications with Spark
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
SnapLogic Live: Salesforce Integration
Next Generation of Data Integration with Azure Data Factory by Tom Kerkhove
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Streaming Analytics for IoT with Apache Spark
Cloud Developer Days - BigQuery
Google Bigtable
Ad

Similar to IoT at Google Scale (20)

PPTX
Intro to Google Cloud Platform Data Engineering.
PDF
IoT NY - Google Cloud Services for IoT
PPTX
slide_ppt_gcp_services_explanation_overview.pptx
PPTX
Google Cloud and Data Pipeline Patterns
PPTX
Introduction to Google Cloud Platform
PDF
Building what's next with google cloud's powerful infrastructure
PDF
Google's Infrastructure and Specific IoT Services
PDF
Big data in action
PDF
Google на конференции Big Data Russia
PDF
Cloud computing overview & running your code on Google Cloud
PDF
Cloud computing overview & running your code on Google Cloud (Jun 2019)
PPTX
Mykola Murha "Using Google Cloud Platform for creating of Big Data Analysis ...
PDF
A Tour of Google Cloud Platform
PPTX
Introduction to Google Cloud & GCCP Campaign
PPTX
Google Cloud Platform Update - NEXT 2017
PPTX
JAM23-24_ppt.pptx
PPTX
Serverless Data Architecture at scale on Google Cloud Platform
PDF
Introduction to gcp
PDF
Google Cloud Platform
PDF
node.js on Google Compute Engine
Intro to Google Cloud Platform Data Engineering.
IoT NY - Google Cloud Services for IoT
slide_ppt_gcp_services_explanation_overview.pptx
Google Cloud and Data Pipeline Patterns
Introduction to Google Cloud Platform
Building what's next with google cloud's powerful infrastructure
Google's Infrastructure and Specific IoT Services
Big data in action
Google на конференции Big Data Russia
Cloud computing overview & running your code on Google Cloud
Cloud computing overview & running your code on Google Cloud (Jun 2019)
Mykola Murha "Using Google Cloud Platform for creating of Big Data Analysis ...
A Tour of Google Cloud Platform
Introduction to Google Cloud & GCCP Campaign
Google Cloud Platform Update - NEXT 2017
JAM23-24_ppt.pptx
Serverless Data Architecture at scale on Google Cloud Platform
Introduction to gcp
Google Cloud Platform
node.js on Google Compute Engine
Ad

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
KodekX | Application Modernization Development
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Electronic commerce courselecture one. Pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KodekX | Application Modernization Development

IoT at Google Scale

  • 1. IoT @ Google Scale James Chittenden Google Cloud Platform Solutions Engineer jameschi@google.com
  • 2. +James Chittenden (Big Data Cloud Engineer) jameschi@google.com
  • 3. Big Data at Google aka. Data at Google
  • 4. Manage the Entire Lifecycle of Big Data Cloud Logs Google App Engine Google Analytics Premium Cloud Pub/Sub BigQuery Storage (tables) Cloud Bigtable (noSQL) Cloud Storage (files) Cloud Dataflow BigQuery Analytics (SQL) Capture Store Analyze Batch Real time analytics and Alerts Cloud DataStore Process Stream Cloud Dataflow Cloud Monitoring
  • 5. End to End View of the GCP IoT Architecture
  • 6. Device to Device Protocols ● Device Discovery ● Device to Device authentication ● Device Configuration ● Protocol Routing
  • 7. Machine Learning: Pattern Detection and Prediction ● Subscribers scan real time streams and feed data into the Machine Learning Recognition algorithm ● Dataflow Orchestrates streaming algorithms which compare data streams against Experience Database ● Correlators detect known patterns and publish alerts using Cloud Pub/Sub
  • 8. Cloud Storage Archival and Retrieval ● Data is periodically unloaded from Big Table and stored in Cloud Storage for archival ● Data in Cloud Storage can be quickly re-loaded in Big Table should it need to be re- processed.
  • 9. Cloud Pub/Sub Real-time and reliable messaging with Pub/Sub
  • 10. Messaging is a shock-absorber Throughput LatencyAvailability Images by Connie Zhou • Buffer new requests during outages • Prevent overloads that cause outages • Redirect requests to recover from outages • Smooth out spikes in new request rate • Balance load across multiple workers • Balance arrival rate with service rate • Accept requests closer to the network edge • Optimize message flow across regions • Leverage shared efforts to improve protocols
  • 11. Pub/Sub is a change-absorber Sinks TransformsSources Images by Connie Zhou • New data sources can plug into old data flows • New data sources can use new schemas • Common security policies for all sources • Data can be sent to new destinations • Push and Pull delivery are both available • Spans organizational boundaries • Select subsets of messages that matter • Helps manage schema and version changes • Can merge streams into new topics
  • 12. Chat & Mobile Every time your GMail box pops up a new message, it’s because of a push notification to your browser or mobile device. One of the most important real-time information streams in the company is advertising revenue — we use Pub/Sub to broadcast budgets to our entire fleet of search engines Google Cloud Messaging for Android delivers billions of messages a day, reliably and securely for Google’s own mobile apps and the entire developer community Updating search results as you type is a feat of real- time indexing that depends on Pub/Sub to update caches with breaking news Ads & Budgets Instant SearchPush Notifications Pub/Sub at Google
  • 13. HTTP Server Subscriber Pub/Sub System Webhook Delivery Publisher Topic Subscription HTTP Push Delivery Google App Engine Pull Subscriber Subscription Subscription Google RPC Delivery Cloud Dataflow Subscription On-Prem/Cloud Any Environment
  • 14. Subscriber Msg Pub/Sub System Subscriber Msg Pub/Sub System Ack RPC Send RPC Return Ack Push Subscription Pull Subscription
  • 15. “We don’t really run MapReduce at Google anymore” - Urs Hoelzle Google Dataflow
  • 16. Google Technologies SpannerDremelMapReduce Big Table MillWheel 2012 2014+2002 2004 2006 2008 2010 GFS 2013 More! Flumejava Colossus
  • 17. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Dataflow Goodies
  • 18. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Pipeline p = Pipeline.create(); p.begin() .apply(TextIO.Read.from(“gs://…”)) .apply(ParDo.of(new ExtractTags()) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(TextIO.Write.to(“gs://…”)); p.run(); Dataflow Goodies
  • 19. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Deploy Schedule & Monitor Dataflow Goodies
  • 20. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 800 RPS 1200 RPS 5000 RPS 50 RPS Dataflow Goodies
  • 21. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Dataflow Goodies
  • 22. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Pipeline p = Pipeline.create(); p.begin() .apply(TextIO.Read.from(“gs://…”)) .apply(ParDo.of(new ExtractTags()) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(TextIO.Write.to(“gs://…”)); p.run(); .apply(PubsubIO.Read.from(“input_topic”)) .apply(Window.<Integer>by(FixedWindows.of(5, MINUTES)) .apply(PubsubIO.Write.to(“output_topic”)); Dataflow Goodies
  • 25. Pub/Sub + Dataflow + BigQuery Demo
  • 26. Life of a Pipeline
  • 28. Dataflow Your Data BigQuery Fast ETL Regex JSON UDFs Spreadsheets BI Tools Coworkers Applications + Reports PubSub Cloud Storage BigTable Enterprise Big Data Architecture on Google
  • 29. Plus True Stream Processing Plus Autoscaling and per-minute billing All the benefits of Hadoop-on-Google Plus a Fully-Managed Service Plus New, Intuitive Framework 1 2 3 4 5 Why Dataflow?