SlideShare a Scribd company logo
Big Data On Google Cloud
Tu Pham - IO extended 2017
Big data on google cloud
CTO @ Dyno
ADataasservicecompany
Technologies: Java, Python, all kind of databases and Cloud
platform from Google, Aws, Azure.
Interests: Cloud computing / architecture, technology
evolution, distributed systems.
Husband, Father, GDE, Open source contributor.
Tu Pham
foto: Lars Kruse, Aarhus Universitet
3
Giới thiệu Dyno: 

- Tech marketing & digital
agency
For	the	past	17 years,	Google		
has	been	building	out	the		
world’s	fastest,	most	powerful,		
highest	quality	cloud		
infrastructure	on the planet.
Images by ConnieZhou
Google	 Cloud	 Platform	 is	 built	 on	
the	 s a m e	 infrastructure	 that	
powers	Google.
Images	by	Connie	Zhou
Google’s	Platform
“[Google's]	ability	to	build,	organize,	and		
operate	a	huge	network	of	servers	and	
fiber-		optic	cables	with	an	efficiency	and	
speed	that		rocks	physics	on	its	heels.	
This is whatmakes Google Google: its	physical		
network,	its	thousands	of	fiber	miles,	and	
those		many	thousands	of	servers	that,	in	
aggregate,		add	up	to	the	mother of all
clouds.”
-	
Wired
77
Peering locations
Yes,	We	Can	Power	that
Web Mobile Storage	&	Database
Big	Data Highly	Scalable	System	 Data	Mining	
Cloud	Platform
Google CloudPlatform
Organize	the	world’s		
information	and	make	it		
universally	accessible	and	
useful.
Google’s Mission
2
“
Google CloudPlatform 5
Source: Boston Consulting Group:
The Mobile Revolution: How Mobile Technologies Drive a Trillion-DollarImpact
IDC,2015
By	2020,	there	will	be	8	Billion	connected	smart	phones	
—			2X	more	than	today.
And 32 Billion connected “IOT”devices
—6X more thantoday.
Exploring	the	Cloud
IaaS	
Infrastructure-as-a-
Service
PaaS	
Platform-as-a-
Service
SaaS	
Software-as-a-
Service
Google	Cloud	
Platform
Cloud	Platform
Big data on google cloud
Google	Compute	Engine
Cloud	Platform
• Flexible	Infrastructure	
• Customer	VM	Size	
• Online	Disk	Resizing	
• Network	
• Internal	Network	
• Firewall	
• Load	Balancing	
• External	Ip	Address	
• Billing	
• Sustained	Usage	Discounts	
• Preemptible	VM
App	Engine
•	Fully	Managed	Platform	
• Popular	Programming	Language	Support	
• Flexible	and	Scalable	Application	Storage	
• Auto-scaling	
• Versioning	and	Traffic	Splitting	
• Local	Developer	Tools	
•	Third-party	Frameworks	and	Extensions
Cloud	Platform
• Global	Presence	
• Flexible	Delivery	Options	
• Pull	
• Push	
• Data	Reliability	
• Flow	Control	
• Data	Security	And	Protection
Cloud	Platform
Pub	Sub
• Reliable	&	Consistency	
Processing	
• Unified	Programing	Model	
• Intelligence	Work	Scheduling	
• Auto	Scaling	
• Monitoring	
• Open	Source
Cloud	Platform
Cloud	Data	Flow
• Versioning	
• Static	Sites	
• Resumable	Transfers	
• Object	Change	Notifications	
• TB	scale	
Cloud	Platform
Cloud	Storage
Cloud	SQL
• Fully	managed	
• Ease	of	Use	
• Highly	Reliable	
• Flexible	Charging	
• Security,	Availability,	Durability	
• Easy	Migration	&	Data	
Portability	
• Optimized	Mysql	versions
Cloud	Platform
Big	Query
• Fully	Managed	Big	Data	Analytics	Service	
• Support	SQL		
• Fast	
• Scalable	
• Flexible	and	Familiar	
• Security	and	Reliability	
Cloud	Platform
Data	Proc
• Includes	
• Apache	Hadoop	
• Apache	Pig	
• Apache	Hive	
• Apache	Spark	
• Fast	And	Scalable	Data	Processing	
• Flexible	Virtual	Machines	
• Resizable	Cluster	
Cloud	Platform
Data	Lab
• Powerful	Data	Exploration	
• Scalable	
• Data	Management	
• Visualization	
• Open	Source	(Jupyter)	
Cloud	Platform
Google’s Data Services for everyone
A common configuration: draw	conclusions
CloudDatalab
Events,	metrics,		
etc.	
Stream	
Visualization and BI
Raw	logs,	files,		
assets,	Google	
Analytics	data	etc.	 Co-workers
Batch	
Batch	
B C Applications and
A Reports
Confidential +Proprietary
A	serverless big	data	stack	that	
scales	automatically
10+	Years	of	Tackling	Big	Data			Problems
Google CloudPlatform 13
Google
Papers
20082002 2004 2006 2010 2012 2014 2015
GFS
Map
Reduce
Flume
Java
Millwheel
Open
Source
2005
Google
Cloud
Products BigQuery Pub/Sub Dataflow Bigtable
BigTable Dremel PubSub
Apache
Beam
Tensorflow
Confidential & ProprietaryGoogle Cloud Platform 24
Transform Data into Actions
Exploration &
Collaboration
Databases Storage
Data
Preparation &
Processing
Analytics
Advanced
Analytics &
Intelligence
Mobile apps
Sensors and
devices
Web apps
Relational
Key-value
Document
SQL
Wide column
Object
Stream
processing
Batch
processing
Data
preparation
Federated
query
Data catalog
Data
exploration
Data
visualization
Developers
Data scientists
Business
analysts
Development
environment
for Machine
Learning
Pre-Trained
Machine
Learning
models
Data
Ingestion
Messaging
Logs
Confidential & ProprietaryGoogle Cloud Platform 25
Transform Data into Actions
Data
Preparation &
Processing
Cloud Dataflow
Cloud Dataproc
Exploration &
Collaboration
Google
BigQuery
Cloud Datalab
Google
Analytics 360
Cloud Dataproc
Mobile apps
Sensors and
devices
Web apps
Developers
Data scientists
Business
analysts
Data Ingestion
Cloud Pub/Sub
App Engine
Databases/
Storage
Cloud SQL
Cloud Bigtable
Cloud
Datastore
Cloud Storage
Analytics
Google BigQuery
Google
Analytics 360
Cloud Dataproc
Google Drive
Advanced
Analytics &
Intelligence
Cloud Machine
Learning
Translate API
Vision API
Speech API
Google Cloud Platform 3
Apache Spark and Apache Hadoop should be
fast, easy, and cost-effective.
Google	Cloud	Data	Proc
Traditional Spark and Hadoop clusters
Google Cloud Dataproc
Google Cloud Dataproc - under the hood
Applications on
the cluster
Dataproc Jobs
GCP Products
Spark
PySpark
Spark SQL
MapReduce
Pig
Hive
Dataproc Cluster
Spark & Hadoop OSS
Cloud Dataproc Agent
Google Cloud Services
Dataproc Jobs FeaturesData Outputs
Easy, fast, cost-effective
Fast
Things take seconds to minutes, not hours or weeks
Easy
Be an expert with your data, not your data infrastructure
Cost-effective
Pay for exactly what you use
Running Hadoop on Google Cloud
bdutil
Free OSS Toolkit
Dataproc
Managed Hadoop
Custom Code
Monitoring/Health
Dev Integration
Scaling
Job Submission
GCP Connectivity
Deployment
Creation
Custom Code
Monitoring/Health
Dev Integration
Manual Scaling
Job Submission
GCP Connectivity
Deployment
Creation
On
Premise
Custom Code
Monitoring/Health
Dev Integration
Scaling
Job Submission
GCP Connectivity
Deployment
Creation
Google Managed
Google Cloud Platform
Customer Managed
Vendor
Hadoop
Custom Code
Monitoring/Health
Dev Integration
Scaling
Job Submission
GCP Connectivity
Deployment
Creation
6
Cloud Dataproc - integrated
6
Cloud Dataproc is
natively integrated with
several Google Cloud
Platform products as
part of an integrated
data platform.
Storage
Operations
Data
7
Where Cloud Dataproc fits into GCP
7
Google Bigtable
(HBase)
Google BigQuery
(Analytics, Data warehouse)
Stackdriver Logging
(Logging Ops.)
Google Cloud Dataflow
(Batch/Stream Processing)
Google Cloud Storage
(HCFS/HDFS)
Stackdriver Monitoring
(Monitoring)
Building what’s next 33
Scales automatically
No setup or administration
Stream up to 100,000 rowsp/sec
Easily integrates with third-partysoftware
Google BigQuery
makes	complex	data	analysis	simple
Big data on google cloud
Confidential +
Proprietary
Google	BigQuery	Performance	Example	?
Running an inefficient	regular expression over 100 billion rowsin
less than 60 seconds
Source: https://cloud.google.com/blog/big-data/2016/01/anatomy-of-a-bigquery-
query
Google	BigQuery
The	Power	of	Google	Dremel	for	everyone
Storage Compute
Fast Ingest
Query
Terabit Network
1000-core Hadoop Cluster
= 2.5 hours
Before
Making ad hocQueries
with BigQuery <5min
After
● 500+	Games
● Hundreds	of	Analysts	
● Terabytes	of	Data	Daily
Big data on google cloud
“Right	at	the	start	of	the	partnership	we	were		
able	to	reduce	time	to	insight	from	96	hours	to		
30	minutes	by	using	BigQuery,	allowing	us	to		
react	in	real	time	to	customer	needs	and		provide	
better	service..”
GarySanders
Head of the bank's digital analyticsfunction
https://www.finextra.com/newsarticle/28566/lloyds-partners-google-on-data-analytics
Big Data Challenges At Dyno
- Multi TB data warehouse
- Raw input > 100 GB new raw data per day (Structured
& Unstructured)
- 65 online data source
- Unlimited offline data source
- Face with data quality problem everyday
- From user information & behavior to user interest &
intention
- Manage high performance / cost effective system
JOIN THE FLIGHT - WE ARE HIRING
IO Extended 2017
Twitter: @phamptu
Email: tu@dyno.vn
Frontend Developer: goo.gl/EY8RvV
Backend Developer: goo.gl/BnmmK6

More Related Content

PPTX
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
PDF
Google Cloud Dataflow
PDF
Workshop on Google Cloud Data Platform
PPTX
warner-DP-203-slides.pptx
PPTX
Cloud Computing
PPTX
Google Cloud Spanner Preview
PDF
Google Associate Cloud Engineer Certification Tips
PPTX
AzureSynapse.pptx
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataflow
Workshop on Google Cloud Data Platform
warner-DP-203-slides.pptx
Cloud Computing
Google Cloud Spanner Preview
Google Associate Cloud Engineer Certification Tips
AzureSynapse.pptx

What's hot (20)

PPTX
Introduction to Google Cloud Platform for Big Data - Trusted Conf
PPTX
Understanding cloud with Google Cloud Platform
PDF
Azure cloud migration simplified
PPTX
Introduction to Google Cloud Services / Platforms
PPTX
Data Center Migration to the AWS Cloud
PPTX
Cloud computing
PPTX
Google Cloud Platform
PDF
Cloud Migration Checklist | Microsoft Azure Migration
PDF
Introduction to Microsoft Azure Cloud
PDF
 Introduction google cloud platform
PPTX
Microsoft Azure Technical Overview
PDF
The evolving story for Agile Integration Architecture in 2019
PDF
Tom Grey - Google Cloud Platform
PDF
Cloud Migration Strategy and Best Practices
PPTX
How to migrate workloads to the google cloud platform
PPTX
Microsoft azure
PPTX
Introduction to Microsoft Azure
PPTX
Azure Cloud PPT
KEY
Introduction to Google App Engine
PPTX
Introduction to Azure DevOps
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Understanding cloud with Google Cloud Platform
Azure cloud migration simplified
Introduction to Google Cloud Services / Platforms
Data Center Migration to the AWS Cloud
Cloud computing
Google Cloud Platform
Cloud Migration Checklist | Microsoft Azure Migration
Introduction to Microsoft Azure Cloud
 Introduction google cloud platform
Microsoft Azure Technical Overview
The evolving story for Agile Integration Architecture in 2019
Tom Grey - Google Cloud Platform
Cloud Migration Strategy and Best Practices
How to migrate workloads to the google cloud platform
Microsoft azure
Introduction to Microsoft Azure
Azure Cloud PPT
Introduction to Google App Engine
Introduction to Azure DevOps
Ad

Similar to Big data on google cloud (20)

PPTX
Introduction to Google Cloud Platform
PPTX
30 daysofcloud - 2
PPTX
Google Cloud Platform: Prototype ->Production-> Planet scale
PDF
Google Cloud Platform for the Enterprise
PDF
Big data in action
PDF
Building what's next with google cloud's powerful infrastructure
PDF
Critical Breakthroughs and Challenges in Big Data and Analytics
PPTX
Eric Andersen Keynote
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
PDF
Introduction to Google Cloud Platform
PDF
What Are Google Cloud Platform Services: Full Guide for 2025
PDF
Getting started with GCP ( Google Cloud Platform)
PDF
A Tour of Google Cloud Platform
PDF
Google Cloud Data Platform - Why Google for Data Analysis?
PPTX
Mykola Murha "Using Google Cloud Platform for creating of Big Data Analysis ...
PDF
node.js on Google Compute Engine
PPTX
Introduction to Google Cloud & GCCP Campaign
PPTX
GDSC Cloud Jam.pptx
PDF
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
PPTX
JAM23-24_ppt.pptx
Introduction to Google Cloud Platform
30 daysofcloud - 2
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform for the Enterprise
Big data in action
Building what's next with google cloud's powerful infrastructure
Critical Breakthroughs and Challenges in Big Data and Analytics
Eric Andersen Keynote
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Introduction to Google Cloud Platform
What Are Google Cloud Platform Services: Full Guide for 2025
Getting started with GCP ( Google Cloud Platform)
A Tour of Google Cloud Platform
Google Cloud Data Platform - Why Google for Data Analysis?
Mykola Murha "Using Google Cloud Platform for creating of Big Data Analysis ...
node.js on Google Compute Engine
Introduction to Google Cloud & GCCP Campaign
GDSC Cloud Jam.pptx
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
JAM23-24_ppt.pptx
Ad

More from Tu Pham (20)

PDF
Multimodal Search in Google Cloud: LLMs with vision
PPTX
From CTO To CEO: The Pathway and Rewards
PPTX
Go from idea to app with no coding using AppSheet.pptx
PDF
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
PDF
Challenges In Implementing SRE
PDF
IT Strategy
PDF
Set up Learn and Development program
PDF
Cost Management For IT Project / Product
PDF
Minimum Viable Product 101
PDF
Understand your customers
PDF
Let's build great products for mid-size companies
PDF
Latency Control And Supervision In Resilience Design Patterns
PDF
End To End Business Intelligence On Google Cloud
PDF
High Output Tech Management
PDF
Big Data Driven At Eway
PDF
Security On The Cloud
PPTX
Eway Tech Talk #2 Coding Guidelines
PDF
End To End Machine Learning With Google Cloud
PPTX
Eway Tech Talk #0 Knowledge Sharing
PPTX
Php 5.6 vs Php 7 performance comparison
Multimodal Search in Google Cloud: LLMs with vision
From CTO To CEO: The Pathway and Rewards
Go from idea to app with no coding using AppSheet.pptx
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Challenges In Implementing SRE
IT Strategy
Set up Learn and Development program
Cost Management For IT Project / Product
Minimum Viable Product 101
Understand your customers
Let's build great products for mid-size companies
Latency Control And Supervision In Resilience Design Patterns
End To End Business Intelligence On Google Cloud
High Output Tech Management
Big Data Driven At Eway
Security On The Cloud
Eway Tech Talk #2 Coding Guidelines
End To End Machine Learning With Google Cloud
Eway Tech Talk #0 Knowledge Sharing
Php 5.6 vs Php 7 performance comparison

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Global journeys: estimating international migration
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Fluorescence-microscope_Botany_detailed content
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Mega Projects Data Mega Projects Data
Business Acumen Training GuidePresentation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction to Knowledge Engineering Part 1
Reliability_Chapter_ presentation 1221.5784
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Database Infoormation System (DBIS).pptx
climate analysis of Dhaka ,Banglades.pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
Moving the Public Sector (Government) to a Digital Adoption
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Global journeys: estimating international migration
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx

Big data on google cloud