SlideShare a Scribd company logo
(Big) Data Driven at
Eway
Tu Pham - CTO @ Eway
Journey to Google Cloud
-- Ha Noi - 03/2019 --
About Me - CTO at Eway JSC
- Google Developer Expert on Cloud
Platform
- Open source contributor, blogger,
father
- 8 years experience on Big data and
Cloud Computing
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
44M dollar
transaction
value in
2018
> 5M
Transactions
Big Data Driven At Eway
Big Data Driven At Eway
When You
Have (Big)
Data
How Do We
Use This
Data
Use Case - Reporting
- Business Analytics
- Operational Analytics
- Product Features
- System Monitoring
- Reporting to:
- Partners
- Advertisers
- Publishers
Reporting
Business
Analytics
- Analyzing
- Growth
- Users behavior
- Sign up funnels
- Sign up referrals
- ...
Operational
Analytics
- Analyzing
- Root cause analysis
- Latency analysis
- Error analysis
- Better
- Threshold alerts
- Security alerts
- Capacity planning (server,
bandwidth)
Product
Features
- Product Features
- Top Products
- Adflex publisher challenge
- Signup referrals
- A/B Testing
Big Data Driven At Eway
Samole: End-To-End Flow For Mining
User Behavior
How do we
collect this
data?
Step 1: GC Compute Engine Instances
Collect Raw Data
- Technology: Cloud Load Balancing, Compute Engine
- Why Cloud Load Balancing:
- TCP/UDP Load Balancing
- Seamless Autoscaling
- Scalable
- Why Compute Engine:
- High-Performance
- Scalable
- Low Cost
- Fast Networking
- Custom Machine Types
Step 1: GC Compute Engine Instances
Collect Raw Data
How do we
process this
data?
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- Technology: Compute Engine, Parquet file format
- Why Parquet:
- Self-describing, columnar storage format
- Language-independent
- High query-performance
- Spark SQL is much faster with Parquet
- High compression (up to 70%)- less disk IO
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- Technology: Compute Engine, Parquet file format, Cloud Storage
- Why Cloud Storage:
- Four storage classes
- Easy to integrate
- Object Lifecycle Management
- Fast Networking
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
How do we
visualize this
data
Step 4a: Explore Dataset Using GC Datalab
- Technology: Cloud Datalab
- Why Datalab:
- Integrated with: Cloud BigQuery, Cloud Machine Learning Engine, Cloud Storage, and
Stackdriver Monitoring
- IPython Support & Notebook Format
- Interactive Data Visualization
- Multi-Language Support: Python, SQL, and JavaScript (for BigQuery user-defined functions
Step 4b: Explore Dataset Using BI Tools
- Technology: Grafana, PowerBI
- Why:
- Support >40 data sources (File, Database, Log Stream, Zabbix, Google Analytic, Google
Calendar, AWS Cloudwatch, Jira, ...)
- Query, visualize, alert on and understand your metrics
- Create, explore, and share dashboards with your team
Step 4a: Explore Dataset Using GC Datalab
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Big Data Driven At Eway
Become
Geek
Where Are
The AI /
ML
Create
Your
Principles
Principles:
- KISS (Keep it simple, stupid)
- DRY (Don’t Repeat Yourself)
- Single Responsibility
- Low Cost
- Scalable
Be 1% better everyday
tips
Create your system
principles
Design system
architecture, data flow,
data model, data
structure first
Separate realtime and
batch flows
Separate data storage
strategies between data
types
Save the cost by
network cost, instances
cost, storage cost by
metric monitoring &
alert system
We Are Hiring
● Product Owner
● Backend Java Developer
● Full Stack PHP Developer
● Full Stack Python Developer
● DevOps Engineer
Thank You - Q&A
● Eway: https://eway.vn
● My Contact: tupp@eway.vn

More Related Content

PDF
End To End Machine Learning With Google Cloud
PDF
End To End Business Intelligence On Google Cloud
PDF
Big data in action
PDF
Big Data at DYNO
PPTX
Google Cloud Platform (GCP)
PPTX
MILLIONS EVENT DELIVERY WITH CLOUD PUB / SUB
PDF
Big Data and ML on Google Cloud
PDF
Google cloud big data summit master gcp big data summit la - 10-20-2015
End To End Machine Learning With Google Cloud
End To End Business Intelligence On Google Cloud
Big data in action
Big Data at DYNO
Google Cloud Platform (GCP)
MILLIONS EVENT DELIVERY WITH CLOUD PUB / SUB
Big Data and ML on Google Cloud
Google cloud big data summit master gcp big data summit la - 10-20-2015

What's hot (20)

PPTX
Understanding cloud with Google Cloud Platform
PPTX
Scaling Galaxy on Google Cloud Platform
PDF
Google Cloud Platform Introduction - 2016Q3
PDF
Getting started with GCP ( Google Cloud Platform)
PDF
Cloud Developer Days - BigQuery
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
PPTX
Google Cloud Platform (GCP) At a Glance
PPTX
Google Cloud Platform Data Storage
PDF
Google Cloud Dataflow
PDF
How Google Does Big Data - DevNexus 2014
PDF
An overview of BigQuery
PDF
#DataUnlimited - Google Big Data Unlimited
PDF
Google Cloud Platform as a Backend Solution for your Product
PDF
StackEngine Demo - Docker Austin
PPTX
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
PPTX
Building Reactive Applications With Akka And Java
PDF
Critical Breakthroughs and Challenges in Big Data and Analytics
PDF
Visualising and Linking Open Data from Multiple Sources
PPTX
Introduction to GCP presentation
PPTX
Google cloud
Understanding cloud with Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
Google Cloud Platform Introduction - 2016Q3
Getting started with GCP ( Google Cloud Platform)
Cloud Developer Days - BigQuery
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Google Cloud Platform (GCP) At a Glance
Google Cloud Platform Data Storage
Google Cloud Dataflow
How Google Does Big Data - DevNexus 2014
An overview of BigQuery
#DataUnlimited - Google Big Data Unlimited
Google Cloud Platform as a Backend Solution for your Product
StackEngine Demo - Docker Austin
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Building Reactive Applications With Akka And Java
Critical Breakthroughs and Challenges in Big Data and Analytics
Visualising and Linking Open Data from Multiple Sources
Introduction to GCP presentation
Google cloud
Ad

Similar to Big Data Driven At Eway (20)

PDF
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
PPTX
GDSC Cloud Jam.pptx
PPTX
Eric Andersen Keynote
PDF
Getting more into GCP.pdf
PDF
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
PDF
Cloud Computing for Data Professionals
PDF
Modern Thinking área digital MSKM 21/09/2017
PPTX
Introduction to Google Cloud Platform for Big Data - Trusted Conf
DOCX
GOOGLE CLOUD DATA AND STORAGE Foundations.docx
PPTX
Big Data as a Service
PPTX
Introduction to Google Cloud Platform
PDF
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
PPT
8.17.11 big data and hadoop with informatica slideshare
PDF
Data Platform on GCP
PPTX
Cloud Computing
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
PDF
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
PDF
Make Data Work for You
PPTX
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
PDF
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
GDSC Cloud Jam.pptx
Eric Andersen Keynote
Getting more into GCP.pdf
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Cloud Computing for Data Professionals
Modern Thinking área digital MSKM 21/09/2017
Introduction to Google Cloud Platform for Big Data - Trusted Conf
GOOGLE CLOUD DATA AND STORAGE Foundations.docx
Big Data as a Service
Introduction to Google Cloud Platform
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
8.17.11 big data and hadoop with informatica slideshare
Data Platform on GCP
Cloud Computing
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
Make Data Work for You
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Ad

More from Tu Pham (20)

PDF
Multimodal Search in Google Cloud: LLMs with vision
PPTX
From CTO To CEO: The Pathway and Rewards
PPTX
Go from idea to app with no coding using AppSheet.pptx
PDF
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
PDF
Challenges In Implementing SRE
PDF
IT Strategy
PDF
Set up Learn and Development program
PDF
Cost Management For IT Project / Product
PDF
Minimum Viable Product 101
PDF
Understand your customers
PDF
Let's build great products for mid-size companies
PDF
Latency Control And Supervision In Resilience Design Patterns
PDF
High Output Tech Management
PDF
Security On The Cloud
PPTX
Eway Tech Talk #2 Coding Guidelines
PPTX
Eway Tech Talk #0 Knowledge Sharing
PPTX
Php 5.6 vs Php 7 performance comparison
PDF
System Security on Cloud
PDF
Big data on google cloud
PDF
Understanding Kubernetes
Multimodal Search in Google Cloud: LLMs with vision
From CTO To CEO: The Pathway and Rewards
Go from idea to app with no coding using AppSheet.pptx
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Challenges In Implementing SRE
IT Strategy
Set up Learn and Development program
Cost Management For IT Project / Product
Minimum Viable Product 101
Understand your customers
Let's build great products for mid-size companies
Latency Control And Supervision In Resilience Design Patterns
High Output Tech Management
Security On The Cloud
Eway Tech Talk #2 Coding Guidelines
Eway Tech Talk #0 Knowledge Sharing
Php 5.6 vs Php 7 performance comparison
System Security on Cloud
Big data on google cloud
Understanding Kubernetes

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced IT Governance
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
cuic standard and advanced reporting.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
NewMind AI Monthly Chronicles - July 2025
Advanced IT Governance
The Rise and Fall of 3GPP – Time for a Sabbatical?
cuic standard and advanced reporting.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Advanced Soft Computing BINUS July 2025.pdf
Network Security Unit 5.pdf for BCA BBA.
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity

Big Data Driven At Eway

  • 1. (Big) Data Driven at Eway Tu Pham - CTO @ Eway Journey to Google Cloud -- Ha Noi - 03/2019 --
  • 2. About Me - CTO at Eway JSC - Google Developer Expert on Cloud Platform - Open source contributor, blogger, father - 8 years experience on Big data and Cloud Computing
  • 17. How Do We Use This Data
  • 18. Use Case - Reporting - Business Analytics - Operational Analytics - Product Features - System Monitoring
  • 19. - Reporting to: - Partners - Advertisers - Publishers Reporting
  • 20. Business Analytics - Analyzing - Growth - Users behavior - Sign up funnels - Sign up referrals - ...
  • 21. Operational Analytics - Analyzing - Root cause analysis - Latency analysis - Error analysis - Better - Threshold alerts - Security alerts - Capacity planning (server, bandwidth)
  • 22. Product Features - Product Features - Top Products - Adflex publisher challenge - Signup referrals - A/B Testing
  • 24. Samole: End-To-End Flow For Mining User Behavior
  • 25. How do we collect this data?
  • 26. Step 1: GC Compute Engine Instances Collect Raw Data - Technology: Cloud Load Balancing, Compute Engine - Why Cloud Load Balancing: - TCP/UDP Load Balancing - Seamless Autoscaling - Scalable - Why Compute Engine: - High-Performance - Scalable - Low Cost - Fast Networking - Custom Machine Types
  • 27. Step 1: GC Compute Engine Instances Collect Raw Data
  • 28. How do we process this data?
  • 29. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files - Technology: Compute Engine, Parquet file format - Why Parquet: - Self-describing, columnar storage format - Language-independent - High query-performance - Spark SQL is much faster with Parquet - High compression (up to 70%)- less disk IO
  • 30. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files
  • 31. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files
  • 32. - Technology: Compute Engine, Parquet file format, Cloud Storage - Why Cloud Storage: - Four storage classes - Easy to integrate - Object Lifecycle Management - Fast Networking Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 33. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 34. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 35. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 36. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 37. How do we visualize this data
  • 38. Step 4a: Explore Dataset Using GC Datalab - Technology: Cloud Datalab - Why Datalab: - Integrated with: Cloud BigQuery, Cloud Machine Learning Engine, Cloud Storage, and Stackdriver Monitoring - IPython Support & Notebook Format - Interactive Data Visualization - Multi-Language Support: Python, SQL, and JavaScript (for BigQuery user-defined functions
  • 39. Step 4b: Explore Dataset Using BI Tools - Technology: Grafana, PowerBI - Why: - Support >40 data sources (File, Database, Log Stream, Zabbix, Google Analytic, Google Calendar, AWS Cloudwatch, Jira, ...) - Query, visualize, alert on and understand your metrics - Create, explore, and share dashboards with your team
  • 40. Step 4a: Explore Dataset Using GC Datalab
  • 47. Create Your Principles Principles: - KISS (Keep it simple, stupid) - DRY (Don’t Repeat Yourself) - Single Responsibility - Low Cost - Scalable
  • 48. Be 1% better everyday tips Create your system principles Design system architecture, data flow, data model, data structure first Separate realtime and batch flows Separate data storage strategies between data types Save the cost by network cost, instances cost, storage cost by metric monitoring & alert system
  • 49. We Are Hiring ● Product Owner ● Backend Java Developer ● Full Stack PHP Developer ● Full Stack Python Developer ● DevOps Engineer
  • 50. Thank You - Q&A ● Eway: https://eway.vn ● My Contact: tupp@eway.vn