SlideShare a Scribd company logo
Zhang Yan 2014.02
Umeng Analytical Arch
Outlines
● what are we doing
● how are we doing
● real-time system
● batch system
● lambda arch
● future & challenge
What are we doing
● founded on 2010.4
● focus on mobile app analytics (and share)
● 100k apps, 600 million devices
● 1PB data, 1TB increment per day
● 1 billion messages per day
● real-time: 15 nodes, 60k QPS (data skew)
● batch: 100 nodes, 500+ jobs
Real-time Processing System(Thunder)
How are we doing
Batch Processing System(Iceberg)
SDK
SDK
SDK
torrent
HDFS
HBase
MapReduce Jobs
kvproxy Front End & Web
nginx
nginx
nginx
finagle log
finagle log
Kafka
MongoDB
Storm
Real-time system
Real-time Processing System(Thunder)
nginx
nginx
nginx
finagle server
finagle server
Kafka
MongoDB
Storm
Real-time Processing System(Thunder)
nginx
nginx
nginx
ruby server
ruby server
Resque
MongoDB
ruby
worker
Evolve ...
To ….
Real-time system(cont.)
● technology we use
○ java, scala, ruby
○ nginx, resque, mongodb
○ finagle, kafka, storm
● challenge we face
○ data skew
○ mongodb
■ global lock is bad.
■ table lock, but still good not enough.
Batch system
Batch Processing System(Iceberg)
torrent
HDFS
HBase
MapReduce
Jobs
kvproxy
finagle
finagle
log
log
But we wish it
could be ...
Batch Processing System(Iceberg)
Kafka
HDFS
HBase
MapReduce
Jobs
kvproxy
Kafka
Mirror
Batch system(cont.)
● technology we use
○ java, scala, python, c/c++
○ hadoop, hdfs, hbase, pig, hive, opentsdb
● challenge we face
○ data processing in a uniform way
○ optimize hbase usage(hash prefixed key, bulk load)
○ big amount of precomputation(HyperLogLog)
○ debugging, deployment, measurement etc.
Lambda arch
● merge real-time and batch
● complete each other
● real-time
○ fast, imprecise, narrow scope
● batch
○ slow, accurate, wide scope
Lambda Arch(cont.)
Real-time(Thunder) Data
Batch(Iceberg) Data
MongoDB
HBase kvproxy
Front End & Web
Index Selector
(not implemented
yet)
eg.
1. recently 5min DAU and Installations
2. recently 1 months DAU and Installations
Future & Challenge
● just-in-time computation
○ druid, impala, spark etc.
● unify real-time & batch processing in one
piece of code.
● data availability(internal & external)

More Related Content

PDF
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
PDF
Tweaking perfomance on high-load projects_Думанский Дмитрий
PDF
10 EZ Steps to SOLR Domination - Berlin Buzzwords 2012
PDF
Analytic Data Report with MongoDB
PPTX
From MySQL to MongoDB at Wordnik (Tony Tam)
PDF
OSOM - Operations in the Cloud
PDF
OSOM Operations in the Cloud
PDF
MongoDB - Warehouse and Aggregator of Events
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
Tweaking perfomance on high-load projects_Думанский Дмитрий
10 EZ Steps to SOLR Domination - Berlin Buzzwords 2012
Analytic Data Report with MongoDB
From MySQL to MongoDB at Wordnik (Tony Tam)
OSOM - Operations in the Cloud
OSOM Operations in the Cloud
MongoDB - Warehouse and Aggregator of Events

What's hot (20)

PDF
Handle TBs with $1500 per month
PDF
Introducing MagnetoDB, a key-value storage sevice for OpenStack
PDF
ELK Wrestling (Leeds DevOps)
PPTX
Gett && Golang
PDF
Taking Your Database Global with Kubernetes
PDF
ELK Syslog server - Kibana
PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
PDF
Data Lessons Learned at Scale
PPTX
Seastar Summit 2019 Keynote
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
PPTX
Keystone event processing pipeline on a dockerized microservices architecture
PPTX
Cassandra Lunch #59 Functions in Cassandra
PDF
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
PPTX
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
PDF
Real-time Analytics with Apache Flink and Druid
PDF
Using ClickHouse for Experimentation
PPTX
Hyperloglog Lightning Talk
PDF
Google Cloud Dataflow
PDF
Counters At Scale - A Cautionary Tale
PPTX
Druid - DevconTLV X
Handle TBs with $1500 per month
Introducing MagnetoDB, a key-value storage sevice for OpenStack
ELK Wrestling (Leeds DevOps)
Gett && Golang
Taking Your Database Global with Kubernetes
ELK Syslog server - Kibana
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
Data Lessons Learned at Scale
Seastar Summit 2019 Keynote
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Keystone event processing pipeline on a dockerized microservices architecture
Cassandra Lunch #59 Functions in Cassandra
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
Real-time Analytics with Apache Flink and Druid
Using ClickHouse for Experimentation
Hyperloglog Lightning Talk
Google Cloud Dataflow
Counters At Scale - A Cautionary Tale
Druid - DevconTLV X
Ad

Viewers also liked (20)

PDF
Lessons Learned on How to Secure Petabytes of Data
PPTX
Research Park: Year in Review 2014
PDF
冯宏华:H base在小米的应用与扩展
PPTX
台湾趴趴走
PPT
前端规范(初稿)
DOCX
周士云的简历
PDF
Stanford splash spring 2016 basic programming
PDF
Dpdk Validation - Liu, Yong
PPTX
PDF
Fast flux domain detection
PDF
Kafka文件系统设计
PDF
CV-YacineRhalmi
PDF
Introducing Ubuntu SDK
PPT
Paradigm Shifts
PDF
CVLinkedIn
PDF
Hung DO-DUY - Spikenet
PDF
Xiaoli_Ma_developer_resume
PDF
Cheng_Wang_resume
PDF
詹剑锋:Big databench—benchmarking big data systems
PDF
Zejia_CV_final
Lessons Learned on How to Secure Petabytes of Data
Research Park: Year in Review 2014
冯宏华:H base在小米的应用与扩展
台湾趴趴走
前端规范(初稿)
周士云的简历
Stanford splash spring 2016 basic programming
Dpdk Validation - Liu, Yong
Fast flux domain detection
Kafka文件系统设计
CV-YacineRhalmi
Introducing Ubuntu SDK
Paradigm Shifts
CVLinkedIn
Hung DO-DUY - Spikenet
Xiaoli_Ma_developer_resume
Cheng_Wang_resume
詹剑锋:Big databench—benchmarking big data systems
Zejia_CV_final
Ad

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Introduction to the R Programming Language
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Managing Community Partner Relationships
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
modul_python (1).pptx for professional and student
PPT
Predictive modeling basics in data cleaning process
IBA_Chapter_11_Slides_Final_Accessible.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
climate analysis of Dhaka ,Banglades.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to the R Programming Language
Reliability_Chapter_ presentation 1221.5784
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Clinical guidelines as a resource for EBP(1).pdf
Business Analytics and business intelligence.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Knowledge Engineering Part 1
Managing Community Partner Relationships
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
modul_python (1).pptx for professional and student
Predictive modeling basics in data cleaning process

umeng analytical arch

  • 1. Zhang Yan 2014.02 Umeng Analytical Arch
  • 2. Outlines ● what are we doing ● how are we doing ● real-time system ● batch system ● lambda arch ● future & challenge
  • 3. What are we doing ● founded on 2010.4 ● focus on mobile app analytics (and share) ● 100k apps, 600 million devices ● 1PB data, 1TB increment per day ● 1 billion messages per day ● real-time: 15 nodes, 60k QPS (data skew) ● batch: 100 nodes, 500+ jobs
  • 4. Real-time Processing System(Thunder) How are we doing Batch Processing System(Iceberg) SDK SDK SDK torrent HDFS HBase MapReduce Jobs kvproxy Front End & Web nginx nginx nginx finagle log finagle log Kafka MongoDB Storm
  • 5. Real-time system Real-time Processing System(Thunder) nginx nginx nginx finagle server finagle server Kafka MongoDB Storm Real-time Processing System(Thunder) nginx nginx nginx ruby server ruby server Resque MongoDB ruby worker Evolve ... To ….
  • 6. Real-time system(cont.) ● technology we use ○ java, scala, ruby ○ nginx, resque, mongodb ○ finagle, kafka, storm ● challenge we face ○ data skew ○ mongodb ■ global lock is bad. ■ table lock, but still good not enough.
  • 7. Batch system Batch Processing System(Iceberg) torrent HDFS HBase MapReduce Jobs kvproxy finagle finagle log log But we wish it could be ... Batch Processing System(Iceberg) Kafka HDFS HBase MapReduce Jobs kvproxy Kafka Mirror
  • 8. Batch system(cont.) ● technology we use ○ java, scala, python, c/c++ ○ hadoop, hdfs, hbase, pig, hive, opentsdb ● challenge we face ○ data processing in a uniform way ○ optimize hbase usage(hash prefixed key, bulk load) ○ big amount of precomputation(HyperLogLog) ○ debugging, deployment, measurement etc.
  • 9. Lambda arch ● merge real-time and batch ● complete each other ● real-time ○ fast, imprecise, narrow scope ● batch ○ slow, accurate, wide scope
  • 10. Lambda Arch(cont.) Real-time(Thunder) Data Batch(Iceberg) Data MongoDB HBase kvproxy Front End & Web Index Selector (not implemented yet) eg. 1. recently 5min DAU and Installations 2. recently 1 months DAU and Installations
  • 11. Future & Challenge ● just-in-time computation ○ druid, impala, spark etc. ● unify real-time & batch processing in one piece of code. ● data availability(internal & external)