SlideShare a Scribd company logo
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
1
2
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
● Introduction
● Old Architecture
● New Architecture
● Decoupling
● Streaming
● Conclusion
3
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
● Legacy Java Process
○ “Crunches” data
○ Sends data downstream to our own datastores and to 3rd party
analytics
○ Runs every hour
● Growth
○ Process can run over an hour
○ 12 GB -> 24GB heap in less than 1 year
○ Cron is a horrible job management system
○ A failure requires rerunning a job from the beginning
● 2.0
○ Horizontably scalable
○ Real Time ETL
○ Reuesable
4
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
ETL @ Vungle
● ~1 Billion Events / Day
● Deduplication
● Calculating $$$
● Outputting data to various destinations
5
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Old Architecture
6
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
7
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
8
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
9
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
10
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
11
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
12
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
13
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
14
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
New Architecture
15
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
16
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
17
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
18
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
19
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
20
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
21
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
22
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Decoupling
23
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
24
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
25
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
26
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
27
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
28
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
29
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
30
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
31
Introduction Problem Decoupling Streaming Conclusion
Setup connection and spark streams
Map each line of log into Mongo Objects
and insert into mongo
32
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Setup connection and spark streams
33
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Mapping to Mongo objects and insertions
34
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Questions
35
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Streaming
36
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
37
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
38
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
39
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Ingestion
40
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Event ID Request View Install ... Request
Added
View
Added
Install
Added
Value
Ingestion Table Schema
41
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
... Date Time Deliveries Views Installs Processed
Deliveries
Processed
Views
Processed
Installs
Fact Table Schema
42
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Ingestion
43
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
44
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
45
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
46
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
47
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
48
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
49
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Process
50
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
51
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
52
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
53
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
54
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
55
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Next Steps
● Switching from JSON to ProtoBuf
● Using YARN to run multiple jobs on one cluster
● Data Science
● Who knows?
56
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Questions
Thank you!
57

More Related Content

PDF
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
PDF
Maximilian Michels - Flink and Beam
PPTX
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
PDF
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
PDF
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
PDF
Matching the Scale at Tinder with Kafka
PDF
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
PPTX
Robust Stream Processing with Apache Flink
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Maximilian Michels - Flink and Beam
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Matching the Scale at Tinder with Kafka
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Robust Stream Processing with Apache Flink

What's hot (20)

PPTX
Counting Elements in Streams
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PDF
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
PDF
Jamie Grier - Robust Stream Processing with Apache Flink
PDF
Flink in Zalando's world of Microservices
PDF
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
PDF
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
PPTX
Streaming in the Wild with Apache Flink
PDF
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
PPTX
Apache Flink Community Updates November 2016 @ Berlin Meetup
PPTX
Robust Stream Processing With Apache Flink
PDF
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
PPTX
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
PDF
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
PDF
Bitsy graph database
PDF
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
PPTX
The Past, Present, and Future of Apache Flink®
PDF
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
PDF
Achieving end-to-end visibility into complex event-sourcing transactions usin...
PDF
Apache Beam @ GCPUG.TW Flink.TW 20161006
Counting Elements in Streams
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Jamie Grier - Robust Stream Processing with Apache Flink
Flink in Zalando's world of Microservices
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Streaming in the Wild with Apache Flink
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robust Stream Processing With Apache Flink
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Bitsy graph database
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
The Past, Present, and Future of Apache Flink®
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Achieving end-to-end visibility into complex event-sourcing transactions usin...
Apache Beam @ GCPUG.TW Flink.TW 20161006
Ad

Viewers also liked (16)

PPTX
China for the Win! What Publishers Need to Know to Succeed in this Emerging M...
PDF
Woundary 서비스 활용안 vine 130527_석혜윤
PDF
Going the extra mile on social media: moving from 1.0 to 2.0
PPTX
Designed to Win: How to Monetize Users and Enhance Experience in Your Game
PDF
Product (Experience) Management
PDF
Genius Strategies for Engaging Followers through Social Media
PDF
Mobile Recruiting Best Practices
PDF
Digital Transformation of the Channel
PPTX
Social media around the world 2011
PDF
Node.js and The Internet of Things
PDF
Satyapriya rajguru: Every day, in one way or another.
PDF
2015 US Global Mobile Consumer Survey
PPT
THIRST
PDF
2016 Digital Yearbook
PPTX
Connecting With the Disconnected
PDF
How to Become a Thought Leader in Your Niche
China for the Win! What Publishers Need to Know to Succeed in this Emerging M...
Woundary 서비스 활용안 vine 130527_석혜윤
Going the extra mile on social media: moving from 1.0 to 2.0
Designed to Win: How to Monetize Users and Enhance Experience in Your Game
Product (Experience) Management
Genius Strategies for Engaging Followers through Social Media
Mobile Recruiting Best Practices
Digital Transformation of the Channel
Social media around the world 2011
Node.js and The Internet of Things
Satyapriya rajguru: Every day, in one way or another.
2015 US Global Mobile Consumer Survey
THIRST
2016 Digital Yearbook
Connecting With the Disconnected
How to Become a Thought Leader in Your Niche
Ad

Similar to Using Spark at Vungle (20)

PPTX
AWS as platform for scalable applications
PPTX
Software architecture for data applications
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
PDF
TechShift: There’s light beyond LAMP
PPTX
2014 09-12 lambda-architecture-at-indix
PPTX
Webinar: An Enterprise Architect’s View of MongoDB
PDF
Building a Business Logic Translation Engine with Spark Streaming for Communi...
PDF
Building Big Data Streaming Architectures
PPTX
MongoDB.local Austin 2018: PetroCloud: MongoDB for the Industrial IOT Ecosystem
PPT
UnConference for Georgia Southern Computer Science March 31, 2015
PDF
Building data pipelines at Shopee with DEC
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PDF
Hadoop Ecosystem and Low Latency Streaming Architecture
PPTX
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
PDF
Stream Processing Handson With Apache Flink Giannis Polyzos
PDF
Architecting Modern Data Platforms Jan Kunigk Ian Buss Paul Wilkinson
PPTX
Python Ireland Conference 2016 - Python and MongoDB Workshop
PPTX
Enabling independent teams by creating decoupled data flows
PDF
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
PDF
Data Architecture at Vente-Exclusive.com - TOTM Exellys
AWS as platform for scalable applications
Software architecture for data applications
Streaming Analytics with Spark, Kafka, Cassandra and Akka
TechShift: There’s light beyond LAMP
2014 09-12 lambda-architecture-at-indix
Webinar: An Enterprise Architect’s View of MongoDB
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building Big Data Streaming Architectures
MongoDB.local Austin 2018: PetroCloud: MongoDB for the Industrial IOT Ecosystem
UnConference for Georgia Southern Computer Science March 31, 2015
Building data pipelines at Shopee with DEC
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Hadoop Ecosystem and Low Latency Streaming Architecture
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Stream Processing Handson With Apache Flink Giannis Polyzos
Architecting Modern Data Platforms Jan Kunigk Ian Buss Paul Wilkinson
Python Ireland Conference 2016 - Python and MongoDB Workshop
Enabling independent teams by creating decoupled data flows
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
Data Architecture at Vente-Exclusive.com - TOTM Exellys

Recently uploaded (6)

PPTX
ASMS Telecommunication company Profile
PDF
6-UseCfgfhgfhgfhgfhgfhfhhaseActivity.pdf
DOC
证书学历UoA毕业证,澳大利亚中汇学院毕业证国外大学毕业证
DOC
Camb毕业证学历认证,格罗斯泰斯特主教大学毕业证仿冒文凭毕业证
PDF
heheheueueyeyeyegehehehhehshMedia-Literacy.pdf
PDF
Lesson 13- HEREDITY _ pedSAWEREGFVCXZDSASEWFigree.pdf
ASMS Telecommunication company Profile
6-UseCfgfhgfhgfhgfhgfhfhhaseActivity.pdf
证书学历UoA毕业证,澳大利亚中汇学院毕业证国外大学毕业证
Camb毕业证学历认证,格罗斯泰斯特主教大学毕业证仿冒文凭毕业证
heheheueueyeyeyegehehehhehshMedia-Literacy.pdf
Lesson 13- HEREDITY _ pedSAWEREGFVCXZDSASEWFigree.pdf

Using Spark at Vungle