SlideShare a Scribd company logo
FLINK IN
ZALANDO’S
WORLD OF
MICROSERVICES
JAVIER LOPEZ
MIHAIL VIERU
12-09-2016
2
AGENDA
● Zalando’s Microservices Architecture
● Saiki - Data Integration and Distribution at Scale
● Flink in a Microservices World
● Stream Processing Use Cases:
o Business Process Monitoring
o Continuous ETL
● Future Work
3
ABOUT US
Mihail Vieru
Big Data Engineer,
Business Intelligence
Javier López
Big Data Engineer,
Business Intelligence
4
5
One of Europe's largest online fashion retailers
15 countries
~19 million active customers
~3 billion € revenue 2015
1,500 brands
150,000+ products
11,000+ employees in Europe
6
ZALANDO TECHNOLOGY
1300+ TECHNOLOGISTS
Rapidly growing
international team
http://tech.zalando.com
VINTAGE ARCHITECTURE
8
VINTAGE BUSINESS INTELLIGENCE
Classical ETL process
Business
Logic
Data Warehouse (DWH)
DatabaseDBA
BI
Business
Logic
Database
Business
Logic
Database
Business
Logic
Database
Dev
9
VINTAGE BUSINESS INTELLIGENCE
DWH Oracle
Exasol
RADICAL AGILITY
11
RADICAL AGILITY
AUTONOMY
MASTERY
PURPOSE
12
RADICAL AGILITY - AUTONOMY
Technologies Operations Teams
13
SUPPORTING AUTONOMY: MICROSERVICES
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
Business
Logic
Database
RESTAPI
14
SUPPORTING AUTONOMY: MICROSERVICES
Business
Logic
Database
Team A
Business
Logic
Database
Team B
RESTAPI
RESTAPI
public Internet
Applications communicate using REST APIs
Databases hidden behind the walls of AWS VPC
15
SUPPORTING AUTONOMY: MICROSERVICES
Business
Logic
Database
Team A
Business
Logic
Database
Team B
RESTAPI
RESTAPI
public Internet
Classical ETL process is impossible!
16
SUPPORTING AUTONOMY: MICROSERVICES
Business
Logic
Database
RESTAPI
AppA
Business
Logic
Database
RESTAPI
AppB
Business
Logic
Database
RESTAPI
AppC
Business
Logic
Database
RESTAPI
AppD
Business Intelligence
SAIKI
18
SAIKI DATA PLATFORM
SAIKI
App A App B App DApp C BI
Data Warehouse
19
SAIKI — DATA INTEGRATION & DISTRIBUTION
BI
Data Warehouse E.g. Forecast DB
SAIKI
App A App B App DApp C
Exporter
REST API
Stream Processing
via Apache Flink Data Lake .
AWS S3
20
SAIKI — SUMMARY
Δ J
D
B
C
REST
B
E
F
O
R
E
A
F
T
E
R
Data sources
Technologies
Data sources
Connections
Data sources
Extraction
Data
Delivery
FLINK IN A
MICROSERVICES WORLD
22
OPPORTUNITIES FOR NEXT GEN BI
Cloud Computing
- Distributed ETL
- Scale
Access to Real Time Data
- All teams publish data to central event
bus
Hub for Data Teams
- Data Lake provides distributed access
and fine grained security
- Data can be transformed (aggregated,
joined, etc.) before delivering it to data
teams
Semi-Structured Data
“General-purpose data processing engines
like Flink or Spark let you define own data
types and functions.”
- Fabian Hueske,
dataArtisans
23
THE RIGHT FIT
STREAM PROCESSING
24
THE RIGHT FIT — STREAM PROCESSING ENGINE
Candidates:
Storm & Samza ruled out because of batch processing
requirement
25
26
SPARK VS. FLINK DIFFERENCES
Feature Apache Spark 1.5.2 Apache Flink 0.10.1
Processing mode micro-batching tuple at a time
Temporal processing support processing time event time, ingestion time,
processing time
Latency seconds sub-second
Back pressure handling manual configuration implicit, through system
architecture
State access full state scan for each microbatch value lookup by key
Operator library neutral ++ (split, windowByCount..)
Support neutral ++ (mailing list, direct contact &
support from data Artisans)
27
APACHE FLINK
• true stream processing framework
• process events at a consistently high rate with low
latency
• scalable
• great community and on-site support from Berlin/
Europe
• university graduates with Flink skills
https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/
28
FLINK ON AWS - OUR APPLIANCE
MASTER ELB
EC2
Docker
Flink Master
EC2
Docker
Flink Shadow Master
WORKERS ELB
EC2
Docker
Flink Worker
EC2
Docker
Flink Worker
EC2
Docker
Flink Worker
USE CASES
BUSINESS PROCESS
MONITORING
31
BUSINESS PROCESS
A business process is in its simplest form a chain of
correlated events:
start event completion event
ORDER_CREATE
D
ALL_PARCELS_SHIPPED
Business Events from the whole Zalando platform flow through
Saiki => opportunity to process those streams in near real time
32
REAL-TIME BUSINESS PROCESS MONITORING
• Check if business processes in the Zalando platform work
• Analyze data on the fly:
o Order velocities
o Delivery velocities
o Control SLAs of correlated events, e.g. parcel sent out
after order
33
Saiki BPM
ARCHITECTURE BPM
Cfg Service
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log
PUBLICINTERNET
OAUTH
Alert Svc
UI
Elasticsearch
Stream Processing
34
HOW WE USE FLINK IN BPM
• 1000+ Event Types; 1 Event Type -> 1 Kafka topic
• Analyze processes with correlated event types (Join &
Union)
• Enrich data based on business rules
• Sliding Windows (1min to 48hrs) for Platform Snapshots
• State for alert metadata
• Generation and processing of Complex Events (CEP lib)
STREAMING ETL
36
Extract Transform Load (ETL)
Traditional ETL process:
• Batch processing
• No real time
• ETL tools
• Heavy processing on the storage side
37
WHAT CHANGED WITH RADICAL AGILITY?
• Data comes in a semi-structured format (JSON payload)
• Data is distributed in separate Kafka topics
• There would be peak times, meaning that the data flow
will increase by several factors
• Data sources number increased by several factors
38
`
Saiki Streaming ETL
ARCHITECTURE STREAMING ETL
Stream Processing
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log Exporter
Oracle DWH
Importer
39
HOW WE (WOULD) USE FLINK IN STREAMING ETL
• Transformation of complex payloads into simple ones for
easier consumption in Oracle DWH
• Combine several topics based on Business Rules (Union,
Join)
• Pre-Aggregate data to improve performance in the
generation of reports (Windows, State)
• Data cleansing
• Data validation
FUTURE USE CASES
41
COMPLEX EVENT PROCESSING FOR BPM
Cont. example business process:
• Multiple PARCEL_SHIPPED events per order
• Generate complex event ALL_PARCELS_SHIPPED,
when all PARCEL_SHIPPED events received
(CEP lib, State)
42
DEPLOYMENTS FROM OTHER BI TEAMS
Flink Jobs from other BI Teams
Requirements:
• manage and control deployments
• isolation of data flows
o prevent different jobs from writing to the same sink
• resource management in Flink
o share cluster resources among concurrently running jobs
StreamSQL would significantly lower the entry barrier
43
REPLACE KAFKA2KAFKA COMPONENT
• Python app
• extracts events from REST API Nakadi Event Bus
• writes them to our Kafka cluster
Idea: Create Nakadi consumer/producer to enable stream
processing with Flink to other internal users
(first POC done)
44
OTHER FUTURE TOPICS
• New use cases for Real Time Analytics/ BI
o Sales monitoring
o Price monitoring
• Fraud detection for payments (evaluation)
• Contact customer according to variable event pattern
(evaluation)
45
CONCLUSION
Flink proved to be the right fit for our current stream
processing use cases. It enables us to build Zalando’s Next
Gen BI platform.
https://tech.zalando.de/blog/?tags=Saiki
THANK YOU

More Related Content

PDF
Maximilian Michels - Flink and Beam
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PDF
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PDF
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
PDF
Apache Flink 101 - the rise of stream processing and beyond
PDF
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
PDF
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Maximilian Michels - Flink and Beam
Taking a look under the hood of Apache Flink's relational APIs.
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Apache Flink 101 - the rise of stream processing and beyond
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...

What's hot (20)

PDF
Stream Processing with Apache Flink
PPTX
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
PDF
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
PPTX
High cardinality time series search: A new level of scale - Data Day Texas 2016
PDF
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
PPTX
Streaming in the Wild with Apache Flink
PPTX
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
PDF
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
PPTX
Fabian Hueske – Cascading on Flink
PDF
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
PPTX
The Past, Present, and Future of Apache Flink®
PDF
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
PDF
Flink at netflix paypal speaker series
PPTX
Portable Streaming Pipelines with Apache Beam
PPTX
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
PPTX
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
PDF
Python web conference 2022 apache pulsar development 101 with python (f li-...
PPTX
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
PPTX
Introduction to Streaming Distributed Processing with Storm
Stream Processing with Apache Flink
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
High cardinality time series search: A new level of scale - Data Day Texas 2016
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Streaming in the Wild with Apache Flink
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Fabian Hueske – Cascading on Flink
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
The Past, Present, and Future of Apache Flink®
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink at netflix paypal speaker series
Portable Streaming Pipelines with Apache Beam
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Python web conference 2022 apache pulsar development 101 with python (f li-...
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Introduction to Streaming Distributed Processing with Storm
Ad

Viewers also liked (20)

PPTX
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
PDF
Automatic Detection of Web Trackers by Vasia Kalavri
PDF
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
PDF
Alexander Kolb - Flinkspector – Taming the squirrel
PDF
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
PDF
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
PPTX
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
PPTX
Ted Dunning-Faster and Furiouser- Flink Drift
PPTX
Eron Wright - Introducing Flink on Mesos
PDF
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
PDF
Julian Hyde - Streaming SQL
PPTX
Ted Dunning - Keynote: How Can We Take Flink Forward?
PDF
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
PDF
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
PPTX
Eron Wright - Flink Security Enhancements
PPTX
Aljoscha Krettek - The Future of Apache Flink
PDF
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

PDF
Jamie Grier - Robust Stream Processing with Apache Flink
PPTX
Kamal Hakimzadeh – Reproducible Distributed Experiments
PDF
Ufuc Celebi – Stream & Batch Processing in one System
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Automatic Detection of Web Trackers by Vasia Kalavri
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Alexander Kolb - Flinkspector – Taming the squirrel
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Ted Dunning-Faster and Furiouser- Flink Drift
Eron Wright - Introducing Flink on Mesos
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Julian Hyde - Streaming SQL
Ted Dunning - Keynote: How Can We Take Flink Forward?
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Eron Wright - Flink Security Enhancements
Aljoscha Krettek - The Future of Apache Flink
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Jamie Grier - Robust Stream Processing with Apache Flink
Kamal Hakimzadeh – Reproducible Distributed Experiments
Ufuc Celebi – Stream & Batch Processing in one System
Ad

Similar to Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink Forward (20)

PDF
Flink in Zalando's world of Microservices
PDF
Flink in Zalando's World of Microservices
PDF
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
PPTX
Streaming in the Wild with Apache Flink
PDF
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
PDF
Santander Stream Processing with Apache Flink
PPTX
Data Stream Processing with Apache Flink
PPTX
Flink history, roadmap and vision
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
PPTX
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
PPTX
Apache Flink: Past, Present and Future
PPTX
Enhancing AI-Driven User Engagement with Real-Time Data Streaming via Flink.pptx
PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
PDF
Apache Flink
PPTX
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
PPTX
Apache Flink: Real-World Use Cases for Streaming Analytics
PPTX
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
PDF
Don't Cross The Streams - Data Streaming And Apache Flink
Flink in Zalando's world of Microservices
Flink in Zalando's World of Microservices
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Streaming in the Wild with Apache Flink
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Santander Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Flink history, roadmap and vision
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Apache Flink: Past, Present and Future
Enhancing AI-Driven User Engagement with Real-Time Data Streaming via Flink.pptx
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Why apache Flink is the 4G of Big Data Analytics Frameworks
Apache Flink
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Apache Flink: Real-World Use Cases for Streaming Analytics
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Don't Cross The Streams - Data Streaming And Apache Flink

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to machine learning and Linear Models
Taxes Foundatisdcsdcsdon Certificate.pdf
Introduction-to-Cloud-ComputingFinal.pptx
Reliability_Chapter_ presentation 1221.5784
Miokarditis (Inflamasi pada Otot Jantung)
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Mega Projects Data Mega Projects Data
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Business Acumen Training GuidePresentation.pptx
climate analysis of Dhaka ,Banglades.pptx

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink Forward