SlideShare a Scribd company logo
Flink	at	Alibaba
Haitao	Wang
About me
• 2017.3 – present Alibaba Compute Platform
• 2012 – 2017.3 Microsoft	China
• Spark	as	a	Service,	with	a	focus	on	streaming
• Skype	for	Business	data
• 2000	– 2012	Microsoft	US
• SQL	Server	engine
• Speech	API	&	Speech	Server
2
About Alibaba
About Alibaba Data
EBs Total PBs Everyday 100Ms Events/secTs Events/Day
Web	Tier
DB	Tier MQ
DataHub
Data	Pipeline HBase Dashboard
Exactly-Once
Lots	of	Events
Sub	second	
latency
Why Streaming?
Highly	Available
Challenges of Data Infra at Alibaba
Lots of JobsTons of Data
Exactly-OnceComplex Logic
Thousands of Machines Strict SLA
Low LatencyHigh Throughput
Introducing Alibaba Blink
+ = Blink
Apache Flink Alibaba’s Improvements Alibaba Blink
Blink numbers
+ = Blink
Apache Flink Alibaba’s Improvements Alibaba Blink
Unprecedented	scale	on	2017-11-11
472M	events	per	second 10s	of	milliseconds	latency Accurate results
Improvements in Blink Runtime
Async IO Increment CP Process & Deployment Metric
Declarative Optimizable Understandable
Why SQL?
Stable Unify
Dynamic Table
Stream	Data
Blink
Batch	Data
Dynamic
Table
Continuous
Query
Dynamic
Table
Stream	Data
Batch	Data
Stream Job
Batch Job
SQL Improvements
UDF/UDTF/UDAF Stream JOIN, etc. Retraction
Window AGG DML: INSERT etc. DDL
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
2016
Scalability & Reliability
2017
Productivity
Use	Case	— Real-time	A/B	Test	(Analytics)
Parser
Parser
Parser
Filter
Filter
Filter
Join UDF Agg
Impression
Click
Transaction
Druid
Online Logs
Use	Case	— Search	Index	Build	&	Update
HBase ClusterMysql Cluster
Mysql
IC2
Mysql
IC3
Mysql
IC1
Mysql
UIC2
Mysql
UIC3
Mysql
UIC1
HBase
IC
HBase
UIC
HBase
Result
Engine
search
Sync Join Export
Use	Case	— Online	Machine	Learning
Online Logs
HDFS
Feature
Feature
Compute
Online
Learning Model
Export Engine
online
Flink Advantages
• Event time / processing time
• Watermark, window, trigger
• Temporal join, retraction, …
Streaming semantics
• Managed state
• Exactly once
• Distributed checkpoints
• Queryable stable
Fault tolerant stateful processing at scale
• Process Functions, DataStream	/	DataSet, Table API	/	SQL
• Streaming,	batch,	event	driven	apps,	CEP,	ML,	graph		
Rich programming models and APIs
Blink Focuses
• Single job 10K+ parallelism, 10s TB state size
Scale
• Efficient resource utilization (FLIP-6)
• Runtime improvements
• Metrics & monitoring
Performance, costs and SLA
• SQL
• Platform	– develop,	debug,	deploy,	monitor,	migration
• Connectors
Productivity
Blink	Ecosystem in Alibaba
Cluster Resource	Management
Search
Storage
StreamCompute	Platform
Blink
Alibaba Apps Recommendation BI Lots	more
DataStream API
Runtime Engine
Ads
DataSet API
SQL	&	Table	API
Machine	Learning	Platform
Blink	&	Flink
YARN (Resource Management)
HDFS (Persistent Storage)
Blink Runtime
YARN App	Master
Resource Manager
Job Manager
Task Manager
Tasks
Rocksdb State Backend
Web Monitor
Flink Client
Alibaba Data Lake
Submit Job
Launch	AM Request TM Launch	TM
Metrics
Apache Flink Alibaba Blink
Alibaba Monitor	SystemMetric Reporter
Connectors
Read/Write
Checkpoint Incrementally
Debug
Task Scheduling
Checkpoint
Coordination
Examples of recent Blink runtime optimizations
• Credit-based	network	stack
• Dynamic	load	balancing
• Improved	check-pointing	for	large	scale	jobs
• Some	of	these	work	has	been	contributed	to	Flink	and	will	be	released	
in	flink-1.5
Some	New Flink	Features
• Incremental Checkpoints
• Fine grained recovery
Flink	1.3
• Queryable	states	improvements
• Table	API	&	SQL	enhancements
Flink	1.4
• FLIP-6
• Network	stack	improvements
• State replication
• Eager	State	Declaration
• State	evolution
Flink	1.5+
Evolution of Large State Handling & Recovery
24
G
H
C
D
Full Checkpoints
25
Checkpoint 1 Checkpoint 2 Checkpoint 3
I
E
A
B
C
D
A
B
C
D
A
F
C
D
E
@t1 @t2 @t3
A
F
C
D
E
G
H
C
D
I
E
G
H
C
D
Incremental Checkpoints
26
Checkpoint 1 Checkpoint 2 Checkpoint 3
I
E
A
B
C
D
A
B
C
D
A
F
C
D
E
E
F
G
H
I
@t1 @t2 @t3
Incremental Checkpoints
27
Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4
C1 C3C1 C1
Chunk
1
Chunk
2
Chunk
3
Chunk
4
Storage
C2 C4C3
Network	Stack	Improvements
• Removal	of	redundant	copy	operations
• Event	driven	network	transfer
• Removal	of	artificial	latency	source
• Introduction	of	flow	control
• Better	control	of	back	pressure
28
State	Replication
• Decouple	state	from	Tasks
• Faster	recovery	in	case	of	Task	failure
• Replicate	state	between	Task	Managers
• Faster	failure	recovery	in	case	of	machine	failure
• High	throughput	queryable	state
29
Thank	You!
We	are	hiring…
Apache Flink @ Alibaba - Seattle Apache Flink Meetup

More Related Content

PDF
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
PPTX
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
PDF
Stream processing with Apache Flink @ OfferUp
PDF
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
PDF
Maximilian Michels - Flink and Beam
PDF
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
PDF
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Stream processing with Apache Flink @ OfferUp
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Maximilian Michels - Flink and Beam
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...

What's hot (20)

PDF
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
PPTX
Stream Processing Live Traffic Data with Kafka Streams
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PPTX
Robust Stream Processing with Apache Flink
PDF
How Disney+ uses fast data ubiquity to improve the customer experience
PPTX
Neo4j Graph Streaming Services with Apache Kafka
PDF
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
PDF
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
PDF
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
PDF
Jamie Grier - Robust Stream Processing with Apache Flink
PPTX
Robust Stream Processing With Apache Flink
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
PDF
Achieving end-to-end visibility into complex event-sourcing transactions usin...
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PPTX
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
PDF
Continuous SQL with Apache Streaming (FLaNK and FLiP)
PDF
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
PDF
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
PDF
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
PDF
Kafka for Real-Time Event Processing in Serverless Environments
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
Stream Processing Live Traffic Data with Kafka Streams
Taking a look under the hood of Apache Flink's relational APIs.
Robust Stream Processing with Apache Flink
How Disney+ uses fast data ubiquity to improve the customer experience
Neo4j Graph Streaming Services with Apache Kafka
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Jamie Grier - Robust Stream Processing with Apache Flink
Robust Stream Processing With Apache Flink
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Achieving end-to-end visibility into complex event-sourcing transactions usin...
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Kafka for Real-Time Event Processing in Serverless Environments
Ad

Similar to Apache Flink @ Alibaba - Seattle Apache Flink Meetup (20)

PDF
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
PDF
Big Data Analytics Platforms by KTH and RISE SICS
PDF
Cloud-native Semantic Layer on Data Lake
PPTX
LeedsSharp May 2023 - Azure Integration Services
PDF
Five Early Challenges Of Building Streaming Fast Data Applications
PDF
AI made easy with Flink AI Flow
PPTX
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
PPTX
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
PPTX
2022 02 Integration Bootcamp
PPTX
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
PPTX
How Totango uses Apache Spark
PDF
Santander Stream Processing with Apache Flink
PDF
XStream: stream processing platform at facebook
PDF
Elado development capablities
PPTX
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
PDF
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
PPTX
Apache Flink: Past, Present and Future
PPTX
The CSV File Strikes Back
PDF
Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...
PDF
Script details
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
Big Data Analytics Platforms by KTH and RISE SICS
Cloud-native Semantic Layer on Data Lake
LeedsSharp May 2023 - Azure Integration Services
Five Early Challenges Of Building Streaming Fast Data Applications
AI made easy with Flink AI Flow
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
2022 02 Integration Bootcamp
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
How Totango uses Apache Spark
Santander Stream Processing with Apache Flink
XStream: stream processing platform at facebook
Elado development capablities
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Apache Flink: Past, Present and Future
The CSV File Strikes Back
Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...
Script details
Ad

More from Bowen Li (14)

PDF
Flink and Hive integration - unifying enterprise data processing systems
PDF
Apache Flink 101 - the rise of stream processing and beyond
PDF
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
PDF
How to contribute to Apache Flink @ Seattle Flink meetup
PDF
Community update on flink 1.9 and How to Contribute to Flink
PDF
Integrating Flink with Hive - Flink Forward SF 2019
PDF
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
PDF
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
PDF
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
PDF
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
PDF
Status Update of Seattle Flink Meetup, Jun 2018
PDF
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
PDF
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
PDF
Opening - Seattle Apache Flink Meetup
Flink and Hive integration - unifying enterprise data processing systems
Apache Flink 101 - the rise of stream processing and beyond
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
How to contribute to Apache Flink @ Seattle Flink meetup
Community update on flink 1.9 and How to Contribute to Flink
Integrating Flink with Hive - Flink Forward SF 2019
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
Community and Meetup Update, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Status Update of Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Opening - Seattle Apache Flink Meetup

Recently uploaded (20)

DOCX
573137875-Attendance-Management-System-original
PPTX
Sustainable Sites - Green Building Construction
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT
Mechanical Engineering MATERIALS Selection
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Digital Logic Computer Design lecture notes
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
additive manufacturing of ss316l using mig welding
PPTX
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
573137875-Attendance-Management-System-original
Sustainable Sites - Green Building Construction
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Geodesy 1.pptx...............................................
CYBER-CRIMES AND SECURITY A guide to understanding
Lecture Notes Electrical Wiring System Components
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mechanical Engineering MATERIALS Selection
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Digital Logic Computer Design lecture notes
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
additive manufacturing of ss316l using mig welding
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Fluid Mechanics, Module 3: Basics of Fluid Mechanics

Apache Flink @ Alibaba - Seattle Apache Flink Meetup