SlideShare a Scribd company logo
Apache Flink
Deep Dive
Vasia Kalavri
Flink Committer & KTH PhD student
vasia@apache.org
1st Apache Flink Meetup Stockholm
May 11, 2015
Flink Internals
● Job Life-Cycle
○ what happens after you submit a Flink job?
● The Batch Optimizer
○ how are execution plans chosen?
● Delta Iterations
○ how are Flink iterations special for Graph and ML
apps?
2
what happens after you
submit a Flink job?
The Flink Stack
Python
Gelly
Table
FlinkML
SAMOA
Batch Optimizer
DataSet (Java/Scala) DataStream (Java/Scala)Hadoop
M/R
Flink Runtime
Local Remote Yarn Tez Embedded
Dataflow
*current Flink master + few PRs
Streaming Optimizer
4
DataSet<String> text = env.readTextFile(input);
DataSet<Tuple2<String, Integer>> result = text
.flatMap((str, out) -> {
for (String token : value.split("W")) {
out.collect(new Tuple2(token, 1));
})
.groupBy(0).aggregate(SUM, 1);
1
3
2
Program Life-Cycle
4
5
Task
Manager
Job
Manager
Task
Manager
Flink Client &
Optimizer
DataSet<String> text = env.readTextFile(input);
DataSet<Tuple2<String, Integer>> result = text
.flatMap((str, out) -> {
for (String token : value.split("W")) {
out.collect(new Tuple2(token, 1));
})
.groupBy(0).aggregate(SUM, 1);
O Romeo,
Romeo,
wherefore art
thou Romeo?
O, 1
Romeo, 3
wherefore, 1
art, 1
thou, 1
6
Nor arm, nor
face, nor any
other part
nor, 3
arm, 1
face, 1,
any, 1,
other, 1
part, 1
creates and submits
the job graph
creates the execution
graph and deploys tasks
execute tasks and send
status updates
Input First SecondX Y
Operator X Operator Y
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> input = env.readTextFile(input);
DataSet<String> first = input.filter (str -> str.contains(“Apache Flink“));
DataSet<String> second = first.filter (str -> str.length() > 40);
second.print()
env.execute();
Series of Transformations
7
DataSet Abstraction
Think of it as a collection of data elements that can be
produced/recovered in several ways:
… like a Java collection
… like an RDD
… perhaps it is never fully materialized (because the program does not
need it to)
… implicitly updated in an iteration
→ this is transparent to the user
8
Romeo,
Romeo,
where art
thou Romeo?
Load Log
Search
for str1
Search
for str2
Search
for str3
Grep 1
Grep 2
Grep 3
Example: grep
9
Romeo,
Romeo,
where art
thou Romeo?
Load Log
Search
for str1
Search
for str2
Search
for str3
Grep 1
Grep 2
Grep 3
Stage 1:
Create/cache Log
Subsequent stages:
Grep log for matches
Caching in-memory
and disk if needed
Staged (batch) execution
10
Romeo,
Romeo,
where art
thou Romeo?
Load Log
Search
for str1
Search
for str2
Search
for str3
Grep 1
Grep 2
Grep 3
001100110011001100110011
Stage 1:
Deploy and start operators
Data transfer in-memory
and disk if needed
Note: Log
DataSet is
never
“created”!
Pipelined execution
11
12
how are execution plans
chosen?
Flink Batch Optimizer
Inspired by database optimizers, it creates and
selects the execution plan for a user program
14
DataSet<Tuple5<Integer, String, String, String, Integer>> orders = …
DataSet<Tuple2<Integer, Double>> lineitems = …
DataSet<Tuple2<Integer, Integer>> filteredOrders = orders
.filter(. . .)
.project(0,4).types(Integer.class, Integer.class);
DataSet<Tuple3<Integer, Integer, Double>> lineitemsOfOrders = filteredOrders
.join(lineitems)
.where(0).equalTo(0)
.projectFirst(0,1).projectSecond(1)
.types(Integer.class, Integer.class, Double.class);
DataSet<Tuple3<Integer, Integer, Double>> priceSums = lineitemsOfOrders
.groupBy(0,1).aggregate(Aggregations.SUM, 2);
priceSums.writeAsCsv(outputPath);
A Simple Program
15
DataSource
orders.tbl
Filter
Map DataSource
lineitem.tbl
Join
Hybrid Hash
buildHT probe
broadcast forward
Combine
GroupRed
sort
DataSource
orders.tbl
Filter
Map DataSource
lineitem.tbl
Join
Hybrid Hash
buildHT probe
hash-part [0] hash-part [0]
hash-part [0,1]
GroupRed
sort
forwardBest plan
depends on
relative sizes
of input files
Alternative Execution Plans
16
17
● Evaluates physical execution strategies
○ e.g. hash-join vs. sort-merge join
● Chooses data shipping strategies
○ e.g. broadcast vs. partition
● Reuses partitioning and sort orders
● Decides to cache loop-invariant data in
iterations
Optimization Examples
18
case class PageVisit(url: String, ip: String, userId: Long)
case class User(id: Long, name: String, email: String, country: String)
// get your data from somewhere
val visits: DataSet[PageVisit] = ...
val users: DataSet[User] = ...
// filter the users data set
val germanUsers = users.filter((u) => u.country.equals("de"))
// join data sets
val germanVisits: DataSet[(PageVisit, User)] =
// equi-join condition (PageVisit.userId = User.id)
visits.join(germanUsers).where("userId").equalTo("id")
Example: Distributed Joins
The join operator needs to
create all the pairs of
elements from the two
inputs, for which the join
condition evaluates to true
19
Example: Distributed Joins
● Ship Strategy: The input data is distributed across all
parallel instances that participate in the join
● Local Strategy: Each parallel instance performs a join
algorithm on its local partition
For both steps, there are multiple valid strategies which are
favorable in different situations.
20
Repartition-Repartition Strategy
Partitions both inputs
using the same
partitioning function.
All elements that share
the same join key are
shipped to the same
parallel instance and can
be locally joined.
21
Broadcast-Forward Strategy
Sends one complete data
set to each parallel
instance that holds a
partition of the other data.
The other Dataset
remains local and is not
shipped at all.
22
The optimizer will compute cost estimates for execution
plans and will pick the “cheapest” plan:
● amount of data shipped over the the network
● if the data of one input is already partitioned
R-R Cost: Full shuffle of both data sets over the network
B-F Cost: Depends on the size of the dataset that is
broadcasted and the number of parallel instances
Read more: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html
How does the Optimizer choose?
23
how are Flink iterations
special?
● for/while loop in client submits one job per
iteration step
● Data reuse by caching in memory and/or disk
Step Step Step Step Step
Client
Iterate by unrolling
25
Native Iterations
● the runtime is aware of the iterative execution
● no scheduling overhead between iterations
● caching and state maintenance are handled automatically
Caching Loop-invariant DataPushing work
“out of the loop”
Maintain state as index
26
Flink Iteration Operators
Iterate IterateDelta
Input
Iterative
Update Function
Result
Replace
Workset
Iterative
Update Function
Result
Solution Set
State
27
Delta Iteration
● Not all the elements of the state are updated
in each iteration.
● The elements that require an update, are
stored in the workset.
● The step function is applied only to the
workset elements.
28
Partition a graph into components by iteratively
propagating the min vertex ID among neighbors
Example: Connected Components
29
Delta-Connected Components
30
31
Performance
32
Read the documentation and our blog posts!
● Memory Management
● Serialization and Type Extraction
● Streaming Optimizations
● Fault-Tolerance
Want to learn more?
33
Apache Flink
Deep Dive
Vasia Kalavri
Flink Committer & KTH PhD student
vasia@apache.org
1st Apache Flink Meetup Stockholm
May 11, 2015

More Related Content

PPTX
Flink Batch Processing and Iterations
PDF
Gelly in Apache Flink Bay Area Meetup
PPTX
Apache Flink Training: System Overview
PDF
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
PDF
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
PDF
Mikio Braun – Data flow vs. procedural programming
PDF
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
PPTX
Apache Flink: API, runtime, and project roadmap
Flink Batch Processing and Iterations
Gelly in Apache Flink Bay Area Meetup
Apache Flink Training: System Overview
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Mikio Braun – Data flow vs. procedural programming
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Apache Flink: API, runtime, and project roadmap

What's hot (20)

PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
PDF
Ufuc Celebi – Stream & Batch Processing in one System
PDF
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
PDF
Apache Flink internals
PDF
Vasia Kalavri – Training: Gelly School
PPTX
Apache Flink Training: DataStream API Part 1 Basic
PPTX
Apache Flink @ NYC Flink Meetup
PDF
Apache Flink & Graph Processing
PPTX
Apache Flink Training: DataStream API Part 2 Advanced
PPTX
Michael Häusler – Everyday flink
PDF
Flink Streaming Berlin Meetup
PPTX
Apache Flink@ Strata & Hadoop World London
PPTX
First Flink Bay Area meetup
PDF
FastR+Apache Flink
PPTX
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
PDF
Flink Gelly - Karlsruhe - June 2015
PPTX
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
PDF
Machine Learning with Apache Flink at Stockholm Machine Learning Group
PDF
Marton Balassi – Stateful Stream Processing
PPTX
Flink Streaming @BudapestData
Flink 0.10 @ Bay Area Meetup (October 2015)
Ufuc Celebi – Stream & Batch Processing in one System
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Apache Flink internals
Vasia Kalavri – Training: Gelly School
Apache Flink Training: DataStream API Part 1 Basic
Apache Flink @ NYC Flink Meetup
Apache Flink & Graph Processing
Apache Flink Training: DataStream API Part 2 Advanced
Michael Häusler – Everyday flink
Flink Streaming Berlin Meetup
Apache Flink@ Strata & Hadoop World London
First Flink Bay Area meetup
FastR+Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Flink Gelly - Karlsruhe - June 2015
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Marton Balassi – Stateful Stream Processing
Flink Streaming @BudapestData
Ad

Viewers also liked (20)

PPTX
Flink vs. Spark
PPTX
Apache Flink: Past, Present and Future
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
PDF
Towards sql for streams
PPTX
Data Analysis With Apache Flink
PDF
m2r2: A Framework for Results Materialization and Reuse
PDF
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
PDF
Big data processing systems research
PDF
Asymmetry in Large-Scale Graph Analysis, Explained
PDF
Like a Pack of Wolves: Community Structure of Web Trackers
PDF
FlinkML: Large Scale Machine Learning with Apache Flink
PDF
Unified Stream and Batch Processing with Apache Flink
PDF
The shortest path is not always a straight line
PDF
MapReduce: Optimizations, Limitations, and Open Issues
PDF
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
PDF
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
PDF
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
PDF
Batch and Stream Graph Processing with Apache Flink
PDF
A Skype case study (2011)
PPTX
Real-time Stream Processing with Apache Flink
Flink vs. Spark
Apache Flink: Past, Present and Future
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Towards sql for streams
Data Analysis With Apache Flink
m2r2: A Framework for Results Materialization and Reuse
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Big data processing systems research
Asymmetry in Large-Scale Graph Analysis, Explained
Like a Pack of Wolves: Community Structure of Web Trackers
FlinkML: Large Scale Machine Learning with Apache Flink
Unified Stream and Batch Processing with Apache Flink
The shortest path is not always a straight line
MapReduce: Optimizations, Limitations, and Open Issues
Graphs as Streams: Rethinking Graph Processing in the Streaming Era
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Batch and Stream Graph Processing with Apache Flink
A Skype case study (2011)
Real-time Stream Processing with Apache Flink
Ad

Similar to Apache Flink Deep Dive (20)

PPTX
Flink internals web
PPTX
Kapacitor - Real Time Data Processing Engine
PPTX
Interpreting the Data:Parallel Analysis with Sawzall
PDF
Taking Spark Streaming to the Next Level with Datasets and DataFrames
PPTX
Technical Overview of Apache Drill by Jacques Nadeau
PDF
[Webinar Slides] Programming the Network Dataplane in P4
PPTX
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
PDF
Handout3o
PDF
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
PDF
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
PDF
Fletcher Framework for Programming FPGA
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PDF
Cooperative Task Execution for Apache Spark
PDF
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
PDF
Near real-time anomaly detection at Lyft
PPTX
Informix Data Streaming Overview
PDF
Data Analytics and Simulation in Parallel with MATLAB*
PPTX
Introduction to Apache Flink
Flink internals web
Kapacitor - Real Time Data Processing Engine
Interpreting the Data:Parallel Analysis with Sawzall
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Technical Overview of Apache Drill by Jacques Nadeau
[Webinar Slides] Programming the Network Dataplane in P4
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Handout3o
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Fletcher Framework for Programming FPGA
K. Tzoumas & S. Ewen – Flink Forward Keynote
Cooperative Task Execution for Apache Spark
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Near real-time anomaly detection at Lyft
Informix Data Streaming Overview
Data Analytics and Simulation in Parallel with MATLAB*
Introduction to Apache Flink

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
KodekX | Application Modernization Development
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
KodekX | Application Modernization Development
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Apache Flink Deep Dive

  • 1. Apache Flink Deep Dive Vasia Kalavri Flink Committer & KTH PhD student vasia@apache.org 1st Apache Flink Meetup Stockholm May 11, 2015
  • 2. Flink Internals ● Job Life-Cycle ○ what happens after you submit a Flink job? ● The Batch Optimizer ○ how are execution plans chosen? ● Delta Iterations ○ how are Flink iterations special for Graph and ML apps? 2
  • 3. what happens after you submit a Flink job?
  • 4. The Flink Stack Python Gelly Table FlinkML SAMOA Batch Optimizer DataSet (Java/Scala) DataStream (Java/Scala)Hadoop M/R Flink Runtime Local Remote Yarn Tez Embedded Dataflow *current Flink master + few PRs Streaming Optimizer 4
  • 5. DataSet<String> text = env.readTextFile(input); DataSet<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("W")) { out.collect(new Tuple2(token, 1)); }) .groupBy(0).aggregate(SUM, 1); 1 3 2 Program Life-Cycle 4 5
  • 6. Task Manager Job Manager Task Manager Flink Client & Optimizer DataSet<String> text = env.readTextFile(input); DataSet<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("W")) { out.collect(new Tuple2(token, 1)); }) .groupBy(0).aggregate(SUM, 1); O Romeo, Romeo, wherefore art thou Romeo? O, 1 Romeo, 3 wherefore, 1 art, 1 thou, 1 6 Nor arm, nor face, nor any other part nor, 3 arm, 1 face, 1, any, 1, other, 1 part, 1 creates and submits the job graph creates the execution graph and deploys tasks execute tasks and send status updates
  • 7. Input First SecondX Y Operator X Operator Y ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<String> input = env.readTextFile(input); DataSet<String> first = input.filter (str -> str.contains(“Apache Flink“)); DataSet<String> second = first.filter (str -> str.length() > 40); second.print() env.execute(); Series of Transformations 7
  • 8. DataSet Abstraction Think of it as a collection of data elements that can be produced/recovered in several ways: … like a Java collection … like an RDD … perhaps it is never fully materialized (because the program does not need it to) … implicitly updated in an iteration → this is transparent to the user 8
  • 9. Romeo, Romeo, where art thou Romeo? Load Log Search for str1 Search for str2 Search for str3 Grep 1 Grep 2 Grep 3 Example: grep 9
  • 10. Romeo, Romeo, where art thou Romeo? Load Log Search for str1 Search for str2 Search for str3 Grep 1 Grep 2 Grep 3 Stage 1: Create/cache Log Subsequent stages: Grep log for matches Caching in-memory and disk if needed Staged (batch) execution 10
  • 11. Romeo, Romeo, where art thou Romeo? Load Log Search for str1 Search for str2 Search for str3 Grep 1 Grep 2 Grep 3 001100110011001100110011 Stage 1: Deploy and start operators Data transfer in-memory and disk if needed Note: Log DataSet is never “created”! Pipelined execution 11
  • 12. 12
  • 13. how are execution plans chosen?
  • 14. Flink Batch Optimizer Inspired by database optimizers, it creates and selects the execution plan for a user program 14
  • 15. DataSet<Tuple5<Integer, String, String, String, Integer>> orders = … DataSet<Tuple2<Integer, Double>> lineitems = … DataSet<Tuple2<Integer, Integer>> filteredOrders = orders .filter(. . .) .project(0,4).types(Integer.class, Integer.class); DataSet<Tuple3<Integer, Integer, Double>> lineitemsOfOrders = filteredOrders .join(lineitems) .where(0).equalTo(0) .projectFirst(0,1).projectSecond(1) .types(Integer.class, Integer.class, Double.class); DataSet<Tuple3<Integer, Integer, Double>> priceSums = lineitemsOfOrders .groupBy(0,1).aggregate(Aggregations.SUM, 2); priceSums.writeAsCsv(outputPath); A Simple Program 15
  • 16. DataSource orders.tbl Filter Map DataSource lineitem.tbl Join Hybrid Hash buildHT probe broadcast forward Combine GroupRed sort DataSource orders.tbl Filter Map DataSource lineitem.tbl Join Hybrid Hash buildHT probe hash-part [0] hash-part [0] hash-part [0,1] GroupRed sort forwardBest plan depends on relative sizes of input files Alternative Execution Plans 16
  • 17. 17
  • 18. ● Evaluates physical execution strategies ○ e.g. hash-join vs. sort-merge join ● Chooses data shipping strategies ○ e.g. broadcast vs. partition ● Reuses partitioning and sort orders ● Decides to cache loop-invariant data in iterations Optimization Examples 18
  • 19. case class PageVisit(url: String, ip: String, userId: Long) case class User(id: Long, name: String, email: String, country: String) // get your data from somewhere val visits: DataSet[PageVisit] = ... val users: DataSet[User] = ... // filter the users data set val germanUsers = users.filter((u) => u.country.equals("de")) // join data sets val germanVisits: DataSet[(PageVisit, User)] = // equi-join condition (PageVisit.userId = User.id) visits.join(germanUsers).where("userId").equalTo("id") Example: Distributed Joins The join operator needs to create all the pairs of elements from the two inputs, for which the join condition evaluates to true 19
  • 20. Example: Distributed Joins ● Ship Strategy: The input data is distributed across all parallel instances that participate in the join ● Local Strategy: Each parallel instance performs a join algorithm on its local partition For both steps, there are multiple valid strategies which are favorable in different situations. 20
  • 21. Repartition-Repartition Strategy Partitions both inputs using the same partitioning function. All elements that share the same join key are shipped to the same parallel instance and can be locally joined. 21
  • 22. Broadcast-Forward Strategy Sends one complete data set to each parallel instance that holds a partition of the other data. The other Dataset remains local and is not shipped at all. 22
  • 23. The optimizer will compute cost estimates for execution plans and will pick the “cheapest” plan: ● amount of data shipped over the the network ● if the data of one input is already partitioned R-R Cost: Full shuffle of both data sets over the network B-F Cost: Depends on the size of the dataset that is broadcasted and the number of parallel instances Read more: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html How does the Optimizer choose? 23
  • 24. how are Flink iterations special?
  • 25. ● for/while loop in client submits one job per iteration step ● Data reuse by caching in memory and/or disk Step Step Step Step Step Client Iterate by unrolling 25
  • 26. Native Iterations ● the runtime is aware of the iterative execution ● no scheduling overhead between iterations ● caching and state maintenance are handled automatically Caching Loop-invariant DataPushing work “out of the loop” Maintain state as index 26
  • 27. Flink Iteration Operators Iterate IterateDelta Input Iterative Update Function Result Replace Workset Iterative Update Function Result Solution Set State 27
  • 28. Delta Iteration ● Not all the elements of the state are updated in each iteration. ● The elements that require an update, are stored in the workset. ● The step function is applied only to the workset elements. 28
  • 29. Partition a graph into components by iteratively propagating the min vertex ID among neighbors Example: Connected Components 29
  • 31. 31
  • 33. Read the documentation and our blog posts! ● Memory Management ● Serialization and Type Extraction ● Streaming Optimizations ● Fault-Tolerance Want to learn more? 33
  • 34. Apache Flink Deep Dive Vasia Kalavri Flink Committer & KTH PhD student vasia@apache.org 1st Apache Flink Meetup Stockholm May 11, 2015