SlideShare a Scribd company logo
Ufuk Celebi
uce@apache.org
Flink Forward
October 13, 2015
Stream & Batch
Processing in One System
Apache Flink’s Streaming Data Flow Engine
System Architecture
Deployment

Local (Single JVM) · Cluster (Standalone, YARN)
DataStream API
Unbounded Data
DataSet API
Bounded Data
Runtime
Distributed Streaming Data Flow
Libraries
Machine Learning · Graph Processing · SQL-like API
1
User
Deployment

Local (Single JVM) · Cluster (Standalone, YARN)
DataStream API
Unbounded Data
DataSet API
Bounded Data
Runtime
Distributed Streaming Data Flow
Libraries
Machine Learning · Graph Processing · SQL-like API
1
System
Deployment

Local (Single JVM) · Cluster (Standalone, YARN)
DataStream API
Unbounded Data
DataSet API
Bounded Data
Runtime
Distributed Streaming Data Flow
Libraries
Machine Learning · Graph Processing · SQL-like API
1
Today

Journey from APIs to
Parallel Execution
A look behind the scenes.
You don’t have to worry about this.
Components
JobManager
Master
Client
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
User System
public class WordCount {
public static void main(String[] args) throws Exception {
// Flink’s entry point
StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?",
"Deny thy father and refuse thy name",
"Or, if thou wilt not, be but sworn my love,",
"And I'll no longer be a Capulet.");
// Split by whitespace to (word, 1) and sum up ones
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace())
.keyBy(0)
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1);
counts.print();
// Today: What happens now?
env.execute();
}
}
2
Components
JobManager
Master
Client
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
User System
public class WordCount {
public static void main(String[] args) throws Exception {
// Flink’s entry point
StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?",
"Deny thy father and refuse thy name",
"Or, if thou wilt not, be but sworn my love,",
"And I'll no longer be a Capulet.");
// Split by whitespace to (word, 1) and sum up ones
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace())
.keyBy(0)
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1);
counts.print();
// Today: What happens now?
env.execute();
}
}
Submit
Program
2
Components
JobManager
Master
Client
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
User System
public class WordCount {
public static void main(String[] args) throws Exception {
// Flink’s entry point
StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?",
"Deny thy father and refuse thy name",
"Or, if thou wilt not, be but sworn my love,",
"And I'll no longer be a Capulet.");
// Split by whitespace to (word, 1) and sum up ones
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace())
.keyBy(0)
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1);
counts.print();
// Today: What happens now?
env.execute();
}
}
Submit
Program
Schedule
2
Components
JobManager
Master
Client
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
User System
public class WordCount {
public static void main(String[] args) throws Exception {
// Flink’s entry point
StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?",
"Deny thy father and refuse thy name",
"Or, if thou wilt not, be but sworn my love,",
"And I'll no longer be a Capulet.");
// Split by whitespace to (word, 1) and sum up ones
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace())
.keyBy(0)
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1);
counts.print();
// Today: What happens now?
env.execute();
}
}
Submit
Program
Schedule
Execute
2
Client
Translates the API code to 

a data flow graph called JobGraph and
submits it to the JobManager.
Source
Transform
Sink
public class WordCount {
public static void main(String[] args) throws Exception {
// Flink’s entry point
StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?",
"Deny thy father and refuse thy name",
"Or, if thou wilt not, be but sworn my love,",
"And I'll no longer be a Capulet.");
// Split by whitespace to (word, 1) and sum up ones
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace())
.keyBy(0)
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1);
counts.print();
// Today: What happens now?
env.execute();
}
}
Translate
3
JobGraph
JobVertex
Intermediate

Result
Computation Data
4
JobGraph
JobVertex
Intermediate

Result
JobVertex
Intermediate

Result
Produce
Computation Data
4
JobGraph
JobVertex
Intermediate

Result
JobVertex
Intermediate

Result
JobVertex
Intermediate

Result
Produce
Consume
Computation Data
4
The JobGraph
Vertices and results are combined
to a directed acyclic graph (DAG)
representing the user program.
5
Source
Source
Sink
SinkJoin
Map
JobGraph Translation
• Translation includes optimizations like chaining:
f g
f · g
• DataSet API translation with cost-based optimization
6
JobGraph
JobVertex Parameters
• Parallelism
• Code to run
• Consumed result(s)
• Connection pattern
JobGraph is common abstraction for both
DataStream and DataSet API.
Result Parameters
• Producer
• Type
Runtime is agnostic to the respective API. It’s only a
question of JobGraph parameterization.
7
JobGraph
JobVertex Parameters
• Parallelism
• Code to run
• Consumed result(s)
• Connection pattern
JobGraph is common abstraction for both
DataStream and DataSet API.
Result Parameters
• Producer
• Type
Runtime is agnostic to the respective API. It’s only a
question of JobGraph parameterization.
7
TaskManagerTaskManager
Coordination
• Coordination between components via Akka Actors
• Actors exchange asynchronous messages
• Each actor has own isolated state
JobManager
Master
Client
Actor SystemActor System
8
TaskManager
System Components
JobManager
Master
Client
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
TaskManager
Worker
User System
public class WordCount {
public static void main(String[] args) throws Exception {
// Flink’s entry point
StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?",
"Deny thy father and refuse thy name",
"Or, if thou wilt not, be but sworn my love,",
"And I'll no longer be a Capulet.");
// Split by whitespace to (word, 1) and sum up ones
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace())
.keyBy(0)
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1);
counts.print();
// Today: What happens now?
env.execute();
}
}
Submit
Program
JobManager
• All coordination via JobManager (master):
• Scheduling programs for execution
• Checkpoint coordination (Till’s talk later today)
• Monitoring workers
Actor System
Scheduling
Checkpoint Coordination
9
ExecutionGraph
• Receive JobGraph and span out to ExecutionGraph
JobVertex Result JobVertex
10
ExecutionGraph
• Receive JobGraph and span out to ExecutionGraph
EV1
EV3
EV2
EV4
RP1
RP2
RP3
RP4
EV1
EV2
Point to Point
JobVertex Result
ExecutionVertex (EV)
ResultPartition (RP)
JobVertex
10
ExecutionGraph
• Receive JobGraph and span out to ExecutionGraph
EV1
EV3
EV2
EV4
RP1
RP2
RP3
RP4
EV1
EV2
All to All
JobVertex Result
ExecutionVertex (EV)
ResultPartition (RP)
JobVertex
10
TaskManager
Actor System
Task SlotTask SlotTask SlotTask Slot
• All data processing in TaskManager (worker):
• Communicate with JobManager via Actor messages
• Exchange data between themselves via dedicated
data connections
• Expose task slots for execution
I/O Manager
Memory Manager
11
Scheduling
• Each ExecutionVertex will be executed one or more times
• The JobManager maps Execution to task slots
• Pipelined execution in same slot where applicable
12
p=4 p=4 p=3
All to allPointwise
TaskManager 1 TaskManager 2
Scheduling
TaskManager 1 TaskManager 2
• Each ExecutionVertex will be executed one or more times
• The JobManager maps Execution to task slots
• Pipelined execution in same slot where applicable
p=4 p=4 p=3
All to allPointwise
12
Scheduling
TaskManager 1 TaskManager 2
• Each ExecutionVertex will be executed one or more times
• The JobManager maps Execution to task slots
• Pipelined execution in same slot where applicable
p=4 p=4 p=3
All to allPointwise
12
Scheduling
TaskManager 1 TaskManager 2
• Each ExecutionVertex will be executed one or more times
• The JobManager maps Execution to task slots
• Pipelined execution in same slot where applicable
p=4 p=4 p=3
All to allPointwise
12
Scheduling
TaskManager 1 TaskManager 2
• Each ExecutionVertex will be executed one or more times
• The JobManager maps Execution to task slots
• Pipelined execution in same slot where applicable
p=4 p=4 p=3
All to allPointwise
12
Scheduling
TaskManager 1 TaskManager 2
• Each ExecutionVertex will be executed one or more times
• The JobManager maps Execution to task slots
• Pipelined execution in same slot where applicable
p=4 p=4 p=3
All to allPointwise
12
Scheduling
• Scheduling happens from the sources
• Later tasks are scheduled during runtime
• Depending on the result type
JobManager
Master
Actor System
TaskManager
Worker
Actor System
Submit
Task
State
Updates
13
Execution
• The ExecutionGraph tracks the state of each parallel
Execution
• Asynchronous messages from the 

TaskManager and Client Failed
FinishedCancellingCancelled
Created Scheduled RunningDeploying
14
Task Execution
• TaskManager receives Task per Execution
• Task descriptor is limited to:
• Location of consumed results
• Produced results
• Operator & user code
User 

Code
Operator
Task
15
? ?
Task Execution
DataStream<Tuple2<String, Integer>> counts =
data.flatMap(new SplitByWhitespace());
User 

Code
17
Task Execution
DataStream<Tuple2<String, Integer>> counts =
data.flatMap(new SplitByWhitespace());
User 

Code
for (…) {

out.collect(new Tuple2<>(w, 1));

}
17
Task Execution
DataStream<Tuple2<String, Integer>> counts =
data.flatMap(new SplitByWhitespace());
User 

Code
StreamTask with

StreamFlatMap
operator
for (…) {

out.collect(new Tuple2<>(w, 1));

}
17
Task Execution
DataStream<Tuple2<String, Integer>> counts =
data.flatMap(new SplitByWhitespace());
User 

Code
StreamTask with

StreamFlatMap
operator
Task with one
consumed and
one produced
result
for (…) {

out.collect(new Tuple2<>(w, 1));

}
17
Data Connections
• Input Gates request input from local and remote
channels on first read
Task Result
ResultManager
TaskManager
ResultManager
TaskManager
NetworkManagerNetworkManager
Input
Gate
18
Data Connections
• Input Gates request input from local and remote
channels on first read
Task Result
ResultManager
TaskManager
ResultManager
TaskManager
NetworkManagerNetworkManager
Input
Gate
1.Initiate TCP
connection
18
Data Connections
• Input Gates request input from local and remote
channels on first read
Task Result
ResultManager
TaskManager
ResultManager
TaskManager
NetworkManagerNetworkManager
Input
Gate
2. Request
1.Initiate TCP
connection
18
Data Connections
• Input Gates request input from local and remote
channels on first read
Task Result
ResultManager
TaskManager
ResultManager
TaskManager
NetworkManagerNetworkManager
Input
Gate
2. Request
3. Send via
TCP
1.Initiate TCP
connection
18
Result Characteristics
vs.
vs.
Ephemeral Checkpointed
Pipelined Blocking
How and when to do data exchange?
How long to keep results around?
20
Map
Pipelined
Result
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result11010101
0100
Pipelined Results
21
Map
Pipelined
Result
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
01010100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
0101
0100
Pipelined Results
21
Map
Pipelined
Result
Reduce
1101
0101
0100
Pipelined Results
21
Map
Blocking
Result
1101
0101
0100
Blocking Results
22
Map
Blocking
Result11010101
0100
Blocking Results
22
Map
Blocking
Result
1101
0101
0100
Blocking Results
22
Map
Blocking
Result
1101
0101
0100
Blocking Results
22
Map
Blocking
Result
1101
0101
0100
Blocking Results
22
Map
Blocking
Result
1101
01010100
Blocking Results
22
Map
Blocking
Result
1101
0101
0100
Blocking Results
22
Map
Blocking
Result
Reduce
1101
0101
0100
Blocking Results
22
Map
Blocking
Result
Reduce
1101
0101
0100
Blocking Results
22
Map
Blocking
Result
Reduce
1101
0101
0100
Blocking Results
22
Map
Blocking
Result
Reduce
1101
0101
0100
Blocking Results
22
Map
Blocking
Result
Reduce
1101
0101
0100
Blocking Results
22
Batch Pipelines
Batch Pipelines
Data exchange

is mostly streamed
Batch Pipelines
Data exchange

is mostly streamed
Some operators block
(e.g. sort, hash table)
Recap
Client JobManager TaskManager
Communication
Actor-only
(coordination)
Actor-only
(coordination)
Actor & Data
Streams
Central 

Abstraction JobGraph ExecutionGraph Task
State Tracking –
Complete

program
Single Task
23
Thank You!
Stream & Batch Processing
DataStream DataSet
JobGraph Chaining
Chaining and cost-
based optimisation
Intermediate

Results
Pipelined Pipelined and Blocking
Operators Stream operators Batch operators
User function
Common interface for

map, reduce, …
25
Stream & Batch Processing
• Stream and Batch programs are different
parameterizations of the JobGraph
• Everything goes down to the same runtime
• Streaming first, batch as special case
• Cost-based optimizer on translation
• Blocking results for less resource fragmentation
• But still profit from streaming
• DataSet and DataStream API are essentially all user
code to the runtime
24
Result Characteristics
vs.
vs.
Ephemeral Checkpointed
Pipelined Blocking
How and when to do data exchange?
How long to keep results around?
Map
Ephemeral
Result
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
45
Ephemeral Results
11010101
0100
Map
Ephemeral
Result
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
1101
0101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
11010101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
11010101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
11010101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
11010101
0100
Map
Ephemeral
Result
Reduce
45
Ephemeral Results
11010101
0100
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
11010101
0100
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
0101
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
01010100
1101
0101
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
0101
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
0101
0100
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
0101
0100
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
0101
0100
Map
Checkpointed
Result
Reduce
46
Checkpointed Results
1101
0101
0100
1101
0101
0100

More Related Content

PDF
Apache Flink internals
PDF
Machine Learning with Apache Flink at Stockholm Machine Learning Group
PPTX
Apache Flink: API, runtime, and project roadmap
PDF
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
PPTX
Apache Flink Training: System Overview
PDF
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
PDF
Marton Balassi – Stateful Stream Processing
PPTX
First Flink Bay Area meetup
Apache Flink internals
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Apache Flink: API, runtime, and project roadmap
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Training: System Overview
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Marton Balassi – Stateful Stream Processing
First Flink Bay Area meetup

What's hot (20)

PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PDF
Mikio Braun – Data flow vs. procedural programming
PDF
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
PDF
Apache Flink Deep Dive
PPTX
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
PDF
Matthias J. Sax – A Tale of Squirrels and Storms
PPTX
Flink internals web
PDF
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
PPTX
Apache Flink@ Strata & Hadoop World London
PPTX
Apache Flink Training: DataStream API Part 1 Basic
PDF
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
PDF
Flink Gelly - Karlsruhe - June 2015
PDF
Batch and Stream Graph Processing with Apache Flink
PDF
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
PPTX
Debunking Common Myths in Stream Processing
PDF
FlinkML: Large Scale Machine Learning with Apache Flink
PDF
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
PPTX
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
PDF
Gelly in Apache Flink Bay Area Meetup
K. Tzoumas & S. Ewen – Flink Forward Keynote
Mikio Braun – Data flow vs. procedural programming
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Apache Flink Deep Dive
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Matthias J. Sax – A Tale of Squirrels and Storms
Flink internals web
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Apache Flink@ Strata & Hadoop World London
Apache Flink Training: DataStream API Part 1 Basic
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Flink Gelly - Karlsruhe - June 2015
Batch and Stream Graph Processing with Apache Flink
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Debunking Common Myths in Stream Processing
FlinkML: Large Scale Machine Learning with Apache Flink
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Gelly in Apache Flink Bay Area Meetup
Ad

Viewers also liked (20)

PDF
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
PDF
Flink Apachecon Presentation
PDF
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
PDF
Vasia Kalavri – Training: Gelly School
PDF
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
PDF
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
PPTX
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
PPTX
Fabian Hueske – Cascading on Flink
PPTX
Assaf Araki – Real Time Analytics at Scale
PDF
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
PPTX
Apache Flink - Hadoop MapReduce Compatibility
PDF
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
PDF
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
PDF
Fabian Hueske – Juggling with Bits and Bytes
PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
PPTX
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
PDF
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
PDF
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
PPTX
Apache Flink Training: DataStream API Part 2 Advanced
PPTX
Slim Baltagi – Flink vs. Spark
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Flink Apachecon Presentation
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Vasia Kalavri – Training: Gelly School
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Fabian Hueske – Cascading on Flink
Assaf Araki – Real Time Analytics at Scale
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Apache Flink - Hadoop MapReduce Compatibility
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Fabian Hueske – Juggling with Bits and Bytes
Flink 0.10 @ Bay Area Meetup (October 2015)
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Apache Flink Training: DataStream API Part 2 Advanced
Slim Baltagi – Flink vs. Spark
Ad

Similar to Ufuc Celebi – Stream & Batch Processing in one System (20)

PDF
Streaming Dataflow with Apache Flink
PPTX
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
PPTX
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
PPTX
Introduction to Apache Flink
PPTX
Apache Flink Deep Dive
ODP
Concurrent Programming in Java
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
PPTX
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
PPTX
Chicago Flink Meetup: Flink's streaming architecture
PDF
Gpars workshop
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
PDF
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PPTX
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
PPTX
Apache Flink Overview at SF Spark and Friends
ODP
Pick up the low-hanging concurrency fruit
PDF
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
PPTX
Reactive Streams - László van den Hoek
PPTX
Flink Streaming
PDF
Pure Future
Streaming Dataflow with Apache Flink
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Introduction to Apache Flink
Apache Flink Deep Dive
Concurrent Programming in Java
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Chicago Flink Meetup: Flink's streaming architecture
Gpars workshop
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Flexible and Real-Time Stream Processing with Apache Flink
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Apache Flink Overview at SF Spark and Friends
Pick up the low-hanging concurrency fruit
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Reactive Streams - László van den Hoek
Flink Streaming
Pure Future

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Monthly Chronicles - July 2025
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm
Chapter 3 Spatial Domain Image Processing.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
Approach and Philosophy of On baking technology

Ufuc Celebi – Stream & Batch Processing in one System