SlideShare a Scribd company logo
dbisINSTITUT FÜR INFORMATIK
HUMBOLDT−UNIVERSITÄT ZU ERLINB
A Tale of Squirrels and Storms
Flink Forward 2015
Matthias J. Sax
mjsax@{informatik.hu-berlin.de|apache.org}
@MatthiasJSax
Humboldt-Universit¨at zu Berlin
Department of Computer Science
October 13st
2015
–MatthiasJ.Sax–SquirrelsandStorms
1/22
About Me
Ph. D. student in CS, DBIS Group, HU Berlin
involved in Stratosphere research project
working on data stream processing and optimization
Aeolus: build on top of Apache Storm
(https://github.com/mjsax/aeolus)
Committer at Apache Flink
Flink and Storm
vs.
Flink and Storm
Flinkvs.Storm
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
low latencies ( 100ms)
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
low latencies ( 100ms)
executing data flow programs
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
low latencies ( 100ms)
executing data flow programs
parallel and distributed
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
low latencies ( 100ms)
executing data flow programs
parallel and distributed
fault-tolerant
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
low latencies ( 100ms)
executing data flow programs
parallel and distributed
fault-tolerant
cloud or cluster environment
–MatthiasJ.Sax–SquirrelsandStorms
3/22
Similarities of Flink and Storm
true stream processing engines (no micro-batching)
low latencies ( 100ms)
executing data flow programs
parallel and distributed
fault-tolerant
cloud or cluster environment
Trident:
similar Java API
exactly-once processing
–MatthiasJ.Sax–SquirrelsandStorms
4/22
Flink vs. Storm
Advantages of Storm:
super low latency (< 10ms)
very robust:
stateless JVM for easy restart on failure
Zookeeper manages cluster state
isolation of topology
dynamic scaling (to some extent)
multi-language protocol (for experts only)
distributed RPC
–MatthiasJ.Sax–SquirrelsandStorms
5/22
Flink vs. Storm
Advantages of Flink:1
richer API
Java and Scala
type safe programs
system is aware of multiple input streams
ordered stream processing
system and user timestamps
count/time and customized windows
stateful processing
light weight fault-tolerance
Chandy-Lamport distributed snapshots
1
http:
//data-artisans.com/real-time-stream-processing-the-next-step-for-apache-flink/
–MatthiasJ.Sax–SquirrelsandStorms
6/22
Flink vs. Storm
Advantages of Flink (cont.):
provides exactly-once sinks
native flow control (back pressure)2
higher throughput (> x 100)3
no lambda or kappa architecture necessary
native support for iterations (cyclic data flows)
managed memory
2
http://data-artisans.com/how-flink-handles-backpressure/
3
http://data-artisans.com/
high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Zookeeper
Zookeeper
Zookeeper
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Zookeeper
Zookeeper
Zookeeper
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Zookeeper
Zookeeper
Zookeeper
Worker
Worker
Worker
Worker
Worker
Worker
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Zookeeper
Zookeeper
Zookeeper
Worker
Worker
Worker
Worker
Worker
Worker
–MatthiasJ.Sax–SquirrelsandStorms
7/22
System Architecture: Storm
Nimbus
Client
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Zookeeper
Zookeeper
Zookeeper
Worker
Worker
Worker
Worker
Worker
Worker
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
TaskManager
TaskManager
TaskManager
TaskManager
TaskManager
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
TaskManager
TaskManager
TaskManager
TaskManager
TaskManager
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
TaskManager
TaskManager
TaskManager
TaskManager
TaskManager
–MatthiasJ.Sax–SquirrelsandStorms
8/22
System Architecture: Flink
JobManager
WebClientCLI Shell
TaskManager
TaskManager
TaskManager
TaskManager
TaskManager
JobManager
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to intra JVM and/or network
communication
localOfShuffle connection pattern poorly exploited
isolation of topologies
custom scheduler possible (for experts only)
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to intra JVM and/or network
communication
localOfShuffle connection pattern poorly exploited
isolation of topologies
custom scheduler possible (for experts only)
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to intra JVM and/or network
communication
localOfShuffle connection pattern poorly exploited
isolation of topologies
custom scheduler possible (for experts only)
Src
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to intra JVM and/or network
communication
localOfShuffle connection pattern poorly exploited
isolation of topologies
custom scheduler possible (for experts only)
Src T1 T2
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to intra JVM and/or network
communication
localOfShuffle connection pattern poorly exploited
isolation of topologies
custom scheduler possible (for experts only)
Src T1 T2 F1
F2
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to intra JVM and/or network
communication
localOfShuffle connection pattern poorly exploited
isolation of topologies
custom scheduler possible (for experts only)
Src T1 T2 F1
F2 C1 C2
–MatthiasJ.Sax–SquirrelsandStorms
9/22
Topology Deployment: Storm
per default: round-robin scheduling
high overhead due to intra JVM and/or network
communication
localOfShuffle connection pattern poorly exploited
isolation of topologies
custom scheduler possible (for experts only)
Src T1 T2 F1
F2 C1 C2 Sk
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forward is default
operator chaining
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forward is default
operator chaining
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forward is default
operator chaining
Src
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forward is default
operator chaining
Src
T1 T2
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forward is default
operator chaining
Src
T1 T2
F1 F2
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forward is default
operator chaining
Src
T1 T2
F1 F2
C1 C2
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forward is default
operator chaining
Src
T1 T2
F1 F2
C1 C2
Sk
–MatthiasJ.Sax–SquirrelsandStorms
10/22
Topology Deployment: Flink
deploys whole pipeline to each TaskManager
local-forward is default
operator chaining
Src
T1 T2
F1 F2
C1 C2
Sk
Storm Compatibility
–MatthiasJ.Sax–SquirrelsandStorms
12/22
Storm Compatibility
Allows to4
execute Storm topologies in Flink
embed Spouts/Bolts in Flink streaming programs
4
https://ci.apache.org/projects/flink/flink-docs-master/apis/storm_compatibility.html
–MatthiasJ.Sax–SquirrelsandStorms
12/22
Storm Compatibility
Allows to4
execute Storm topologies in Flink
embed Spouts/Bolts in Flink streaming programs
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
Streaming API
Stream Processing
Local
JVM, Embedded
Cluster
Standalone, YARN
Cloud
GCE, EC2
FlinkML
MachineLearning
Gelly
GraphAPI&Library
TableAPI
Batch
HadoopM/R
Comptibility
TableAPI
Streaming
Storm
Compatibility
4
https://ci.apache.org/projects/flink/flink-docs-master/apis/storm_compatibility.html
–MatthiasJ.Sax–SquirrelsandStorms
13/22
Storm Compatibility: API
Execute whole topologies:
FlinkTopologyBuilder
FlinkSubmitter
FlinkClient
FlinkLocalCluster
–MatthiasJ.Sax–SquirrelsandStorms
13/22
Storm Compatibility: API
Execute whole topologies:
FlinkTopologyBuilder
FlinkSubmitter
FlinkClient
FlinkLocalCluster
Embedded mode:
SpoutWrapper
BoltWrapper
–MatthiasJ.Sax–SquirrelsandStorms
13/22
Storm Compatibility: API
Execute whole topologies:
FlinkTopologyBuilder
FlinkSubmitter
FlinkClient
FlinkLocalCluster
Embedded mode:
SpoutWrapper
BoltWrapper
Additionally:
FiniteSpout interface
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
Bolt
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
Bolt
BoltWrapper
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
Bolt
BoltWrapper
Flink Collector
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
redirecting method calls
run() ⇒ nextTuple()
processElement() ⇒ execute()
emit() ⇒ collect()
Bolt
BoltWrapper
Flink Collector
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
redirecting method calls
run() ⇒ nextTuple()
processElement() ⇒ execute()
emit() ⇒ collect()
Bolt
BoltWrapper
Flink Collector
execute()
processElement()
emit()
collect()
–MatthiasJ.Sax–SquirrelsandStorms
14/22
Storm Compatibility: Internals
Wrappers for Operators and Collectors
redirecting method calls
run() ⇒ nextTuple()
processElement() ⇒ execute()
emit() ⇒ collect()
translating data types
TupleX, POJO ⇔ Tuple/Values
primitive types for single attribute input/output
Bolt
BoltWrapper
Flink Collector
execute()
processElement()
emit()
collect()
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= new TopologyBuilder ();
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= new TopologyBuilder ();
builder.setSpout("source",
new FileSpout("/tmp/hamlet.txt"));
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= new TopologyBuilder ();
builder.setSpout("source",
new FileSpout("/tmp/hamlet.txt"));
builder.setBolt("tokenizer", new BoltTokenizer ())
. shuffleGrouping ("source");
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= new TopologyBuilder ();
builder.setSpout("source",
new FileSpout("/tmp/hamlet.txt"));
builder.setBolt("tokenizer", new BoltTokenizer ())
. shuffleGrouping ("source");
builder.setBolt("counter", new BoltCounter ())
. fieldsGrouping("tokenizer",
new Fields("word"));
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= new TopologyBuilder ();
builder.setSpout("source",
new FileSpout("/tmp/hamlet.txt"));
builder.setBolt("tokenizer", new BoltTokenizer ())
. shuffleGrouping ("source");
builder.setBolt("counter", new BoltCounter ())
. fieldsGrouping("tokenizer",
new Fields("word"));
builder.setBolt("sink",
new BoltFileSink("/tmp/count.txt"))
. shuffleGrouping ("counter");
–MatthiasJ.Sax–SquirrelsandStorms
15/22
WordCount on Storm
public void main(String [] args) {
TopologyBuilder builder
= new TopologyBuilder ();
builder.setSpout("source",
new FileSpout("/tmp/hamlet.txt"));
builder.setBolt("tokenizer", new BoltTokenizer ())
. shuffleGrouping ("source");
builder.setBolt("counter", new BoltCounter ())
. fieldsGrouping("tokenizer",
new Fields("word"));
builder.setBolt("sink",
new BoltFileSink("/tmp/count.txt"))
. shuffleGrouping ("counter");
Config conf = new Config ();
StormSubmitter. submitTopology("WordCount", conf ,
builder.createTopology ());
}
–MatthiasJ.Sax–SquirrelsandStorms
16/22
WordCount on Flink
public void main(String [] args) {
FlinkTopologyBuilder builder
= new FlinkTopologyBuilder ();
builder.setSpout("source",
new FileSpout("/tmp/hamlet.txt"));
builder.setBolt("tokenizer", new BoltTokenizer ())
. shuffleGrouping ("source");
builder.setBolt("counter", new BoltCounter ())
. fieldsGrouping("tokenizer",
new Fields("word"));
builder.setBolt("sink",
new BoltFileSink("/tmp/count.txt"))
. shuffleGrouping ("counter");
Config conf = new Config ();
FlinkSubmitter. submitTopology("WordCount", conf ,
builder.createTopology ());
}
–MatthiasJ.Sax–SquirrelsandStorms
17/22
Storm on Flink
run Storm topology on Flink:
changing two lines of code
sufficient
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <Tuple1 <String >> source
= env.addSource(
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <Tuple1 <String >> source
= env.addSource(
new SpoutWrapper <Tuple1 <String >>(
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <Tuple1 <String >> source
= env.addSource(
new SpoutWrapper <Tuple1 <String >>(
new FileSpout("/tmp/hamlet.txt")),
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <Tuple1 <String >> source
= env.addSource(
new SpoutWrapper <Tuple1 <String >>(
new FileSpout("/tmp/hamlet.txt")),
TypeExtractor.getForObject(
new Tuple1 <String >("")));
–MatthiasJ.Sax–SquirrelsandStorms
18/22
WordCount: Embedded Spout
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <Tuple1 <String >> source
= env.addSource(
new SpoutWrapper <Tuple1 <String >>(
new FileSpout("/tmp/hamlet.txt")),
TypeExtractor.getForObject(
new Tuple1 <String >("")));
// do further processing on source
source.flatMap(new Tokenizer ())
// out -> Tuple2 <String ,Integer >
.keyBy (0). sum (1). writeAsText("/tmp/count.txt");
env.execute("WordCount");
}
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <String > text
= env.readTextFile("/tmp/hamlet.txt");
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <String > text
= env.readTextFile("/tmp/hamlet.txt");
DataStream <Tuple2 <String ,Integer >> tokens
= text.transform(
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <String > text
= env.readTextFile("/tmp/hamlet.txt");
DataStream <Tuple2 <String ,Integer >> tokens
= text.transform(
"tokenizer",
new BoltWrapper <String ,
Tuple2 <String ,Integer >>(
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <String > text
= env.readTextFile("/tmp/hamlet.txt");
DataStream <Tuple2 <String ,Integer >> tokens
= text.transform(
"tokenizer",
new BoltWrapper <String ,
Tuple2 <String ,Integer >>(
new BoltTokenizer ()));
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <String > text
= env.readTextFile("/tmp/hamlet.txt");
DataStream <Tuple2 <String ,Integer >> tokens
= text.transform(
"tokenizer",
TypeExtractor.getForObject
new Tuple2 <String ,Integer >("", 0),
new BoltWrapper <String ,
Tuple2 <String ,Integer >>(
new BoltTokenizer ()));
–MatthiasJ.Sax–SquirrelsandStorms
19/22
WordCount: Embedded Bolt
public void main(String [] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment
. getExecutionEnvironment ();
DataStream <String > text
= env.readTextFile("/tmp/hamlet.txt");
DataStream <Tuple2 <String ,Integer >> tokens
= text.transform(
"tokenizer",
TypeExtractor.getForObject
new Tuple2 <String ,Integer >("", 0),
new BoltWrapper <String ,
Tuple2 <String ,Integer >>(
new BoltTokenizer ()));
// do further processing on tokens
tokens.keyBy (0). sum (1). writeAsText("/tmp/count.txt");
env.execute("WordCount");
}
–MatthiasJ.Sax–SquirrelsandStorms
20/22
Embedded Compatibility Mode
Re-use code within Flink streaming program:
Spouts as Flink sources
Bolts as Flink operators
–MatthiasJ.Sax–SquirrelsandStorms
20/22
Embedded Compatibility Mode
Re-use code within Flink streaming program:
Spouts as Flink sources
Bolts as Flink operators
Pros:
mix-and-match of Storm and Flink operators
configure Spouts/Bolts (Map/Config)
spliting Spout/Bolt output streams
type-safe embedding
also raw types, ie, String instead of Tuple1 String
convert infinite Spouts to finite sources
FinitSpout interfacee
–MatthiasJ.Sax–SquirrelsandStorms
20/22
Embedded Compatibility Mode
Re-use code within Flink streaming program:
Spouts as Flink sources
Bolts as Flink operators
Pros:
mix-and-match of Storm and Flink operators
configure Spouts/Bolts (Map/Config)
spliting Spout/Bolt output streams
type-safe embedding
also raw types, ie, String instead of Tuple1 String
convert infinite Spouts to finite sources
FinitSpout interfacee
Cons: Currently, quite some boilderplate code necessary :/
–MatthiasJ.Sax–SquirrelsandStorms
21/22
Outlook: Storm Compatibility
Current status:
available in master branch
based on Storm 0.9.4
will be part of Flink 0.10.0
–MatthiasJ.Sax–SquirrelsandStorms
21/22
Outlook: Storm Compatibility
Current status:
available in master branch
based on Storm 0.9.4
will be part of Flink 0.10.0
Work in progress:
Hooks
Metrics
–MatthiasJ.Sax–SquirrelsandStorms
21/22
Outlook: Storm Compatibility
Current status:
available in master branch
based on Storm 0.9.4
will be part of Flink 0.10.0
Work in progress:
Hooks
Metrics
Next steps:
enable fault-tolerance
introduce FlinkTridentTopology
improve embedded mode (StormEnvironment)
dbisINSTITUT FÜR INFORMATIK
HUMBOLDT−UNIVERSITÄT ZU ERLINB
A Tale of Squirrels and Storms
Flink Forward 2015
Thanks!

More Related Content

PDF
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
PPTX
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
PPTX
Apache Flink Training: System Overview
PDF
Marton Balassi – Stateful Stream Processing
PDF
Alexander Kolb – Flink. Yet another Streaming Framework?
PDF
Tech Talk @ Google on Flink Fault Tolerance and HA
PDF
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
PDF
Ufuc Celebi – Stream & Batch Processing in one System
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Apache Flink Training: System Overview
Marton Balassi – Stateful Stream Processing
Alexander Kolb – Flink. Yet another Streaming Framework?
Tech Talk @ Google on Flink Fault Tolerance and HA
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Ufuc Celebi – Stream & Batch Processing in one System

What's hot (20)

PDF
Dongwon Kim – A Comparative Performance Evaluation of Flink
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
PDF
Christian Kreuzfeld – Static vs Dynamic Stream Processing
PDF
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
PPTX
Apache Flink at Strata San Jose 2016
PDF
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PDF
Pulsar connector on flink 1.14
PDF
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
PPTX
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
PDF
Data Stream Analytics - Why they are important
PPTX
An Introduction to Distributed Data Streaming
PDF
Unified Stream and Batch Processing with Apache Flink
PDF
A look at Flink 1.2
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PPTX
Apache Flink@ Strata & Hadoop World London
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
PPTX
Flink history, roadmap and vision
PPTX
Data Stream Processing with Apache Flink
PPTX
SICS: Apache Flink Streaming
Dongwon Kim – A Comparative Performance Evaluation of Flink
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Apache Flink at Strata San Jose 2016
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
K. Tzoumas & S. Ewen – Flink Forward Keynote
Pulsar connector on flink 1.14
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Data Stream Analytics - Why they are important
An Introduction to Distributed Data Streaming
Unified Stream and Batch Processing with Apache Flink
A look at Flink 1.2
Taking a look under the hood of Apache Flink's relational APIs.
Apache Flink@ Strata & Hadoop World London
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Flink history, roadmap and vision
Data Stream Processing with Apache Flink
SICS: Apache Flink Streaming
Ad

Viewers also liked (20)

PPTX
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
PDF
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
PDF
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
PPTX
Apache Flink Training: DataStream API Part 1 Basic
PDF
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
PPTX
Slim Baltagi – Flink vs. Spark
PPTX
Flink Case Study: Bouygues Telecom
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PDF
Mikio Braun – Data flow vs. procedural programming
PDF
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
PDF
Vasia Kalavri – Training: Gelly School
PPTX
Apache Flink: API, runtime, and project roadmap
PPTX
Michael Häusler – Everyday flink
PDF
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
PPTX
Assaf Araki – Real Time Analytics at Scale
PDF
Apache Flink internals
PDF
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
PPTX
Apache Flink Training: DataSet API Basics
PDF
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
PPTX
Aljoscha Krettek – Notions of Time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Apache Flink Training: DataStream API Part 1 Basic
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Slim Baltagi – Flink vs. Spark
Flink Case Study: Bouygues Telecom
Introduction to Apache Flink - Fast and reliable big data processing
Mikio Braun – Data flow vs. procedural programming
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Vasia Kalavri – Training: Gelly School
Apache Flink: API, runtime, and project roadmap
Michael Häusler – Everyday flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Assaf Araki – Real Time Analytics at Scale
Apache Flink internals
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Apache Flink Training: DataSet API Basics
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Aljoscha Krettek – Notions of Time
Ad

Similar to Matthias J. Sax – A Tale of Squirrels and Storms (20)

PDF
All Day DevOps - FLiP Stack for Cloud Data Lakes
PDF
An Optics Life
PDF
Porting a Streaming Pipeline from Scala to Rust
PDF
PROFIBUS frame analysis - Peter Thomas of Control Specialists
PDF
Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
PPT
GARUDA
PDF
Network simulator 2 a simulation tool for linux
PDF
Apache Kafka - A modern Stream Processing Platform
PPTX
FAIR Projector Builder
PDF
PLNOG 13: Alexis Dacquay: Architectures for Universal Data Centre Networks, t...
PDF
Streaming Analytics Unit 3 notes for engineers
PDF
SA UNIT III STORM.pdf
PDF
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
PDF
Steven Le Roux - Kafka et Storm au service de la lutte antiDDoS à OVH - Soiré...
PDF
Real-Time Data Processing at RTB House – Architecture & Lessons Learned
DOCX
Dc project 1
PDF
Spark streaming state of the union
PPT
FEC & File Multicast
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
All Day DevOps - FLiP Stack for Cloud Data Lakes
An Optics Life
Porting a Streaming Pipeline from Scala to Rust
PROFIBUS frame analysis - Peter Thomas of Control Specialists
Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
GARUDA
Network simulator 2 a simulation tool for linux
Apache Kafka - A modern Stream Processing Platform
FAIR Projector Builder
PLNOG 13: Alexis Dacquay: Architectures for Universal Data Centre Networks, t...
Streaming Analytics Unit 3 notes for engineers
SA UNIT III STORM.pdf
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
Steven Le Roux - Kafka et Storm au service de la lutte antiDDoS à OVH - Soiré...
Real-Time Data Processing at RTB House – Architecture & Lessons Learned
Dc project 1
Spark streaming state of the union
FEC & File Multicast
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
KodekX | Application Modernization Development
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
A Presentation on Artificial Intelligence
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Per capita expenditure prediction using model stacking based on satellite ima...
KodekX | Application Modernization Development
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Matthias J. Sax – A Tale of Squirrels and Storms