SlideShare a Scribd company logo
Till Rohrmann
till@data-artisans.com
@stsffap
From Apache Flink®
1.3 to 1.4
2
Original creators of Apache
Flink®
Providers of
dA Platform 2, including
open source Apache Flink +
dA Application Manager
Overview
Apache Flink 1.3 – Previously on Apache
Flink
Apache Flink 1.4 – What’s happening now?
Apache Flink 1.5+ – Next on Apache Flink
3
Previously on Apache Flink
Apache Flink 1.3
Apache Flink 1.3 in Numbers
141 contributors (no deduplication)
1400 commits
>= 680 resolved JIRA issues
+261813 / -65646 LOC
5
Evolution of Flink’s API
6
Flink 1.0.0
State API (ValueState
ReducingState, ListState)
Flink 1.1.0
Session Windows
Late arriving events
Flink 1.2.0
ProcessFunction (access
to state, timers, events)
Flink 1.3.0
Side outputs
Access to per-window state
Side Outputs
 Additional outputs for a stream
 Late events
 Corrupted input data
 More expressive APIs
 FLINK-4460
7
Process
Function
Main output
Side output
Side Outputs: Example
8
DataStream<Integer> input = ...;
final OutputTag<String> outputTag = new OutputTag<String>("side-output"){};
SingleOutputStreamOperator<Integer> mainDataStream = input
.process(new ProcessFunction<Integer, Integer>() {
@Override
public void processElement(
Integer value,
Context ctx,
Collector<Integer> out) throws Exception {
// emit data to regular output
out.collect(value);
// emit data to side output
ctx.output(outputTag, "sideout-" + String.valueOf(value));
}
});
DataStream<String> sideOutputStream = mainDataStream.getSideOutput(outputTag);
Evolution of Large State Handling
9
Flink 1.0.0
RocksDB for out-of-core
state support
Flink 1.1.0
Fully async RocksDB
snapshots
Flink 1.2.0
Rescalable keyed and
non-partitioned state
Flink 1.3.0
Incremental checkpoints
Fine-grained recovery
G
H
C
D
Full Checkpoints
10
Checkpoint 1 Checkpoint 2 Checkpoint 3
I
E
A
B
C
D
A
B
C
D
A
F
C
D
E
@t1 @t2 @t3
A
F
C
D
E
G
H
C
D
I
E
G
H
C
D
Incremental Checkpoints
11
Checkpoint 1 Checkpoint 2 Checkpoint 3
I
E
A
B
C
D
A
B
C
D
A
F
C
D
E
E
F
G
H
I
@t1 @t2 @t3
Incremental Checkpoints
12
Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4
C1 C3C1 C1
Chunk
1
Chunk
2
Chunk
3
Chunk
4
Storage
C2 C4C3
Incremental Checkpointing Contd.
Currently supported for RocksDB
state backend
FLINK-5053
Faster and smaller checkpoints
13
Full checkpoint Incremental checkpoint
Size 60 GB 1 – 30 GB
Time 180 s 3 – 30 s
“A Look at Flink’s Internal
Data Structures and
Algorithms for Efficient
Checkpointing” by Stefan
Richter, Tomorrow @
12:20 pm Maschinenhaus
Evolution of High Level APIs
14
Flink 1.0.0
CEP library added
Table API v1
Flink 1.1.0
Table API overhaul
Integration with Apache Calcite
Flink 1.2.0
Tumbling, sliding and session
group-windows for Table API
Flink 1.3.0
Rescalable CEP operators
Retractions in Table API/SQL
Enriched CEP Language
Support for quantifiers (+, *, ?)
FLINK-3318
Iterative conditions
FLINK-6197
Not operator
FLINK-3320
15
“Complex Event Processing With
Flink: The State of FlinkCEP” by
Kostas Kloudas, Today @ 2:30
pm Maschinenhaus
CEP: Detect Dipping Stocks
16
DataStream<Stock> stocks = …;
Pattern<Stock, ?> pattern = Pattern
.<Stock>begin("rising")
.where(new IterativeCondition<Stock>() {
@Override
public boolean filter(Stock stock, Context<Stock> ctx) throws Exception {
// calculate the average price
double sum = 0.0; int count = 0;
for (Stock previousStock : ctx.getEventsForPattern("rising")) {
sum += previousStock.getPrice(); count++;
}
// only accept if the price is higher or equal than the average price
return stock.getPrice() >= sum / count;
}).oneOrMore()
.next("falling");
PatternStream<Stock> dippingStocks = new PatternStream<>(stocks.keyBy("name"), pattern);
DataStream<String> namesOfDippingStocks = dippingStocks.select(…);
What’s Happening Now?
Apache Flink 1.4
Event Driven I/O
18
Rework of Flink’s network stack
Event driven network I/O
Use full available capacity
Near perfect latency behaviour
TCP
Buffer
capacity left
flush
Flow Control
 Flow control for TaskManager communication
 Single channel no longer stalls other
multiplexed channels
 Fine-grained backpressure control
 Improves checkpoint alignments
19
“Building a Network Stack
for Optimal Throughput /
Low-Latency Trade-Offs”
by Nico Kruber, Today @
2:00 pm Palais Atelier
Receiver
Sender #1
Sender #2
Give credit
Send
credited data
New Deployment Model
Rework of Flink’s distributed
architecture
Ready for multitude of
deployment scenarios
Support for dynamic scaling
20
“Flink in Containerland” by
Patrick Lucas, Tomorrow
@ 3:20 pm Maschinenhaus
Producing Exactly Once with Kafka 0.11
Support for Kafka 0.11
First Kafka producer with
exactly once processing
guarantees
21
“Hit Me, Baby, Just One Time
– Building End-to-End Exactly
Once Applications With Flink”
by Piotr Nowojski, Today @
3:20 pm Palais Atelier
Consuming Producing
End-to-End exactly once processing
StreamSQL and Table API
Support for retractions
Extended aggregation support
Support for external table
catalogs
Window joins
22
“Unified Stream and Batch
Processing With Apache
Flink’s Relational APIs” by
Fabian Hüske, Tomorrow
@ 11:00 am Kesselhaus
“From Streams to Tables
and Back Again: A Demo
of Flink’s Table & SQL
API” by Timo Walther,
Tomorrow @ 11:50 am
Kesselhaus
Operational Robustness
Drop Java 7
Support Scala 2.12
Avoid dependency hell
Child first class loading
Relocation of
dependencies
De-Hadoopification
23
Next on Apache Flink
Apache Flink 1.5+
Side Inputs
 Additional input for operator
 Join with static data set
 Feeding of externally trained ML model
 Window joins
 Flip-17 design document: https://goo.gl/W4yMEu
25
Process
Function
Main input
Side input
State Management & Evolution
Eager state declaration
State type, serializer and name
known at pre-flight time
Flip-22 design document:
https://goo.gl/trFiSi
Evolving existing state
Schema updates
Serializer upgrades
26
“Managing State in
Apache Flink” by
Tzu-Li Tai, Today @
4:30 pm Kesselhaus
State Replication
Replicate state between
TaskManagers
Faster recovery in
case of failures
High throughput
queryable state
27
TaskManager
TaskManager
Change log stream
Input
State
Programmatic Job Control
Improve client to give better job control
Run concurrent jobs from the same
program
Trigger savepoints programmatically
Better testing facilities
28
JobClient & ClusterClient
29
StreamExecutionEnvironment env = ...;
// define program
JobClient jobClient = env.execute();
CompletableFuture<Acknowledge> savepointFuture = jobClient.takeSavepoint(savepointPath);
// wait for the savepoint completion
savepointFuture.get();
CompletableFuture<JobExecutionResult> resultFuture = jobClient.getResultFuture();
// cancel the job
jobClient.cancelJob();
// get the execution result --> should be canceled
JobExecutionResult result = resultFuture.get();
// get list of all still running jobs on the cluster
ClusterClient clusterClient = jobClient.getClusterClient();
CompletableFuture<List<JobInfo>> jobInfosFuture = clusterClient.getJobInfos();
List<JobInfo> jobInfos = jobInfosFuture.get();
TL;DL
Apache Flink one of the most innovative open
source stream processing platforms
Stay tuned what’s happening next 
Visit the in depths talks to learn more about
Flink’s internals
30
31
Thank you!
@stsffap
@ApacheFlink
@dataArtisans
We are hiring!
data-artisans.com/careers
32

More Related Content

PPTX
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
PPTX
Apache Flink Berlin Meetup May 2016
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
PPTX
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
PPTX
data Artisans Product Announcement
PDF
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
PDF
Apache Flink and More @ MesosCon Asia 2017
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Apache Flink Berlin Meetup May 2016
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen - Experiences running Flink at Very Large Scale
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
data Artisans Product Announcement
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Apache Flink and More @ MesosCon Asia 2017

What's hot (20)

PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PDF
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
PPTX
Flink Community Update December 2015: Year in Review
PDF
A look at Flink 1.2
PPTX
Aljoscha Krettek - The Future of Apache Flink
PPTX
Data Stream Processing with Apache Flink
PPTX
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
PPTX
Real-time Stream Processing with Apache Flink
PPTX
Apache Flink Overview at SF Spark and Friends
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
PDF
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
PPTX
January 2016 Flink Community Update & Roadmap 2016
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
PDF
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
PPTX
Apache Flink at Strata San Jose 2016
PDF
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
PPTX
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
PDF
Scaling stream data pipelines with Pravega and Apache Flink
PPTX
Flink Forward San Francisco 2018 keynote: Stephan Ewen - "What turns stream p...
Taking a look under the hood of Apache Flink's relational APIs.
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Community Update December 2015: Year in Review
A look at Flink 1.2
Aljoscha Krettek - The Future of Apache Flink
Data Stream Processing with Apache Flink
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Real-time Stream Processing with Apache Flink
Apache Flink Overview at SF Spark and Friends
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
January 2016 Flink Community Update & Roadmap 2016
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Apache Flink at Strata San Jose 2016
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Scaling stream data pipelines with Pravega and Apache Flink
Flink Forward San Francisco 2018 keynote: Stephan Ewen - "What turns stream p...
Ad

Similar to From Apache Flink® 1.3 to 1.4 (20)

PPTX
Apache flink
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
PDF
Apache Flink 101 - the rise of stream processing and beyond
PPTX
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
PPTX
Apache Flink: Past, Present and Future
PDF
Apache flink
PDF
Flink Apachecon Presentation
PPTX
Flink Meetup Septmeber 2017 2018
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PDF
Zurich Flink Meetup
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
PDF
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
PPTX
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
PDF
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
PDF
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
PPTX
Workshop híbrido: Stream Processing con Flink
PPTX
Flink System Overview
PPTX
Unified Batch and Real-Time Stream Processing Using Apache Flink
PDF
Apache Flink - a Gentle Start
Apache flink
Why apache Flink is the 4G of Big Data Analytics Frameworks
Apache Flink 101 - the rise of stream processing and beyond
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Apache Flink: Past, Present and Future
Apache flink
Flink Apachecon Presentation
Flink Meetup Septmeber 2017 2018
Flexible and Real-Time Stream Processing with Apache Flink
Zurich Flink Meetup
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Workshop híbrido: Stream Processing con Flink
Flink System Overview
Unified Batch and Real-Time Stream Processing Using Apache Flink
Apache Flink - a Gentle Start
Ad

More from Till Rohrmann (16)

PDF
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
PPTX
Apache flink 1.7 and Beyond
PDF
Elastic Streams at Scale @ Flink Forward 2018 Berlin
PDF
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
PDF
Apache Flink® Meets Apache Mesos® and DC/OS
PPTX
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
PDF
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
PDF
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
PDF
Streaming Analytics & CEP - Two sides of the same coin?
PDF
Apache Flink: Streaming Done Right @ FOSDEM 2016
PDF
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
PDF
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
PDF
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
PDF
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
PDF
Machine Learning with Apache Flink at Stockholm Machine Learning Group
PDF
Introduction to Apache Flink - Fast and reliable big data processing
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Apache flink 1.7 and Beyond
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink® Meets Apache Mesos® and DC/OS
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Streaming Analytics & CEP - Two sides of the same coin?
Apache Flink: Streaming Done Right @ FOSDEM 2016
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Introduction to Apache Flink - Fast and reliable big data processing

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
cuic standard and advanced reporting.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
Modernizing your data center with Dell and AMD
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Weekly Chronicles - August'25 Week I
cuic standard and advanced reporting.pdf
Machine learning based COVID-19 study performance prediction
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.

From Apache Flink® 1.3 to 1.4

  • 2. 2 Original creators of Apache Flink® Providers of dA Platform 2, including open source Apache Flink + dA Application Manager
  • 3. Overview Apache Flink 1.3 – Previously on Apache Flink Apache Flink 1.4 – What’s happening now? Apache Flink 1.5+ – Next on Apache Flink 3
  • 4. Previously on Apache Flink Apache Flink 1.3
  • 5. Apache Flink 1.3 in Numbers 141 contributors (no deduplication) 1400 commits >= 680 resolved JIRA issues +261813 / -65646 LOC 5
  • 6. Evolution of Flink’s API 6 Flink 1.0.0 State API (ValueState ReducingState, ListState) Flink 1.1.0 Session Windows Late arriving events Flink 1.2.0 ProcessFunction (access to state, timers, events) Flink 1.3.0 Side outputs Access to per-window state
  • 7. Side Outputs  Additional outputs for a stream  Late events  Corrupted input data  More expressive APIs  FLINK-4460 7 Process Function Main output Side output
  • 8. Side Outputs: Example 8 DataStream<Integer> input = ...; final OutputTag<String> outputTag = new OutputTag<String>("side-output"){}; SingleOutputStreamOperator<Integer> mainDataStream = input .process(new ProcessFunction<Integer, Integer>() { @Override public void processElement( Integer value, Context ctx, Collector<Integer> out) throws Exception { // emit data to regular output out.collect(value); // emit data to side output ctx.output(outputTag, "sideout-" + String.valueOf(value)); } }); DataStream<String> sideOutputStream = mainDataStream.getSideOutput(outputTag);
  • 9. Evolution of Large State Handling 9 Flink 1.0.0 RocksDB for out-of-core state support Flink 1.1.0 Fully async RocksDB snapshots Flink 1.2.0 Rescalable keyed and non-partitioned state Flink 1.3.0 Incremental checkpoints Fine-grained recovery
  • 10. G H C D Full Checkpoints 10 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E @t1 @t2 @t3 A F C D E G H C D I E
  • 11. G H C D Incremental Checkpoints 11 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E E F G H I @t1 @t2 @t3
  • 12. Incremental Checkpoints 12 Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4 C1 C3C1 C1 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Storage C2 C4C3
  • 13. Incremental Checkpointing Contd. Currently supported for RocksDB state backend FLINK-5053 Faster and smaller checkpoints 13 Full checkpoint Incremental checkpoint Size 60 GB 1 – 30 GB Time 180 s 3 – 30 s “A Look at Flink’s Internal Data Structures and Algorithms for Efficient Checkpointing” by Stefan Richter, Tomorrow @ 12:20 pm Maschinenhaus
  • 14. Evolution of High Level APIs 14 Flink 1.0.0 CEP library added Table API v1 Flink 1.1.0 Table API overhaul Integration with Apache Calcite Flink 1.2.0 Tumbling, sliding and session group-windows for Table API Flink 1.3.0 Rescalable CEP operators Retractions in Table API/SQL
  • 15. Enriched CEP Language Support for quantifiers (+, *, ?) FLINK-3318 Iterative conditions FLINK-6197 Not operator FLINK-3320 15 “Complex Event Processing With Flink: The State of FlinkCEP” by Kostas Kloudas, Today @ 2:30 pm Maschinenhaus
  • 16. CEP: Detect Dipping Stocks 16 DataStream<Stock> stocks = …; Pattern<Stock, ?> pattern = Pattern .<Stock>begin("rising") .where(new IterativeCondition<Stock>() { @Override public boolean filter(Stock stock, Context<Stock> ctx) throws Exception { // calculate the average price double sum = 0.0; int count = 0; for (Stock previousStock : ctx.getEventsForPattern("rising")) { sum += previousStock.getPrice(); count++; } // only accept if the price is higher or equal than the average price return stock.getPrice() >= sum / count; }).oneOrMore() .next("falling"); PatternStream<Stock> dippingStocks = new PatternStream<>(stocks.keyBy("name"), pattern); DataStream<String> namesOfDippingStocks = dippingStocks.select(…);
  • 18. Event Driven I/O 18 Rework of Flink’s network stack Event driven network I/O Use full available capacity Near perfect latency behaviour TCP Buffer capacity left flush
  • 19. Flow Control  Flow control for TaskManager communication  Single channel no longer stalls other multiplexed channels  Fine-grained backpressure control  Improves checkpoint alignments 19 “Building a Network Stack for Optimal Throughput / Low-Latency Trade-Offs” by Nico Kruber, Today @ 2:00 pm Palais Atelier Receiver Sender #1 Sender #2 Give credit Send credited data
  • 20. New Deployment Model Rework of Flink’s distributed architecture Ready for multitude of deployment scenarios Support for dynamic scaling 20 “Flink in Containerland” by Patrick Lucas, Tomorrow @ 3:20 pm Maschinenhaus
  • 21. Producing Exactly Once with Kafka 0.11 Support for Kafka 0.11 First Kafka producer with exactly once processing guarantees 21 “Hit Me, Baby, Just One Time – Building End-to-End Exactly Once Applications With Flink” by Piotr Nowojski, Today @ 3:20 pm Palais Atelier Consuming Producing End-to-End exactly once processing
  • 22. StreamSQL and Table API Support for retractions Extended aggregation support Support for external table catalogs Window joins 22 “Unified Stream and Batch Processing With Apache Flink’s Relational APIs” by Fabian Hüske, Tomorrow @ 11:00 am Kesselhaus “From Streams to Tables and Back Again: A Demo of Flink’s Table & SQL API” by Timo Walther, Tomorrow @ 11:50 am Kesselhaus
  • 23. Operational Robustness Drop Java 7 Support Scala 2.12 Avoid dependency hell Child first class loading Relocation of dependencies De-Hadoopification 23
  • 24. Next on Apache Flink Apache Flink 1.5+
  • 25. Side Inputs  Additional input for operator  Join with static data set  Feeding of externally trained ML model  Window joins  Flip-17 design document: https://goo.gl/W4yMEu 25 Process Function Main input Side input
  • 26. State Management & Evolution Eager state declaration State type, serializer and name known at pre-flight time Flip-22 design document: https://goo.gl/trFiSi Evolving existing state Schema updates Serializer upgrades 26 “Managing State in Apache Flink” by Tzu-Li Tai, Today @ 4:30 pm Kesselhaus
  • 27. State Replication Replicate state between TaskManagers Faster recovery in case of failures High throughput queryable state 27 TaskManager TaskManager Change log stream Input State
  • 28. Programmatic Job Control Improve client to give better job control Run concurrent jobs from the same program Trigger savepoints programmatically Better testing facilities 28
  • 29. JobClient & ClusterClient 29 StreamExecutionEnvironment env = ...; // define program JobClient jobClient = env.execute(); CompletableFuture<Acknowledge> savepointFuture = jobClient.takeSavepoint(savepointPath); // wait for the savepoint completion savepointFuture.get(); CompletableFuture<JobExecutionResult> resultFuture = jobClient.getResultFuture(); // cancel the job jobClient.cancelJob(); // get the execution result --> should be canceled JobExecutionResult result = resultFuture.get(); // get list of all still running jobs on the cluster ClusterClient clusterClient = jobClient.getClusterClient(); CompletableFuture<List<JobInfo>> jobInfosFuture = clusterClient.getJobInfos(); List<JobInfo> jobInfos = jobInfosFuture.get();
  • 30. TL;DL Apache Flink one of the most innovative open source stream processing platforms Stay tuned what’s happening next  Visit the in depths talks to learn more about Flink’s internals 30

Editor's Notes

  • #3: A little bit about myself, I am a committer for Apache Flink and a software engineer for data Artisans, the original creators of Apache Flink and the providers of the dA Platform.
  • #21: Flink as a library Containerized execution
  • #24: Relocated dependencies: asm, guava, jackson, netty