SlideShare a Scribd company logo
Upcoming Features:
Apache Flink™ 0.10
Aljoscha Krettek
aljoscha@apache.org
What to Expect
 High-Availability of Master Node
(JobManager)
 Live Monitoring
 Event-time, watermarks and windowing
improvements
 Demo: Fault Tolerance
2
These are only the highlights, more stuff is being worked on!
High Availability
3
Status Quo
4
JobManager
TasManager
PANIC!
With High Availability
5
JobManager
TaskManager
Stand-by
JobManager
Apache Zookeeper™
KEEP GOING
Some Details
 Flink uses ZooKeeper™ for two things:
• Leader selection (in case of multiple
JobManagers)
• Reliable Storage of Dataflow graph and
checkpoint metadata (more on that later)
6
Live Monitoring
7
Live Monitoring
 Before:
• Accumulators only available after Job finishes
 Now:
• Accumulators updated while Job is running
• System accumulators (number of
bytes/records processed…)
8
9
Timestamps, Watermarks and
the Rest™
10
Why all the Fuss?
11
Window
Operator112131143
Payload: 0x45FD
Timestamp: 13
Window Window
Flow of Data
Elements do not arrive ordered by Timestamp.
? ?
Processing Time Windows
12
Window
Operator112131143
Payload: 0x45FD
Timestamp: 13
1143
Window
11213
Window
Flow of Data
Elements do not arrive ordered by Timestamp.
Event Time Windows
13
Window
Operator112131143
Payload: 0x45FD
Timestamp: 13
Flow of Data
Elements do not arrive ordered by Timestamp.
111314
Window
312
Window
Problem: How do you
know when to process
windows?
Watermarks to the Rescue
14
Source 11213163115571420
4
This is a Watermark
815
Some Details
 Window Operator waits for watermarks
 Upon Watermark Arrival we can process
elements with timestamps lower than the
watermark
 Operators forward watermarks once they
know they cannot emit elements with
lower timestamp
15
Fault Tolerance
16
Streaming Fault Tolerance
 Ensure that operators see all events
• “At least once”
• Solved by replaying a stream from a
checkpoint, e.g., from a past Kafka offset
 Ensure that operators do not perform
duplicate updates to their state
• “Exactly once”
• Several solutions
17
Exactly-Once Approaches
 Discretized streams (Spark Streaming)
• Treat streaming as a series of small atomic computations
• “Fast track” to fault tolerance, but restricts computational
and programming model (e.g., cannot mutate state across
“mini-batches”, window functions correlated with mini-
batch size)
 MillWheel (Google Cloud Dataflow)
• State update and derived events committed as atomic
transaction to a high-throughput transactional store
• Requires a very high-throughput transactional store 
 Chandy-Lamport distributed snapshots (Flink)
18
19
20
21
22
Best of all Worlds for Streaming
 Low latency
• Thanks to pipelined engine
 Exactly-once guarantees
• Variation of Chandy-Lamport
 High throughput
• Controllable checkpointing overhead
 Separates app logic from recovery
• Checkpointing interval is just a config parameter
23
Demo time
24
25
flink-forward.org
I Flink, do you? 
26
If you find this exciting,
get involved and start a discussion on Flink‘s
mailing list,
or stay tuned by
subscribing to news@flink.apache.org,
following flink.apache.org/blog, and
@ApacheFlink on Twitter

More Related Content

PPT
Step-by-Step Introduction to Apache Flink
PPTX
Apache Flink Hands On
PDF
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
PPTX
Stephan Ewen - Running Flink Everywhere
PPTX
Splunk Conf 2014 - Getting the message
PPTX
Splunking the JVM
PPTX
Developing with the Go client for Apache Kafka
PDF
Performance Testing using Real Browsers with JMeter & Webdriver
Step-by-Step Introduction to Apache Flink
Apache Flink Hands On
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Stephan Ewen - Running Flink Everywhere
Splunk Conf 2014 - Getting the message
Splunking the JVM
Developing with the Go client for Apache Kafka
Performance Testing using Real Browsers with JMeter & Webdriver

What's hot (20)

PPTX
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
PDF
Server(less) Swift at SwiftCloudWorkshop 3
PPTX
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
PDF
Monitoring Akka with Kamon 1.0
PDF
Seven perilous pitfalls to avoid with Java | DevNation Tech Talk
PDF
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
PDF
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
PDF
Openwhisk - Colorado Meetups
PDF
Raffaele Rialdi
PPTX
Introducing Exactly Once Semantics To Apache Kafka
PPTX
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
PDF
OSMC 2021 | Monitoring @ G&D
PPTX
Splunk for JMX
PDF
Tips and Tricks for Operating Apache Kafka
PDF
Fabric8 - Being devOps doesn't suck anymore
PPTX
Your journey into the serverless world
PPTX
(Re)Indexing Large Repositories in Alfresco
PPTX
Developing Real-Time Data Pipelines with Apache Kafka
PDF
Bee con2016 presentation_20160125004_installing
PPTX
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Server(less) Swift at SwiftCloudWorkshop 3
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
Monitoring Akka with Kamon 1.0
Seven perilous pitfalls to avoid with Java | DevNation Tech Talk
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Openwhisk - Colorado Meetups
Raffaele Rialdi
Introducing Exactly Once Semantics To Apache Kafka
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
OSMC 2021 | Monitoring @ G&D
Splunk for JMX
Tips and Tricks for Operating Apache Kafka
Fabric8 - Being devOps doesn't suck anymore
Your journey into the serverless world
(Re)Indexing Large Repositories in Alfresco
Developing Real-Time Data Pipelines with Apache Kafka
Bee con2016 presentation_20160125004_installing
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Ad

Similar to Flink 0.10 - Upcoming Features (20)

PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PPTX
Flink Streaming Hadoop Summit San Jose
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
PPTX
Flink Streaming @BudapestData
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
PDF
Unified Stream and Batch Processing with Apache Flink
PDF
Apache flink
PPTX
QCon London - Stream Processing with Apache Flink
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
PPTX
Apache Flink Overview at SF Spark and Friends
PDF
Introduction to Stateful Stream Processing with Apache Flink.
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
PPTX
Data Stream Processing with Apache Flink
PPTX
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
PDF
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
PPTX
Flink history, roadmap and vision
PDF
Stream Processing with Apache Flink
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PDF
Zurich Flink Meetup
Flexible and Real-Time Stream Processing with Apache Flink
Flink Streaming Hadoop Summit San Jose
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Flink Streaming @BudapestData
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Unified Stream and Batch Processing with Apache Flink
Apache flink
QCon London - Stream Processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
Apache Flink Overview at SF Spark and Friends
Introduction to Stateful Stream Processing with Apache Flink.
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Data Stream Processing with Apache Flink
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink history, roadmap and vision
Stream Processing with Apache Flink
Apache Flink(tm) - A Next-Generation Stream Processor
Zurich Flink Meetup
Ad

More from Aljoscha Krettek (15)

PPTX
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
PPTX
The Evolution of (Open Source) Data Processing
PPTX
Apache Flink and what it is used for
PPTX
The Past, Present, and Future of Apache Flink®
PPTX
(Past), Present, and Future of Apache Flink
PPTX
Python Streaming Pipelines with Beam on Flink
PPTX
The Past, Present, and Future of Apache Flink
PPTX
Robust stream processing with Apache Flink
PDF
Unified stateful big data processing in Apache Beam (incubating)
PPTX
Stream processing for the practitioner: Blueprints for common stream processi...
PPTX
Advanced Flink Training - Design patterns for streaming applications
PPTX
Apache Flink - A Stream Processing Engine
PPTX
Adventures in Timespace - How Apache Flink Handles Time and Windows
PPTX
Data Analysis with Apache Flink (Hadoop Summit, 2015)
PPTX
Apache Flink Hands-On
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
The Evolution of (Open Source) Data Processing
Apache Flink and what it is used for
The Past, Present, and Future of Apache Flink®
(Past), Present, and Future of Apache Flink
Python Streaming Pipelines with Beam on Flink
The Past, Present, and Future of Apache Flink
Robust stream processing with Apache Flink
Unified stateful big data processing in Apache Beam (incubating)
Stream processing for the practitioner: Blueprints for common stream processi...
Advanced Flink Training - Design patterns for streaming applications
Apache Flink - A Stream Processing Engine
Adventures in Timespace - How Apache Flink Handles Time and Windows
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Apache Flink Hands-On

Recently uploaded (20)

PPT
Introduction Database Management System for Course Database
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
AI in Product Development-omnex systems
PDF
System and Network Administration Chapter 2
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
medical staffing services at VALiNTRY
PPTX
L1 - Introduction to python Backend.pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Transform Your Business with a Software ERP System
Introduction Database Management System for Course Database
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Internet Downloader Manager (IDM) Crack 6.42 Build 41
ManageIQ - Sprint 268 Review - Slide Deck
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Wondershare Filmora 15 Crack With Activation Key [2025
AI in Product Development-omnex systems
System and Network Administration Chapter 2
Softaken Excel to vCard Converter Software.pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
Design an Analysis of Algorithms I-SECS-1021-03
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
medical staffing services at VALiNTRY
L1 - Introduction to python Backend.pptx
PTS Company Brochure 2025 (1).pdf.......
How to Migrate SBCGlobal Email to Yahoo Easily
Upgrade and Innovation Strategies for SAP ERP Customers
Transform Your Business with a Software ERP System

Flink 0.10 - Upcoming Features

  • 1. Upcoming Features: Apache Flink™ 0.10 Aljoscha Krettek aljoscha@apache.org
  • 2. What to Expect  High-Availability of Master Node (JobManager)  Live Monitoring  Event-time, watermarks and windowing improvements  Demo: Fault Tolerance 2 These are only the highlights, more stuff is being worked on!
  • 6. Some Details  Flink uses ZooKeeper™ for two things: • Leader selection (in case of multiple JobManagers) • Reliable Storage of Dataflow graph and checkpoint metadata (more on that later) 6
  • 8. Live Monitoring  Before: • Accumulators only available after Job finishes  Now: • Accumulators updated while Job is running • System accumulators (number of bytes/records processed…) 8
  • 9. 9
  • 11. Why all the Fuss? 11 Window Operator112131143 Payload: 0x45FD Timestamp: 13 Window Window Flow of Data Elements do not arrive ordered by Timestamp. ? ?
  • 12. Processing Time Windows 12 Window Operator112131143 Payload: 0x45FD Timestamp: 13 1143 Window 11213 Window Flow of Data Elements do not arrive ordered by Timestamp.
  • 13. Event Time Windows 13 Window Operator112131143 Payload: 0x45FD Timestamp: 13 Flow of Data Elements do not arrive ordered by Timestamp. 111314 Window 312 Window Problem: How do you know when to process windows?
  • 14. Watermarks to the Rescue 14 Source 11213163115571420 4 This is a Watermark 815
  • 15. Some Details  Window Operator waits for watermarks  Upon Watermark Arrival we can process elements with timestamps lower than the watermark  Operators forward watermarks once they know they cannot emit elements with lower timestamp 15
  • 17. Streaming Fault Tolerance  Ensure that operators see all events • “At least once” • Solved by replaying a stream from a checkpoint, e.g., from a past Kafka offset  Ensure that operators do not perform duplicate updates to their state • “Exactly once” • Several solutions 17
  • 18. Exactly-Once Approaches  Discretized streams (Spark Streaming) • Treat streaming as a series of small atomic computations • “Fast track” to fault tolerance, but restricts computational and programming model (e.g., cannot mutate state across “mini-batches”, window functions correlated with mini- batch size)  MillWheel (Google Cloud Dataflow) • State update and derived events committed as atomic transaction to a high-throughput transactional store • Requires a very high-throughput transactional store   Chandy-Lamport distributed snapshots (Flink) 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. Best of all Worlds for Streaming  Low latency • Thanks to pipelined engine  Exactly-once guarantees • Variation of Chandy-Lamport  High throughput • Controllable checkpointing overhead  Separates app logic from recovery • Checkpointing interval is just a config parameter 23
  • 26. I Flink, do you?  26 If you find this exciting, get involved and start a discussion on Flink‘s mailing list, or stay tuned by subscribing to news@flink.apache.org, following flink.apache.org/blog, and @ApacheFlink on Twitter