SlideShare a Scribd company logo
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem and Apache Flink’s accelerated growth
Some practical information
Network name: Flink Forward 2016
Password: #flinkforward16
Twitter handle: @flinkforward
Hashtag: #ff16
Group photo today at 3.30 pm
All talks will be recorded and can be found on our YouTube channel
“Apache Flink Berlin” after the conference
FlinkFest today at Palais starting at 6.10 pm
Attention:
Some last minute changes to the
program, please consult online
schedule
3
The Venue
4
A big thanks to our sponsors!
5
A big thanks to our program committee!
Tyler Akidau
Google
Stephan Ewen
data Artisans
Jamie Grier
data Artisans
Vasia Kalavri
KTH
Neha Narkhede
Confluent
6
A big thanks to our speakers!
7
A big thanks to our speakers!
8
Kostas Tzoumas
Stephan Ewen
Flink Forward
September 12, 2016
The data streaming ecosystem and
Apache Flink®: present and future
9
Founded by the original creators of Apache Flink®, our goal
is to make stream processing accessible to the enterprise
 Contributing and helping the Flink community grow
 Providing enterprise support and services
Streaming is a rapidly growing and maturing
market category of its own
Streaming is the biggest change in data
infrastructure (Flink Forward 2015)
10
The Flink community has been at the center of
this journey. And there is innovation and
convergence in all parts of the stack.
message
transport
compute
engine
programming
paradigm
11
Why? Streaming technology is enabling the
obvious: continuous processing on data that
is continuously produced
Hint: you already have streaming data
12
Data streaming adoption patterns
 Real-time products and business monitoring
 Robust continuous applications
 Decentralized architecture
 Unify real-time and historical data
13
Retail, e-commerce
 Better product
recommendations
 Process monitoring
 Inventory
management
Finance
 Differentiation via
tech
 Push-based
products
 Fraud detection
Telco, IoT,
Infrastructure
 Infrastructure
monitoring
 Anomaly detection
Internet & mobile
 Personalization
 User behavior
monitoring
 Analytics
14
30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
15
What is Flink's unique role in the streaming
data ecosystem?
16
Before Flink, users had to make hard
choices between:
 Volume
 Latency
 Accuracy
17
Flink eliminates these tradeoffs
 10s of millions events per second for stateful
applications
 Sub-second latency, as low as single-digit
milliseconds
 Accurate computation results
18
A broader definition of accuracy: the
results that I want when I want them
1. Accurate under failures and downtime
2. Accurate under out of order data
3. Results when you need them
4. Accurate modeling of the world
19
1. Failures and downtime
 Checkpoints & savepoints
 Exactly-once guarantees
2. Out of order and late data
 Event time support
 Watermarks
3. Results when you need them
 Low latency
 Triggers
4. Accurate modeling
 True streaming engine
 Sessions and flexible
windows
20
5. Batch + streaming
 One engine
 Dedicated APIs
6. Reprocessing
 High throughput, event
time support, and
savepoints
7. Ecosystem
 Rich connector ecosystem
and 3rd party packages
8. Community support
 One of the most active
projects with over 200
contributors
21
flink -s <savepoint> <job>
What are the next steps for Flink?
22
 Provide state of the art streaming capabilities (✔)
 Operate in the largest infrastructures of the world
 Open up to a wider set of enterprise users
 Broaden the scope of stream processing
23
Apache Flink today
24
The Apache Flink community has
pushed the boundaries of
open source stream processing.
Flink's unique combination of features
25
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Consistency
Works on real-time
and historic data
Performance Event Time
APIs
Libraries
Stateful
Streaming
Savepoints
(replays, A/B testing,
upgrades, versioning)
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing
Fluent API
Out-of-order events
Fast and large
out-of-core state
Flink v1.1
26
Connectors
Metric
System
(Stream) SQL Session
Windows
Library
enhancements
Flink v1.1 + current threads
27
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State
Flink v1.1 + current threads
28
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State
Flink v1.1 + current threads
29
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State
Queryable State
Flink v1.1 + current threads
30
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication More details in the Talk
"The Future of Apache Flink"
(Monday, 11:00)
Security / Authentication
31
No unauthorized data access
Secured clusters with Kerberos-based authentication
• Kafka, ZooKeeper, HDFS, YARN, HBase, …
No unencrypted traffic between Flink Processes
• RPC, Data Exchange, Web UI
Largely contributed by
Prevent malicious users to hook into Flink jobs
See talk
"Flink Security
Enhancements"
(Tuesday, 11.45)
Checkpoints / Savepoints
32
Recover a running job into a new job
Recover a running job onto a new cluster
Application state backwards compatibility
• Flink 1.0 made the APIs backwards compatible
• Now making the savepoints backwards compatible
• Applications can be moved to newer versions of
Flink even when state backends or internals change
v1.x v2.0v1.y
Dynamic scaling
33
Changing load bears changing resource requirements
• Need to adjust parallelism of running streaming jobs
Re-scaling stateless operators is trivial
Re-scaling stateful operators is hard (windows, user state)
• Efficiently re-shard state
time
Workload
Resources
Re-scaling Flink jobs preserves
exactly-once guarantees
See talk
"Dynamic scaling: How Apache
Flink adapts to changing
workloads"
(Tuesday, 14.45)
Cluster management
34
Series of improvements to seamlessly interoperate with
various cluster managers
• YARN, Mesos, Docker, Standalone, …
• Proper isolation of jobs, clean support for multi-job sessions
Dynamic acquire/release of resources
Using mixed container sizes
Driven by
Mesos integration contributed by
and
Cluster management
35
Series of improvements to seamlessly interoperate with
various cluster managers
• YARN, Mesos, Docker, Standalone, …
• Proper isolation of jobs, clean support for multi-job sessions
Dynamic acquire/release of resources
Using mixed container sizes
Driven by
Mesos integration contributed by
and
See talk
"Introducing Flink on
Mesos"
(Tuesday, 11.30)
See talk
"Running Flink
Everywhere"
(Monday, 16.45)
Stream SQL
36
SQL is the standard high-level query language
A natural way to open up streaming to more people
Problem: There is no Streaming SQL standard
• At least beyond the basic operations
• Challenging: Incorporate windows and time semantics
Flink community working with
Apache Calcite to draft a new model
Stream SQL
37
SQL is the standard high-level query language
A natural way to open up streaming to more people
Flink community working with users and with
Apache Calcite to draft a new model
Problem: There is no Streaming SQL standard
• At least beyond the basic operations
• Challenging: Incorporate windows and time semantics
See talk
"Streaming SQL"
(Monday, 11:00)
See talk
"Taking a look under the
hood of Apache Flink’s
relational APIs"
(Monday, 16.45)
Looking further
38
Streaming and batch
39
The separation of batch and streaming …
… is quite artificial
… has been largely technology driven (not by use cases)
In fact – several talks here are about batch processing…
People are approaching Flink for batch processing as well
Streaming and batch
40
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Streaming and batch
41
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Stream (high latency)
Streaming and batch
42
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)
Stream (high latency)
Why use batch at all now?
43
… or Flink's DataSet API
… dedicated batch processors
Cost of fault tolerance
and accuracy
Resource elasticity /
efficiency
Missing primitives
(example: BSP iterations)
Possible to add to
DataStream API
Deeper integration
between batch and streaming
techniques
Some batch proof points…
44
TeraSort
Relational Join
Classic Batch Jobs
Graph
Processing
Linear
Algebra
State in stream processing
45
Stateless Streaming
(Apache Storm)
Stateful Streaming
(Apache Samza)
Accurate Stateful Streaming
(Apache Flink)
State sizes in Flink today (my assessment): 10s gigabytes per operator
How to scale this to many terabytes?
• Queryable State
• Data driven triggers over large state
Large-state streaming
46
How to scale the stream processor state?
… and maintain fast checkpoint intervals?
… and have very fast recovery on machine failures?
More and more database techniques coming into Flink
…in conclusion
1. Flink is running in some of the largest streaming setups
2. Community is working on adding many
state-of-the-art operational features
3. Available to broader audiences, via Stream SQL
4. Streaming has even more potential to subsume batch
and will hold more and more application state
47
48
Enjoy the conference!

More Related Content

PPTX
Apache Flink Community Updates November 2016 @ Berlin Meetup
PDF
Jamie Grier - Robust Stream Processing with Apache Flink
PPTX
data Artisans Product Announcement
PDF
A look at Flink 1.2
PPTX
Counting Elements in Streams
PPTX
Flink 1.0-slides
PPTX
Aljoscha Krettek - The Future of Apache Flink
PPTX
Robust Stream Processing with Apache Flink
Apache Flink Community Updates November 2016 @ Berlin Meetup
Jamie Grier - Robust Stream Processing with Apache Flink
data Artisans Product Announcement
A look at Flink 1.2
Counting Elements in Streams
Flink 1.0-slides
Aljoscha Krettek - The Future of Apache Flink
Robust Stream Processing with Apache Flink

What's hot (20)

PPTX
Streaming in the Wild with Apache Flink
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PDF
Big Data Warsaw
PDF
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
PPTX
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
PPTX
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
PPTX
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
PDF
Stream Processing with Apache Flink
PPTX
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PPTX
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
PPTX
Extending the Yahoo Streaming Benchmark
PPTX
The Evolution of (Open Source) Data Processing
PPTX
Flink. Pure Streaming
PDF
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
PPTX
Flink Community Update December 2015: Year in Review
PPTX
Robust Stream Processing With Apache Flink
PDF
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
PPTX
The Past, Present, and Future of Apache Flink®
PDF
Scaling stream data pipelines with Pravega and Apache Flink
Streaming in the Wild with Apache Flink
Taking a look under the hood of Apache Flink's relational APIs.
Big Data Warsaw
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Stream Processing with Apache Flink
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Apache Flink(tm) - A Next-Generation Stream Processor
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Extending the Yahoo Streaming Benchmark
The Evolution of (Open Source) Data Processing
Flink. Pure Streaming
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Community Update December 2015: Year in Review
Robust Stream Processing With Apache Flink
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
The Past, Present, and Future of Apache Flink®
Scaling stream data pipelines with Pravega and Apache Flink
Ad

Viewers also liked (20)

PPTX
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
PPTX
Ted Dunning-Faster and Furiouser- Flink Drift
PDF
Julian Hyde - Streaming SQL
PPTX
Stephan Ewen - Running Flink Everywhere
PDF
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
PDF
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
PPTX
Stephan Ewen - Scaling to large State
PDF
Márton Balassi Streaming ML with Flink-
PPTX
Flink Streaming @BudapestData
PPTX
Kostas Tzoumas - Stream Processing with Apache Flink®
PDF
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
PDF
Alexander Kolb - Flinkspector – Taming the squirrel
PDF
Automatic Detection of Web Trackers by Vasia Kalavri
PDF
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
PDF
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
PPTX
Eron Wright - Introducing Flink on Mesos
PPTX
Ted Dunning - Keynote: How Can We Take Flink Forward?
PPTX
Eron Wright - Flink Security Enhancements
PDF
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

PPTX
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Ted Dunning-Faster and Furiouser- Flink Drift
Julian Hyde - Streaming SQL
Stephan Ewen - Running Flink Everywhere
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Stephan Ewen - Scaling to large State
Márton Balassi Streaming ML with Flink-
Flink Streaming @BudapestData
Kostas Tzoumas - Stream Processing with Apache Flink®
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Alexander Kolb - Flinkspector – Taming the squirrel
Automatic Detection of Web Trackers by Vasia Kalavri
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Eron Wright - Introducing Flink on Mesos
Ted Dunning - Keynote: How Can We Take Flink Forward?
Eron Wright - Flink Security Enhancements
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Ad

Similar to Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem and Apache Flink’s accelerated growth (20)

PPTX
Data Stream Processing with Apache Flink
PPTX
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
PPTX
Apache flink 1.7 and Beyond
PPTX
Apache Flink: Past, Present and Future
PDF
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
PPTX
Flink System Overview
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
PPTX
QCon London - Stream Processing with Apache Flink
PPTX
Debunking Six Common Myths in Stream Processing
PPTX
Streaming in the Wild with Apache Flink
PPTX
Debunking Common Myths in Stream Processing
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PDF
Santander Stream Processing with Apache Flink
PDF
Apache flink
PPTX
Debunking Common Myths in Stream Processing
PPTX
Unified Batch and Real-Time Stream Processing Using Apache Flink
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
PPTX
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Data Stream Processing with Apache Flink
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache flink 1.7 and Beyond
Apache Flink: Past, Present and Future
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
GOTO Night Amsterdam - Stream processing with Apache Flink
Flink System Overview
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
QCon London - Stream Processing with Apache Flink
Debunking Six Common Myths in Stream Processing
Streaming in the Wild with Apache Flink
Debunking Common Myths in Stream Processing
Flexible and Real-Time Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
Apache flink
Debunking Common Myths in Stream Processing
Unified Batch and Real-Time Stream Processing Using Apache Flink
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Lecture1 pattern recognition............
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Global journeys: estimating international migration
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Database Infoormation System (DBIS).pptx
Business Acumen Training GuidePresentation.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Lecture1 pattern recognition............
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
oil_refinery_comprehensive_20250804084928 (1).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Global journeys: estimating international migration
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Moving the Public Sector (Government) to a Digital Adoption
Reliability_Chapter_ presentation 1221.5784
Clinical guidelines as a resource for EBP(1).pdf
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx

Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem and Apache Flink’s accelerated growth

  • 2. Some practical information Network name: Flink Forward 2016 Password: #flinkforward16 Twitter handle: @flinkforward Hashtag: #ff16 Group photo today at 3.30 pm All talks will be recorded and can be found on our YouTube channel “Apache Flink Berlin” after the conference FlinkFest today at Palais starting at 6.10 pm Attention: Some last minute changes to the program, please consult online schedule
  • 4. 4 A big thanks to our sponsors!
  • 5. 5 A big thanks to our program committee! Tyler Akidau Google Stephan Ewen data Artisans Jamie Grier data Artisans Vasia Kalavri KTH Neha Narkhede Confluent
  • 6. 6 A big thanks to our speakers!
  • 7. 7 A big thanks to our speakers!
  • 8. 8 Kostas Tzoumas Stephan Ewen Flink Forward September 12, 2016 The data streaming ecosystem and Apache Flink®: present and future
  • 9. 9 Founded by the original creators of Apache Flink®, our goal is to make stream processing accessible to the enterprise  Contributing and helping the Flink community grow  Providing enterprise support and services
  • 10. Streaming is a rapidly growing and maturing market category of its own Streaming is the biggest change in data infrastructure (Flink Forward 2015) 10
  • 11. The Flink community has been at the center of this journey. And there is innovation and convergence in all parts of the stack. message transport compute engine programming paradigm 11
  • 12. Why? Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data 12
  • 13. Data streaming adoption patterns  Real-time products and business monitoring  Robust continuous applications  Decentralized architecture  Unify real-time and historical data 13
  • 14. Retail, e-commerce  Better product recommendations  Process monitoring  Inventory management Finance  Differentiation via tech  Push-based products  Fraud detection Telco, IoT, Infrastructure  Infrastructure monitoring  Anomaly detection Internet & mobile  Personalization  User behavior monitoring  Analytics 14
  • 15. 30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second 15
  • 16. What is Flink's unique role in the streaming data ecosystem? 16
  • 17. Before Flink, users had to make hard choices between:  Volume  Latency  Accuracy 17
  • 18. Flink eliminates these tradeoffs  10s of millions events per second for stateful applications  Sub-second latency, as low as single-digit milliseconds  Accurate computation results 18
  • 19. A broader definition of accuracy: the results that I want when I want them 1. Accurate under failures and downtime 2. Accurate under out of order data 3. Results when you need them 4. Accurate modeling of the world 19
  • 20. 1. Failures and downtime  Checkpoints & savepoints  Exactly-once guarantees 2. Out of order and late data  Event time support  Watermarks 3. Results when you need them  Low latency  Triggers 4. Accurate modeling  True streaming engine  Sessions and flexible windows 20
  • 21. 5. Batch + streaming  One engine  Dedicated APIs 6. Reprocessing  High throughput, event time support, and savepoints 7. Ecosystem  Rich connector ecosystem and 3rd party packages 8. Community support  One of the most active projects with over 200 contributors 21 flink -s <savepoint> <job>
  • 22. What are the next steps for Flink? 22
  • 23.  Provide state of the art streaming capabilities (✔)  Operate in the largest infrastructures of the world  Open up to a wider set of enterprise users  Broaden the scope of stream processing 23
  • 24. Apache Flink today 24 The Apache Flink community has pushed the boundaries of open source stream processing.
  • 25. Flink's unique combination of features 25 Low latency High Throughput Well-behaved flow control (back pressure) Consistency Works on real-time and historic data Performance Event Time APIs Libraries Stateful Streaming Savepoints (replays, A/B testing, upgrades, versioning) Exactly-once semantics for fault tolerance Windows & user-defined state Flexible windows (time, count, session, roll-your own) Complex Event Processing Fluent API Out-of-order events Fast and large out-of-core state
  • 26. Flink v1.1 26 Connectors Metric System (Stream) SQL Session Windows Library enhancements
  • 27. Flink v1.1 + current threads 27 Connectors Session Windows (Stream) SQL Library enhancements Metric System Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Security Mesos & others Dynamic Resource Management Authentication Queryable State
  • 28. Flink v1.1 + current threads 28 Connectors Session Windows (Stream) SQL Library enhancements Metric System Operations Ecosystem Application Features Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Broader Audience Security Mesos & others Dynamic Resource Management Authentication Queryable State
  • 29. Flink v1.1 + current threads 29 Connectors Session Windows (Stream) SQL Library enhancements Metric System Operations Ecosystem Application Features Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Broader Audience Security Mesos & others Dynamic Resource Management Authentication Queryable State
  • 30. Queryable State Flink v1.1 + current threads 30 Connectors Session Windows (Stream) SQL Library enhancements Metric System Operations Ecosystem Application Features Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Broader Audience Security Mesos & others Dynamic Resource Management Authentication More details in the Talk "The Future of Apache Flink" (Monday, 11:00)
  • 31. Security / Authentication 31 No unauthorized data access Secured clusters with Kerberos-based authentication • Kafka, ZooKeeper, HDFS, YARN, HBase, … No unencrypted traffic between Flink Processes • RPC, Data Exchange, Web UI Largely contributed by Prevent malicious users to hook into Flink jobs See talk "Flink Security Enhancements" (Tuesday, 11.45)
  • 32. Checkpoints / Savepoints 32 Recover a running job into a new job Recover a running job onto a new cluster Application state backwards compatibility • Flink 1.0 made the APIs backwards compatible • Now making the savepoints backwards compatible • Applications can be moved to newer versions of Flink even when state backends or internals change v1.x v2.0v1.y
  • 33. Dynamic scaling 33 Changing load bears changing resource requirements • Need to adjust parallelism of running streaming jobs Re-scaling stateless operators is trivial Re-scaling stateful operators is hard (windows, user state) • Efficiently re-shard state time Workload Resources Re-scaling Flink jobs preserves exactly-once guarantees See talk "Dynamic scaling: How Apache Flink adapts to changing workloads" (Tuesday, 14.45)
  • 34. Cluster management 34 Series of improvements to seamlessly interoperate with various cluster managers • YARN, Mesos, Docker, Standalone, … • Proper isolation of jobs, clean support for multi-job sessions Dynamic acquire/release of resources Using mixed container sizes Driven by Mesos integration contributed by and
  • 35. Cluster management 35 Series of improvements to seamlessly interoperate with various cluster managers • YARN, Mesos, Docker, Standalone, … • Proper isolation of jobs, clean support for multi-job sessions Dynamic acquire/release of resources Using mixed container sizes Driven by Mesos integration contributed by and See talk "Introducing Flink on Mesos" (Tuesday, 11.30) See talk "Running Flink Everywhere" (Monday, 16.45)
  • 36. Stream SQL 36 SQL is the standard high-level query language A natural way to open up streaming to more people Problem: There is no Streaming SQL standard • At least beyond the basic operations • Challenging: Incorporate windows and time semantics Flink community working with Apache Calcite to draft a new model
  • 37. Stream SQL 37 SQL is the standard high-level query language A natural way to open up streaming to more people Flink community working with users and with Apache Calcite to draft a new model Problem: There is no Streaming SQL standard • At least beyond the basic operations • Challenging: Incorporate windows and time semantics See talk "Streaming SQL" (Monday, 11:00) See talk "Taking a look under the hood of Apache Flink’s relational APIs" (Monday, 16.45)
  • 39. Streaming and batch 39 The separation of batch and streaming … … is quite artificial … has been largely technology driven (not by use cases) In fact – several talks here are about batch processing… People are approaching Flink for batch processing as well
  • 40. Streaming and batch 40 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition
  • 41. Streaming and batch 41 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition Stream (low latency) Stream (high latency)
  • 42. Streaming and batch 42 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  • 43. Why use batch at all now? 43 … or Flink's DataSet API … dedicated batch processors Cost of fault tolerance and accuracy Resource elasticity / efficiency Missing primitives (example: BSP iterations) Possible to add to DataStream API Deeper integration between batch and streaming techniques
  • 44. Some batch proof points… 44 TeraSort Relational Join Classic Batch Jobs Graph Processing Linear Algebra
  • 45. State in stream processing 45 Stateless Streaming (Apache Storm) Stateful Streaming (Apache Samza) Accurate Stateful Streaming (Apache Flink) State sizes in Flink today (my assessment): 10s gigabytes per operator How to scale this to many terabytes? • Queryable State • Data driven triggers over large state
  • 46. Large-state streaming 46 How to scale the stream processor state? … and maintain fast checkpoint intervals? … and have very fast recovery on machine failures? More and more database techniques coming into Flink
  • 47. …in conclusion 1. Flink is running in some of the largest streaming setups 2. Community is working on adding many state-of-the-art operational features 3. Available to broader audiences, via Stream SQL 4. Streaming has even more potential to subsume batch and will hold more and more application state 47

Editor's Notes

  • #14: Uber, Netflix, Alibaba, Zalando, King