Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem and Apache Flink’s accelerated growth

Some practical information
Network name: Flink Forward 2016
Password: #flinkforward16
Twitter handle: @flinkforward
Hashtag: #ff16
Group photo today at 3.30 pm
All talks will be recorded and can be found on our YouTube channel
“Apache Flink Berlin” after the conference
FlinkFest today at Palais starting at 6.10 pm
Attention:
Some last minute changes to the
program, please consult online
schedule

4
A big thanks to our sponsors!

5
A big thanks to our program committee!
Tyler Akidau
Google
Stephan Ewen
data Artisans
Jamie Grier
data Artisans
Vasia Kalavri
KTH
Neha Narkhede
Confluent

6
A big thanks to our speakers!

7
A big thanks to our speakers!

8
Kostas Tzoumas
Stephan Ewen
Flink Forward
September 12, 2016
The data streaming ecosystem and
Apache Flink®: present and future

9
Founded by the original creators of Apache Flink®, our goal
is to make stream processing accessible to the enterprise
 Contributing and helping the Flink community grow
 Providing enterprise support and services

Streaming is a rapidly growing and maturing
market category of its own
Streaming is the biggest change in data
infrastructure (Flink Forward 2015)
10

The Flink community has been at the center of
this journey. And there is innovation and
convergence in all parts of the stack.
message
transport
compute
engine
programming
paradigm
11

Why? Streaming technology is enabling the
obvious: continuous processing on data that
is continuously produced
Hint: you already have streaming data
12

Data streaming adoption patterns
 Real-time products and business monitoring
 Robust continuous applications
 Decentralized architecture
 Unify real-time and historical data
13

Retail, e-commerce
 Better product
recommendations
 Process monitoring
 Inventory
management
Finance
 Differentiation via
tech
 Push-based
products
 Fraud detection
Telco, IoT,
Infrastructure
 Infrastructure
monitoring
 Anomaly detection
Internet & mobile
 Personalization
 User behavior
monitoring
 Analytics
14

30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
15

What is Flink's unique role in the streaming
data ecosystem?
16

Before Flink, users had to make hard
choices between:
 Volume
 Latency
 Accuracy
17

Flink eliminates these tradeoffs
 10s of millions events per second for stateful
applications
 Sub-second latency, as low as single-digit
milliseconds
 Accurate computation results
18

A broader definition of accuracy: the
results that I want when I want them
1. Accurate under failures and downtime
2. Accurate under out of order data
3. Results when you need them
4. Accurate modeling of the world
19

1. Failures and downtime
 Checkpoints & savepoints
 Exactly-once guarantees
2. Out of order and late data
 Event time support
 Watermarks
3. Results when you need them
 Low latency
 Triggers
4. Accurate modeling
 True streaming engine
 Sessions and flexible
windows
20

5. Batch + streaming
 One engine
 Dedicated APIs
6. Reprocessing
 High throughput, event
time support, and
savepoints
7. Ecosystem
 Rich connector ecosystem
and 3rd party packages
8. Community support
 One of the most active
projects with over 200
contributors
21
flink -s <savepoint> <job>

What are the next steps for Flink?
22

 Provide state of the art streaming capabilities (✔)
 Operate in the largest infrastructures of the world
 Open up to a wider set of enterprise users
 Broaden the scope of stream processing
23

Apache Flink today
24
The Apache Flink community has
pushed the boundaries of
open source stream processing.

Flink's unique combination of features
25
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Consistency
Works on real-time
and historic data
Performance Event Time
APIs
Libraries
Stateful
Streaming
Savepoints
(replays, A/B testing,
upgrades, versioning)
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing
Fluent API
Out-of-order events
Fast and large
out-of-core state

Flink v1.1
26
Connectors
Metric
System
(Stream) SQL Session
Windows
Library
enhancements

Flink v1.1 + current threads
27
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State

28
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
to savepoints
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State

29
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
to savepoints
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State

Queryable State
30
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
to savepoints
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication More details in the Talk
"The Future of Apache Flink"
(Monday, 11:00)

Security / Authentication
31
No unauthorized data access
Secured clusters with Kerberos-based authentication
• Kafka, ZooKeeper, HDFS, YARN, HBase, …
No unencrypted traffic between Flink Processes
• RPC, Data Exchange, Web UI
Largely contributed by
Prevent malicious users to hook into Flink jobs
See talk
"Flink Security
Enhancements"
(Tuesday, 11.45)

Checkpoints / Savepoints
32
Recover a running job into a new job
Recover a running job onto a new cluster
Application state backwards compatibility
• Flink 1.0 made the APIs backwards compatible
• Now making the savepoints backwards compatible
• Applications can be moved to newer versions of
Flink even when state backends or internals change
v1.x v2.0v1.y

Dynamic scaling
33
Changing load bears changing resource requirements
• Need to adjust parallelism of running streaming jobs
Re-scaling stateless operators is trivial
Re-scaling stateful operators is hard (windows, user state)
• Efficiently re-shard state
time
Workload
Resources
Re-scaling Flink jobs preserves
exactly-once guarantees
See talk
"Dynamic scaling: How Apache
Flink adapts to changing
workloads"
(Tuesday, 14.45)

Cluster management
34
Series of improvements to seamlessly interoperate with
various cluster managers
• YARN, Mesos, Docker, Standalone, …
• Proper isolation of jobs, clean support for multi-job sessions
Dynamic acquire/release of resources
Using mixed container sizes
Driven by
Mesos integration contributed by
and

Cluster management
35
Series of improvements to seamlessly interoperate with
various cluster managers
• YARN, Mesos, Docker, Standalone, …
• Proper isolation of jobs, clean support for multi-job sessions
Dynamic acquire/release of resources
Using mixed container sizes
Driven by
Mesos integration contributed by
and
See talk
"Introducing Flink on
Mesos"
(Tuesday, 11.30)
See talk
"Running Flink
Everywhere"
(Monday, 16.45)

Stream SQL
36
SQL is the standard high-level query language
A natural way to open up streaming to more people
Problem: There is no Streaming SQL standard
• At least beyond the basic operations
• Challenging: Incorporate windows and time semantics
Flink community working with
Apache Calcite to draft a new model

Stream SQL
37
SQL is the standard high-level query language
A natural way to open up streaming to more people
Flink community working with users and with
Apache Calcite to draft a new model
Problem: There is no Streaming SQL standard
• At least beyond the basic operations
• Challenging: Incorporate windows and time semantics
See talk
"Streaming SQL"
(Monday, 11:00)
See talk
"Taking a look under the
hood of Apache Flink’s
relational APIs"
(Monday, 16.45)

Streaming and batch
39
The separation of batch and streaming …
… is quite artificial
… has been largely technology driven (not by use cases)
In fact – several talks here are about batch processing…
People are approaching Flink for batch processing as well

Streaming and batch
40
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition

Streaming and batch
41
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Stream (high latency)

Streaming and batch
42
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)
Stream (high latency)

Why use batch at all now?
43
… or Flink's DataSet API
… dedicated batch processors
Cost of fault tolerance
and accuracy
Resource elasticity /
efficiency
Missing primitives
(example: BSP iterations)
Possible to add to
DataStream API
Deeper integration
between batch and streaming
techniques

Some batch proof points…
44
TeraSort
Relational Join
Classic Batch Jobs
Graph
Processing
Linear
Algebra

State in stream processing
45
Stateless Streaming
(Apache Storm)
Stateful Streaming
(Apache Samza)
Accurate Stateful Streaming
(Apache Flink)
State sizes in Flink today (my assessment): 10s gigabytes per operator
How to scale this to many terabytes?
• Queryable State
• Data driven triggers over large state

Large-state streaming
46
How to scale the stream processor state?
… and maintain fast checkpoint intervals?
… and have very fast recovery on machine failures?
More and more database techniques coming into Flink

…in conclusion
1. Flink is running in some of the largest streaming setups
2. Community is working on adding many
state-of-the-art operational features
3. Available to broader audiences, via Stream SQL
4. Streaming has even more potential to subsume batch
and will hold more and more application state
47

Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem and Apache Flink’s accelerated growth

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem and Apache Flink’s accelerated growth (20)

More from Flink Forward (20)

Recently uploaded (20)

Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem and Apache Flink’s accelerated growth

Editor's Notes