SlideShare a Scribd company logo
Event-driven applications
with streams and snapshots
(Apache Flink)
@StephanEwen
J on the Beach, 2017
1
2
Why it is a great idea to build
applications on top of a stream processor
Like (micro) services?
Or complete web sites?
Streaming and Streaming Processing
3
 First wave for streaming was lambda architecture
• Aid batch systems to be more real-time
 Second wave was analytics (real time and lag-time)
• Based on distributed collections, functions, and windows
 The next wave is much broader:
A new architecture for event-driven applications
4
Large
Distributed State
Time / Order /
Completeness
offers unique building blocks to handle
A streaming architecture
for event-driven applications
Matters of State
5
Building stateful applications…
6
… typically starts with picking a data store
… and a data model
… then thinking through consistency models and guarantees
… then worrying about read/write performance
… which may lead to rethinking the data model
and consistency model
… often gets complicated at scale
7
Image by Pablo Castilla
https://pablocastilla.wordpress.com/
A core problem in many architectures
8
database
layer
compute
layer
application state
+ historic state
Separation of compute and working state
All state access / update
is remote
Often requires heavy
caching layers
Expensive and (at least partially)
synchronous durability
A core problem in many architectures
9
database
layer
compute
layer
application state
+ historic state
Separation of compute and working state
All state access / update
is remote
Often requires heavy
caching layers
Expensive and (at least partially)
synchronous durability
All just to have the state or
working set fault tolerant
Event Sourcing + Memory Image
10
event log
persists events
(temporarily)
event /
command
Process
main memory
update local
variables/structures
periodically snapshot
the memory
Event Sourcing + Memory Image
11
Recovery: Restore snapshot and replay events
since snapshot
event log
persists events
(temporarily)
Process
Distributed Memory Image
12
Distributed application, many memory images.
Snapshots are all consistent together.
13
Apache Flink in a Nutshell
That is almost
Events, State, Time, and Snapshots
14
f(a,b)
Event-driven function
executed distributedly
Events, State, Time, and Snapshots
15
f(a,b)
Maintain fault tolerant local state similar to
any normal application
Main memory +
out of core (for maps)
Events, State, Time, and Snapshots
16
f(a,b)
wall clock
event time clock
Access and react to
notions of time and progress,
handle out-of-order events
Events, State, Time, and Snapshots
17
f(a,b)
wall clock
event time clock
Snapshot point-in-time
view for recovery,
rollback, cloning,
versioning, etc.
Stateful Event & Stream Processing
18
Source
Transformation
Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Streaming
Dataflow
Source Transform Window
(state read/write)
Sink
Stateful Event & Stream Processing
19
Source
Filter /
Transform
State
read/write
Sink
Stateful Event & Stream Processing
20
Scalable embedded state
Access at memory speed &
scales with parallel operators
Stateful Event & Stream Processing
21
Re-load state
Reset positions
in input streams
Rolling back computation
Re-processing
Stateful Event & Stream Processing
22
Restore to different
programs
Bugfixes, Upgrades, A/B testing, etc
Voila: A lightweight CQRS architecture
event log
write model +
building the read views
writes reads
Voila: A lightweight CQRS architecture
event log
writes reads
mirror the
state
write model +
building the read views
"Classical" versus
Streaming Architecture
25
Compute, State, and Storage
26
Classic tiered architecture Streaming architecture
database
layer
compute
layer
application state
+ backup
compute
+
stream storage
and
snapshot storage
(backup)
application state
Performance
27
synchronous reads/writes
across tier boundary
asynchronous writes
of large blobs
all modifications
are local
Classic tiered architecture Streaming architecture
Consistency
28
distributed transactions
at scale typically
at-most / at-least once
exactly once
per state
=1 =1snapshot consistency
across states
Classic tiered architecture Streaming architecture
Scaling a Service
29
separately provision additional
database capacity
provision compute
and state together
Classic tiered architecture Streaming architecture
provision compute
Rolling out a new Service
30
provision a new database
(or add capacity to an existing one)
provision compute
and state together
simply occupies some
additional backup space
Classic tiered architecture Streaming architecture
Repair External State
31
Streaming architecture
events
live application external state
wrong results
backed up data
(HDFS, S3, etc.)
Repair External State
32
Streaming architecture
live application external state
overwrite
with correct results
backed up data
(HDFS, S3, etc.)
application on backup input
events
Repair External State
33
Streaming architecture
live application external state
overwrite
with correct results
backed up date
(HDFS, S3, etc.)
Each application doubles as
a batch job!
application on backup input
events
Versioning the state of applications
34
Savepoint
Savepoint
Savepoint
App. A
App. B
App. C
Time
Savepoint
(It's) About Time
36
Time, Completeness, Out-of-order
37
?
event time clocks
define data completeness
event time timers
handle actions for
out-of-order data
Classic tiered architecture Streaming architecture
Time: Different Notions of Time
38
Event Producer Message Queue
Flink
Data Source
Flink
Window Operator
partition 1
partition 2
Event
Time
Ingestion
Time
Window
Processing
Time
Broker
Time
Time: Event Time Example
39
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
Time: Watermarks
40
7
W(11)W(17)
11159121417122220 171921
Watermark
Event
Event timestamp
Stream (in order)
7
W(11)W(20)
Watermark
991011141517
Event
Event timestamp
1820 192123
Stream (out of order)
Time: Watermarks in Parallel
41
Source
(1)
Source
(2)
map
(1)
map
(2)
window
(1)
window
(2)
29
29
17
14
14
29
14
14
W(33)
W(17)
W(17)
A|30B|31
C|30
D|15
E|30
F|15G|18H|20
K|35
Watermark
Event Time
at the operator
Event
[id|timestamp]
Event Time
at input streams
Watermark
Generation
M|39N|39Q|44
L|22O|23R|37
Event–driven applications
42
Event-driven
Applications
Stream Processing
Batch Processing
Stateful, event-driven,
event-time-aware processing
(event sourcing, CQRS, …)
(streams, windows, …)
(data sets)
Layers of abstraction
43
Ogres have
layers
So do
squirrels
Apache Flink's Layered APIs
44
Process Function (events, state, time)
DataStream API (streams, windows)
Table API (dynamic tables)
Stream SQL
Stream- &
Batch Processing
Analytics
Stateful
Event-Driven
Applications
Process Function
45
class MyFunction extends ProcessFunction[MyEvent, Result] {
// declare state to use in the program
lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…)
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = {
// handle callback when event-/processing- time instant is reached
}
}
Data Stream API
46
val lines: DataStream[String] = env.addSource(
new FlinkKafkaConsumer09<>(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Table API & Stream SQL
47
48
Streaming Analytics and
Event-driven applications move closer
to each other (and merge)
Stream Processing is a great way of
building applications.
Think of it as event-sourcing/CQRS
on steroids.
49
Thank you! 

More Related Content

PDF
Realizing the promise of portability with Apache Beam
PDF
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
PDF
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
PDF
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
PDF
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
PDF
Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016
PDF
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
PDF
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Realizing the promise of portability with Apache Beam
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Flink Forward San Francisco 2018 keynote: Anand Iyer - "Apache Flink + Apach...
Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...

What's hot (19)

PDF
Machine Learning Exchange (MLX)
PPTX
End to-end example: consumer loan acceptance scoring using kubeflow
PDF
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
PDF
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
PDF
Kubeflow Pipelines (with Tekton)
PDF
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
PDF
Streaming your Lyft Ride Prices - Flink Forward SF 2019
PDF
KFServing and Kubeflow Pipelines
PDF
KFServing - Serverless Model Inferencing
PDF
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
PPTX
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
PDF
KFServing Payload Logging for Trusted AI
PDF
dA Platform Overview
PDF
End to-end large messages processing with Kafka Streams & Kafka Connect
PDF
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
PPTX
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
PDF
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
PPTX
The Past, Present, and Future of Apache Flink®
PDF
Kubeflow Distributed Training and HPO
Machine Learning Exchange (MLX)
End to-end example: consumer loan acceptance scoring using kubeflow
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Kubeflow Pipelines (with Tekton)
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
Streaming your Lyft Ride Prices - Flink Forward SF 2019
KFServing and Kubeflow Pipelines
KFServing - Serverless Model Inferencing
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
KFServing Payload Logging for Trusted AI
dA Platform Overview
End to-end large messages processing with Kafka Streams & Kafka Connect
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
The Past, Present, and Future of Apache Flink®
Kubeflow Distributed Training and HPO
Ad

Similar to Building Applications with Streams and Snapshots (20)

PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
PDF
Apache Flink @ Tel Aviv / Herzliya Meetup
PDF
The Power of Distributed Snapshots in Apache Flink
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
PPTX
Data Stream Processing with Apache Flink
PDF
Unified Stream and Batch Processing with Apache Flink
PDF
Introduction to Stateful Stream Processing with Apache Flink.
PDF
Stream Processing with Apache Flink
PDF
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
PPTX
QCon London - Stream Processing with Apache Flink
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
PDF
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
PPTX
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
PPTX
The Stream Processor as a Database Apache Flink
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Apache Flink @ Tel Aviv / Herzliya Meetup
The Power of Distributed Snapshots in Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
Introduction to Stateful Stream Processing with Apache Flink.
Stream Processing with Apache Flink
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
QCon London - Stream Processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as a Database Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Ad

More from J On The Beach (20)

PDF
Massively scalable ETL in real world applications: the hard way
PPTX
Big Data On Data You Don’t Have
PPTX
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
PDF
Pushing it to the edge in IoT
PDF
Drinking from the firehose, with virtual streams and virtual actors
PDF
How do we deploy? From Punched cards to Immutable server pattern
PDF
Java, Turbocharged
PDF
When Cloud Native meets the Financial Sector
PDF
The big data Universe. Literally.
PDF
Streaming to a New Jakarta EE
PDF
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
PDF
Pushing AI to the Client with WebAssembly and Blazor
PDF
Axon Server went RAFTing
PDF
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
PDF
Madaari : Ordering For The Monkeys
PDF
Servers are doomed to fail
PDF
Interaction Protocols: It's all about good manners
PDF
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
PDF
Leadership at every level
PDF
Machine Learning: The Bare Math Behind Libraries
Massively scalable ETL in real world applications: the hard way
Big Data On Data You Don’t Have
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Pushing it to the edge in IoT
Drinking from the firehose, with virtual streams and virtual actors
How do we deploy? From Punched cards to Immutable server pattern
Java, Turbocharged
When Cloud Native meets the Financial Sector
The big data Universe. Literally.
Streaming to a New Jakarta EE
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
Pushing AI to the Client with WebAssembly and Blazor
Axon Server went RAFTing
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
Madaari : Ordering For The Monkeys
Servers are doomed to fail
Interaction Protocols: It's all about good manners
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
Leadership at every level
Machine Learning: The Bare Math Behind Libraries

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Approach and Philosophy of On baking technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
cuic standard and advanced reporting.pdf
Understanding_Digital_Forensics_Presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Approach and Philosophy of On baking technology
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf

Building Applications with Streams and Snapshots

  • 1. Event-driven applications with streams and snapshots (Apache Flink) @StephanEwen J on the Beach, 2017 1
  • 2. 2 Why it is a great idea to build applications on top of a stream processor Like (micro) services? Or complete web sites?
  • 3. Streaming and Streaming Processing 3  First wave for streaming was lambda architecture • Aid batch systems to be more real-time  Second wave was analytics (real time and lag-time) • Based on distributed collections, functions, and windows  The next wave is much broader: A new architecture for event-driven applications
  • 4. 4 Large Distributed State Time / Order / Completeness offers unique building blocks to handle A streaming architecture for event-driven applications
  • 6. Building stateful applications… 6 … typically starts with picking a data store … and a data model … then thinking through consistency models and guarantees … then worrying about read/write performance … which may lead to rethinking the data model and consistency model
  • 7. … often gets complicated at scale 7 Image by Pablo Castilla https://pablocastilla.wordpress.com/
  • 8. A core problem in many architectures 8 database layer compute layer application state + historic state Separation of compute and working state All state access / update is remote Often requires heavy caching layers Expensive and (at least partially) synchronous durability
  • 9. A core problem in many architectures 9 database layer compute layer application state + historic state Separation of compute and working state All state access / update is remote Often requires heavy caching layers Expensive and (at least partially) synchronous durability All just to have the state or working set fault tolerant
  • 10. Event Sourcing + Memory Image 10 event log persists events (temporarily) event / command Process main memory update local variables/structures periodically snapshot the memory
  • 11. Event Sourcing + Memory Image 11 Recovery: Restore snapshot and replay events since snapshot event log persists events (temporarily) Process
  • 12. Distributed Memory Image 12 Distributed application, many memory images. Snapshots are all consistent together.
  • 13. 13 Apache Flink in a Nutshell That is almost
  • 14. Events, State, Time, and Snapshots 14 f(a,b) Event-driven function executed distributedly
  • 15. Events, State, Time, and Snapshots 15 f(a,b) Maintain fault tolerant local state similar to any normal application Main memory + out of core (for maps)
  • 16. Events, State, Time, and Snapshots 16 f(a,b) wall clock event time clock Access and react to notions of time and progress, handle out-of-order events
  • 17. Events, State, Time, and Snapshots 17 f(a,b) wall clock event time clock Snapshot point-in-time view for recovery, rollback, cloning, versioning, etc.
  • 18. Stateful Event & Stream Processing 18 Source Transformation Transformation Sink val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Streaming Dataflow Source Transform Window (state read/write) Sink
  • 19. Stateful Event & Stream Processing 19 Source Filter / Transform State read/write Sink
  • 20. Stateful Event & Stream Processing 20 Scalable embedded state Access at memory speed & scales with parallel operators
  • 21. Stateful Event & Stream Processing 21 Re-load state Reset positions in input streams Rolling back computation Re-processing
  • 22. Stateful Event & Stream Processing 22 Restore to different programs Bugfixes, Upgrades, A/B testing, etc
  • 23. Voila: A lightweight CQRS architecture event log write model + building the read views writes reads
  • 24. Voila: A lightweight CQRS architecture event log writes reads mirror the state write model + building the read views
  • 26. Compute, State, and Storage 26 Classic tiered architecture Streaming architecture database layer compute layer application state + backup compute + stream storage and snapshot storage (backup) application state
  • 27. Performance 27 synchronous reads/writes across tier boundary asynchronous writes of large blobs all modifications are local Classic tiered architecture Streaming architecture
  • 28. Consistency 28 distributed transactions at scale typically at-most / at-least once exactly once per state =1 =1snapshot consistency across states Classic tiered architecture Streaming architecture
  • 29. Scaling a Service 29 separately provision additional database capacity provision compute and state together Classic tiered architecture Streaming architecture provision compute
  • 30. Rolling out a new Service 30 provision a new database (or add capacity to an existing one) provision compute and state together simply occupies some additional backup space Classic tiered architecture Streaming architecture
  • 31. Repair External State 31 Streaming architecture events live application external state wrong results backed up data (HDFS, S3, etc.)
  • 32. Repair External State 32 Streaming architecture live application external state overwrite with correct results backed up data (HDFS, S3, etc.) application on backup input events
  • 33. Repair External State 33 Streaming architecture live application external state overwrite with correct results backed up date (HDFS, S3, etc.) Each application doubles as a batch job! application on backup input events
  • 34. Versioning the state of applications 34 Savepoint Savepoint Savepoint App. A App. B App. C Time Savepoint
  • 36. Time, Completeness, Out-of-order 37 ? event time clocks define data completeness event time timers handle actions for out-of-order data Classic tiered architecture Streaming architecture
  • 37. Time: Different Notions of Time 38 Event Producer Message Queue Flink Data Source Flink Window Operator partition 1 partition 2 Event Time Ingestion Time Window Processing Time Broker Time
  • 38. Time: Event Time Example 39 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
  • 39. Time: Watermarks 40 7 W(11)W(17) 11159121417122220 171921 Watermark Event Event timestamp Stream (in order) 7 W(11)W(20) Watermark 991011141517 Event Event timestamp 1820 192123 Stream (out of order)
  • 40. Time: Watermarks in Parallel 41 Source (1) Source (2) map (1) map (2) window (1) window (2) 29 29 17 14 14 29 14 14 W(33) W(17) W(17) A|30B|31 C|30 D|15 E|30 F|15G|18H|20 K|35 Watermark Event Time at the operator Event [id|timestamp] Event Time at input streams Watermark Generation M|39N|39Q|44 L|22O|23R|37
  • 41. Event–driven applications 42 Event-driven Applications Stream Processing Batch Processing Stateful, event-driven, event-time-aware processing (event sourcing, CQRS, …) (streams, windows, …) (data sets)
  • 42. Layers of abstraction 43 Ogres have layers So do squirrels
  • 43. Apache Flink's Layered APIs 44 Process Function (events, state, time) DataStream API (streams, windows) Table API (dynamic tables) Stream SQL Stream- & Batch Processing Analytics Stateful Event-Driven Applications
  • 44. Process Function 45 class MyFunction extends ProcessFunction[MyEvent, Result] { // declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached } }
  • 45. Data Stream API 46 val lines: DataStream[String] = env.addSource( new FlinkKafkaConsumer09<>(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path))
  • 46. Table API & Stream SQL 47
  • 47. 48 Streaming Analytics and Event-driven applications move closer to each other (and merge) Stream Processing is a great way of building applications. Think of it as event-sourcing/CQRS on steroids.