SlideShare a Scribd company logo
© 2017 MapR TechnologiesMapR Confidential 1
Introduction to
Stream Processing
with Apache Flink
Tugdual Grall
@tgrall
© 2017 MapR Technologies@tgrall
{“about” : “me”}
Tugdual “Tug” Grall
• MapR : Technical Evangelist
• MongoDB, Couchbase, eXo, Oracle
• NantesJUG co-founder

• @tgrall
• http://tgrall.github.io
• tug@mapr.com / tugdual@gmail.com
© 2017 MapR Technologies@tgrall 3
Open Source Engines & Tools Commercial Engines & Applications
Utility-Grade Platform Services
DataProcessing
Web-Scale Storage
MapR-FS MapR-DB
Search and
Others
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Cloud and
Managed
Services
Search and
Others
UnifiedManagementandMonitoring
Search and
Others
Event StreamingDatabase
Custom
Apps
MapR Converged Data Platform
© 2017 MapR Technologies@tgrall
Streaming
Streaming technology is enabling the obvious:
continuous processing on data
that is continuously produced
Hint: you already have streaming data
© 2017 MapR Technologies@tgrall
Decoupling
App B
App A
App C
State managed centralized
App B
App A
App C
Applications build their own state
© 2017 MapR Technologies@tgrall
Event
Stream=Data
Pipelines
© 2017 MapR Technologies@tgrall
Streaming and Batch
2016-3-1

12:00 am
2016-3-1

1:00 am
2016-3-1

2:00 am
2016-3-11

11:00pm
2016-3-12

12:00am
2016-3-12

1:00am
2016-3-11

10:00pm
2016-3-12

2:00am
2016-3-12

3:00am…
partition
partition
© 2017 MapR Technologies@tgrall
Streaming and Batch
2016-3-1

12:00 am
2016-3-1

1:00 am
2016-3-1

2:00 am
2016-3-11

11:00pm
2016-3-12

12:00am
2016-3-12

1:00am
2016-3-11

10:00pm
2016-3-12

2:00am
2016-3-12

3:00am…
partition
partition
Stream (low latency)
Stream (high latency)
© 2017 MapR Technologies@tgrall
Streaming and Batch
2016-3-1

12:00 am
2016-3-1

1:00 am
2016-3-1

2:00 am
2016-3-11

11:00pm
2016-3-12

12:00am
2016-3-12

1:00am
2016-3-11

10:00pm
2016-3-12

2:00am
2016-3-12

3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)
Stream (high latency)
© 2017 MapR Technologies@tgrall
Processing
• Request / Response
© 2017 MapR Technologies@tgrall
Processing
• Request / Response
• Batch
© 2017 MapR Technologies@tgrall
Processing
• Request / Response
• Batch
• Stream Processing
© 2017 MapR Technologies@tgrall
Processing
• Request / Response
• Batch
• Stream Processing
• Real-time reaction to events
• Continuous applications
• Process both real-time and historical data
© 2017 MapR Technologies@tgrall
© 2017 MapR Technologies@tgrall
Flink Architecture
© 2017 MapR Technologies@tgrall
Flink Architecture
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
© 2017 MapR Technologies@tgrall
Flink Architecture
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
© 2017 MapR Technologies@tgrall
Flink Architecture
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
API
&
Libraries
© 2017 MapR Technologies@tgrall
Flink Architecture
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
API
&
Libraries
FlinkML
Machine Learning
Gelly
Graph Processing
Table
Relational
© 2017 MapR Technologies@tgrall
Flink Architecture
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
DataStream API
Stream Processing
API
&
Libraries
FlinkML
Machine Learning
Gelly
Graph Processing
Table
Relational
© 2017 MapR Technologies@tgrall
Flink Architecture
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
DataStream API
Stream Processing
API
&
Libraries
FlinkML
Machine Learning
Gelly
Graph Processing
Table
Relational
CEP
Event Processing
Table
Relational
© 2017 MapR Technologies@tgrall
Demonstration
Flink Basics
© 2017 MapR Technologies@tgrall
Batch & Stream
case class Word (word: String, frequency: Int)
// DataSet API - Batch
val lines: DataSet[String] = env.readTextFile(…)
lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
// DataStream API - Streaming
val lines: DataSream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))}
.keyBy("word”).window(Time.of(5,SECONDS))
.every(Time.of(1,SECONDS)).sum(”frequency")
.print()
© 2017 MapR Technologies@tgrall
Steam Processing
Source
Filter /

Transform
Sink
© 2017 MapR Technologies@tgrall
Flink Ecosystem
Source Sink
Apache Kafka
MapR Streams
AWS Kinesis
RabbitMQ
Twitter
Apache Bahir
…
Apache Kafka
MapR Streams
AWS Kinesis
RabbitMQ
Elasticsearch
HDFS/MapR-FS
…
© 2017 MapR Technologies@tgrall
Stateful Steam Processing
Source
Filter /

Transform
State

read/write
Sink
© 2017 MapR Technologies@tgrall
Is Flink used?
© 2017 MapR Technologies@tgrall
Powered by Flink
© 2017 MapR Technologies@tgrall
10 Billion events/day
2Tb of data/day
30 Applications
2Pb of storage and growing
Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf
© 2017 MapR Technologies@tgrall
Stream Processing
Windowing
© 2017 MapR Technologies@tgrall
Stream Windows
© 2017 MapR Technologies@tgrall
Stream Windows
© 2017 MapR Technologies@tgrall
Stream Windows
© 2017 MapR Technologies@tgrall
Stream Windows
© 2017 MapR Technologies@tgrall
Stream Windows
© 2017 MapR Technologies@tgrall
Demonstration
Flink Windowing
© 2017 MapR Technologies@tgrall
What about it ?What about it ?
Time
© 2017 MapR Technologies@tgrall
Time in Flink
• Multiple notion of “Time” in Flink
• Event Time
• Ingestion Time
• Processing Time
© 2017 MapR Technologies@tgrall
What Is Event-Time Processing
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode

IV
Episode

V
Episode

VI
Episode

I
Episode

II
Episode

III
Episode

VII
Event Time
© 2017 MapR Technologies@tgrall
Time in Flink
© 2017 MapR Technologies@tgrall
Complex Event Processing
© 2017 MapR Technologies@tgrall
Complex Event Processing
• Analyzing a stream of events and drawing conclusions
• “if A and then B ! infer event C”
• Demanding requirements on stream processor
• Low latency!
• Exactly-once semantics & event-time support
© 2017 MapR Technologies@tgrall
Use Case
© 2017 MapR Technologies@tgrall
Order Events
Process is reflected in a stream of order events
Order(orderId, tStamp, “received”)
Shipment(orderId, tStamp, “shipped”)
Delivery(orderId, tStamp, “delivered”)
orderId: Identifies the order
tStamp: Time at which the event happened
© 2017 MapR Technologies@tgrall
Real-time Warnings
© 2017 MapR Technologies@tgrall
CEP to the Rescue
Define processing and delivery intervals (SLAs)
ProcessSucc(orderId, tStamp, duration)
ProcessWarn(orderId, tStamp)
DeliverySucc(orderId, tStamp, duration)
DeliveryWarn(orderId, tStamp)
orderId: Identifies the order
tStamp: Time when the event happened
duration: Duration of the processing/delivery
© 2017 MapR Technologies@tgrall
CEP Example
© 2017 MapR Technologies@tgrall
Processing: Order ! Shipment
© 2017 MapR Technologies@tgrall
Processing: Order ! Shipment
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
© 2017 MapR Technologies@tgrall
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy("orderId"),
processingPattern)
Processing: Order ! Shipment
© 2017 MapR Technologies@tgrall
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy("orderId"),
processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
processingPatternStream.select {
(pP, timestamp) => // Timeout handler
ProcessWarn(pP("received").orderId, timestamp)
} {
fP => // Select function
ProcessSucc(
fP("received").orderId, fP("shipped").tStamp,
fP("shipped").tStamp – fP("received").tStamp)
}
Processing: Order ! Shipment
© 2017 MapR Technologies@tgrall
Count Delayed Shipments
© 2017 MapR Technologies@tgrall
Compute Avg Processing Time
© 2017 MapR Technologies@tgrall
Demonstration
Streaming Analytics
© 2017 MapR Technologies@tgrall
Demonstration
• https://github.com/mapr-demos/mapr-streams-flink-demo
• https://github.com/mapr-demos/wifi-sensor-demo
• http://tgrall.github.io/blog/2016/10/12/getting-started-with-
apache-flink-and-kafka/
• http://tgrall.github.io/blog/2016/10/17/getting-started-with-
apache-flink-and-mapr-streams/
• more soon….
© 2017 MapR Technologies@tgrall
Kostas Tzoumas
Stephan Ewen
Fabian Hueske
Till Rohrmann
Jamie Grier
Thanks to
© 2017 MapR Technologies@tgrall
Streaming Architecture
http://mapr.com/ebooks/
Free ebooks & Online training
http://mapr.com/training/
© 2017 MapR TechnologiesMapR Confidential 58
Stream Processing with Apache Flink
Tugdual Grall
@tgrall

More Related Content

PDF
Apache Flink 101 - the rise of stream processing and beyond
PDF
Introduction to Streaming with Apache Flink
PPTX
Apache Flink and what it is used for
PPTX
The Past, Present, and Future of Apache Flink®
PDF
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PDF
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
PDF
Baymeetup-FlinkResearch
Apache Flink 101 - the rise of stream processing and beyond
Introduction to Streaming with Apache Flink
Apache Flink and what it is used for
The Past, Present, and Future of Apache Flink®
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Apache Flink(tm) - A Next-Generation Stream Processor
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Baymeetup-FlinkResearch

What's hot (20)

PDF
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
PDF
Maximilian Michels - Flink and Beam
PDF
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
PDF
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
PDF
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
PDF
Bay Area Apache Flink Meetup Community Update August 2015
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
PPTX
Analysis of data science software 2020
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
PDF
What's new in confluent platform 5.4 online talk
PDF
How to Build an Apache Kafka® Connector
PDF
Time series-analysis-using-an-event-streaming-platform -_v3_final
PDF
Modern ETL Pipelines with Change Data Capture
PPTX
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
PDF
End to-end large messages processing with Kafka Streams & Kafka Connect
PPTX
Implementing the Lambda Architecture efficiently with Apache Spark
PDF
dA Platform Overview
PPTX
New Approaches for Fraud Detection on Apache Kafka and KSQL
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
PDF
Building real time data-driven products
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Maximilian Michels - Flink and Beam
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Bay Area Apache Flink Meetup Community Update August 2015
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Analysis of data science software 2020
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
What's new in confluent platform 5.4 online talk
How to Build an Apache Kafka® Connector
Time series-analysis-using-an-event-streaming-platform -_v3_final
Modern ETL Pipelines with Change Data Capture
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
End to-end large messages processing with Kafka Streams & Kafka Connect
Implementing the Lambda Architecture efficiently with Apache Spark
dA Platform Overview
New Approaches for Fraud Detection on Apache Kafka and KSQL
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Building real time data-driven products
Ad

Similar to Introduction to Streaming with Apache Flink (20)

PPTX
Introduction to Apache Flink at Vienna Meet Up
PPTX
Apache Flink @ NYC Flink Meetup
PDF
Real-time Stream Processing with Apache Flink @ Hadoop Summit
PPTX
Apache Flink Deep Dive
PPTX
Chicago Flink Meetup: Flink's streaming architecture
PPTX
Real-time Stream Processing with Apache Flink
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
PDF
Apache Flink @ Tel Aviv / Herzliya Meetup
PPTX
Data Stream Processing with Apache Flink
PDF
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
PPTX
First Flink Bay Area meetup
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PPTX
Flink Streaming Hadoop Summit San Jose
PDF
Flink Streaming Berlin Meetup
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PPTX
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
PPTX
Apache Flink Overview at SF Spark and Friends
PPTX
Continuous Processing with Apache Flink - Strata London 2016
PDF
Unified Stream and Batch Processing with Apache Flink
Introduction to Apache Flink at Vienna Meet Up
Apache Flink @ NYC Flink Meetup
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Apache Flink Deep Dive
Chicago Flink Meetup: Flink's streaming architecture
Real-time Stream Processing with Apache Flink
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Apache Flink @ Tel Aviv / Herzliya Meetup
Data Stream Processing with Apache Flink
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
First Flink Bay Area meetup
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Streaming Hadoop Summit San Jose
Flink Streaming Berlin Meetup
Flexible and Real-Time Stream Processing with Apache Flink
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Overview at SF Spark and Friends
Continuous Processing with Apache Flink - Strata London 2016
Unified Stream and Batch Processing with Apache Flink
Ad

More from Tugdual Grall (20)

PDF
Fast Cars, Big Data - How Streaming Can Help Formula 1
PPTX
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
PDF
Big Data Journey
PDF
Proud to be Polyglot - Riviera Dev 2015
PDF
Introduction to NoSQL with MongoDB - SQLi Workshop
PDF
Enabling Telco to Build and Run Modern Applications
PPTX
MongoDB and Hadoop
PDF
Proud to be polyglot
PDF
Drop your table ! MongoDB Schema Design
PDF
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
PDF
Some cool features of MongoDB
PDF
Building Your First MongoDB Application
PDF
Opensourceday 2014-iot
PDF
Neotys conference
PDF
Softshake 2013: Introduction to NoSQL with Couchbase
PDF
Introduction to NoSQL with Couchbase
PDF
Why and How to integrate Hadoop and NoSQL?
PDF
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
PPT
Big Data Paris : Hadoop and NoSQL
PDF
Big Data Israel Meetup : Couchbase and Big Data
Fast Cars, Big Data - How Streaming Can Help Formula 1
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Big Data Journey
Proud to be Polyglot - Riviera Dev 2015
Introduction to NoSQL with MongoDB - SQLi Workshop
Enabling Telco to Build and Run Modern Applications
MongoDB and Hadoop
Proud to be polyglot
Drop your table ! MongoDB Schema Design
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Some cool features of MongoDB
Building Your First MongoDB Application
Opensourceday 2014-iot
Neotys conference
Softshake 2013: Introduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
Why and How to integrate Hadoop and NoSQL?
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
Big Data Paris : Hadoop and NoSQL
Big Data Israel Meetup : Couchbase and Big Data

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Modernizing your data center with Dell and AMD
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
CIFDAQ's Market Insight: SEC Turns Pro Crypto
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
GamePlan Trading System Review: Professional Trader's Honest Take
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectral efficient network and resource selection model in 5G networks
Modernizing your data center with Dell and AMD
Per capita expenditure prediction using model stacking based on satellite ima...
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Monthly Chronicles - July 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation

Introduction to Streaming with Apache Flink

  • 1. © 2017 MapR TechnologiesMapR Confidential 1 Introduction to Stream Processing with Apache Flink Tugdual Grall @tgrall
  • 2. © 2017 MapR Technologies@tgrall {“about” : “me”} Tugdual “Tug” Grall • MapR : Technical Evangelist • MongoDB, Couchbase, eXo, Oracle • NantesJUG co-founder
 • @tgrall • http://tgrall.github.io • tug@mapr.com / tugdual@gmail.com
  • 3. © 2017 MapR Technologies@tgrall 3 Open Source Engines & Tools Commercial Engines & Applications Utility-Grade Platform Services DataProcessing Web-Scale Storage MapR-FS MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps MapR Converged Data Platform
  • 4. © 2017 MapR Technologies@tgrall Streaming Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data
  • 5. © 2017 MapR Technologies@tgrall Decoupling App B App A App C State managed centralized App B App A App C Applications build their own state
  • 6. © 2017 MapR Technologies@tgrall Event Stream=Data Pipelines
  • 7. © 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1
 12:00 am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am… partition partition
  • 8. © 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1
 12:00 am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am… partition partition Stream (low latency) Stream (high latency)
  • 9. © 2017 MapR Technologies@tgrall Streaming and Batch 2016-3-1
 12:00 am 2016-3-1
 1:00 am 2016-3-1
 2:00 am 2016-3-11
 11:00pm 2016-3-12
 12:00am 2016-3-12
 1:00am 2016-3-11
 10:00pm 2016-3-12
 2:00am 2016-3-12
 3:00am… partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  • 10. © 2017 MapR Technologies@tgrall Processing • Request / Response
  • 11. © 2017 MapR Technologies@tgrall Processing • Request / Response • Batch
  • 12. © 2017 MapR Technologies@tgrall Processing • Request / Response • Batch • Stream Processing
  • 13. © 2017 MapR Technologies@tgrall Processing • Request / Response • Batch • Stream Processing • Real-time reaction to events • Continuous applications • Process both real-time and historical data
  • 14. © 2017 MapR Technologies@tgrall
  • 15. © 2017 MapR Technologies@tgrall Flink Architecture
  • 16. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google
  • 17. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow
  • 18. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing API & Libraries
  • 19. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  • 20. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational
  • 21. © 2017 MapR Technologies@tgrall Flink Architecture Deployment Local Cluster Cloud Single JVM Standalone,YARN, Mesos AWS, Google Core Runtime Distributed Streaming Dataflow DataSet API Batch Processing DataStream API Stream Processing API & Libraries FlinkML Machine Learning Gelly Graph Processing Table Relational CEP Event Processing Table Relational
  • 22. © 2017 MapR Technologies@tgrall Demonstration Flink Basics
  • 23. © 2017 MapR Technologies@tgrall Batch & Stream case class Word (word: String, frequency: Int) // DataSet API - Batch val lines: DataSet[String] = env.readTextFile(…) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() // DataStream API - Streaming val lines: DataSream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .keyBy("word”).window(Time.of(5,SECONDS)) .every(Time.of(1,SECONDS)).sum(”frequency") .print()
  • 24. © 2017 MapR Technologies@tgrall Steam Processing Source Filter /
 Transform Sink
  • 25. © 2017 MapR Technologies@tgrall Flink Ecosystem Source Sink Apache Kafka MapR Streams AWS Kinesis RabbitMQ Twitter Apache Bahir … Apache Kafka MapR Streams AWS Kinesis RabbitMQ Elasticsearch HDFS/MapR-FS …
  • 26. © 2017 MapR Technologies@tgrall Stateful Steam Processing Source Filter /
 Transform State
 read/write Sink
  • 27. © 2017 MapR Technologies@tgrall Is Flink used?
  • 28. © 2017 MapR Technologies@tgrall Powered by Flink
  • 29. © 2017 MapR Technologies@tgrall 10 Billion events/day 2Tb of data/day 30 Applications 2Pb of storage and growing Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf
  • 30. © 2017 MapR Technologies@tgrall Stream Processing Windowing
  • 31. © 2017 MapR Technologies@tgrall Stream Windows
  • 32. © 2017 MapR Technologies@tgrall Stream Windows
  • 33. © 2017 MapR Technologies@tgrall Stream Windows
  • 34. © 2017 MapR Technologies@tgrall Stream Windows
  • 35. © 2017 MapR Technologies@tgrall Stream Windows
  • 36. © 2017 MapR Technologies@tgrall Demonstration Flink Windowing
  • 37. © 2017 MapR Technologies@tgrall What about it ?What about it ? Time
  • 38. © 2017 MapR Technologies@tgrall Time in Flink • Multiple notion of “Time” in Flink • Event Time • Ingestion Time • Processing Time
  • 39. © 2017 MapR Technologies@tgrall What Is Event-Time Processing 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode
 IV Episode
 V Episode
 VI Episode
 I Episode
 II Episode
 III Episode
 VII Event Time
  • 40. © 2017 MapR Technologies@tgrall Time in Flink
  • 41. © 2017 MapR Technologies@tgrall Complex Event Processing
  • 42. © 2017 MapR Technologies@tgrall Complex Event Processing • Analyzing a stream of events and drawing conclusions • “if A and then B ! infer event C” • Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support
  • 43. © 2017 MapR Technologies@tgrall Use Case
  • 44. © 2017 MapR Technologies@tgrall Order Events Process is reflected in a stream of order events Order(orderId, tStamp, “received”) Shipment(orderId, tStamp, “shipped”) Delivery(orderId, tStamp, “delivered”) orderId: Identifies the order tStamp: Time at which the event happened
  • 45. © 2017 MapR Technologies@tgrall Real-time Warnings
  • 46. © 2017 MapR Technologies@tgrall CEP to the Rescue Define processing and delivery intervals (SLAs) ProcessSucc(orderId, tStamp, duration) ProcessWarn(orderId, tStamp) DeliverySucc(orderId, tStamp, duration) DeliveryWarn(orderId, tStamp) orderId: Identifies the order tStamp: Time when the event happened duration: Duration of the processing/delivery
  • 47. © 2017 MapR Technologies@tgrall CEP Example
  • 48. © 2017 MapR Technologies@tgrall Processing: Order ! Shipment
  • 49. © 2017 MapR Technologies@tgrall Processing: Order ! Shipment val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
  • 50. © 2017 MapR Technologies@tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) Processing: Order ! Shipment
  • 51. © 2017 MapR Technologies@tgrall val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) } Processing: Order ! Shipment
  • 52. © 2017 MapR Technologies@tgrall Count Delayed Shipments
  • 53. © 2017 MapR Technologies@tgrall Compute Avg Processing Time
  • 54. © 2017 MapR Technologies@tgrall Demonstration Streaming Analytics
  • 55. © 2017 MapR Technologies@tgrall Demonstration • https://github.com/mapr-demos/mapr-streams-flink-demo • https://github.com/mapr-demos/wifi-sensor-demo • http://tgrall.github.io/blog/2016/10/12/getting-started-with- apache-flink-and-kafka/ • http://tgrall.github.io/blog/2016/10/17/getting-started-with- apache-flink-and-mapr-streams/ • more soon….
  • 56. © 2017 MapR Technologies@tgrall Kostas Tzoumas Stephan Ewen Fabian Hueske Till Rohrmann Jamie Grier Thanks to
  • 57. © 2017 MapR Technologies@tgrall Streaming Architecture http://mapr.com/ebooks/ Free ebooks & Online training http://mapr.com/training/
  • 58. © 2017 MapR TechnologiesMapR Confidential 58 Stream Processing with Apache Flink Tugdual Grall @tgrall