SlideShare a Scribd company logo
A Data Streaming Architecture
with Apache Flink
Robert Metzger
@rmetzger_
rmetzger@apache.org
Berlin Buzzwords,
June 7, 2016
Talk overview
 My take on the stream processing space, and how it
changes the way we think about data
 Transforming an existing data analysis pattern into the
streaming world (“Streaming ETL”)
 Demo
2
Apache Flink
 Apache Flink is an open source stream processing
framework
• Low latency
• High throughput
• Stateful
• Distributed
 Developed at the Apache Software Foundation, 1.0.0
released in March 2016,
used in production
3
Entering the streaming era
4
5
Streaming is the biggest change in
data infrastructure since Hadoop
6
1. Radically simplified infrastructure
2. Do more with your data, faster
3. Can completely subsume batch
7
Real-world data is produced in a
continuous fashion.
New systems like Flink and Kafka
embrace streaming nature of data.
Web server Kafka topic
Stream processor
Apache Flink stack
8
Gelly
Table/SQL
ML
SAMOA
DataSet (Java/Scala)DataStream (Java / Scala)
HadoopM/R LocalClusterYARN
ApacheBeam
ApacheBeam
Table/
StreamSQL
Cascading
Streaming dataflow runtimeStormAPI
Zeppelin
CEP
What makes Flink flink?
9
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Make more sense of data
Works on real-time
and historic data
True
Streaming
Event Time
APIs
Libraries
Stateful
Streaming
Globally consistent
savepoints
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing
Moving existing (batch) data
analysis into streaming
10
Extract, Transform, Load (ETL)
 ETL: Move data from A to B and transform it on the way
 Old approach:
Server
LogsServer
Logs
Server
Logs
Mobile
IoT
Extract, Transform, Load (ETL)
 ETL: Move data from A to B and transform it on the way
 Old approach:
Server
Logs
HDFS / S3
“Data Lake”
Server
Logs
Server
Logs
Mobile
IoT
Tier 0: Raw data
Extract, Transform, Load (ETL)
 ETL: Move data from A to B and transform it on the way
 Old approach:
Server
Logs
HDFS / S3
“Data Lake”
Server
Logs
Server
Logs
Mobile
IoT
Tier 0: Raw data Tier 1: Normalized, cleansed data
Periodic
jobs Parquet /
ORC in
HDFS
User
Extract, Transform, Load (ETL)
 ETL: Move data from A to B and transform it on the way
 Old approach:
Server
Logs
HDFS / S3
“Data Lake”
Server
Logs
Server
Logs
Mobile
IoT
Tier 0: Raw data Tier 1: Normalized, cleansed data
Periodic
jobs Parquet /
ORC in
HDFS
Tier 2: Aggregated data
Periodic
jobs
User
User
“Data Warehouse”
Extract, Transform, Load (Streaming ETL)
 ETL: Move data from A to B and transform it on the way
 Streaming approach:
Server
Logs
“Data Lake”
Server
Logs
Server
Logs
Mobile
IoT
Tier 0: Raw data
Stream Processor
Extract, Transform, Load (Streaming ETL)
 ETL: Move data from A to B and transform it on the way
 Streaming approach:
Server
Logs
“Data Lake”
Server
Logs
Server
Logs
Mobile
IoT
Kafka
Connector
Tier 0: Raw data
Cleansing
Transformation
Time-Window
Alerts
Time-Window
Stream Processor
Extract, Transform, Load (Streaming ETL)
 ETL: Move data from A to B and transform it on the way
 Streaming approach:
Server
Logs
“Data Lake”
Server
Logs
Server
Logs
Mobile
IoT
Tier 1: Normalized, cleansed data
Parquet /
ORC in HDFS
Kafka
Connector
ES
Connector
Rolling file
sink
Tier 0: Raw data
Cleansing
Transformation
Time-Window
Alerts
Time-Window
User
Batch
Processing
Stream Processor
Extract, Transform, Load (Streaming ETL)
 ETL: Move data from A to B and transform it on the way
 Streaming approach:
Server
Logs
“Data Lake”
Server
Logs
Server
Logs
Mobile
IoT
Tier 1: Normalized, cleansed data
Parquet /
ORC in HDFS
Tier 2: Aggregated data
User
Kafka
Connector
ES
Connector
Rolling file
sink
JDBC sink
Cassandra
sink
Tier 0: Raw data
Cleansing
Transformation
Time-Window
Alerts
Time-Window
User
Batch
Processing
Streaming ETL: Low Latency
19* Your mileage may vary. These are rule of thumb estimates.
 Events are processed immediately
 No need to wait until the next “load” batch job is running
hours minutes milliseconds
Periodic batch job
Batch processor
with micro-batches
Latency
Approach
seconds
Stream processor
Streaming ETL: Event-time aware
20
 Events derived from the same real-world activity might
arrive out of order in the system
 Flink is event-time aware
11:28 11:29
11:28 11:29
11:28 11:29
Same real-world activity
Out of sync clocks Network delays Machine failures
Demo
21
Job Overview
22
Flink
Twitter
Source
Data Ingestion Job
“Streaming ETL” Job
Job Overview
23
(Rolling) file sinkFilter operationFilter operation
Aggregation to
ElasticSearch
Streaming
WordCount
TopN operator
Demo code @ GitHub
24
https://github.com/rmetzger/flink-streaming-etl
Closing
25
26
https://www.eventbrite.com/e/apache-flink-hackathon-by-berlin-buzzwords-tickets-25580481910
Flink Forward 2016, Berlin
Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016
www.flink-forward.org
We are hiring!
data-artisans.com/careers
Questions?
 Ask now!
 eMail: rmetzger@apache.org
 Twitter: @rmetzger_
 Follow: @ApacheFlink
 Read: flink.apache.org/blog, data-artisans.com/blog/
 Mailinglists: (news | user | dev)@flink.apache.org
29
Appendix
30
Sources
31
 “Large scale ETL with Hadoop”
http://www.slideshare.net/OReillyStrata/large-scale-etl-
with-hadoop

More Related Content

PDF
Apache Flink @ Tel Aviv / Herzliya Meetup
PPTX
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PPTX
QCon London - Stream Processing with Apache Flink
PDF
Christian Kreuzfeld – Static vs Dynamic Stream Processing
PPTX
Extending the Yahoo Streaming Benchmark
Apache Flink @ Tel Aviv / Herzliya Meetup
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup
GOTO Night Amsterdam - Stream processing with Apache Flink
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Taking a look under the hood of Apache Flink's relational APIs.
QCon London - Stream Processing with Apache Flink
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Extending the Yahoo Streaming Benchmark

What's hot (20)

PPTX
Real-time Stream Processing with Apache Flink
PPTX
January 2016 Flink Community Update & Roadmap 2016
PPTX
Fabian Hueske – Cascading on Flink
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
PDF
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
PDF
Flink Apachecon Presentation
PDF
Stateful Distributed Stream Processing
PDF
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
PPTX
Streaming in the Wild with Apache Flink
PPTX
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
PPTX
Data Stream Processing with Apache Flink
PDF
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
PDF
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
PPTX
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
PDF
Marton Balassi – Stateful Stream Processing
PPTX
Apache flink
PDF
A look at Flink 1.2
PPTX
Flink Streaming @BudapestData
PDF
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
PDF
Dongwon Kim – A Comparative Performance Evaluation of Flink
Real-time Stream Processing with Apache Flink
January 2016 Flink Community Update & Roadmap 2016
Fabian Hueske – Cascading on Flink
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Flink Apachecon Presentation
Stateful Distributed Stream Processing
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Streaming in the Wild with Apache Flink
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Data Stream Processing with Apache Flink
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Marton Balassi – Stateful Stream Processing
Apache flink
A look at Flink 1.2
Flink Streaming @BudapestData
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
Ad

Viewers also liked (20)

PPTX
Stephan Ewen - Scaling to large State
PDF
Streaming Analytics & CEP - Two sides of the same coin?
PPTX
Kostas Tzoumas - Stream Processing with Apache Flink®
PPTX
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
PPTX
Apache Flink Community Updates November 2016 @ Berlin Meetup
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
PPTX
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
PDF
Unified Stream and Batch Processing with Apache Flink
PPTX
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
PPTX
Aljoscha Krettek - The Future of Apache Flink
PDF
Jamie Grier - Robust Stream Processing with Apache Flink
PPTX
The Stream Processor as a Database Apache Flink
PPTX
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
PPTX
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
PPTX
Stephan Ewen - Running Flink Everywhere
PPTX
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
PDF
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
PDF
Fluentd and Kafka
PPTX
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
PDF
Apache Flume
Stephan Ewen - Scaling to large State
Streaming Analytics & CEP - Two sides of the same coin?
Kostas Tzoumas - Stream Processing with Apache Flink®
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Apache Flink Community Updates November 2016 @ Berlin Meetup
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Unified Stream and Batch Processing with Apache Flink
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
Aljoscha Krettek - The Future of Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
The Stream Processor as a Database Apache Flink
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Stephan Ewen - Running Flink Everywhere
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd and Kafka
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Apache Flume
Ad

Similar to A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016) (20)

PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
PPTX
Flink history, roadmap and vision
PDF
Stream Processing with Apache Flink
PPTX
Apache Flink: Past, Present and Future
PPTX
Streaming in the Wild with Apache Flink
PDF
Santander Stream Processing with Apache Flink
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
PPTX
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PDF
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
PDF
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
PDF
Continus sql with sql stream builder
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PPTX
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
PDF
Evolution of Real-time User Engagement Event Consumption at Pinterest
PDF
Don't Cross The Streams - Data Streaming And Apache Flink
PDF
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
PPTX
Flink Streaming Hadoop Summit San Jose
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink history, roadmap and vision
Stream Processing with Apache Flink
Apache Flink: Past, Present and Future
Streaming in the Wild with Apache Flink
Santander Stream Processing with Apache Flink
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Continus sql with sql stream builder
Flexible and Real-Time Stream Processing with Apache Flink
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Evolution of Real-time User Engagement Event Consumption at Pinterest
Don't Cross The Streams - Data Streaming And Apache Flink
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
Flink Streaming Hadoop Summit San Jose

More from Robert Metzger (19)

PDF
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
PDF
dA Platform Overview
PPTX
Flink Community Update December 2015: Year in Review
PPTX
Chicago Flink Meetup: Flink's streaming architecture
PPTX
Flink September 2015 Community Update
PPTX
Click-Through Example for Flink’s KafkaConsumer Checkpointing
PPTX
August Flink Community Update
PPTX
Flink Cummunity Update July (Berlin Meetup)
PPTX
Apache Flink First Half of 2015 Community Update
PPTX
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
PPTX
Apache Flink Hands On
PPTX
Berlin Apache Flink Meetup May 2015, Community Update
PPTX
Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015
PPTX
Flink Community Update April 2015
PPTX
Apache Flink Community Update March 2015
PPTX
Flink Community Update February 2015
PDF
Compute "Closeness" in Graphs using Apache Giraph.
PDF
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
ODP
Stratosphere Intro (Java and Scala Interface)
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
dA Platform Overview
Flink Community Update December 2015: Year in Review
Chicago Flink Meetup: Flink's streaming architecture
Flink September 2015 Community Update
Click-Through Example for Flink’s KafkaConsumer Checkpointing
August Flink Community Update
Flink Cummunity Update July (Berlin Meetup)
Apache Flink First Half of 2015 Community Update
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Hands On
Berlin Apache Flink Meetup May 2015, Community Update
Unified batch and stream processing with Flink @ Big Data Beers Berlin May 2015
Flink Community Update April 2015
Apache Flink Community Update March 2015
Flink Community Update February 2015
Compute "Closeness" in Graphs using Apache Giraph.
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere Intro (Java and Scala Interface)

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Approach and Philosophy of On baking technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Monthly Chronicles - July 2025
Review of recent advances in non-invasive hemoglobin estimation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Approach and Philosophy of On baking technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)

  • 1. A Data Streaming Architecture with Apache Flink Robert Metzger @rmetzger_ rmetzger@apache.org Berlin Buzzwords, June 7, 2016
  • 2. Talk overview  My take on the stream processing space, and how it changes the way we think about data  Transforming an existing data analysis pattern into the streaming world (“Streaming ETL”)  Demo 2
  • 3. Apache Flink  Apache Flink is an open source stream processing framework • Low latency • High throughput • Stateful • Distributed  Developed at the Apache Software Foundation, 1.0.0 released in March 2016, used in production 3
  • 5. 5 Streaming is the biggest change in data infrastructure since Hadoop
  • 6. 6 1. Radically simplified infrastructure 2. Do more with your data, faster 3. Can completely subsume batch
  • 7. 7 Real-world data is produced in a continuous fashion. New systems like Flink and Kafka embrace streaming nature of data. Web server Kafka topic Stream processor
  • 8. Apache Flink stack 8 Gelly Table/SQL ML SAMOA DataSet (Java/Scala)DataStream (Java / Scala) HadoopM/R LocalClusterYARN ApacheBeam ApacheBeam Table/ StreamSQL Cascading Streaming dataflow runtimeStormAPI Zeppelin CEP
  • 9. What makes Flink flink? 9 Low latency High Throughput Well-behaved flow control (back pressure) Make more sense of data Works on real-time and historic data True Streaming Event Time APIs Libraries Stateful Streaming Globally consistent savepoints Exactly-once semantics for fault tolerance Windows & user-defined state Flexible windows (time, count, session, roll-your own) Complex Event Processing
  • 10. Moving existing (batch) data analysis into streaming 10
  • 11. Extract, Transform, Load (ETL)  ETL: Move data from A to B and transform it on the way  Old approach: Server LogsServer Logs Server Logs Mobile IoT
  • 12. Extract, Transform, Load (ETL)  ETL: Move data from A to B and transform it on the way  Old approach: Server Logs HDFS / S3 “Data Lake” Server Logs Server Logs Mobile IoT Tier 0: Raw data
  • 13. Extract, Transform, Load (ETL)  ETL: Move data from A to B and transform it on the way  Old approach: Server Logs HDFS / S3 “Data Lake” Server Logs Server Logs Mobile IoT Tier 0: Raw data Tier 1: Normalized, cleansed data Periodic jobs Parquet / ORC in HDFS User
  • 14. Extract, Transform, Load (ETL)  ETL: Move data from A to B and transform it on the way  Old approach: Server Logs HDFS / S3 “Data Lake” Server Logs Server Logs Mobile IoT Tier 0: Raw data Tier 1: Normalized, cleansed data Periodic jobs Parquet / ORC in HDFS Tier 2: Aggregated data Periodic jobs User User “Data Warehouse”
  • 15. Extract, Transform, Load (Streaming ETL)  ETL: Move data from A to B and transform it on the way  Streaming approach: Server Logs “Data Lake” Server Logs Server Logs Mobile IoT Tier 0: Raw data
  • 16. Stream Processor Extract, Transform, Load (Streaming ETL)  ETL: Move data from A to B and transform it on the way  Streaming approach: Server Logs “Data Lake” Server Logs Server Logs Mobile IoT Kafka Connector Tier 0: Raw data Cleansing Transformation Time-Window Alerts Time-Window
  • 17. Stream Processor Extract, Transform, Load (Streaming ETL)  ETL: Move data from A to B and transform it on the way  Streaming approach: Server Logs “Data Lake” Server Logs Server Logs Mobile IoT Tier 1: Normalized, cleansed data Parquet / ORC in HDFS Kafka Connector ES Connector Rolling file sink Tier 0: Raw data Cleansing Transformation Time-Window Alerts Time-Window User Batch Processing
  • 18. Stream Processor Extract, Transform, Load (Streaming ETL)  ETL: Move data from A to B and transform it on the way  Streaming approach: Server Logs “Data Lake” Server Logs Server Logs Mobile IoT Tier 1: Normalized, cleansed data Parquet / ORC in HDFS Tier 2: Aggregated data User Kafka Connector ES Connector Rolling file sink JDBC sink Cassandra sink Tier 0: Raw data Cleansing Transformation Time-Window Alerts Time-Window User Batch Processing
  • 19. Streaming ETL: Low Latency 19* Your mileage may vary. These are rule of thumb estimates.  Events are processed immediately  No need to wait until the next “load” batch job is running hours minutes milliseconds Periodic batch job Batch processor with micro-batches Latency Approach seconds Stream processor
  • 20. Streaming ETL: Event-time aware 20  Events derived from the same real-world activity might arrive out of order in the system  Flink is event-time aware 11:28 11:29 11:28 11:29 11:28 11:29 Same real-world activity Out of sync clocks Network delays Machine failures
  • 23. Job Overview 23 (Rolling) file sinkFilter operationFilter operation Aggregation to ElasticSearch Streaming WordCount TopN operator
  • 24. Demo code @ GitHub 24 https://github.com/rmetzger/flink-streaming-etl
  • 27. Flink Forward 2016, Berlin Submission deadline: June 30, 2016 Early bird deadline: July 15, 2016 www.flink-forward.org
  • 29. Questions?  Ask now!  eMail: rmetzger@apache.org  Twitter: @rmetzger_  Follow: @ApacheFlink  Read: flink.apache.org/blog, data-artisans.com/blog/  Mailinglists: (news | user | dev)@flink.apache.org 29
  • 31. Sources 31  “Large scale ETL with Hadoop” http://www.slideshare.net/OReillyStrata/large-scale-etl- with-hadoop

Editor's Notes

  • #2: Test note
  • #3: Test1
  • #4: Test1
  • #6: Because its enabling the obvious: Process continuous data in a cont. fashion
  • #7: But what is the importance of streaming, what can you do with it? First, streaming radically simplifies the data infrastructure, by serving many use cases out of the stream processor in real time. This is connected to broader trends like the move to more microservice-based organizations. Second, streaming is the style of processing that is needed by new applications. These include Internet of Things, and demand-driven services like Uber. Third, streaming is just a better way to do many of the traditional use cases because it subsumes batch.