SlideShare a Scribd company logo
Getting started with Apache Flink
streaming API
Preetdeep Kumar
18th Jan, 2020
https://www.linkedin.com/in/preetdeep-kumar/
https://github.com/preetdeepkumar/flink-tutorials
https://www.meetup.com/Hyderabad-Apache-Flink-Meetup-Group/
Agenda
• Streaming
• Introduction
• Architecture
• Flink
• Design
• Typical DataStreaming API workflow
• Demo
Streaming – high level summary
• Streaming refers to data that
is continuously generated, usually
at high velocity and in small sizes
(KBs).
• Common examples of streaming
data include:
• IoT Sensor events
• Server logs
• Click-stream data from apps and
websites
• GPS co-ordinates from a ride
• Social media
Batch data processing Stream data processing
Data
scope
Queries or processing over
all or most of the data in
the dataset
Queries or processing over
data within a rolling time
window, or on just the most
recent data record
Data
size
Large batches of data Individual records or micro
batches consisting of a few
records
Latency Minutes to hours Seconds(near real time) or
milliseconds (real time)
Analysis Complex analytics Simple response functions,
aggregates, and rolling metrics
Source: https://aws.amazon.com/streaming-data/
Typical Streaming data architecture
Data sources Collection Ingestion Process Storage Visualize /
Analyze
IOT Devices
(Sensors)
Apps (GPS,
Tweets,
Clickstreams)
Server Logs
Logstash
Kinesis Agent
YourOwnAgent
Kafka
Kinesis Stream
Kstream
Kinesis
Analytics
Flink
Spark
Streaming
S3
Elasticsearch
DB
Kibana
Grafana
Athena
Flink – High level design
https://ci.apache.org/projects/flink/flink-docs-release-1.9/internals/components.html
Flink’s DataStream typical workflow
1. Create a StreamExecutionEnvironment
2. Add a source which will produce data into Flink
3. Create a DataStream
4. Partition the stream using a key
5. Define a window
6. Provide business logic on the data within a Window
7. Send the result of a window to a source

More Related Content

PPTX
Network Infrastructure Monitoring @ LinkedIn
PDF
Data Con LA 2019 - Large scale streaming analytics using cloud based managed ...
PDF
O monitoramento da infraestrutura facilitado, da ingestão ao insight
PDF
Elastic@Colruyt: Ensuring business continuity and improving efficiency
PPTX
Data as a_service_1.1_anup
PPTX
Deploying web apis on core clr to docker
PDF
Grab: Building a Healthy Elasticsearch Ecosystem
PPTX
August Flink Community Update
Network Infrastructure Monitoring @ LinkedIn
Data Con LA 2019 - Large scale streaming analytics using cloud based managed ...
O monitoramento da infraestrutura facilitado, da ingestão ao insight
Elastic@Colruyt: Ensuring business continuity and improving efficiency
Data as a_service_1.1_anup
Deploying web apis on core clr to docker
Grab: Building a Healthy Elasticsearch Ecosystem
August Flink Community Update

What's hot (20)

PDF
Keynote
PDF
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
PDF
Combining logs, metrics, and traces for unified observability
PPTX
Azure event grid
PDF
Apricot2017 Request tracing in distributed environment
PDF
Building a reliable and cost effect logging system at Box
PDF
Overview of Blue Medora - New Relic Plugin for MongoDB
PDF
SplunkLive! Customer Presentation - Hurricane Labs
PDF
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
PDF
Palestra de abertura: Evolução e visão do Elastic Observability
PPTX
SnapLogic Live: IoT Integration
PDF
DEVNET-1129 WAN Automation Engine - Develop Traffic Aware Applications Using ...
PDF
Security Events Logging at Bell with the Elastic Stack
PDF
Observability
PDF
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
PDF
Infrastructure monitoring made easy, from ingest to insight
PDF
Combinação de logs, métricas e rastreamentos para observabilidade unificada
PDF
TBD Data Governance | David Araujo and Michael Agnich, Confluent
PPTX
(Past), Present, and Future of Apache Flink
PDF
Monitoreo sencillo de la infraestructura, de la ingesta a la visualización
Keynote
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
Combining logs, metrics, and traces for unified observability
Azure event grid
Apricot2017 Request tracing in distributed environment
Building a reliable and cost effect logging system at Box
Overview of Blue Medora - New Relic Plugin for MongoDB
SplunkLive! Customer Presentation - Hurricane Labs
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
Palestra de abertura: Evolução e visão do Elastic Observability
SnapLogic Live: IoT Integration
DEVNET-1129 WAN Automation Engine - Develop Traffic Aware Applications Using ...
Security Events Logging at Bell with the Elastic Stack
Observability
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
Infrastructure monitoring made easy, from ingest to insight
Combinação de logs, métricas e rastreamentos para observabilidade unificada
TBD Data Governance | David Araujo and Michael Agnich, Confluent
(Past), Present, and Future of Apache Flink
Monitoreo sencillo de la infraestructura, de la ingesta a la visualización
Ad

Similar to Getting started with apache flink streaming api (20)

PDF
Streaming Visualization
PDF
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
PDF
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
PDF
Flink for Everyone: Self-Service Data Analytics with StreamPipes
PDF
Stream Processing – Concepts and Frameworks
PDF
Self-Service IoT Data Analytics with StreamPipes
PPTX
Data Stream Processing with Apache Flink
PDF
Introduction to Stream Processing
PDF
A Pragmatic Reference Architecture for The Internet of Things
PDF
Motadata brochure
PPTX
Streaming in the Wild with Apache Flink
PPTX
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
PDF
Io t data streaming
PPTX
Splunk App for Stream
PPTX
Streaming in the Wild with Apache Flink
PPTX
Splunk App for Stream for Enhanced Operational Intelligence from Wire Data
PPTX
Cisco UCS and Splunk Workshop
PDF
Introduction to Stream Processing
PDF
Data Ingestion in Big Data and IoT platforms
PDF
Sumo Logic QuickStart Webinar - Jan 2016
Streaming Visualization
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
Flink for Everyone: Self-Service Data Analytics with StreamPipes
Stream Processing – Concepts and Frameworks
Self-Service IoT Data Analytics with StreamPipes
Data Stream Processing with Apache Flink
Introduction to Stream Processing
A Pragmatic Reference Architecture for The Internet of Things
Motadata brochure
Streaming in the Wild with Apache Flink
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Io t data streaming
Splunk App for Stream
Streaming in the Wild with Apache Flink
Splunk App for Stream for Enhanced Operational Intelligence from Wire Data
Cisco UCS and Splunk Workshop
Introduction to Stream Processing
Data Ingestion in Big Data and IoT platforms
Sumo Logic QuickStart Webinar - Jan 2016
Ad

Recently uploaded (20)

PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Logistic Regression ml machine learning.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Lecture1 pattern recognition............
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction-to-Cloud-ComputingFinal.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Logistic Regression ml machine learning.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Business Acumen Training GuidePresentation.pptx
Introduction to machine learning and Linear Models
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Lecture1 pattern recognition............
Galatica Smart Energy Infrastructure Startup Pitch Deck
Moving the Public Sector (Government) to a Digital Adoption
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
climate analysis of Dhaka ,Banglades.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx

Getting started with apache flink streaming api

  • 1. Getting started with Apache Flink streaming API Preetdeep Kumar 18th Jan, 2020 https://www.linkedin.com/in/preetdeep-kumar/ https://github.com/preetdeepkumar/flink-tutorials https://www.meetup.com/Hyderabad-Apache-Flink-Meetup-Group/
  • 2. Agenda • Streaming • Introduction • Architecture • Flink • Design • Typical DataStreaming API workflow • Demo
  • 3. Streaming – high level summary • Streaming refers to data that is continuously generated, usually at high velocity and in small sizes (KBs). • Common examples of streaming data include: • IoT Sensor events • Server logs • Click-stream data from apps and websites • GPS co-ordinates from a ride • Social media Batch data processing Stream data processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record Data size Large batches of data Individual records or micro batches consisting of a few records Latency Minutes to hours Seconds(near real time) or milliseconds (real time) Analysis Complex analytics Simple response functions, aggregates, and rolling metrics Source: https://aws.amazon.com/streaming-data/
  • 4. Typical Streaming data architecture Data sources Collection Ingestion Process Storage Visualize / Analyze IOT Devices (Sensors) Apps (GPS, Tweets, Clickstreams) Server Logs Logstash Kinesis Agent YourOwnAgent Kafka Kinesis Stream Kstream Kinesis Analytics Flink Spark Streaming S3 Elasticsearch DB Kibana Grafana Athena
  • 5. Flink – High level design https://ci.apache.org/projects/flink/flink-docs-release-1.9/internals/components.html
  • 6. Flink’s DataStream typical workflow 1. Create a StreamExecutionEnvironment 2. Add a source which will produce data into Flink 3. Create a DataStream 4. Partition the stream using a key 5. Define a window 6. Provide business logic on the data within a Window 7. Send the result of a window to a source