SlideShare a Scribd company logo
Presented By: Anuj & Jashan
Let’s get to know
Streaming
- A developer’s point of view
Our Agenda
01 Streaming: What, Why,
Benefits
02 Different Architectures
03 Challenges in Stream Processing
04 Types of Stream Processing: Stateless &
Stateful
05 Stateful Stream Processing:
Elaborated
What is Data Streaming?
● A continuous flow of data is called Data
Streaming.
● Ex: Surge in IoT devices caused more data to
gather.
● Data gathered at real time can be processed
to get real time results.
● Stream processing is the practice of taking
action on a series of data at the time the
data is created.
Why Data Streaming ?
● Providing insights faster.
● Handle never-ending stream of events
● Easy inspection of data from multiple streams
simultaneously
● Stream processing can work with a lot less hardware
than batch processing
● To Design data processing engine with infinite data sets
in mind.
Streaming: Benefits
● Lot less hardware required.
● Real-time fraud and anomaly detection.
● Internet of Things (IoT) real-time analytics.
● Real-time personalization, marketing, and
advertising.
Evolution of Stream Processing
beginning 1970 Early 2000s 2015 Current
Fortran / C
Started Simple processing
SQL and RDBMs
Databases invented
Batch
processing
Bulk processing and Big
Data like Map-Reduce
Streaming or
Micro Batching
Stream processing started
showing promises
Streaming SQL
Unified
Processing
Streaming or Micro-Batching ?
Different Architectures
Different Architectures: At a Glance
Lambda vs. Kappa
Lambda Architecture
Batch Processing + Streaming Power
Kappa Architecture
Pure Streaming
Challenges
Stream Processing Challenges
3 4
Late Data
Data received at a later time
than the actual event time
Deduplication
Removing duplicate data in
stream
1 2
Stream Joins
Joining Data from two
Streams
Aggregations
Aggregation operations for
SQL
5 6
Fast Incoming
DataStream Processor
Not Upto Speed
Fault tolerance
This paragraph actually is a
good place for title
description
Solution in Streaming
1 2
3 4
5 6
Stream Joins
Managing state in Streaming
Watermarking
Late Data
Managing state in Streaming
window, watermarking
Fast Incoming
DataBackpressure
Aggregations
Apply grouping (window)
and watermarking
Deduplication
Managing state in Streaming
with watermarking
Fault tolerance
Checkpointing
Late Data in Streaming
Types of Stream
Processing
Types of Stream Processing
Stateless & Stateful
Stateless Stream Processing
What
This streaming is the straight
forward streaming we don’t need to
maintain state
Where
Where we need to perform some
operation per individual
message/event like filter, select, etc
When
when result is not dependent upon
previous events
Stateful Stream Processing
What
This stream is maintaining the state to
perform Aggregations, Deduplication,
Joins
Where
where we need to perform
operations like groupBy, count, etc
When
When result is dependent upon
previous events.
Stateful Stream
Processing: Elaborated
Stateful Streaming: Aggregation
● Aggregation by key only
● Aggregation by event time windows
● Aggregation by both
Windowing
01
02
03
This is simplest window.. This
window is pretty straight forward
We can perform both
windows by with respect to
the processing time and
event time.
This is window is bit complex then the Fixed
window. This window gives us two insights
like window and slice
Windowing by processing time vs
event time
Fixed window (Tumbling windows)
Sliding window (Hopping
windows)
Event Time & Processing Time
Windowing by Processing Time vs Event
Time
Processing Time Window
● Processing time window is based upon
the clock time window.
● All the late events will keep into current
window
● Do not reorder the out of order events
Event Time Window
● Event time window is based upon time
when event get produced
● Event will be keep in the belonging
window.
● Re-order the out of order events.
Fixed Window (Tumbling Window)
Fixed/tumbling: time is partitioned into same-length,
non-overlapping chunks. Each event belongs to exactly one
window
Fixed Window (Tumbling Window)
Sliding Window (Hopping Window)
Sliding: windows have fixed length, but are separated by a time
interval (step) which can be smaller than the window length. Typically
the window interval is a multiplicity of the step. Each event belongs to
a number of windows ([window interval]/[window step]).
Sliding Window (Hopping Window)
Late Data in Streaming
Watermarking
● Data newer than watermark may be late, but allowed
to aggregate
● Windows older than watermark automatically deleted
to
limit the amount of intermediate state
Handle more late data -> Keep more state
Reduced the state -> Handle less lateness
Internal working of Watermarking
Stateful Streaming: Deduplication
● Drop duplicate records in a Stream
● Specify Columns which uniquely identify
records
● State will store unique keys in stream and
drop any record matching the state
Stateful Streaming: Deduplication
● Too large Key Set in state for
deduplication will make the streaming
unstable
● Solution: Drop the state after a specified
period.
Stateful Streaming: Joins
● Each of the stream should buffer events in
state for matching any future events of other
stream.
Stateful Streaming: Joins
Stateful Streaming: Joins
● Impressions can be 2 hours late
● Clicks can be 3 hours late
● Clicks can occur within 1 hour after the
corresponding impression
Some Use case of Streaming
● Algorithmic Trading, Stock Market Surveillance,
● Smart Patient Care
● Monitoring a production line
● Supply chain optimizations
● Intrusion, Surveillance and Fraud Detection ( e.g. Uber)
● Most Smart Device Applications: Smart Car, Smart Home ..
● Smart Grid — (e.g. load prediction and outlier plug detection see Smart grids, 4 Billion events, throughout in range
of 100Ks)
● Traffic Monitoring, Geofencing, Vehicle, and Wildlife tracking — e.g. TFL London Transport Management System
● Sports analytics — Augment Sports with real-time analytics (e.g. this is a work we did with a real football game (e.g.
Overlaying real time analytics on Football Broadcasts)
● Context-aware promotions and advertising
● Computer system and network monitoring
● Predictive Maintenance, (e.g. Machine Learning Techniques for Predictive Maintenance)
● Geospatial data processing
Some Use case of Streaming
Some Use case of Streaming
References
● https://hazelcast.com/glossary/kappa-architecture/
● https://hazelcast.com/glossary/lambda-architecture/
● https://databricks.com/glossary/lambda-architecture
● Streams Concepts — Confluent Platform
● Stateful Stream Processing: Databricks
● Spark Strcutured Streaming Documentation
Thank You !
Get in touch with us:
anuj.saxena@knoldus.com || anuj1207 || anuj-saxena
jashan.goyal@knoldus.com || jashangoyal09 || jashan-goyal

More Related Content

PPTX
Real time databases
PPT
Svetlin Nakov - Transactions: Case Study
PPTX
Hard real time db tsp
PPTX
Transaction processing ppt
PPTX
Unit no 5 transation processing DMS 22319
PPTX
Microservices Coordination using Saga
PDF
FPGA CEP Appliance
PDF
Data migration
Real time databases
Svetlin Nakov - Transactions: Case Study
Hard real time db tsp
Transaction processing ppt
Unit no 5 transation processing DMS 22319
Microservices Coordination using Saga
FPGA CEP Appliance
Data migration

Similar to Let's get to know the Data Streaming (20)

PDF
Empowering Real-Time Decision Making with Data Streaming
PDF
Complex event processing platform handling millions of users - Krzysztof Zarz...
PDF
Scalability truths and serverless architectures
PDF
Streaming Analytics and Internet of Things - Geesara Prathap
PDF
Stream Processing Overview
PPTX
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
PDF
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
PDF
Understanding time in structured streaming
PDF
[WSO2Con EU 2018] The Rise of Streaming SQL
PDF
Observability at Spotify
PDF
Apache flink
PPTX
Kostas Tzoumas - Stream Processing with Apache Flink®
PPTX
Debunking Common Myths in Stream Processing
PDF
AI-Powered Streaming Analytics for Real-Time Customer Experience
PDF
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
PDF
Ecetera uses Splunk to facilitate DevOps in forex
PDF
Introduction to Stream Processing
PDF
Netflix SRE perf meetup_slides
PDF
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
PDF
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Empowering Real-Time Decision Making with Data Streaming
Complex event processing platform handling millions of users - Krzysztof Zarz...
Scalability truths and serverless architectures
Streaming Analytics and Internet of Things - Geesara Prathap
Stream Processing Overview
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Understanding time in structured streaming
[WSO2Con EU 2018] The Rise of Streaming SQL
Observability at Spotify
Apache flink
Kostas Tzoumas - Stream Processing with Apache Flink®
Debunking Common Myths in Stream Processing
AI-Powered Streaming Analytics for Real-Time Customer Experience
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Ecetera uses Splunk to facilitate DevOps in forex
Introduction to Stream Processing
Netflix SRE perf meetup_slides
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
PPTX
Self-Healing Test Automation Framework - Healenium
PPTX
Kanban Metrics Presentation (Project Management)
PPTX
Java 17 features and implementation.pptx
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
PPTX
GraalVM - A Step Ahead of JVM Presentation
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
DAPR - Distributed Application Runtime Presentation
PPTX
Introduction to Azure Virtual WAN Presentation
PPTX
Introduction to Argo Rollouts Presentation
PPTX
Intro to Azure Container App Presentation
PPTX
Insights Unveiled Test Reporting and Observability Excellence
PPTX
Introduction to Splunk Presentation (DevOps)
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
PPTX
AWS: Messaging Services in AWS Presentation
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
PPTX
Managing State & HTTP Requests In Ionic.
Angular Hydration Presentation (FrontEnd)
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Self-Healing Test Automation Framework - Healenium
Kanban Metrics Presentation (Project Management)
Java 17 features and implementation.pptx
Chaos Mesh Introducing Chaos in Kubernetes
GraalVM - A Step Ahead of JVM Presentation
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
DAPR - Distributed Application Runtime Presentation
Introduction to Azure Virtual WAN Presentation
Introduction to Argo Rollouts Presentation
Intro to Azure Container App Presentation
Insights Unveiled Test Reporting and Observability Excellence
Introduction to Splunk Presentation (DevOps)
Code Camp - Data Profiling and Quality Analysis Framework
AWS: Messaging Services in AWS Presentation
Amazon Cognito: A Primer on Authentication and Authorization
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Managing State & HTTP Requests In Ionic.
Ad

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Monthly Chronicles - July 2025
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars

Let's get to know the Data Streaming

  • 1. Presented By: Anuj & Jashan Let’s get to know Streaming - A developer’s point of view
  • 2. Our Agenda 01 Streaming: What, Why, Benefits 02 Different Architectures 03 Challenges in Stream Processing 04 Types of Stream Processing: Stateless & Stateful 05 Stateful Stream Processing: Elaborated
  • 3. What is Data Streaming? ● A continuous flow of data is called Data Streaming. ● Ex: Surge in IoT devices caused more data to gather. ● Data gathered at real time can be processed to get real time results. ● Stream processing is the practice of taking action on a series of data at the time the data is created.
  • 4. Why Data Streaming ? ● Providing insights faster. ● Handle never-ending stream of events ● Easy inspection of data from multiple streams simultaneously ● Stream processing can work with a lot less hardware than batch processing ● To Design data processing engine with infinite data sets in mind.
  • 5. Streaming: Benefits ● Lot less hardware required. ● Real-time fraud and anomaly detection. ● Internet of Things (IoT) real-time analytics. ● Real-time personalization, marketing, and advertising.
  • 6. Evolution of Stream Processing beginning 1970 Early 2000s 2015 Current Fortran / C Started Simple processing SQL and RDBMs Databases invented Batch processing Bulk processing and Big Data like Map-Reduce Streaming or Micro Batching Stream processing started showing promises Streaming SQL Unified Processing
  • 9. Different Architectures: At a Glance Lambda vs. Kappa
  • 13. Stream Processing Challenges 3 4 Late Data Data received at a later time than the actual event time Deduplication Removing duplicate data in stream 1 2 Stream Joins Joining Data from two Streams Aggregations Aggregation operations for SQL 5 6 Fast Incoming DataStream Processor Not Upto Speed Fault tolerance This paragraph actually is a good place for title description
  • 14. Solution in Streaming 1 2 3 4 5 6 Stream Joins Managing state in Streaming Watermarking Late Data Managing state in Streaming window, watermarking Fast Incoming DataBackpressure Aggregations Apply grouping (window) and watermarking Deduplication Managing state in Streaming with watermarking Fault tolerance Checkpointing
  • 15. Late Data in Streaming
  • 17. Types of Stream Processing Stateless & Stateful
  • 18. Stateless Stream Processing What This streaming is the straight forward streaming we don’t need to maintain state Where Where we need to perform some operation per individual message/event like filter, select, etc When when result is not dependent upon previous events
  • 19. Stateful Stream Processing What This stream is maintaining the state to perform Aggregations, Deduplication, Joins Where where we need to perform operations like groupBy, count, etc When When result is dependent upon previous events.
  • 21. Stateful Streaming: Aggregation ● Aggregation by key only ● Aggregation by event time windows ● Aggregation by both
  • 22. Windowing 01 02 03 This is simplest window.. This window is pretty straight forward We can perform both windows by with respect to the processing time and event time. This is window is bit complex then the Fixed window. This window gives us two insights like window and slice Windowing by processing time vs event time Fixed window (Tumbling windows) Sliding window (Hopping windows)
  • 23. Event Time & Processing Time
  • 24. Windowing by Processing Time vs Event Time Processing Time Window ● Processing time window is based upon the clock time window. ● All the late events will keep into current window ● Do not reorder the out of order events Event Time Window ● Event time window is based upon time when event get produced ● Event will be keep in the belonging window. ● Re-order the out of order events.
  • 25. Fixed Window (Tumbling Window) Fixed/tumbling: time is partitioned into same-length, non-overlapping chunks. Each event belongs to exactly one window
  • 27. Sliding Window (Hopping Window) Sliding: windows have fixed length, but are separated by a time interval (step) which can be smaller than the window length. Typically the window interval is a multiplicity of the step. Each event belongs to a number of windows ([window interval]/[window step]).
  • 29. Late Data in Streaming
  • 30. Watermarking ● Data newer than watermark may be late, but allowed to aggregate ● Windows older than watermark automatically deleted to limit the amount of intermediate state Handle more late data -> Keep more state Reduced the state -> Handle less lateness
  • 31. Internal working of Watermarking
  • 32. Stateful Streaming: Deduplication ● Drop duplicate records in a Stream ● Specify Columns which uniquely identify records ● State will store unique keys in stream and drop any record matching the state
  • 33. Stateful Streaming: Deduplication ● Too large Key Set in state for deduplication will make the streaming unstable ● Solution: Drop the state after a specified period.
  • 34. Stateful Streaming: Joins ● Each of the stream should buffer events in state for matching any future events of other stream.
  • 36. Stateful Streaming: Joins ● Impressions can be 2 hours late ● Clicks can be 3 hours late ● Clicks can occur within 1 hour after the corresponding impression
  • 37. Some Use case of Streaming ● Algorithmic Trading, Stock Market Surveillance, ● Smart Patient Care ● Monitoring a production line ● Supply chain optimizations ● Intrusion, Surveillance and Fraud Detection ( e.g. Uber) ● Most Smart Device Applications: Smart Car, Smart Home .. ● Smart Grid — (e.g. load prediction and outlier plug detection see Smart grids, 4 Billion events, throughout in range of 100Ks) ● Traffic Monitoring, Geofencing, Vehicle, and Wildlife tracking — e.g. TFL London Transport Management System ● Sports analytics — Augment Sports with real-time analytics (e.g. this is a work we did with a real football game (e.g. Overlaying real time analytics on Football Broadcasts) ● Context-aware promotions and advertising ● Computer system and network monitoring ● Predictive Maintenance, (e.g. Machine Learning Techniques for Predictive Maintenance) ● Geospatial data processing
  • 38. Some Use case of Streaming
  • 39. Some Use case of Streaming
  • 40. References ● https://hazelcast.com/glossary/kappa-architecture/ ● https://hazelcast.com/glossary/lambda-architecture/ ● https://databricks.com/glossary/lambda-architecture ● Streams Concepts — Confluent Platform ● Stateful Stream Processing: Databricks ● Spark Strcutured Streaming Documentation
  • 41. Thank You ! Get in touch with us: anuj.saxena@knoldus.com || anuj1207 || anuj-saxena jashan.goyal@knoldus.com || jashangoyal09 || jashan-goyal