SlideShare a Scribd company logo
Introduction to DataStreams
Concepts
By:Dr.Sarita Tripathy
Assistant Professor
School of Computer engineering
KIIT Deemed to be University
What is a datastream?
• Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered
(implicitly by arrival time or explicitly by timestamp) sequence of items.
It is impossible to control the order in which items arrive, nor is it
feasible to locally store a stream in its entirety.”
• Massive volumes of data, items arrive at a high rate.
Data Streams
• A data stream is a (potentially unbounded) sequence of tuples. Each
tuple consist of a set of attributes, similar to a row in database table.
• Transactional data streams: log interactions between entities
• Credit card: purchases by consumers from merchants
• Telecommunications: phone calls by callers to dialed parties
• Web: accesses by clients of resources at servers
• Measurement data streams: monitor evolution of entity states
• Sensor networks: physical phenomena, road traffic
• IP network: traffic at router interfaces
• Earth climate: temperature, moisture at weather stations
Examples of StreamSources
Before proceeding, let us consider some of the ways in which stream data arises aturally.
Sensor Data : Imagine a temperature sensor bobbing about in the ocean, sending back to a base
station a reading of the surface temperature each hour. The data produced by this sensor is a stream
of real numbers. Now we have 3.5 terabytes arriving every day, and we definitely need to think about
what can be kept in working storage and what can only be archived.
Image Data : Satellites often send down to earth streams consisting of many terabytes of images per
day. Surveillance cameras produce images with lower resolution than satellites, but there can be many
of them, each producing a stream of images at intervals like one second.
Internet and Web Traffic : A switching node in the middle of the Internet receives streams of IP
packets from many inputs and routes them to its outputs. Web sites receive streams of various types.
For example, Google receives several hundred million search queries per day. Yahoo! accepts billions
of “clicks” per day on its various sites.
Characteristics of DataStreams
• Characteristics
• Huge volumes of continuous data, possibly infinite
• Fast changing and requires fast, real-time response
• Data stream captures nicely our data processing needs of today
• Random access is expensive—single scan algorithm (can only have
one look)
• Store only the summary of the data seen thus far
• Most stream data are at pretty low-level or multi-dimensional in
nature, needs multi-level and multi-dimensional processing
Applications of data streamprocessing
• Data stream processing
• Process queries (compute statistics, activate alarms)
• Apply data mining algorithms
• Requirements
• Real-time processing
• One-pass processing
• Bounded storage (no complete storage of streams)
• Possibly consider several streams
• Let’s go deeper into some examples
• Network management
• Stock monitoring
Network management
Network management(cont.)
Stock monitoring
A data-stream-management system(DSMS)
• Streams may be archived in a large archival
store, but we assume it is not possible to answer
queries from the archival store.
• I t could be examined only under special
circumstances using time-consuming retrieval
processes.
• There is also a working store , into which
summaries or parts of streams may be placed,
and which can be used for answering queries.
• The working store might be disk, or it might be
main memory, depending on how fast we need
to process queries.
• But either way, it is of sufficiently limited
capacity that it cannot store all the data from all
the streams.
Generic DSMS Architecture
Updates to
Static Data
User
Queries
[Golab & Özsu 2003]
Input
Monitor
Output
Buffer
Query
Processor
Query
Reposi-
tory
Working
Storage
Summary
Storage
Static
Storage
Streaming
Inputs
Streaming
Outputs
Architecture: Stream QueryProcessing
SDMS (Stream Data
Management System)
Data Stream ManagementSystems
DBMS versus DSMS (Data Stream
Management System)
• Persistent relations
• One-time queries
• Random access
• “Unbounded” disk store
• Only current state matters
• No real-time services
• Relatively low update rate
• Data at any granularity
• Assume precise data
• Access plan determined by query
processor, physical DB design
• Transient streams
• Continuous queries
• Sequential access
• Bounded main memory
• Historical data is important
• Real-time requirements
• Possibly multi-GB arrival rate
• Data at fine granularity
• Data stale/imprecise
• Unpredictable/variable data arrival
and characteristics
Existing DSMS
Challenges of Stream DataProcessing
• Multiple, continuous, rapid, time-varying, ordered streams
• Main memory computations
• Queries are often continuous
• Evaluated continuously as stream data arrives
• Answer updated over time
• Queries are often complex
• Beyond element-at-a-time processing
• Beyond stream-at-a-time processing
• Beyond relational queries (scientific, data mining, OLAP)
• Multi-level/multi-dimensional processing and data mining
• Most stream data are at low-level or multi-dimensional in nature
Howto deal with Big Data Streams ?
Approximate answers toqueries
▪ When ?
• Queries needing unbounded memory
• Too much queries/too rapid streams/too high response time
requirements
• CPU limit
• Memory limit
• Solution : approximate answers to queries
• Sliding windows
• Sampling and load shedding
• Definition of synopsis
Streaming Computing
Approaches
• Two approaches for handling such streams
• Use a time window, and query the window as a static table
• When you can’t store collected data, or to keep track of historical data
• Sampling
• Filtering
• Counting

More Related Content

PDF
Streaming computing: architectures, and tchnologies
PPT
A big-data architecture for real-time analytics
PPTX
Building a data driven search application with LucidWorks SiLK
PPTX
When to Use MongoDB...and When You Should Not...
PDF
Presentacion redislabs-ihub
PDF
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
PPTX
Generic Crawler
PDF
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
Streaming computing: architectures, and tchnologies
A big-data architecture for real-time analytics
Building a data driven search application with LucidWorks SiLK
When to Use MongoDB...and When You Should Not...
Presentacion redislabs-ihub
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Generic Crawler
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...

What's hot (10)

PDF
Building tiered data stores using aesop to bridge sql and no sql systems
PDF
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
PDF
Big data real time architectures
PPTX
When to Use MongoDB
KEY
MongoDB vs Mysql. A devops point of view
PDF
Basic Introduction to Crate @ ViennaDB Meetup
PPTX
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
PPTX
Data Management on Hadoop at Yahoo!
PPTX
Introduction to Real-Time Data Processing
PDF
Using Hazelcast in the Kappa architecture
Building tiered data stores using aesop to bridge sql and no sql systems
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Big data real time architectures
When to Use MongoDB
MongoDB vs Mysql. A devops point of view
Basic Introduction to Crate @ ViennaDB Meetup
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Data Management on Hadoop at Yahoo!
Introduction to Real-Time Data Processing
Using Hazelcast in the Kappa architecture
Ad

Similar to Datastream management system1 (20)

PDF
Lecture6 introduction to data streams
PDF
Data Care, Feeding, and Maintenance
PDF
Development of concurrent services using In-Memory Data Grids
PPT
Big data – can it deliver speed and accuracy v1
PDF
Operational-Analytics
PDF
Dbms vs dsms
PPTX
whyPostgres, a presentation on the project choice for a storage system
PDF
Big Data - Umesh Bellur
PDF
Harness the power of Data in a Big Data Lake
PPTX
Data analytics introduction
PDF
Building data intensive applications
PDF
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
KEY
What ya gonna do?
 
PDF
Nisha talagala keynote_inflow_2016
PPTX
Introduction to Apache Apex
PPTX
Scaling Systems: Architectures that grow
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
PPT
Performance and Scalability Tuning
PPTX
Work with hundred of hot terabytes in JVMs
PPTX
Dw 07032018-dr pl pradhan
Lecture6 introduction to data streams
Data Care, Feeding, and Maintenance
Development of concurrent services using In-Memory Data Grids
Big data – can it deliver speed and accuracy v1
Operational-Analytics
Dbms vs dsms
whyPostgres, a presentation on the project choice for a storage system
Big Data - Umesh Bellur
Harness the power of Data in a Big Data Lake
Data analytics introduction
Building data intensive applications
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
What ya gonna do?
 
Nisha talagala keynote_inflow_2016
Introduction to Apache Apex
Scaling Systems: Architectures that grow
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Performance and Scalability Tuning
Work with hundred of hot terabytes in JVMs
Dw 07032018-dr pl pradhan
Ad

Recently uploaded (20)

PPTX
Geodesy 1.pptx...............................................
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
composite construction of structures.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Sustainable Sites - Green Building Construction
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Welding lecture in detail for understanding
PPT
Project quality management in manufacturing
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
Geodesy 1.pptx...............................................
Mechanical Engineering MATERIALS Selection
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lecture Notes Electrical Wiring System Components
composite construction of structures.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Sustainable Sites - Green Building Construction
Model Code of Practice - Construction Work - 21102022 .pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Welding lecture in detail for understanding
Project quality management in manufacturing
UNIT-1 - COAL BASED THERMAL POWER PLANTS
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Automation-in-Manufacturing-Chapter-Introduction.pdf

Datastream management system1

  • 1. Introduction to DataStreams Concepts By:Dr.Sarita Tripathy Assistant Professor School of Computer engineering KIIT Deemed to be University
  • 2. What is a datastream? • Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered (implicitly by arrival time or explicitly by timestamp) sequence of items. It is impossible to control the order in which items arrive, nor is it feasible to locally store a stream in its entirety.” • Massive volumes of data, items arrive at a high rate.
  • 3. Data Streams • A data stream is a (potentially unbounded) sequence of tuples. Each tuple consist of a set of attributes, similar to a row in database table. • Transactional data streams: log interactions between entities • Credit card: purchases by consumers from merchants • Telecommunications: phone calls by callers to dialed parties • Web: accesses by clients of resources at servers • Measurement data streams: monitor evolution of entity states • Sensor networks: physical phenomena, road traffic • IP network: traffic at router interfaces • Earth climate: temperature, moisture at weather stations
  • 4. Examples of StreamSources Before proceeding, let us consider some of the ways in which stream data arises aturally. Sensor Data : Imagine a temperature sensor bobbing about in the ocean, sending back to a base station a reading of the surface temperature each hour. The data produced by this sensor is a stream of real numbers. Now we have 3.5 terabytes arriving every day, and we definitely need to think about what can be kept in working storage and what can only be archived. Image Data : Satellites often send down to earth streams consisting of many terabytes of images per day. Surveillance cameras produce images with lower resolution than satellites, but there can be many of them, each producing a stream of images at intervals like one second. Internet and Web Traffic : A switching node in the middle of the Internet receives streams of IP packets from many inputs and routes them to its outputs. Web sites receive streams of various types. For example, Google receives several hundred million search queries per day. Yahoo! accepts billions of “clicks” per day on its various sites.
  • 5. Characteristics of DataStreams • Characteristics • Huge volumes of continuous data, possibly infinite • Fast changing and requires fast, real-time response • Data stream captures nicely our data processing needs of today • Random access is expensive—single scan algorithm (can only have one look) • Store only the summary of the data seen thus far • Most stream data are at pretty low-level or multi-dimensional in nature, needs multi-level and multi-dimensional processing
  • 6. Applications of data streamprocessing • Data stream processing • Process queries (compute statistics, activate alarms) • Apply data mining algorithms • Requirements • Real-time processing • One-pass processing • Bounded storage (no complete storage of streams) • Possibly consider several streams • Let’s go deeper into some examples • Network management • Stock monitoring
  • 10. A data-stream-management system(DSMS) • Streams may be archived in a large archival store, but we assume it is not possible to answer queries from the archival store. • I t could be examined only under special circumstances using time-consuming retrieval processes. • There is also a working store , into which summaries or parts of streams may be placed, and which can be used for answering queries. • The working store might be disk, or it might be main memory, depending on how fast we need to process queries. • But either way, it is of sufficiently limited capacity that it cannot store all the data from all the streams.
  • 11. Generic DSMS Architecture Updates to Static Data User Queries [Golab & Özsu 2003] Input Monitor Output Buffer Query Processor Query Reposi- tory Working Storage Summary Storage Static Storage Streaming Inputs Streaming Outputs
  • 12. Architecture: Stream QueryProcessing SDMS (Stream Data Management System)
  • 14. DBMS versus DSMS (Data Stream Management System) • Persistent relations • One-time queries • Random access • “Unbounded” disk store • Only current state matters • No real-time services • Relatively low update rate • Data at any granularity • Assume precise data • Access plan determined by query processor, physical DB design • Transient streams • Continuous queries • Sequential access • Bounded main memory • Historical data is important • Real-time requirements • Possibly multi-GB arrival rate • Data at fine granularity • Data stale/imprecise • Unpredictable/variable data arrival and characteristics
  • 16. Challenges of Stream DataProcessing • Multiple, continuous, rapid, time-varying, ordered streams • Main memory computations • Queries are often continuous • Evaluated continuously as stream data arrives • Answer updated over time • Queries are often complex • Beyond element-at-a-time processing • Beyond stream-at-a-time processing • Beyond relational queries (scientific, data mining, OLAP) • Multi-level/multi-dimensional processing and data mining • Most stream data are at low-level or multi-dimensional in nature
  • 17. Howto deal with Big Data Streams ?
  • 18. Approximate answers toqueries ▪ When ? • Queries needing unbounded memory • Too much queries/too rapid streams/too high response time requirements • CPU limit • Memory limit • Solution : approximate answers to queries • Sliding windows • Sampling and load shedding • Definition of synopsis
  • 19. Streaming Computing Approaches • Two approaches for handling such streams • Use a time window, and query the window as a static table • When you can’t store collected data, or to keep track of historical data • Sampling • Filtering • Counting