SlideShare a Scribd company logo
A First Step Towards Stream Reasoning  Presenter:   Emanuele Della Valle [email_address] http://emanueledellavalle.org   Authors: Emanuele Della Valle, Stefano Ceri, Davide F. Barbieri,  Daniele Braga and Alessandro Campi
Agenda Introduction Motivation Urban Computing Problem definition Knowledge and data can change over time Data Stream Management Systems Is Stream Reasoning possible? A Conceptual Architecture for Stream Reasoning The LarKC Project A conceptual architecture RDF streams C-SPARQL Two approaches to stream reasoning Evolutionary Revolutionary Conclusions Research agenda - medium term Research agenda - long term Vienna,28.9.2008 - FIS 2008
Introduction While  reasoners   are  year after year  scaling up  in the classical, time invariant domain of ontological knowledge,  reasoning upon rapidly changing information has been neglected or forgotten .  Data streams  are  unbounded sequences of time-varying data elements ;  They occur in a variety of modern applications , such as network monitoring, traffic engineering, sensor networks, RFID tags applications, telecom call records, financial applications, Web logs, click-streams, etc. processing of  data streams has been largely investigated  and specialized Stream Database Management Systems  exist. The combination of reasoning techniques with data streams gives rise to  Stream Reasoning ,  an unexplored, yet high impact, research area . Vienna,28.9.2008 - FIS 2008
Motivation To understand the potential impact  of Stream Reasoning,  we can consider  the emblematic case of  Urban Computing The integration of computing, sensing, and actuation technologies into everyday urban settings and lifestyles. Pervasive computing  has been has been applied  either in relatively  homogeneous rural areas , where researchers have added sensors in places such as forests, vineyards, and glaciers  or, on the other hand,  in small-scale , well-defined patches of the built environment  such as smart houses or rooms . Urban settings  include, for example,  streets, squares, pubs, shops, buses, and cafés  - any space in  the semipublic realms  of our towns and cities.  are challenging  for experimentation and deployment,  and they remain little explore d [source  IEEE Pervasive Computing,July-September 2007 (Vol. 6, No. 3) ] Vienna,28.9.2008 - FIS 2008
Availability of Data Some years ago , due to the lack of data, Urban Computing would have looked like a  Sci-Fi idea .  Nowadays , a large amount of the required  information  can be made  available  on the Internet at almost no cost:  maps with the commercial activities and meeting places, events scheduled in the city and their locations, average speed in highways, but also normal streets positions and speed of public transportation vehicles parking availabilities in specific parking areas, and so on. We are running a survey (please contribute), see http://wiki.larkc.eu/UrbanComputing/ShowUsABetterWay   http://wiki.larkc.eu/UrbanComputing/OtherDataSources   Vienna,28.9.2008 - FIS 2008
A challenge for Stream Reasoning Looking for parking lots  in large cities may cost up to 40% of the daily fuel consumption. Problems dramatically increase when big events, involving lots of people, take place A typical  Urban Computing  problem  is  to  help citizens  willing to participate to such events  in finding a parking lot  and reaching the event locations in time,  while globally limiting  the occurrences of  traffic congestions . Current technologies are not up to this challenge, because  it requires  combining  a huge amount of  static knowledge  about the city  with an even  larger  set of data streams  reasoning  in realtime above the resulting  time-varying  knowledge. Vienna,28.9.2008 - FIS 2008  [source:  http://gizmodo.com/photogallery/trafficsky/1003143552  ]
Problem definition Knowledge and data can change over the time .  For instance, in Urban Computing names of streets, landmarks, kind of events, etc. change very slowly, whereas the number of cars that go through a traffic detector in five minutes changes very fast.  This means that the system must have the  notion of  ' 'observation period '', defined as the period  when the system is subject to querying . Moreover the system,  within a given observation period , must consider the following  four different types of knowledge and data : Invariable knowledge  (a design constrain that we’d like to relax in future) Invariable data Periodically changing data  that change according to a temporal law that can be Event driven changing data  that are updated as a consequence of some external event.  Vienna,28.9.2008 - FIS 2008
Invariable knowledge and data Invariable knowledge it includes obvious  terminological knowledge  such as an address is made up by a street name, a civic number, a city name and a ZIP code less obvious  nomological knowledge  that describes  how the world is expected   to be  traffic lights are switched off during the night to evolve  traffic jams appears when important sport events take place Invariable data do not change in the observation period, e.g. the names and lengths of the roads. Vienna,28.9.2008 - FIS 2008  In the observation period !
Changing data Periodically changing data  change according to a temporal law that can be Pure periodic law , e.g. every night at 10pm all Milano overpasses close. Probabilistic law , e.g. traffic jam appear in the west side of Milano due to bad weather or when San Siro stadium hosts a soccer match. Event driven changing data  are updated as a consequence of some external event. They can be  further characterized by the mean time between changes : Slow , e.g. roads closed for scheduled works Medium , e.g. roads closed for accidents or congestion due to traffic Fast , e.g. the intensity of traffic for each street in a city Vienna,28.9.2008 - FIS 2008
Data Stream Management Systems When a  continuous query is registered , generate a query execution plan New plan  merged  with  existing  plans Plans composed of three main components: Operators  Queues (input and inter-operator) State (windows, operators requiring history) Global scheduler  for plan execution  maximizing experience gathered with previous queries. Vienna,28.9.2008 - FIS 2008  Streams : continuous instead of one-time semantics Selecting by sliding  Windows  on streams  Selecting by  sampling  on streams Background: stream database key concepts TIME Query Execution STREAMS
DSMS Implementations Research Prototypes Amazon/Cougar (Cornell) – sensors Aurora (Brown/MIT) – sensor monitoring, dataflow Gigascope: AT&T Labs – Network Monitoring Hancock (AT&T) – Telecom streams Niagara (OGI/Wisconsin) – Internet DBs & XML OpenCQ (Georgia) – triggers, view maintenance Stream (Stanford) – general-purpose DSMS Stream Mill (UCLA) - power & extensibility Tapestry (Xerox) – pubish/subscribe filtering Telegraph (Berkeley) – adaptive engine for sensors Tribeca (Bellcore) – network monitoring High-tech startups Streambase, Coral8, Apama, Truviso Major DBMS vendors are all adding stream extensions as well Vienna,28.9.2008 - FIS 2008
Is Stream Reasoning possible ? Is combining data stream and reasoning possible?  Can the innovation so far conned within the DB community be leveraged  in realizing a new generation of reasoners able to cope with continuous reasoning tasks ? C-SPARQL 1 C-SPARQL 2 State 3 ⋈  RDF Stream 1 RDF Stream 2 RDF Stream 3 State 1 State 2 ⋈ Global Scheduler Vienna,28.9.2008 - FIS 2008
A Conceptual Architecture for Stream Reasoning 1/2 We are developing the Stream Reasoning vision with the LarKC European Re search Project [ Source: Fensel, D., van Harmelen, F.: Unifying reasoning and search to web scale. IEEE  Internet Computing 11(2) (2007)] Visit http://www.larkc .eu ! Vienna,28.9.2008 - FIS 2008
A Conceptual Architecture for Stream Reasoning 2/2 Decide Select Abstract Reason Streamed Input Sampled Streams RDF Streams Answers […] Problem Modelining Framework Stream data schema Sampling and filtering policy Knowledge Invariable and changing data Reasoning goal Stream data schema Abstraction queries RDF streams schema […] Traffic, Events, …, Geo-Tags  >>  From Data to Model  >>  Traffic Control Actions  Answer quality metrics Decision Criteria Data Streams Retrieval Streams PROBLEM SOLUTION Vienna,28.9.2008 - FIS 2008  data stream element  RDF stream element  configuration action  tuning action
Conceptual Architecture: concepts On top, the problem space is grounded in the Urban Computing scenario, below, input is provided to the "LarKC steps" Retrieval  is intentionally left aside (nothing specific to streams) Selection  mostly applies load-shedding techniques Explicit, implicit, or learned/inferred sampling and filtering policies Abstraction  shifts fine grain raw data streams into "aggregate events" By means of data compression techniques (histograms, wavelets) By means of aggregation operators, Bloom filters, … Abstraction is responsible for "lifting" raw data into RDF typically in the form of  streams of RDF data  (details next) Reasoning  depends on the notion of RDF stream Decisions  are propagated back to the pipeline so as to on-line tune the behavior Adaptive sampling, grouping/aggregation, graceful degradation, … in order to let the system continue to catch up with the real-time requirements Quality metrics for answers and decision criteria are hopefully provided by application designers Vienna,28.9.2008 - FIS 2008
The two key ingredients RDF streams new data format s set at the confluence of conventional data streams and of conventional atoms usually injected into reasoners Continuous SPARQL ( C-SPARQL ) The distinguishing feature of C-SPARQL is the support for  continuous queries , i.e. SPARQL-like queries registered over RDF data streams in the context of a C-SPARQL execution environment and then continuously executed Vienna,28.9.2008 - FIS 2008
What is an RDF stream? Only a complete  RDF molecules  is stream elemen Molecules are the smallest components obtained from lossless RDF graph decomposition [Ding&Al2005] uc:sensor42 uc:measure  _:x . _:x uc: numberOfCars “120” . _:x uc:numberOfTrucks “70” . _:x uc:numberOfOtherVehicles “37” . Every new  RDF statement  (triple) is a stream element uc:sensor42 uc:measure  _:x .  27.9.2008-11.29.58   _:x uc: numberOfCars “120” .  27.9.2008-11.29.59 _:x uc:numberOfTrucks “70” .  27.9.2008-11.30.00  _:x uc:numberOfOtherVehicles “37” .  27.9.2008-11.30.01 Classification of RDF statement stream: From fully bound to fully free Free bound subject, predicate, object free subject, predicate, object bound 27.9.2008-11.30.00  Vienna,28.9.2008 - FIS 2008  spo s po s p o sp o sp o s p o s po spo
Two approaches to stream reasoning Evolutionary  approach Reuse of existing technology Based upon adapters Streams of  RDF molecules Fast prototyping possible Revolutionary  approach Development of radically new concepts and design of revolutionary technologies A specific query language and time-aware reasoners Streams of RDF statements Research framework Vienna,28.9.2008 - FIS 2008
Two approaches to stream reasoning Evolutionary  approach Reuse of existing technology Based upon adapters Streams of  RDF molecules Fast prototyping possible Revolutionary  approach Development of radically new concepts and design of revolutionary technologies A specific query language and time-aware reasoners Streams of RDF statements Research framework Vienna,28.9.2008 - FIS 2008
Evolutionary approach Vienna,28.9.2008 - FIS 2008  Reason Pre-reasoner Abstract Transcoder DSMS Reasoner Snapshot (unaware of time) Stream data schema Abstraction queries Abstracted stream data schema RDF molecules streams schema Knowledge Reasoning goals Aggregation operators A shift w.r.t. language / data model (~ lifting from data to model ) Timed streaming raw data RDF  streams pure RDF
The evolutionary approach requires… Use of existing technologies for the DSMS and the reasoner Massive use of CQL for selection purposes But it works on raw data Careful, semantic stream registration Transcoding and pre-reasoning  for extracting subsequent snapshots of information Incremental maintenance of snapshots These are embedded in the abstraction & reasoning steps of LarKC  Vienna,28.9.2008 - FIS 2008
Two approaches to stream reasoning Evolutionary  approach Reuse of existing technology Based upon adapters Streams of  RDF molecules Fast prototyping possible Revolutionary  approach Development of radically new concepts and design of revolutionary technologies A specific query language and time-aware reasoners Streams of RDF statements Research framework Vienna,28.9.2008 - FIS 2008
Revolutionary approach: C-SPARQL Reasoning on streams with a paradigm which includes streams overall in the reasoning process Current focus  on inventing  C-SPARQL requirements  applicability expressive power/efficiency trade-offs comparison with existing work A query  in C-SPARQL REGISTER  STREAM  new_pirates   COMPUTED EVERY  1  SEC AS CONSTRUCT {  ?vehicle a uc:Pirate . ?vehicle uc:RecentViolations ?countViolations  } FROM  STREAM   <http://uc.larkc.eu/highspeed.trdf>    [ RANGE   1  HOURS  STEP   1  MIN] WHERE {  ?vehicle uc:hasHighSpeedOn ?street . ?v a uc:Pirate . FILTER ( ?v != ?vehicle ) } AGGREGATES  {( ?countViolations ,  COUNT ,  ?vehicle ) .  FILTER ( ?count > 5 )} CQL SPARQL C-SPARQL + Vienna,28.9.2008 - FIS 2008
Research agenda (medium term) Syntax  ( full specification of C-SPARQL ) Incrementally, from existing specifications Including windows, grouping, aggregates, timestamping Distinguishing streams of triples vs molecules Semantics  ( Formal semantics of C-SPARQL ) Query registration, handling overloads Order of evaluation, pattern matching over time, …  Technology  ( Efficiency of evaluation ) Defining a suitable algebra Efficient materialization of inferred data from streams Time window-dependant KB maintenance Smart indexing techniques that benefit from the FIFO semantics (timestamp-based obsolescence of inference chains) Vienna,28.9.2008 - FIS 2008
… exploiting parallelism (long term) Streams are parallel in nature Keep them separate as long as possible, and push sampling and filtering (selection/abstraction) near to the source 1st level of parallelism ( inter-stream ) Each stream can be handled by dedicated processors Each operator can be handled by a different CPU 2nd level of parallelism ( intra-stream ) Windowing may occur in parallel Different CPUs address different (possibly overlapping) windows over the same stream overlapping windows can be divided into disjoint panes, to efficiently compute sub-aggregates over each pane. Vienna,28.9.2008 - FIS 2008
Thank you for paying attention Vienna,28.9.2008 - FIS 2008  Any Questions?
A First Step Towards Stream Reasoning  Lecturer:   Emanuele Della Valle [email_address] http://emanueledellavalle.org   Authors: Emanuele Della Valle, Stefano Ceri, Davide F. Barbieri,  Daniele Braga and Alessandro Campi

More Related Content

PPTX
SEMANCO - Integrating multiple data sources, domains and tools in urban ener...
PDF
Sustainable Places 2015 - The OPTIMUS project
PPTX
IR tutorial
PDF
Digital Business Engineering: Findings from the Install4Schenker case
PDF
Francesca Froy "What is the role of spatial configuration and urban morpholog...
PDF
Streaming Day - an overview of Stream Reasoning
PDF
Software Project Fundamentals and Classic Mistakes - P&MSP2010 (1/11)
PPT
IC2009 Introduzione all'ingegneria della conoscenza
SEMANCO - Integrating multiple data sources, domains and tools in urban ener...
Sustainable Places 2015 - The OPTIMUS project
IR tutorial
Digital Business Engineering: Findings from the Install4Schenker case
Francesca Froy "What is the role of spatial configuration and urban morpholog...
Streaming Day - an overview of Stream Reasoning
Software Project Fundamentals and Classic Mistakes - P&MSP2010 (1/11)
IC2009 Introduzione all'ingegneria della conoscenza

Similar to A First Step Towards Stream Reasoning at FIS 2008 (20)

PPT
Challenges, Approaches, and Solutions in Stream Reasoning
PPTX
It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, ...
PPT
Incremental Reasoning on Streams and Rich Background Knowledge
PDF
Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
PPT
Challenging LarKC with Urban Computing
PDF
Stream reasoning: an approach to tame the velocity and variety dimensions of ...
PPTX
Listening to the pulse of our cities with Stream Reasoning (and few more tech...
PPT
Stream Reasoning: State of the Art and Beyond
PPT
Stream Reasoning : Where We Got So Far
PDF
Formal Models for Context Aware Computing
PPT
Smart Cities: How are they different?
PPT
Stream Reasoning: Where we got so far. Oxford 2010.1.18
PDF
Building Social Life Networks 130818
PDF
Stream Processing Environmental Applications in Jordan Valley
PDF
Myths and challenges in knowledge extraction and analysis from human-generate...
PDF
A survey on context aware system & intelligent Middleware’s
PDF
Mining Stream Data using k-Means clustering Algorithm
PPT
What makes smart cities “Smart”?
PPTX
On Stream Reasoning
PPTX
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Challenges, Approaches, and Solutions in Stream Reasoning
It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, ...
Incremental Reasoning on Streams and Rich Background Knowledge
Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
Challenging LarKC with Urban Computing
Stream reasoning: an approach to tame the velocity and variety dimensions of ...
Listening to the pulse of our cities with Stream Reasoning (and few more tech...
Stream Reasoning: State of the Art and Beyond
Stream Reasoning : Where We Got So Far
Formal Models for Context Aware Computing
Smart Cities: How are they different?
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Building Social Life Networks 130818
Stream Processing Environmental Applications in Jordan Valley
Myths and challenges in knowledge extraction and analysis from human-generate...
A survey on context aware system & intelligent Middleware’s
Mining Stream Data using k-Means clustering Algorithm
What makes smart cities “Smart”?
On Stream Reasoning
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Ad

More from Emanuele Della Valle (20)

PDF
Taming velocity - a tale of four streams
PDF
Stream reasoning
PPTX
Work in progress on Inductive Stream Reasoning
PPTX
Big Data and Data Science W's
PPT
Knowledge graphs in search engines
PPTX
La città dei balocchi 2017 in numeri - Fluxedo
PPTX
Stream Reasoning: a summary of ten years of research and a vision for the nex...
PPTX
ACQUA: Approximate Continuous Query Answering over Streams and Dynamic Linked...
PDF
Big Data: how to use it to create value
PPT
Ist16-04 An introduction to RDF
PPT
Ist16-03 An Introduction to the Semantic Web
PPT
Ist16-02 HL7 from v2 (syntax) to v3 (semantics)
PPT
IST16-01 - Introduction to Interoperability and Semantic Technologies
PDF
Stream reasoning: mastering the velocity and the variety dimensions of Big Da...
PPTX
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
PPTX
Social listener-brera-design-district-2015-03
PDF
City Data Fusion for Event Management (in Italiano)
PDF
Semantic technologies and Interoperability
PDF
Big data: why, what, paradigm shifts enabled , tools and market landscape
PPTX
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
Taming velocity - a tale of four streams
Stream reasoning
Work in progress on Inductive Stream Reasoning
Big Data and Data Science W's
Knowledge graphs in search engines
La città dei balocchi 2017 in numeri - Fluxedo
Stream Reasoning: a summary of ten years of research and a vision for the nex...
ACQUA: Approximate Continuous Query Answering over Streams and Dynamic Linked...
Big Data: how to use it to create value
Ist16-04 An introduction to RDF
Ist16-03 An Introduction to the Semantic Web
Ist16-02 HL7 from v2 (syntax) to v3 (semantics)
IST16-01 - Introduction to Interoperability and Semantic Technologies
Stream reasoning: mastering the velocity and the variety dimensions of Big Da...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Social listener-brera-design-district-2015-03
City Data Fusion for Event Management (in Italiano)
Semantic technologies and Interoperability
Big data: why, what, paradigm shifts enabled , tools and market landscape
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
Ad

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Weekly Chronicles - August'25-Week II
gpt5_lecture_notes_comprehensive_20250812015547.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology

A First Step Towards Stream Reasoning at FIS 2008

  • 1. A First Step Towards Stream Reasoning Presenter: Emanuele Della Valle [email_address] http://emanueledellavalle.org Authors: Emanuele Della Valle, Stefano Ceri, Davide F. Barbieri, Daniele Braga and Alessandro Campi
  • 2. Agenda Introduction Motivation Urban Computing Problem definition Knowledge and data can change over time Data Stream Management Systems Is Stream Reasoning possible? A Conceptual Architecture for Stream Reasoning The LarKC Project A conceptual architecture RDF streams C-SPARQL Two approaches to stream reasoning Evolutionary Revolutionary Conclusions Research agenda - medium term Research agenda - long term Vienna,28.9.2008 - FIS 2008
  • 3. Introduction While reasoners are year after year scaling up in the classical, time invariant domain of ontological knowledge, reasoning upon rapidly changing information has been neglected or forgotten . Data streams are unbounded sequences of time-varying data elements ; They occur in a variety of modern applications , such as network monitoring, traffic engineering, sensor networks, RFID tags applications, telecom call records, financial applications, Web logs, click-streams, etc. processing of data streams has been largely investigated and specialized Stream Database Management Systems exist. The combination of reasoning techniques with data streams gives rise to Stream Reasoning , an unexplored, yet high impact, research area . Vienna,28.9.2008 - FIS 2008
  • 4. Motivation To understand the potential impact of Stream Reasoning, we can consider the emblematic case of Urban Computing The integration of computing, sensing, and actuation technologies into everyday urban settings and lifestyles. Pervasive computing has been has been applied either in relatively homogeneous rural areas , where researchers have added sensors in places such as forests, vineyards, and glaciers or, on the other hand, in small-scale , well-defined patches of the built environment such as smart houses or rooms . Urban settings include, for example, streets, squares, pubs, shops, buses, and cafés - any space in the semipublic realms of our towns and cities. are challenging for experimentation and deployment, and they remain little explore d [source IEEE Pervasive Computing,July-September 2007 (Vol. 6, No. 3) ] Vienna,28.9.2008 - FIS 2008
  • 5. Availability of Data Some years ago , due to the lack of data, Urban Computing would have looked like a Sci-Fi idea . Nowadays , a large amount of the required information can be made available on the Internet at almost no cost: maps with the commercial activities and meeting places, events scheduled in the city and their locations, average speed in highways, but also normal streets positions and speed of public transportation vehicles parking availabilities in specific parking areas, and so on. We are running a survey (please contribute), see http://wiki.larkc.eu/UrbanComputing/ShowUsABetterWay http://wiki.larkc.eu/UrbanComputing/OtherDataSources Vienna,28.9.2008 - FIS 2008
  • 6. A challenge for Stream Reasoning Looking for parking lots in large cities may cost up to 40% of the daily fuel consumption. Problems dramatically increase when big events, involving lots of people, take place A typical Urban Computing problem is to help citizens willing to participate to such events in finding a parking lot and reaching the event locations in time, while globally limiting the occurrences of traffic congestions . Current technologies are not up to this challenge, because it requires combining a huge amount of static knowledge about the city with an even larger set of data streams reasoning in realtime above the resulting time-varying knowledge. Vienna,28.9.2008 - FIS 2008 [source: http://gizmodo.com/photogallery/trafficsky/1003143552 ]
  • 7. Problem definition Knowledge and data can change over the time . For instance, in Urban Computing names of streets, landmarks, kind of events, etc. change very slowly, whereas the number of cars that go through a traffic detector in five minutes changes very fast. This means that the system must have the notion of ' 'observation period '', defined as the period when the system is subject to querying . Moreover the system, within a given observation period , must consider the following four different types of knowledge and data : Invariable knowledge (a design constrain that we’d like to relax in future) Invariable data Periodically changing data that change according to a temporal law that can be Event driven changing data that are updated as a consequence of some external event. Vienna,28.9.2008 - FIS 2008
  • 8. Invariable knowledge and data Invariable knowledge it includes obvious terminological knowledge such as an address is made up by a street name, a civic number, a city name and a ZIP code less obvious nomological knowledge that describes how the world is expected to be traffic lights are switched off during the night to evolve traffic jams appears when important sport events take place Invariable data do not change in the observation period, e.g. the names and lengths of the roads. Vienna,28.9.2008 - FIS 2008 In the observation period !
  • 9. Changing data Periodically changing data change according to a temporal law that can be Pure periodic law , e.g. every night at 10pm all Milano overpasses close. Probabilistic law , e.g. traffic jam appear in the west side of Milano due to bad weather or when San Siro stadium hosts a soccer match. Event driven changing data are updated as a consequence of some external event. They can be further characterized by the mean time between changes : Slow , e.g. roads closed for scheduled works Medium , e.g. roads closed for accidents or congestion due to traffic Fast , e.g. the intensity of traffic for each street in a city Vienna,28.9.2008 - FIS 2008
  • 10. Data Stream Management Systems When a continuous query is registered , generate a query execution plan New plan merged with existing plans Plans composed of three main components: Operators Queues (input and inter-operator) State (windows, operators requiring history) Global scheduler for plan execution maximizing experience gathered with previous queries. Vienna,28.9.2008 - FIS 2008 Streams : continuous instead of one-time semantics Selecting by sliding Windows on streams Selecting by sampling on streams Background: stream database key concepts TIME Query Execution STREAMS
  • 11. DSMS Implementations Research Prototypes Amazon/Cougar (Cornell) – sensors Aurora (Brown/MIT) – sensor monitoring, dataflow Gigascope: AT&T Labs – Network Monitoring Hancock (AT&T) – Telecom streams Niagara (OGI/Wisconsin) – Internet DBs & XML OpenCQ (Georgia) – triggers, view maintenance Stream (Stanford) – general-purpose DSMS Stream Mill (UCLA) - power & extensibility Tapestry (Xerox) – pubish/subscribe filtering Telegraph (Berkeley) – adaptive engine for sensors Tribeca (Bellcore) – network monitoring High-tech startups Streambase, Coral8, Apama, Truviso Major DBMS vendors are all adding stream extensions as well Vienna,28.9.2008 - FIS 2008
  • 12. Is Stream Reasoning possible ? Is combining data stream and reasoning possible? Can the innovation so far conned within the DB community be leveraged in realizing a new generation of reasoners able to cope with continuous reasoning tasks ? C-SPARQL 1 C-SPARQL 2 State 3 ⋈  RDF Stream 1 RDF Stream 2 RDF Stream 3 State 1 State 2 ⋈ Global Scheduler Vienna,28.9.2008 - FIS 2008
  • 13. A Conceptual Architecture for Stream Reasoning 1/2 We are developing the Stream Reasoning vision with the LarKC European Re search Project [ Source: Fensel, D., van Harmelen, F.: Unifying reasoning and search to web scale. IEEE Internet Computing 11(2) (2007)] Visit http://www.larkc .eu ! Vienna,28.9.2008 - FIS 2008
  • 14. A Conceptual Architecture for Stream Reasoning 2/2 Decide Select Abstract Reason Streamed Input Sampled Streams RDF Streams Answers […] Problem Modelining Framework Stream data schema Sampling and filtering policy Knowledge Invariable and changing data Reasoning goal Stream data schema Abstraction queries RDF streams schema […] Traffic, Events, …, Geo-Tags >> From Data to Model >> Traffic Control Actions Answer quality metrics Decision Criteria Data Streams Retrieval Streams PROBLEM SOLUTION Vienna,28.9.2008 - FIS 2008 data stream element RDF stream element configuration action tuning action
  • 15. Conceptual Architecture: concepts On top, the problem space is grounded in the Urban Computing scenario, below, input is provided to the &quot;LarKC steps&quot; Retrieval is intentionally left aside (nothing specific to streams) Selection mostly applies load-shedding techniques Explicit, implicit, or learned/inferred sampling and filtering policies Abstraction shifts fine grain raw data streams into &quot;aggregate events&quot; By means of data compression techniques (histograms, wavelets) By means of aggregation operators, Bloom filters, … Abstraction is responsible for &quot;lifting&quot; raw data into RDF typically in the form of streams of RDF data (details next) Reasoning depends on the notion of RDF stream Decisions are propagated back to the pipeline so as to on-line tune the behavior Adaptive sampling, grouping/aggregation, graceful degradation, … in order to let the system continue to catch up with the real-time requirements Quality metrics for answers and decision criteria are hopefully provided by application designers Vienna,28.9.2008 - FIS 2008
  • 16. The two key ingredients RDF streams new data format s set at the confluence of conventional data streams and of conventional atoms usually injected into reasoners Continuous SPARQL ( C-SPARQL ) The distinguishing feature of C-SPARQL is the support for continuous queries , i.e. SPARQL-like queries registered over RDF data streams in the context of a C-SPARQL execution environment and then continuously executed Vienna,28.9.2008 - FIS 2008
  • 17. What is an RDF stream? Only a complete RDF molecules is stream elemen Molecules are the smallest components obtained from lossless RDF graph decomposition [Ding&Al2005] uc:sensor42 uc:measure _:x . _:x uc: numberOfCars “120” . _:x uc:numberOfTrucks “70” . _:x uc:numberOfOtherVehicles “37” . Every new RDF statement (triple) is a stream element uc:sensor42 uc:measure _:x . 27.9.2008-11.29.58 _:x uc: numberOfCars “120” . 27.9.2008-11.29.59 _:x uc:numberOfTrucks “70” . 27.9.2008-11.30.00 _:x uc:numberOfOtherVehicles “37” . 27.9.2008-11.30.01 Classification of RDF statement stream: From fully bound to fully free Free bound subject, predicate, object free subject, predicate, object bound 27.9.2008-11.30.00 Vienna,28.9.2008 - FIS 2008 spo s po s p o sp o sp o s p o s po spo
  • 18. Two approaches to stream reasoning Evolutionary approach Reuse of existing technology Based upon adapters Streams of RDF molecules Fast prototyping possible Revolutionary approach Development of radically new concepts and design of revolutionary technologies A specific query language and time-aware reasoners Streams of RDF statements Research framework Vienna,28.9.2008 - FIS 2008
  • 19. Two approaches to stream reasoning Evolutionary approach Reuse of existing technology Based upon adapters Streams of RDF molecules Fast prototyping possible Revolutionary approach Development of radically new concepts and design of revolutionary technologies A specific query language and time-aware reasoners Streams of RDF statements Research framework Vienna,28.9.2008 - FIS 2008
  • 20. Evolutionary approach Vienna,28.9.2008 - FIS 2008 Reason Pre-reasoner Abstract Transcoder DSMS Reasoner Snapshot (unaware of time) Stream data schema Abstraction queries Abstracted stream data schema RDF molecules streams schema Knowledge Reasoning goals Aggregation operators A shift w.r.t. language / data model (~ lifting from data to model ) Timed streaming raw data RDF streams pure RDF
  • 21. The evolutionary approach requires… Use of existing technologies for the DSMS and the reasoner Massive use of CQL for selection purposes But it works on raw data Careful, semantic stream registration Transcoding and pre-reasoning for extracting subsequent snapshots of information Incremental maintenance of snapshots These are embedded in the abstraction & reasoning steps of LarKC Vienna,28.9.2008 - FIS 2008
  • 22. Two approaches to stream reasoning Evolutionary approach Reuse of existing technology Based upon adapters Streams of RDF molecules Fast prototyping possible Revolutionary approach Development of radically new concepts and design of revolutionary technologies A specific query language and time-aware reasoners Streams of RDF statements Research framework Vienna,28.9.2008 - FIS 2008
  • 23. Revolutionary approach: C-SPARQL Reasoning on streams with a paradigm which includes streams overall in the reasoning process Current focus on inventing C-SPARQL requirements applicability expressive power/efficiency trade-offs comparison with existing work A query in C-SPARQL REGISTER STREAM new_pirates COMPUTED EVERY 1 SEC AS CONSTRUCT { ?vehicle a uc:Pirate . ?vehicle uc:RecentViolations ?countViolations } FROM STREAM <http://uc.larkc.eu/highspeed.trdf> [ RANGE 1 HOURS STEP 1 MIN] WHERE { ?vehicle uc:hasHighSpeedOn ?street . ?v a uc:Pirate . FILTER ( ?v != ?vehicle ) } AGGREGATES {( ?countViolations , COUNT , ?vehicle ) . FILTER ( ?count > 5 )} CQL SPARQL C-SPARQL + Vienna,28.9.2008 - FIS 2008
  • 24. Research agenda (medium term) Syntax ( full specification of C-SPARQL ) Incrementally, from existing specifications Including windows, grouping, aggregates, timestamping Distinguishing streams of triples vs molecules Semantics ( Formal semantics of C-SPARQL ) Query registration, handling overloads Order of evaluation, pattern matching over time, … Technology ( Efficiency of evaluation ) Defining a suitable algebra Efficient materialization of inferred data from streams Time window-dependant KB maintenance Smart indexing techniques that benefit from the FIFO semantics (timestamp-based obsolescence of inference chains) Vienna,28.9.2008 - FIS 2008
  • 25. … exploiting parallelism (long term) Streams are parallel in nature Keep them separate as long as possible, and push sampling and filtering (selection/abstraction) near to the source 1st level of parallelism ( inter-stream ) Each stream can be handled by dedicated processors Each operator can be handled by a different CPU 2nd level of parallelism ( intra-stream ) Windowing may occur in parallel Different CPUs address different (possibly overlapping) windows over the same stream overlapping windows can be divided into disjoint panes, to efficiently compute sub-aggregates over each pane. Vienna,28.9.2008 - FIS 2008
  • 26. Thank you for paying attention Vienna,28.9.2008 - FIS 2008 Any Questions?
  • 27. A First Step Towards Stream Reasoning Lecturer: Emanuele Della Valle [email_address] http://emanueledellavalle.org Authors: Emanuele Della Valle, Stefano Ceri, Davide F. Barbieri, Daniele Braga and Alessandro Campi

Editor's Notes

  • #2: Service-Finder