SlideShare a Scribd company logo
Distributed Stream Consistency Checking
Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan and Abraham Bernstein
Cáceres, Spain, 08.06.2018
Carlo Bernaschina (presenter)
Problem setting
ICWE, 08.06.2018Distributed Stream Consistency Checking2/25
 Real time processing of huge volumes of dynamic data
 Smart cities
 News
 Knowledge graph
The problem of noise
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Streaming data are often noisy
 Broken sensors
 Malicious data injection
 Measurement errors
 How to cope with noise?
 Machine learning and numerical analyses to cope with noise in
time series
 When streams are complex (as Web streams), we want to
ensure that they are compliant to a (non-trivial)
conceptual model
3/25
Research question
How to assess the consistency of streams w.r.t. a
fixed and known a-priori conceptual model?
ICWE, 08.06.2018Distributed Stream Consistency Checking4/25
Towards a solution
ICWE, 08.06.2018Distributed Stream Consistency Checking
 How to model the stream consistency check problem?
5/25
How to model the conceptual model?
ICWE, 08.06.2018Distributed Stream Consistency Checking
 DL-Litecore
 The set of PIs and NIs composes a TBoxT
Person
Student Employee
Faculty Admin
Positive Inclusion (PI)
PhD student
Person
Organization
DJ
Negative Inclusion (NI)
6/25
How to model the data?
ICWE, 08.06.2018Distributed Stream Consistency Checking
 ABox axioms associate:
 Individuals to classes
 Shen is a
 University of Zurich is a
 Individuals to other individuals
 Shen attends the University of Zurich
 Inconsistencies arise when the ontology (TBox + ABox)
contains contraditions
 Daniele is a
 Daniele is a
 disjoint
PhD student
University
PhD student
University
PhD student University
7/25
How to model the data stream?
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Ontology stream
 One staticTBox
 A sequence of time-annotated
ABoxes with the updates
 Sliding window over the
ontology stream
 Captures a recent set of events
A1
A3
A5
{ Shen is a }
3
5
1
t
PhD student
{ Jeff is a
Daniele is a }
Employee
Student
{ Avi is a }PhD student
TBoxPerson
Student Employee
Faculty AdminPhD student
Organiz.
Univers. High school
DJ
8/25
The stream consistency check problem
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Given an ontology stream,
we want to check if it is
consistent w.r.t. a sliding
window of a fixed size
 At each time instant, we
want to check if the events
captured by the sliding
window are consistent
 TheTBox and the current
window content compose
an ontology
A1
A3
A5
{ Shen is a }
3
5
1
t
PhD student
{ Jeff is a
Daniele is a }
University
Student
{ Jeff is a }PhD student
TBoxPerson
Student Employee
Faculty AdminPhD student
Organiz.
Univers. High school
DJ
9/25
Towards a solution
ICWE, 08.06.2018Distributed Stream Consistency Checking
 How to model the stream consistency check problem?
 Description logics, ontology streams
 How to cope with a huge amount of streaming data?
10/25
Scalability
ICWE, 08.06.2018Distributed Stream Consistency Checking
How to cope with the problem when the data volume is
big?
 Sliding windows
 The content of the window may still be too large to be
processed online
 Distribution of the stream consistency checking process
 We build our solution on top of a Distributed Stream
Processing Engine (DSPE)
 We adopt the Storm terminology to introduce the main
concepts, but they are common to other DSPEs
11/25
DSPE concepts
ICWE, 08.06.2018Distributed Stream Consistency Checking
S B1 B2
B1 B2S
B1
B1 B2
Logical topology
Physical topology
BoltsSpout
Node 1
Node 2
Node 3
Tuples
12/25
Towards a solution
ICWE, 08.06.2018Distributed Stream Consistency Checking
 How to model the stream consistency check problem?
 Description logics, ontology streams
 How to cope with a huge amount of streaming data?
 Distributed stream processing engines
 How to perform stream consistency checking over DSPE?
13/25
The NI closure
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Given theTBox T, it is possible to compute all the
possible Negative Inclusion axioms
 The set of all the possible NI axioms is named NI closure
Person
Student Employee
Faculty AdminPhD student
DJ Organization
University Company
14/25
B1
The NIs Topology Method (NTM)
ICWE, 08.06.2018Distributed Stream Consistency Checking
 The resulting topology is the following
 A bolt evaluates when the disjoint axioms in the NI
closure are satisfied
 Each axiom is encoded as a conjunction operation
S B1
15/25
Daniele is a Person
Inconsistency
Daniele is a University
o1
Shen is a Company
Inconsistency
Shen is a Student
o2
Improving NTM
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Drawback of NTM
 The NI closure size can be exponential to the size of theTBox
 The bolt B1 becomes the bottleneck of the topology
 Introduction of inference operations to reduce the number
of conjunction operations
16/25
o
Daniele is a Student Daniele is a Person
S B1
Improving NTM - intuition
ICWE, 08.06.2018Distributed Stream Consistency Checking
Person
Student Employee
DJ Organization
University Company
9 NIs
S B1 B2
Student -> Person
Employee -> Person
Company -> Organization
University -> Organization
S B1
1 NI
17/25
The Pipeline Topology Method (LN)
ICWE, 08.06.2018Distributed Stream Consistency Checking
DJ(Person,Publication)
DJ(Student,Publication)
DJ(Student,Employee)
DJ(Article,Student)
DJ(Person,Organization)
...
Computes the
NI closure
DJ(Person,Publication)
DJ(Student,Employee)
DJ(Person,Organization)
...
Identifies the
essential NIs
Groups and
orders the
essential NIs
18/25
The Pipeline Topology Method (LN) cont’d
ICWE, 08.06.2018Distributed Stream Consistency Checking
Groups are
assigned to bolts
This step has a
major impact on
performance!
Less NIs w.r.t. NTM
19/25
Towards a solution
ICWE, 08.06.2018Distributed Stream Consistency Checking
 How to model the stream consistency check problem?
 Description logics, ontology streams
 How to cope with a huge amount of streaming data?
 Distributed stream processing engines
 How to perform stream consistency checking over DSPE?
 NTM, LN
 How to they perform?
20/25
Setup
ICWE, 08.06.2018Distributed Stream Consistency Checking
 Ontologies
 LUBM
 56 PIs, 70 NIs
 NPD
 332 PIs, 51 Nis
 Six machines
 128GB ram
 2 E5-2680 v2 processors (10 cores per processor)
 Twitter Heron 0.14.3
21/25
Comparing NTM and LN
ICWE, 08.06.2018Distributed Stream Consistency Checking
S B1 B2
LN-x:
x NI groups
Half of the
nodes assigned
to check
consistency
Similar results
LN-2 outperforms
NTM up to 139% The load on the first
node increases
22/25
Investigating the results
ICWE, 08.06.2018Distributed Stream Consistency Checking
LN LN LN
LN
LN
LN
LN
LN
LN LN
LN LN LN LN LN LN
LN
LN LN LN
NTM
23/25
Conclusions
ICWE, 08.06.2018Distributed Stream Consistency Checking
 It is possible to perform consistency checking over high
volumes of data streams
 We developed two methods (NTM and LN) and studied
their performance
 More than 14 million tuples/minute
 LN can outperform NTM up to 300%
 What’s next
 Towards more expressive ontological languages
 Repairing inconsistencies
 Implementation and testing over other DPSEs
24/25
Thank you! Questions?
Distributed Stream Consistency Checking
Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan,Abraham Bernstein
ICWE, 08.06.2018Distributed Stream Consistency Checking25/25

More Related Content

PDF
SLD Revolution: A Cheaper, Faster yet more Accurate Streaming Linked Data Fra...
PDF
Towards a Benchmark for Expressive Stream Reasoning
PDF
On Unified Stream Reasoning
PPT
Distributed Streams
PDF
RDF Stream Processing Models (SR4LD2013)
PPT
Stream Reasoning: Where we got so far. Oxford 2010.1.18
PPT
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
PPT
On the need for a W3C community group on RDF Stream Processing
SLD Revolution: A Cheaper, Faster yet more Accurate Streaming Linked Data Fra...
Towards a Benchmark for Expressive Stream Reasoning
On Unified Stream Reasoning
Distributed Streams
RDF Stream Processing Models (SR4LD2013)
Stream Reasoning: Where we got so far. Oxford 2010.1.18
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
On the need for a W3C community group on RDF Stream Processing

Similar to Distributed stream consistency checking (20)

PDF
Towards efficient processing of RDF data streams
PDF
Towards efficient processing of RDF data streams
PPTX
RDF Stream Processing: Let's React
PPTX
RDF Stream Processing and the role of Semantics
PDF
RDF Stream Processing Models (RSP2014)
PDF
Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
PPTX
Streaming Hypothesis Reasoning - William Smith, Jan 2016
PPTX
Streaming HYpothesis REasoning
PPTX
Tutorial Stream Reasoning SPARQLstream and Morph-streams
PPT
5.1 mining data streams
PPT
Stream Reasoning : Where We Got So Far
PPTX
Stream Reasoning: a summary of ten years of research and a vision for the nex...
PPTX
How to extract valueable information from real time data feeds
PDF
Presentation iswc
PPTX
Reactconf 2014 - Event Stream Processing
PPTX
Extending Complex Event Processing to Graph-structured Information
PPTX
On correctness in RDF stream processor benchmarking
PDF
On Unified Stream Reasoning - The RDF Stream Processing realm
PPTX
Beyond Strong Consistency
PPTX
An Introduction to Distributed Data Streaming
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
RDF Stream Processing: Let's React
RDF Stream Processing and the role of Semantics
RDF Stream Processing Models (RSP2014)
Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming HYpothesis REasoning
Tutorial Stream Reasoning SPARQLstream and Morph-streams
5.1 mining data streams
Stream Reasoning : Where We Got So Far
Stream Reasoning: a summary of ten years of research and a vision for the nex...
How to extract valueable information from real time data feeds
Presentation iswc
Reactconf 2014 - Event Stream Processing
Extending Complex Event Processing to Graph-structured Information
On correctness in RDF stream processor benchmarking
On Unified Stream Reasoning - The RDF Stream Processing realm
Beyond Strong Consistency
An Introduction to Distributed Data Streaming
Ad

More from Daniele Dell'Aglio (17)

PDF
On web stream processing
PDF
On a web of data streams
PDF
Triplewave: a step towards RDF Stream Processing on the Web
PDF
On unifying query languages for RDF streams
PDF
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
PDF
Summary of the Stream Reasoning workshop at ISWC 2016
PDF
Querying the Web of Data with XSPARQL 1.1
PDF
Augmented Participation to Live Events through Social Network Content Enrichm...
PDF
An experience on empirical research about rdf stream
PDF
A Survey of Temporal Extensions of Description Logics
PDF
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)
PPTX
Ontology based top-k query answering over massive, heterogeneous, and dynamic...
PPTX
An Ontological Formulation and an OPM profile for Causality in Planning Appli...
PDF
P&MSP2012 - Maven
PDF
P&MSP2012 - Version Control Systems
PDF
P&MSP2012 - Unit Testing
PDF
P&MSP2012 - Logging Frameworks
On web stream processing
On a web of data streams
Triplewave: a step towards RDF Stream Processing on the Web
On unifying query languages for RDF streams
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
Summary of the Stream Reasoning workshop at ISWC 2016
Querying the Web of Data with XSPARQL 1.1
Augmented Participation to Live Events through Social Network Content Enrichm...
An experience on empirical research about rdf stream
A Survey of Temporal Extensions of Description Logics
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)
Ontology based top-k query answering over massive, heterogeneous, and dynamic...
An Ontological Formulation and an OPM profile for Causality in Planning Appli...
P&MSP2012 - Maven
P&MSP2012 - Version Control Systems
P&MSP2012 - Unit Testing
P&MSP2012 - Logging Frameworks
Ad

Recently uploaded (20)

PPTX
A Presentation on Touch Screen Technology
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
A Presentation on Artificial Intelligence
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
project resource management chapter-09.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
August Patch Tuesday
A Presentation on Touch Screen Technology
SOPHOS-XG Firewall Administrator PPT.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Heart disease approach using modified random forest and particle swarm optimi...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
TLE Review Electricity (Electricity).pptx
A Presentation on Artificial Intelligence
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
1 - Historical Antecedents, Social Consideration.pdf
A comparative analysis of optical character recognition models for extracting...
cloud_computing_Infrastucture_as_cloud_p
Assigned Numbers - 2025 - Bluetooth® Document
Univ-Connecticut-ChatGPT-Presentaion.pdf
Tartificialntelligence_presentation.pptx
Enhancing emotion recognition model for a student engagement use case through...
project resource management chapter-09.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
August Patch Tuesday

Distributed stream consistency checking

  • 1. Distributed Stream Consistency Checking Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan and Abraham Bernstein Cáceres, Spain, 08.06.2018 Carlo Bernaschina (presenter)
  • 2. Problem setting ICWE, 08.06.2018Distributed Stream Consistency Checking2/25  Real time processing of huge volumes of dynamic data  Smart cities  News  Knowledge graph
  • 3. The problem of noise ICWE, 08.06.2018Distributed Stream Consistency Checking  Streaming data are often noisy  Broken sensors  Malicious data injection  Measurement errors  How to cope with noise?  Machine learning and numerical analyses to cope with noise in time series  When streams are complex (as Web streams), we want to ensure that they are compliant to a (non-trivial) conceptual model 3/25
  • 4. Research question How to assess the consistency of streams w.r.t. a fixed and known a-priori conceptual model? ICWE, 08.06.2018Distributed Stream Consistency Checking4/25
  • 5. Towards a solution ICWE, 08.06.2018Distributed Stream Consistency Checking  How to model the stream consistency check problem? 5/25
  • 6. How to model the conceptual model? ICWE, 08.06.2018Distributed Stream Consistency Checking  DL-Litecore  The set of PIs and NIs composes a TBoxT Person Student Employee Faculty Admin Positive Inclusion (PI) PhD student Person Organization DJ Negative Inclusion (NI) 6/25
  • 7. How to model the data? ICWE, 08.06.2018Distributed Stream Consistency Checking  ABox axioms associate:  Individuals to classes  Shen is a  University of Zurich is a  Individuals to other individuals  Shen attends the University of Zurich  Inconsistencies arise when the ontology (TBox + ABox) contains contraditions  Daniele is a  Daniele is a  disjoint PhD student University PhD student University PhD student University 7/25
  • 8. How to model the data stream? ICWE, 08.06.2018Distributed Stream Consistency Checking  Ontology stream  One staticTBox  A sequence of time-annotated ABoxes with the updates  Sliding window over the ontology stream  Captures a recent set of events A1 A3 A5 { Shen is a } 3 5 1 t PhD student { Jeff is a Daniele is a } Employee Student { Avi is a }PhD student TBoxPerson Student Employee Faculty AdminPhD student Organiz. Univers. High school DJ 8/25
  • 9. The stream consistency check problem ICWE, 08.06.2018Distributed Stream Consistency Checking  Given an ontology stream, we want to check if it is consistent w.r.t. a sliding window of a fixed size  At each time instant, we want to check if the events captured by the sliding window are consistent  TheTBox and the current window content compose an ontology A1 A3 A5 { Shen is a } 3 5 1 t PhD student { Jeff is a Daniele is a } University Student { Jeff is a }PhD student TBoxPerson Student Employee Faculty AdminPhD student Organiz. Univers. High school DJ 9/25
  • 10. Towards a solution ICWE, 08.06.2018Distributed Stream Consistency Checking  How to model the stream consistency check problem?  Description logics, ontology streams  How to cope with a huge amount of streaming data? 10/25
  • 11. Scalability ICWE, 08.06.2018Distributed Stream Consistency Checking How to cope with the problem when the data volume is big?  Sliding windows  The content of the window may still be too large to be processed online  Distribution of the stream consistency checking process  We build our solution on top of a Distributed Stream Processing Engine (DSPE)  We adopt the Storm terminology to introduce the main concepts, but they are common to other DSPEs 11/25
  • 12. DSPE concepts ICWE, 08.06.2018Distributed Stream Consistency Checking S B1 B2 B1 B2S B1 B1 B2 Logical topology Physical topology BoltsSpout Node 1 Node 2 Node 3 Tuples 12/25
  • 13. Towards a solution ICWE, 08.06.2018Distributed Stream Consistency Checking  How to model the stream consistency check problem?  Description logics, ontology streams  How to cope with a huge amount of streaming data?  Distributed stream processing engines  How to perform stream consistency checking over DSPE? 13/25
  • 14. The NI closure ICWE, 08.06.2018Distributed Stream Consistency Checking  Given theTBox T, it is possible to compute all the possible Negative Inclusion axioms  The set of all the possible NI axioms is named NI closure Person Student Employee Faculty AdminPhD student DJ Organization University Company 14/25
  • 15. B1 The NIs Topology Method (NTM) ICWE, 08.06.2018Distributed Stream Consistency Checking  The resulting topology is the following  A bolt evaluates when the disjoint axioms in the NI closure are satisfied  Each axiom is encoded as a conjunction operation S B1 15/25 Daniele is a Person Inconsistency Daniele is a University o1 Shen is a Company Inconsistency Shen is a Student o2
  • 16. Improving NTM ICWE, 08.06.2018Distributed Stream Consistency Checking  Drawback of NTM  The NI closure size can be exponential to the size of theTBox  The bolt B1 becomes the bottleneck of the topology  Introduction of inference operations to reduce the number of conjunction operations 16/25 o Daniele is a Student Daniele is a Person S B1
  • 17. Improving NTM - intuition ICWE, 08.06.2018Distributed Stream Consistency Checking Person Student Employee DJ Organization University Company 9 NIs S B1 B2 Student -> Person Employee -> Person Company -> Organization University -> Organization S B1 1 NI 17/25
  • 18. The Pipeline Topology Method (LN) ICWE, 08.06.2018Distributed Stream Consistency Checking DJ(Person,Publication) DJ(Student,Publication) DJ(Student,Employee) DJ(Article,Student) DJ(Person,Organization) ... Computes the NI closure DJ(Person,Publication) DJ(Student,Employee) DJ(Person,Organization) ... Identifies the essential NIs Groups and orders the essential NIs 18/25
  • 19. The Pipeline Topology Method (LN) cont’d ICWE, 08.06.2018Distributed Stream Consistency Checking Groups are assigned to bolts This step has a major impact on performance! Less NIs w.r.t. NTM 19/25
  • 20. Towards a solution ICWE, 08.06.2018Distributed Stream Consistency Checking  How to model the stream consistency check problem?  Description logics, ontology streams  How to cope with a huge amount of streaming data?  Distributed stream processing engines  How to perform stream consistency checking over DSPE?  NTM, LN  How to they perform? 20/25
  • 21. Setup ICWE, 08.06.2018Distributed Stream Consistency Checking  Ontologies  LUBM  56 PIs, 70 NIs  NPD  332 PIs, 51 Nis  Six machines  128GB ram  2 E5-2680 v2 processors (10 cores per processor)  Twitter Heron 0.14.3 21/25
  • 22. Comparing NTM and LN ICWE, 08.06.2018Distributed Stream Consistency Checking S B1 B2 LN-x: x NI groups Half of the nodes assigned to check consistency Similar results LN-2 outperforms NTM up to 139% The load on the first node increases 22/25
  • 23. Investigating the results ICWE, 08.06.2018Distributed Stream Consistency Checking LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN LN NTM 23/25
  • 24. Conclusions ICWE, 08.06.2018Distributed Stream Consistency Checking  It is possible to perform consistency checking over high volumes of data streams  We developed two methods (NTM and LN) and studied their performance  More than 14 million tuples/minute  LN can outperform NTM up to 300%  What’s next  Towards more expressive ontological languages  Repairing inconsistencies  Implementation and testing over other DPSEs 24/25
  • 25. Thank you! Questions? Distributed Stream Consistency Checking Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan,Abraham Bernstein ICWE, 08.06.2018Distributed Stream Consistency Checking25/25