SlideShare a Scribd company logo
Managing Completeness of Web Data
Fariz Darari
PhD Supervisor: Werner Nutt
Supported by the project MAGIC, funded by the province of Bolzano
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 1 / 38
About Us
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 2 / 38
Research Group
Sorted by distance to Werner’s office :)
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 3 / 38
Bozen-Bolzano
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 4 / 38
Motivation
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 5 / 38
Completeness statements are already there
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 6 / 38
However . . .
Completeness statements are available
but only in natural language
Unclear what data completeness & query completeness mean
No techniques to check whether data completeness entails
query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 7 / 38
Solution Ideas
Completeness statements are available
but only in natural language
Solution: RDF-ize completeness statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
Solution Ideas
Completeness statements are available
but only in natural language
Solution: RDF-ize completeness statements
Unclear what data completeness & query completeness mean
Solution: Formalize data completeness & query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
Solution Ideas
Completeness statements are available
but only in natural language
Solution: RDF-ize completeness statements
Unclear what data completeness & query completeness mean
Solution: Formalize data completeness & query completeness
No techniques to check whether data completeness entails
query completeness
Solution: Develop techniques to check whether data completeness
entails query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
Solutions
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 9 / 38
Background: RDF
Grd = { (resDogs, dir, tarantino),
(resDogs, act, tarantino) }
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 10 / 38
Background: SPARQL
SELECT
Qsdir = ({ ?m }, { (?m, dir, tarantino) })
ASK
Qadir = ({ }, { (?m, dir, tarantino) })
CONSTRUCT
Qcdir = ({ (?m, dir, tarantino) }, { (?m, dir, tarantino) })
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 11 / 38
Story: Incomplete Data Source
An incomplete data source of Reservoir Dogs,
Gdbp = (Ga
dbp, Gi
dbp):
Ga
dbp = {(resDogs, dir, tarantino)}
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 12 / 38
Story: Incomplete Data Source
An incomplete data source of Reservoir Dogs,
Gdbp = (Ga
dbp, Gi
dbp):
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 13 / 38
Story: Completeness Statement
Ga
dbp = {(resDogs, dir, tarantino)}
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
From (Ga
dbp, Gi
dbp), we can say that DBpedia is complete
for movies directed by Tarantino:
Cdir = Compl((?m, dir, tarantino) | ∅)
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38
Story: Completeness Statement
Ga
dbp = {(resDogs, dir, tarantino)}
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
From (Ga
dbp, Gi
dbp), we can say that DBpedia is complete
for movies directed by Tarantino:
Cdir = Compl((?m, dir, tarantino) | ∅)
However, it is not complete for actors in movies directed by Tarantino:
Cact = Compl((?m, act, ?a) | (?m, dir, tarantino))
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38
Story: Query Completeness
Ga
dbp = {(resDogs, dir, tarantino)}
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
Consequently, when we ask for all movies directed by Tarantino
over DBpedia:
Qdir = ({?m}, {(?m, dir, tarantino)})
the query completeness Compl(Qdir ) is obtained.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 15 / 38
Story: Query Completeness
Ga
dbp = {(resDogs, dir, tarantino)}
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
However, if we ask for all movies directed by and starring Tarantino:
Qdir+act = ({?m}, {(?m, dir, tarantino), (?m, act, tarantino)})
the query completeness Compl(Qdir+act ) is not obtained.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 16 / 38
Incomplete Data Source
Definition (Incomplete Data Source)
An incomplete data source is a pair of two graphs
G = (Ga, Gi), where Ga ⊆ Gi.
We call Ga the available graph and Gi the ideal graph.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 17 / 38
Completeness Statement
Definition (Completeness Statement)
Let P1 be a non-empty BGP and P2 a BGP.
A completeness statement is defined as
Compl(P1 | P2)
where we call P1 the pattern and P2 the condition of the statement.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 18 / 38
Satisfaction of Completeness Statements
To a statement
C = Compl(P1 | P2),
we associate the CONSTRUCT query
QC = (P1, P1 ∪ P2).
Then, we say:
C is satisfied by an incomplete data source G = (Ga, Gi),
written G |= C, if
QC Gi ⊆ Ga
.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 19 / 38
Completeness Statements in RDF
Cact = Compl((?m, act, ?a) | (?m, dir, tarantino))
lv:dataset a void:Dataset;
c:hasComplStmt lv:csAct.
lv:csAct c:hasPattern [c:subject [c:varName "m"];
c:predicate s:actor;
c:object [c:varName "a"]];
c:hasCondition [c:subject [c:varName "m"];
c:predicate s:director;
c:object lmdb:Quentin_Tarantino].
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 20 / 38
Query Completeness
Definition (Query Completeness)
Let Q be a query. We write
Compl(Q)
to say that Q is complete.
An incomplete data source G = (Ga, Gi) satisfies Compl(Q),
written G |= Compl(Q), if
Q Gi = Q Ga .
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 21 / 38
Completeness Entailment
Problem Definition (Completeness Entailment)
Let C be a set of completeness statements and Q a query.
We say that C entails the completeness of Q, written
C |= Compl(Q),
if any incomplete data source satisfying C also satisfies Compl(Q).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 22 / 38
Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ) where
Pdir+act = { (?m, dir, tarantino), (?m, act, tarantino) }.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 23 / 38
Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ).
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 24 / 38
Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ).
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Therefore,
QCdir ˜Pdir+act
∪ QCact ˜Pdir+act
=
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ).
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Therefore,
QCdir ˜Pdir+act
∪ QCact ˜Pdir+act
=
{ ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } =
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ).
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Therefore,
QCdir ˜Pdir+act
∪ QCact ˜Pdir+act
=
{ ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } =
˜Pdir+act .
Thus,
Cdir,act |= Compl(Qdir+act ).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
Prototypical Graph
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Definition (Prototypical Graph)
Let Q = (W, P) be a query.
The freeze mapping ˜id is defined as a mapping
from each variable ?v in P to a new IRI ˜v.
Instantiating the graph pattern P with ˜id yields the graph
˜P := ˜id P,
which we call the prototypical graph of Q.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 26 / 38
Transfer Operator
QCdir ˜Pdir+act
∪ QCact ˜Pdir+act
Definition (Transfer Operator)
For any set C of completeness statements and a graph G,
we define the transfer operator TC that computes the union
of the evaluation over G of all CONSTRUCT queries
of the statements in C:
TC(G) =
C∈ C
QC G
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 27 / 38
Completeness Entailment Theorem
˜Pdir+act = TCdir,act
(˜Pdir+act )
Theorem (Completeness of Basic Queries)
Let C be a set of completeness statements and
Q = (W, P) a basic query. Then,
C |= Compl(Q) if and only if ˜P = TC(˜P).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 28 / 38
Query Class: DISTINCT Queries
Give us all Oscar-winning things:
Qawd = (Wawd , Pawd )d =
({?m}, { (?m, award, oscar), (?m, award, ?aw) })d
Complete for all Oscar-winning things:
Cos = Compl((?m, award, oscar) | ∅)
{ Cos } |= Compl(Qawd ) holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 29 / 38
Query Class: OPT Queries
Give us all movies, and their awards, if any:
Qmaw = ({ ?m, ?aw }, ((?m, a, Movie) OPT (?m, award, ?aw)))
Complete for all movies and their awards:
Caw = Compl((?m, a, Movie), (?m, award, ?aw) | ∅)
{ Caw } |= Compl(Qmaw ) holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 30 / 38
Query Class: Queries under RDFS Semantics
Give us all films:
Qfilm = ({ ?m }, { (?m, a, Film) })
Complete for all movies:
Cmovie = Compl((?m, a, Movie) | ∅)
Films are the same as movies:
Sfm = {(Film, subclass, Movie), (Movie, subclass, Film)}
{ Cmovie } |= Compl(Qfilm) wrt. Sfm holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 31 / 38
Federated Completeness Statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 32 / 38
Timestamped Completeness Statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 33 / 38
Conclusions
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 34 / 38
Conclusions
Completeness statements can now be represented in RDF
We know how completeness statements can entail query
completeness in different query classes and
different settings of completeness statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 35 / 38
Future Work
Completeness statements for queries with negation
Completeness statements as session annotations
for RDF streams
Statistical completeness reasoning
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 36 / 38
Publications
Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski: Completeness
Statements about RDF Data Sources and Their Use for Query Answering.
ISWC 2013.
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A Completeness
Reasoner for SPARQL Queries Over RDF Data Sources. ESWC Posters and
Demos 2014.
Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gap
between RDF and SPARQL using Completeness Statements. ISWC Posters
and Demos 2014.
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: Expressing No-Value
Information in RDF. ISWC Posters & Demos 2015.
The latest results (timestamped statements and efficient completeness
reasoning with 1 million statements) have been submitted to a journal.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 37 / 38
Compl((myDaSePresentation, slide, ?s) | ∅)
Thank You!
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 38 / 38

More Related Content

PDF
Review of Data Management Maturity Models
PPTX
Enhancing educational data quality in heterogeneous learning contexts using p...
PPTX
Linked Data Quality Assessment – daQ and Luzzu
PDF
Crowdsourcing Linked Data Quality Assessment
PDF
Linked Data Quality Assessment: A Survey
PDF
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
PDF
[ISWC 2013] Completeness statements about RDF data sources and their use for ...
PDF
Enabling Fine-grained RDF Data Completeness Assessment
Review of Data Management Maturity Models
Enhancing educational data quality in heterogeneous learning contexts using p...
Linked Data Quality Assessment – daQ and Luzzu
Crowdsourcing Linked Data Quality Assessment
Linked Data Quality Assessment: A Survey
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
[ISWC 2013] Completeness statements about RDF data sources and their use for ...
Enabling Fine-grained RDF Data Completeness Assessment

Similar to Managing Completeness of Web Data (16)

PPTX
Dissertation Defense - Managing and Consuming Completeness Information for RD...
PDF
2017 UniBZ Winter Seminar Poster: Managing and Consuming Completeness Informa...
PDF
Research and Study Plan: Year II
PPTX
Once upon a time in Datatown ...
PPTX
Managing Completeness of Data
PDF
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
PPTX
Comparing Index Structures for Completeness Reasoning
PDF
On the Semantic Web, Completeness does Matter!
PPT
Pods2003
PPTX
The DANGERS of Incomplete Data
PPTX
Measuring completeness as metadata quality metric in Europeana (DH 2017)
PPTX
But what do we actually know - On knowledge base recall
PDF
Enhance The Technique For Searching Dimension Incomplete Databases
PDF
Formal Specification of Cypher
PPT
Computing FDs
PDF
Querying incomplete data
Dissertation Defense - Managing and Consuming Completeness Information for RD...
2017 UniBZ Winter Seminar Poster: Managing and Consuming Completeness Informa...
Research and Study Plan: Year II
Once upon a time in Datatown ...
Managing Completeness of Data
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
Comparing Index Structures for Completeness Reasoning
On the Semantic Web, Completeness does Matter!
Pods2003
The DANGERS of Incomplete Data
Measuring completeness as metadata quality metric in Europeana (DH 2017)
But what do we actually know - On knowledge base recall
Enhance The Technique For Searching Dimension Incomplete Databases
Formal Specification of Cypher
Computing FDs
Querying incomplete data
Ad

More from Fariz Darari (20)

PDF
Data X Museum - Hari Museum Internasional 2022 - WMID
PDF
[PUBLIC] quiz-01-midterm-solutions.pdf
PPTX
Free AI Kit - Game Theory
PPTX
Neural Networks and Deep Learning: An Intro
PPTX
NLP guest lecture: How to get text to confess what knowledge it has
PPTX
Supply and Demand - AI Talents
PPTX
Basic Python Programming: Part 01 and Part 02
PPTX
AI in education done properly
PPTX
Artificial Neural Networks: Pointers
PPTX
Open Tridharma at ICACSIS 2019
PDF
Defense Slides of Avicenna Wisesa - PROWD
PPTX
Seminar Laporan Aktualisasi - Tridharma Terbuka - Fariz Darari
PPTX
Foundations of Programming - Java OOP
PPTX
Recursion in Python
PPTX
Testing in Python: doctest and unittest (Updated)
PPTX
Testing in Python: doctest and unittest
PPTX
Research Writing - 2018.07.18
PPTX
KOI - Knowledge Of Incidents - SemEval 2018
PPTX
Python in 30 minutes!
PPTX
Research Writing - Universitas Indonesia
Data X Museum - Hari Museum Internasional 2022 - WMID
[PUBLIC] quiz-01-midterm-solutions.pdf
Free AI Kit - Game Theory
Neural Networks and Deep Learning: An Intro
NLP guest lecture: How to get text to confess what knowledge it has
Supply and Demand - AI Talents
Basic Python Programming: Part 01 and Part 02
AI in education done properly
Artificial Neural Networks: Pointers
Open Tridharma at ICACSIS 2019
Defense Slides of Avicenna Wisesa - PROWD
Seminar Laporan Aktualisasi - Tridharma Terbuka - Fariz Darari
Foundations of Programming - Java OOP
Recursion in Python
Testing in Python: doctest and unittest (Updated)
Testing in Python: doctest and unittest
Research Writing - 2018.07.18
KOI - Knowledge Of Incidents - SemEval 2018
Python in 30 minutes!
Research Writing - Universitas Indonesia
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Spectroscopy.pptx food analysis technology
Digital-Transformation-Roadmap-for-Companies.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Managing Completeness of Web Data

  • 1. Managing Completeness of Web Data Fariz Darari PhD Supervisor: Werner Nutt Supported by the project MAGIC, funded by the province of Bolzano Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 1 / 38
  • 2. About Us Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 2 / 38
  • 3. Research Group Sorted by distance to Werner’s office :) Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 3 / 38
  • 4. Bozen-Bolzano Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 4 / 38
  • 5. Motivation Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 5 / 38
  • 6. Completeness statements are already there Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 6 / 38
  • 7. However . . . Completeness statements are available but only in natural language Unclear what data completeness & query completeness mean No techniques to check whether data completeness entails query completeness Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 7 / 38
  • 8. Solution Ideas Completeness statements are available but only in natural language Solution: RDF-ize completeness statements Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
  • 9. Solution Ideas Completeness statements are available but only in natural language Solution: RDF-ize completeness statements Unclear what data completeness & query completeness mean Solution: Formalize data completeness & query completeness Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
  • 10. Solution Ideas Completeness statements are available but only in natural language Solution: RDF-ize completeness statements Unclear what data completeness & query completeness mean Solution: Formalize data completeness & query completeness No techniques to check whether data completeness entails query completeness Solution: Develop techniques to check whether data completeness entails query completeness Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
  • 11. Solutions Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 9 / 38
  • 12. Background: RDF Grd = { (resDogs, dir, tarantino), (resDogs, act, tarantino) } Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 10 / 38
  • 13. Background: SPARQL SELECT Qsdir = ({ ?m }, { (?m, dir, tarantino) }) ASK Qadir = ({ }, { (?m, dir, tarantino) }) CONSTRUCT Qcdir = ({ (?m, dir, tarantino) }, { (?m, dir, tarantino) }) Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 11 / 38
  • 14. Story: Incomplete Data Source An incomplete data source of Reservoir Dogs, Gdbp = (Ga dbp, Gi dbp): Ga dbp = {(resDogs, dir, tarantino)} Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 12 / 38
  • 15. Story: Incomplete Data Source An incomplete data source of Reservoir Dogs, Gdbp = (Ga dbp, Gi dbp): Gi dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)} Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 13 / 38
  • 16. Story: Completeness Statement Ga dbp = {(resDogs, dir, tarantino)} Gi dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)} From (Ga dbp, Gi dbp), we can say that DBpedia is complete for movies directed by Tarantino: Cdir = Compl((?m, dir, tarantino) | ∅) Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38
  • 17. Story: Completeness Statement Ga dbp = {(resDogs, dir, tarantino)} Gi dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)} From (Ga dbp, Gi dbp), we can say that DBpedia is complete for movies directed by Tarantino: Cdir = Compl((?m, dir, tarantino) | ∅) However, it is not complete for actors in movies directed by Tarantino: Cact = Compl((?m, act, ?a) | (?m, dir, tarantino)) Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38
  • 18. Story: Query Completeness Ga dbp = {(resDogs, dir, tarantino)} Gi dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)} Consequently, when we ask for all movies directed by Tarantino over DBpedia: Qdir = ({?m}, {(?m, dir, tarantino)}) the query completeness Compl(Qdir ) is obtained. Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 15 / 38
  • 19. Story: Query Completeness Ga dbp = {(resDogs, dir, tarantino)} Gi dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)} However, if we ask for all movies directed by and starring Tarantino: Qdir+act = ({?m}, {(?m, dir, tarantino), (?m, act, tarantino)}) the query completeness Compl(Qdir+act ) is not obtained. Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 16 / 38
  • 20. Incomplete Data Source Definition (Incomplete Data Source) An incomplete data source is a pair of two graphs G = (Ga, Gi), where Ga ⊆ Gi. We call Ga the available graph and Gi the ideal graph. Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 17 / 38
  • 21. Completeness Statement Definition (Completeness Statement) Let P1 be a non-empty BGP and P2 a BGP. A completeness statement is defined as Compl(P1 | P2) where we call P1 the pattern and P2 the condition of the statement. Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 18 / 38
  • 22. Satisfaction of Completeness Statements To a statement C = Compl(P1 | P2), we associate the CONSTRUCT query QC = (P1, P1 ∪ P2). Then, we say: C is satisfied by an incomplete data source G = (Ga, Gi), written G |= C, if QC Gi ⊆ Ga . Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 19 / 38
  • 23. Completeness Statements in RDF Cact = Compl((?m, act, ?a) | (?m, dir, tarantino)) lv:dataset a void:Dataset; c:hasComplStmt lv:csAct. lv:csAct c:hasPattern [c:subject [c:varName "m"]; c:predicate s:actor; c:object [c:varName "a"]]; c:hasCondition [c:subject [c:varName "m"]; c:predicate s:director; c:object lmdb:Quentin_Tarantino]. Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 20 / 38
  • 24. Query Completeness Definition (Query Completeness) Let Q be a query. We write Compl(Q) to say that Q is complete. An incomplete data source G = (Ga, Gi) satisfies Compl(Q), written G |= Compl(Q), if Q Gi = Q Ga . Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 21 / 38
  • 25. Completeness Entailment Problem Definition (Completeness Entailment) Let C be a set of completeness statements and Q a query. We say that C entails the completeness of Q, written C |= Compl(Q), if any incomplete data source satisfying C also satisfies Compl(Q). Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 22 / 38
  • 26. Intuition: Completeness Entailment Consider the set Cdir,act = { Cdir , Cact } of completeness statements and the query Qdir+act = ({ ?m }, Pdir+act ) where Pdir+act = { (?m, dir, tarantino), (?m, act, tarantino) }. Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 23 / 38
  • 27. Intuition: Completeness Entailment Consider the set Cdir,act = { Cdir , Cact } of completeness statements and the query Qdir+act = ({ ?m }, Pdir+act ). ˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 24 / 38
  • 28. Intuition: Completeness Entailment Consider the set Cdir,act = { Cdir , Cact } of completeness statements and the query Qdir+act = ({ ?m }, Pdir+act ). ˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } Therefore, QCdir ˜Pdir+act ∪ QCact ˜Pdir+act = Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
  • 29. Intuition: Completeness Entailment Consider the set Cdir,act = { Cdir , Cact } of completeness statements and the query Qdir+act = ({ ?m }, Pdir+act ). ˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } Therefore, QCdir ˜Pdir+act ∪ QCact ˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } = Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
  • 30. Intuition: Completeness Entailment Consider the set Cdir,act = { Cdir , Cact } of completeness statements and the query Qdir+act = ({ ?m }, Pdir+act ). ˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } Therefore, QCdir ˜Pdir+act ∪ QCact ˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } = ˜Pdir+act . Thus, Cdir,act |= Compl(Qdir+act ). Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
  • 31. Prototypical Graph ˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } Definition (Prototypical Graph) Let Q = (W, P) be a query. The freeze mapping ˜id is defined as a mapping from each variable ?v in P to a new IRI ˜v. Instantiating the graph pattern P with ˜id yields the graph ˜P := ˜id P, which we call the prototypical graph of Q. Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 26 / 38
  • 32. Transfer Operator QCdir ˜Pdir+act ∪ QCact ˜Pdir+act Definition (Transfer Operator) For any set C of completeness statements and a graph G, we define the transfer operator TC that computes the union of the evaluation over G of all CONSTRUCT queries of the statements in C: TC(G) = C∈ C QC G Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 27 / 38
  • 33. Completeness Entailment Theorem ˜Pdir+act = TCdir,act (˜Pdir+act ) Theorem (Completeness of Basic Queries) Let C be a set of completeness statements and Q = (W, P) a basic query. Then, C |= Compl(Q) if and only if ˜P = TC(˜P). Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 28 / 38
  • 34. Query Class: DISTINCT Queries Give us all Oscar-winning things: Qawd = (Wawd , Pawd )d = ({?m}, { (?m, award, oscar), (?m, award, ?aw) })d Complete for all Oscar-winning things: Cos = Compl((?m, award, oscar) | ∅) { Cos } |= Compl(Qawd ) holds? Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 29 / 38
  • 35. Query Class: OPT Queries Give us all movies, and their awards, if any: Qmaw = ({ ?m, ?aw }, ((?m, a, Movie) OPT (?m, award, ?aw))) Complete for all movies and their awards: Caw = Compl((?m, a, Movie), (?m, award, ?aw) | ∅) { Caw } |= Compl(Qmaw ) holds? Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 30 / 38
  • 36. Query Class: Queries under RDFS Semantics Give us all films: Qfilm = ({ ?m }, { (?m, a, Film) }) Complete for all movies: Cmovie = Compl((?m, a, Movie) | ∅) Films are the same as movies: Sfm = {(Film, subclass, Movie), (Movie, subclass, Film)} { Cmovie } |= Compl(Qfilm) wrt. Sfm holds? Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 31 / 38
  • 37. Federated Completeness Statements Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 32 / 38
  • 38. Timestamped Completeness Statements Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 33 / 38
  • 39. Conclusions Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 34 / 38
  • 40. Conclusions Completeness statements can now be represented in RDF We know how completeness statements can entail query completeness in different query classes and different settings of completeness statements Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 35 / 38
  • 41. Future Work Completeness statements for queries with negation Completeness statements as session annotations for RDF streams Statistical completeness reasoning Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 36 / 38
  • 42. Publications Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski: Completeness Statements about RDF Data Sources and Their Use for Query Answering. ISWC 2013. Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A Completeness Reasoner for SPARQL Queries Over RDF Data Sources. ESWC Posters and Demos 2014. Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gap between RDF and SPARQL using Completeness Statements. ISWC Posters and Demos 2014. Fariz Darari, Radityo Eko Prasojo, Werner Nutt: Expressing No-Value Information in RDF. ISWC Posters & Demos 2015. The latest results (timestamped statements and efficient completeness reasoning with 1 million statements) have been submitted to a journal. Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 37 / 38
  • 43. Compl((myDaSePresentation, slide, ?s) | ∅) Thank You! Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 38 / 38