SlideShare a Scribd company logo
ontology-mediated
query answering with
data-tractable
description logics
Meghyn Bienvenu (CNRS & Université de Montpellier)
Magdalena Ortiz (Vienna University of Technology)
ontology-mediated query answering (omqa)
data
incomplete
database
(ground facts)
ontology
(logical theory)
???
user query
2/109
ontology-mediated query answering (omqa)
data ???
patient data
“Melanie has listeriosis”
“Paul has Lyme disease”
medical knowledge
“Listeriosis & Lyme disease
are bacterial infections”
user query
“Find all patients with
bacterial infections”
2/109
ontology-mediated query answering (omqa)
data ???
patient data
“Melanie has listeriosis”
“Paul has Lyme disease”
medical knowledge
“Listeriosis & Lyme disease
are bacterial infections”
user query
“Find all patients with
bacterial infections”
expected answers: Melanie, Paul
2/109
ontology-mediated query answering (omqa)
data ???
employee data
“Marie is a professor”
“Mark teaches CS200”
org. knowledge
“Professors are teaching staff”
“Someone who teaches is
part of the teaching staff”
user query
“Find all teaching staff”
2/109
ontology-mediated query answering (omqa)
data ???
employee data
“Marie is a professor”
“Mark teaches CS200”
org. knowledge
“Professors are teaching staff”
“Someone who teaches is
part of the teaching staff”
user query
“Find all teaching staff”
expected answers: Marie, Mark
2/109
what are ontologies good for?
To standardize the terminology of an application domain
∙ meaning of terms is constrained, so less misunderstandings
∙ by adopting a common vocabulary, easy to share information
3/109
what are ontologies good for?
To standardize the terminology of an application domain
∙ meaning of terms is constrained, so less misunderstandings
∙ by adopting a common vocabulary, easy to share information
To present an intuitive and unified view of data sources
∙ ontology can be used to enrich the data vocabulary, making it
easier for users to formulate their queries
∙ especially useful when integrating multiple data sources
3/109
what are ontologies good for?
To standardize the terminology of an application domain
∙ meaning of terms is constrained, so less misunderstandings
∙ by adopting a common vocabulary, easy to share information
To present an intuitive and unified view of data sources
∙ ontology can be used to enrich the data vocabulary, making it
easier for users to formulate their queries
∙ especially useful when integrating multiple data sources
To support automated reasoning
∙ uncover implicit connections between terms, errors in modelling
∙ exploit knowledge in the ontology during query answering, to get
back a more complete set of answers to queries
3/109
applications of omqa: medicine
General medical ontologies: SNOMED CT (∼ 400,000 terms!), GALEN
Specialized ontologies: FMA (anatomy), NCI (cancer), ...
Querying & exchanging medical records (find patients for medical trials)
∙ myocardial infarction vs. MI vs. heart attack vs. 410.0
Supports tools for annotating and visualizing patient data (scans, x-rays) 4/109
applications of omqa: life sciences
Hundreds of ontologies at BioPortal (http://bioportal.bioontology.org/):
Gene Ontology (GO), Cell Ontology, Pathway Ontology, Plant Anatomy, ...
Help scientists share, query, & visualize experimental data
5/109
applications of omqa: entreprise information systems
Companies and organizations have lots of data
∙ need easy and flexible access to support decision-making
Example industrial projects:
∙ Public debt data: Sapienza Univ. & Italian Department of Treasury
∙ Energy sector: Optique EU project (several univ, StatOil, & Siemens)
6/109
our focus: horn description logics
Ontologies formulated using description logics (DLs):
∙ family of decidable fragments of first-order logic
∙ basis for OWL web ontology language (W3C)
∙ range from fairly simple to highly expressive
∙ complexity of query answering well understood
7/109
our focus: horn description logics
Ontologies formulated using description logics (DLs):
∙ family of decidable fragments of first-order logic
∙ basis for OWL web ontology language (W3C)
∙ range from fairly simple to highly expressive
∙ complexity of query answering well understood
In this tutorial, focus on Horn description logics:
∙ DL-LiteR, EL, ELHI, Horn-SHIQ, ...
∙ good computational properties, well suited for OMQA
∙ still expressive enough for interesting applications
∙ basis for OWL 2 QL and OWL 2 EL profiles
Consider various types of queries
7/109
plan for today
∙ Horn Description Logics
∙ Basics of OMQA
∙ Instance Queries
∙ Conjunctive Queries
∙ Navigational Queries
∙ Queries with Negation and Recursion
∙ Research Trends in OMQA
8/109
horn description logics
dl basics
Building blocks of DLs:
∙ concept names (unary predicates, classes)
IceCream, Pizza, Meat, SpicyDish, Dish, Menu, Restaurant, ...
∙ role names (binary predicates, properties)
hasIngred, hasCourse, hasDessert, serves, ...
∙ individual names (constants)
menu32, pastadish17, d3, rest156, r12, ...
(specific menus, dishes, restaurants ...)
NC / NR / NI: set of all concept / role / individual names
10/109
dl knowledge bases
Knowledge base (KB) = ABox (data) + TBox (ontology)
ABox contains facts about specific individuals
∙ finite set of concept assertions A(a) and role assertions r(a, b)
∙ IceCream(d2): dish d2 is of type IceCream
∙ hasDessert(m, d2): menu m is connected via hasDessert to dish d2
11/109
dl knowledge bases
Knowledge base (KB) = ABox (data) + TBox (ontology)
ABox contains facts about specific individuals
∙ finite set of concept assertions A(a) and role assertions r(a, b)
∙ IceCream(d2): dish d2 is of type IceCream
∙ hasDessert(m, d2): menu m is connected via hasDessert to dish d2
TBox contains general knowledge about the domain of interest
∙ finite set of axioms (details on syntax to follow)
∙ IceCream is a subclass of Dessert
∙ hasCourse connects Menus to Dishes
∙ every Menu is connected to at least one dish via hasCourse
11/109
concept and role constructors
Can build complex concepts and roles using constructors:
∙ conjunction (⊓), disjunction (⊔), negation (¬)
Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish
12/109
concept and role constructors
Can build complex concepts and roles using constructors:
∙ conjunction (⊓), disjunction (⊔), negation (¬)
Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish
∙ restricted forms of existential and universal quantification (∃, ∀)
∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat
( ⊤ acts as a “wildcard”, denotes set of all things)
12/109
concept and role constructors
Can build complex concepts and roles using constructors:
∙ conjunction (⊓), disjunction (⊔), negation (¬)
Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish
∙ restricted forms of existential and universal quantification (∃, ∀)
∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat
( ⊤ acts as a “wildcard”, denotes set of all things)
∙ inverse (−
) and composition (·) of roles
hasCourse−
contains · contains
(use N±
R for set of role names and inverse roles)
(use inv(r) to toggle −: inv(r) = r−, inv(r−) = r )
12/109
concept and role constructors
Can build complex concepts and roles using constructors:
∙ conjunction (⊓), disjunction (⊔), negation (¬)
Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish
∙ restricted forms of existential and universal quantification (∃, ∀)
∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat
( ⊤ acts as a “wildcard”, denotes set of all things)
∙ inverse (−
) and composition (·) of roles
hasCourse−
contains · contains
(use N±
R for set of role names and inverse roles)
(use inv(r) to toggle −: inv(r) = r−, inv(r−) = r )
Note: set of available constructors depends on the particular DL! 12/109
tbox axioms
Concept inclusions C ⊑ D (C, D possibly complex concepts)
IceCream ⊑ Dessert Menu ⊑ ∃hasCourse.⊤ Spicy ⊓ Dish ⊑ SpicyDish
Role inclusions R ⊑ S (R, S possibly complex roles)
hasIngred ⊑ contains ingredOf−
⊑ hasIngred hasDessert ⊑ hasCourse
Note: type and syntax of axioms depends on the particular DL!
13/109
dl semantics
Interpretation I (“possible world”)
∙ domain of objects ∆I
(possibly infinite set)
∙ interpretation function ·I
that maps
∙ concept name A ⇝ set of objects AI
⊆ ∆I
∙ role name r ⇝ set of pairs of objects rI
⊆ ∆I
× ∆I
∙ individual name a ⇝ object aI
∈ ∆I
14/109
example: interpretation
∆I
italFeastI
chCakeI
Dish
I
DessertI
Appetizer
I
MenuI
hasCourse
hasCourse
hasCourse
hasCourse
4 concept names: Dish, Dessert, Appetizer, Menu
1 role name: hasCourse 2 individual names: italFeast, chCake
15/109
dl semantics
Interpretation I (“possible world”)
∙ domain of objects ∆I
(possibly infinite set)
∙ interpretation function ·I
that maps
∙ concept name A ⇝ set of objects AI
⊆ ∆I
∙ role name r ⇝ set of pairs of objects rI
⊆ ∆I
× ∆I
∙ individual name a ⇝ object aI
∈ ∆I
Interpretation function ·I
extends to complex concepts and roles:
⊤ ∆I
⊥ ∅
¬C ∆I
 CI
C1 ⊓ C2 C1
I
∩ C2
I
∃R.C {d1 | there exists (d1, d2) ∈ RI
with d2 ∈ CI
}
∀R.C {d1 | d2 ∈ CI
for all (d1, d2) ∈ RI
}
r−
{(d2, d1) | (d1, d2) ∈ rI
}
16/109
back to the example
∆I
italFeastI
chCakeI
Dish
I
DessertI
Appetizer
I
MenuI
hasCourse
hasCourse
hasCourse
hasCourse
Dish ⊓ Menu Dessert ⊓ Appetizer ∃hasCourse.⊤ ∃hasCourse−
.Dessert
17/109
semantics of dl kbs
Satisfaction in an interpretation
∙ I satisfies C ⊑ D ⇔ CI
⊆ DI
∙ I satisfies R ⊑ S ⇔ RI
⊆ SI
18/109
semantics of dl kbs
Satisfaction in an interpretation
∙ I satisfies C ⊑ D ⇔ CI
⊆ DI
∙ I satisfies R ⊑ S ⇔ RI
⊆ SI
∙ I satisfies A(a) ⇔ aI
∈ AI
∙ I satisfies r(a, b) ⇔ (aI
, bI
) ∈ rI
18/109
semantics of dl kbs
Satisfaction in an interpretation
∙ I satisfies C ⊑ D ⇔ CI
⊆ DI
∙ I satisfies R ⊑ S ⇔ RI
⊆ SI
∙ I satisfies A(a) ⇔ aI
∈ AI
∙ I satisfies r(a, b) ⇔ (aI
, bI
) ∈ rI
Model of a KB K = interpretation that satisfies all statements in K
K is satisfiable = K has at least one model
K entails α (written K |= α) = every model I of K satisfies α
18/109
back to the example
∆I
italFeastI
chCakeI
Dish
I
DessertI
Appetizer
I
MenuI
hasCourse
hasCourse
hasCourse
hasCourse
Which of the following assertions / axioms is satisfied in I?
Dessert ⊑ Dish Dish ⊓ Menu ⊑ ⊥ Menu ⊑ ∃hasCourse.⊤
∃hasCourse−
.⊤ ⊑ Dish Menu(italFeast) hasCourse(italFeast, chCake)
19/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
20/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
DL-LiteR
∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N±
R )
∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N±
R
20/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
DL-LiteR
∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N±
R )
∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N±
R
EL
∙ allows only ⊤, ⊓, and ∃r.C as constructors
∙ only concept inclusions in TBox
20/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
DL-LiteR
∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N±
R )
∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N±
R
EL
∙ allows only ⊤, ⊓, and ∃r.C as constructors
∙ only concept inclusions in TBox
ELHI⊥
∙ additionally allows for ⊥ and inverse roles (r−
)
∙ can also have role inclusions
20/109
some important horn dls
Idea: Horn DLs cannot express disjunction (explicitly or implicitly)
∙ better computational properties than non-Horn DLs (more on this later)
DL-LiteR
∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N±
R )
∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N±
R
EL
∙ allows only ⊤, ⊓, and ∃r.C as constructors
∙ only concept inclusions in TBox
ELHI⊥
∙ additionally allows for ⊥ and inverse roles (r−
)
∙ can also have role inclusions
Horn-SHIQ
∙ limited use of ¬, ∀r.C, and number restrictions (≥ nR.C, ≤ nR.C)
∙ also have transitivity axioms (e.g. assert contains is transitive) 20/109
basics of omqa
aboxes vs. databases
ABoxes and databases (DBs) and are syntactically similar:
∙ ABox = finite set of assertions (unary and binary facts)
∙ Database = finite set of facts of arbitrary arity
22/109
aboxes vs. databases
ABoxes and databases (DBs) and are syntactically similar:
∙ ABox = finite set of assertions (unary and binary facts)
∙ Database = finite set of facts of arbitrary arity
ABoxes interpreted under open world assumption:
∙ every assertion in the ABox is assumed to hold (true)
∙ assertions not present in the ABox may hold or not (unknown)
Each ABox gives rise to many interpretations (its models)
∙ models can be infinite, can have infinitely many models
22/109
aboxes vs. databases
ABoxes and databases (DBs) and are syntactically similar:
∙ ABox = finite set of assertions (unary and binary facts)
∙ Database = finite set of facts of arbitrary arity
ABoxes interpreted under open world assumption:
∙ every assertion in the ABox is assumed to hold (true)
∙ assertions not present in the ABox may hold or not (unknown)
Each ABox gives rise to many interpretations (its models)
∙ models can be infinite, can have infinitely many models
Databases interpreted under closed world assumption:
∙ every fact in the DB is assumed to hold (true)
∙ every fact not in the DB is assumed not to hold (false)
In other words, each DB corresponds to single finite interpretation
∙ domain of the interpretation = set of constants in DB
22/109
querying databases
Database query q of arity n maps (Boolean query = arity 0)
Database D ⇝ ans(q, D) = set of n-tuples of constants from D
23/109
querying databases
Database query q of arity n maps (Boolean query = arity 0)
Database D ⇝ ans(q, D) = set of n-tuples of constants from D
Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I
23/109
querying databases
Database query q of arity n maps (Boolean query = arity 0)
Database D ⇝ ans(q, D) = set of n-tuples of constants from D
Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I
First-order (FO) query = first-order formula
∙ arity of FO query = number of free variables
∙ answers = substitutions for free vars that make formula hold
∙ example: Dish(x) ∧ ∀y.(contains(x, y) → ¬Spicy(y))
23/109
querying databases
Database query q of arity n maps (Boolean query = arity 0)
Database D ⇝ ans(q, D) = set of n-tuples of constants from D
Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I
First-order (FO) query = first-order formula
∙ arity of FO query = number of free variables
∙ answers = substitutions for free vars that make formula hold
∙ example: Dish(x) ∧ ∀y.(contains(x, y) → ¬Spicy(y))
Datalog queries = finite set of Datalog rules + ‘goal’ relation
∙ arity of Datalog query = arity of goal relation
∙ answers = exhaustively apply rules to DB / interpretation, collect
tuples in goal relation
∙ example: rules contains(x, z) ← contains(x, y), contains(y, z) and
SpicyDish(x) ← Dish(x), contains(x, y), Spicy(y)
23/109
querying dl knowledge bases
Problem: each KB gives rise to multiple interpretations (its models),
but DB query semantics defines answers w.r.t. a single interpretation
24/109
querying dl knowledge bases
Problem: each KB gives rise to multiple interpretations (its models),
but DB query semantics defines answers w.r.t. a single interpretation
Solution: adopt certain answer semantics
∙ require tuple to be an answer w.r.t. all models of KB
24/109
querying dl knowledge bases
Problem: each KB gives rise to multiple interpretations (its models),
but DB query semantics defines answers w.r.t. a single interpretation
Solution: adopt certain answer semantics
∙ require tuple to be an answer w.r.t. all models of KB
Formally: Call a tuple (a1, . . . , an) of individuals from A a certain
answer to n-ary query q over DL KB K = (T , A) iff
(aI
1 , . . . , aI
n ) ∈ ans(q, I) for every model I of K
24/109
querying dl knowledge bases
Problem: each KB gives rise to multiple interpretations (its models),
but DB query semantics defines answers w.r.t. a single interpretation
Solution: adopt certain answer semantics
∙ require tuple to be an answer w.r.t. all models of KB
Formally: Call a tuple (a1, . . . , an) of individuals from A a certain
answer to n-ary query q over DL KB K = (T , A) iff
(aI
1 , . . . , aI
n ) ∈ ans(q, I) for every model I of K
Ontology-mediated query answering (OMQA)
= computing certain answers to queries
24/109
example: certain answers
Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A):
T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse
∃hasCourse ⊑ Menu ∃hasDessert−
⊑ Dessert }
A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)}
25/109
example: certain answers
Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A):
T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse
∃hasCourse ⊑ Menu ∃hasDessert−
⊑ Dessert }
A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)}
There are four certain answers to q w.r.t. K:
∙ d1 ∈ cert(q, K) Cake(d1) ∈ A, Cake ⊑ Dessert ∈ T
∙ d2 ∈ cert(q, K) IceCream(d2) ∈ A, IceCream ⊑ Dessert ∈ T
∙ d3 ∈ cert(q, K) Dessert(d3) ∈ A
∙ d4 ∈ cert(q, K) hasDessert(m, d4)∈A, hasDessert−
⊑ Dessert ∈ T
25/109
example: certain answers
Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A):
T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse
∃hasCourse ⊑ Menu ∃hasDessert−
⊑ Dessert }
A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)}
There are four certain answers to q w.r.t. K:
∙ d1 ∈ cert(q, K) Cake(d1) ∈ A, Cake ⊑ Dessert ∈ T
∙ d2 ∈ cert(q, K) IceCream(d2) ∈ A, IceCream ⊑ Dessert ∈ T
∙ d3 ∈ cert(q, K) Dessert(d3) ∈ A
∙ d4 ∈ cert(q, K) hasDessert(m, d4)∈A, hasDessert−
⊑ Dessert ∈ T
The fifth individual m is not a certain answer: can construct model
J of K in which mJ
̸∈ DessertJ
(see lecture notes).
25/109
key techniques for omqa: query rewriting
Query rewriting: Reduces problem of finding certain answers to
standard DB query evaluation (⇝ exploit existing DB systems)
+
query rewriting
+
+
query evaluation
TBox T
query
ABox
q
database query q0
query answersA
26/109
key techniques for omqa: query rewriting
Query rewriting: Reduces problem of finding certain answers to
standard DB query evaluation (⇝ exploit existing DB systems)
+
query rewriting
+
+
query evaluation
TBox T
query
ABox
q
database query q0
query answersA
Call q′
(⃗x) a rewriting of q(⃗x) and T iff for every ABox A and tuple ⃗a
T , A |= q(⃗a) ⇔ ⃗a ∈ ans(q′
(⃗x), IA) (IA = treat A as DB)
26/109
key techniques for omqa: query rewriting
Query rewriting: Reduces problem of finding certain answers to
standard DB query evaluation (⇝ exploit existing DB systems)
+
query rewriting
+
+
query evaluation
TBox T
query
ABox
q
database query q0
query answersA
Call q′
(⃗x) a rewriting of q(⃗x) and T iff for every ABox A and tuple ⃗a
T , A |= q(⃗a) ⇔ ⃗a ∈ ans(q′
(⃗x), IA) (IA = treat A as DB)
Types of rewritings: FO-rewritings (SQL), Datalog rewritings, ...
26/109
key techniques for omqa: saturation
Saturation: Render explicit (some of) the implicit information
contained in the KB, making it available for query evaluation
27/109
key techniques for omqa: saturation
Saturation: Render explicit (some of) the implicit information
contained in the KB, making it available for query evaluation
Simple use of saturation: (works e.g. for RDFS ontologies)
∙ use saturation to ‘complete’ the ABox by adding those assertions
that are logically entailed from the KB
∙ then evaluate the query over the saturated ABox
27/109
key techniques for omqa: saturation
Saturation: Render explicit (some of) the implicit information
contained in the KB, making it available for query evaluation
Simple use of saturation: (works e.g. for RDFS ontologies)
∙ use saturation to ‘complete’ the ABox by adding those assertions
that are logically entailed from the KB
∙ then evaluate the query over the saturated ABox
More complex uses:
∙ enrich the ABox in other ways (e.g. add new ABox individuals to
witness the existential restrictions ∃R.C)
∙ combine saturation with query rewriting
27/109
complexity of omqa
View OMQA as a decision problem (yes-or-no question):
Problem: Q answering in L (Q a query language, L a DL)
Input: An n-ary query q ∈ Q, an ABox A, a L-TBox T ,
and a tuple ⃗a ∈ Ind(A)n
Question: Does ⃗a belong to cert(q, (T , A))?
28/109
complexity of omqa
View OMQA as a decision problem (yes-or-no question):
Problem: Q answering in L (Q a query language, L a DL)
Input: An n-ary query q ∈ Q, an ABox A, a L-TBox T ,
and a tuple ⃗a ∈ Ind(A)n
Question: Does ⃗a belong to cert(q, (T , A))?
Combined complexity: in terms of size of whole input
Data complexity: in terms of size of A only
∙ view rest of input as fixed (of constant size)
∙ motivation: ABox typically much larger than rest of input
Note: use |A| to denote size of A (similarly for |T |, |q|, etc.)
28/109
complexity classes
We will mention the following standard classes:
P problems solvable in deterministic polynomial time
NP problems solvable in non-det. polynomial time
coNP problems whose complement is solvable in
non-deterministic polynomial time
LogSpace problems solvable in deterministic logarithmic space
NLogSpace problems solvable in non-det. logarithmic space
PSpace problems solvable in polynomial space (note: =NPSpace)
Exp problems solvable in deterministic exponential time
29/109
complexity classes
We will mention the following standard classes:
P problems solvable in deterministic polynomial time
NP problems solvable in non-det. polynomial time
coNP problems whose complement is solvable in
non-deterministic polynomial time
LogSpace problems solvable in deterministic logarithmic space
NLogSpace problems solvable in non-det. logarithmic space
PSpace problems solvable in polynomial space (note: =NPSpace)
Exp problems solvable in deterministic exponential time
Another less known but important class:
AC0 problems solvable by uniform family of
polynomial-size constant-depth circuits
Relationships between classes:
AC0 ⊊ LogSpace ⊆ NLogSpace ⊆ P ⊆ NP ⊆ PSpace ⊆ Exp
29/109
instance queries
instance queries
Instance queries (IQs): find instances of a given concept or role
A(x) where A ∈ NC concept instance query
r(x, y) where r ∈ NR role instance query
31/109
instance queries
Instance queries (IQs): find instances of a given concept or role
A(x) where A ∈ NC concept instance query
r(x, y) where r ∈ NR role instance query
To query for a complex concept C, take AC(x) for fresh AC ∈ NC and
add C ⊑ AC to the TBox
31/109
instance queries
Instance queries (IQs): find instances of a given concept or role
A(x) where A ∈ NC concept instance query
r(x, y) where r ∈ NR role instance query
To query for a complex concept C, take AC(x) for fresh AC ∈ NC and
add C ⊑ AC to the TBox
Remarks:
∙ Instance query answering is often called instance checking
∙ Focus of OMQA until mid-2000s
31/109
instance checking in dl-lite via query rewriting
Input = instance query q + DL-LiteR TBox T
We construct an FO-rewriting of q w.r.t. T
More specifically, we construct:
∙ an FO-rewriting of q relative to consistent ABoxes, and
∙ an FO-rewriting of unsatisfiability
(these can be easily combined into FO-rewriting of q for all ABoxes)
32/109
rewriting relative to consistent aboxes
We first define two procedures:
ComputeSubsumees all reasons for an individual to be in B
input concept B, TBox T
output set of C such that T |= C ⊑ B ⇝ subsumees of B w.r.t. T
ComputeSubroles all reasons for a pair to be in R
input role R, TBox T
output set of S such that T |= S ⊑ R ⇝ subroles of R w.r.t. T
33/109
computing subsumees
Algorithm ComputeSubsumees
Input: DL-LiteR TBox T , concept B ∈ NC ∪ {∃R | R ∈ N±
R }
1. Initialize Subsumees = {B} and Examined = ∅.
2. While Subsumees  Examined ̸= ∅
2.1 Select D ∈ Subsumees  Examined and add D to Examined.
2.2 For every concept inclusion C ⊑ D ∈ T
∙ If C ̸∈ Subsumees, add C to Subsumees
2.3 For every role inclusion R ⊑ S ∈ T such that D = ∃S.
∙ If ∃R ̸∈ Subsumees, add ∃R to Subsumees
2.4 For every role inclusion R ⊑ S ∈ T such that D = ∃inv(S).
∙ If ∃inv(R) ̸∈ Subsumees, add ∃inv(R) to Subsumees.
3. Return Subsumees.
34/109
computing subsumees: an example (1/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Examined = ∅
Subsumees = {Dish}
35/109
computing subsumees: an example (1/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Examined = ∅
Subsumees = {Dish}
Choose: Dish Examined = {Dish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
35/109
computing subsumees: an example (1/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Examined = ∅
Subsumees = {Dish}
Choose: Dish Examined = {Dish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
Choose: ItalDish Examined = {Dish, ItalDish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
35/109
computing subsumees: an example (2/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Choose: VegDish Examined = {Dish, ItalDish, VegDish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
36/109
computing subsumees: an example (2/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Choose: VegDish Examined = {Dish, ItalDish, VegDish}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
}
Choose: ∃hasCourse−
Examined = {Dish, ItalDish, VegDish, ∃hasCourse−
}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
36/109
computing subsumees: an example (3/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Choose: ∃hasMain−
Examined = {Dish, ItalDish, VegDish, ∃hasCourse−
,
hasMain−
}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
37/109
computing subsumees: an example (3/3)
ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Choose: ∃hasMain−
Examined = {Dish, ItalDish, VegDish, ∃hasCourse−
,
hasMain−
}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
Choose: ∃hasDessert−
Examined = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse−
,
∃hasMain−
, ∃hasDessert−
}
37/109
computing subroles
Algorithm ComputeSubroles
Input: DL-LiteR TBox T , role R ∈ N±
R
1. Initialize Subroles = {R} and Examined = ∅.
2. While Subroles  Examined ̸= ∅
2.1 Select S ∈ Subroles  Examined and add S to Examined.
2.2 For every role inclusion U ⊑ S or inv(U) ⊑ inv(S) in T
∙ If U ̸∈ Subsumees, add U to Subsumees
3. Return Subroles.
38/109
computing subroles
Algorithm ComputeSubroles
Input: DL-LiteR TBox T , role R ∈ N±
R
1. Initialize Subroles = {R} and Examined = ∅.
2. While Subroles  Examined ̸= ∅
2.1 Select S ∈ Subroles  Examined and add S to Examined.
2.2 For every role inclusion U ⊑ S or inv(U) ⊑ inv(S) in T
∙ If U ̸∈ Subsumees, add U to Subsumees
3. Return Subroles.
ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
Run on hasCourse:
Subroles = {hasCourse, hasMain,
hasDessert}
38/109
from concepts and roles to queries
Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ).
Rewriting of A(x) w.r.t. T (and consistent ABoxes):
RewriteIQ(A, T ) =
∨
C∈SC∩NC
C(x) ∨
∨
∃r∈SC
∃y.r(x, y) ∨
∨
∃r−∈SC
∃y.r(y, x)
39/109
from concepts and roles to queries
Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ).
Rewriting of A(x) w.r.t. T (and consistent ABoxes):
RewriteIQ(A, T ) =
∨
C∈SC∩NC
C(x) ∨
∨
∃r∈SC
∃y.r(x, y) ∨
∨
∃r−∈SC
∃y.r(y, x)
Rewriting of r(x, y) w.r.t. T (and consistent ABoxes):
RewriteIQ(r, T ) =
∨
s∈SR
s(x, y) ∨
∨
s−∈SR
s(y, x)
39/109
from concepts and roles to queries
Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ).
Rewriting of A(x) w.r.t. T (and consistent ABoxes):
RewriteIQ(A, T ) =
∨
C∈SC∩NC
C(x) ∨
∨
∃r∈SC
∃y.r(x, y) ∨
∨
∃r−∈SC
∃y.r(y, x)
Rewriting of r(x, y) w.r.t. T (and consistent ABoxes):
RewriteIQ(r, T ) =
∨
s∈SR
s(x, y) ∨
∨
s−∈SR
s(y, x)
The rewriting is ABox-independent and polysize in |T | and |q|.
39/109
example of query rewriting (1/2)
We have already computed:
ComputeSubsumees(Dish, T ) ={Dish, ItalDish, VegDish,
∃hasCourse−
, ∃hasMain
−
, ∃hasDessert−
}
Get following rewriting of Dish(x) w.r.t. T :
RewriteIQ(Dish, T ) = Dish(x) ∨ ItalDish(x) ∨ VegDish(x)
∨ ∃y.hasCourse(y, x) ∨ ∃y.hasMain(y, x)
∨ ∃y.hasDessert(y, x)
40/109
example of query rewriting (2/2)
ItalDish ⊑ Dish
VegDish ⊑ Dish
Dish ⊑ ∃hasIngred
∃hasCourse−
⊑ Dish
hasMain ⊑ hasCourse
hasDessert ⊑ hasCourse
ABox A:
hasMain(m, d1)
hasDessert(m, d2)
VegDish(d3)
RewriteIQ(Dish, T ) = Dish(x) ∨ ItalDish(x) ∨ VegDish(x) ∨ ∃y.hasCourse(y, x)
∨ ∃y.hasMain(y, x) ∨ ∃y.hasDessert(y, x)
Certain answers: d1, because of the disjunct ∃y.hasMain(y, x)
d2, because of the disjunct ∃y.hasDessert(y, x)
d3, because of the disjunct VegDish(x)
41/109
checking unsatisfiability
We have a FO-rewriting of q w.r.t. T relative to consistent ABoxes
We need a rewriting of unsatisfiability to obtain a rewriting of q
42/109
checking unsatisfiability
We have a FO-rewriting of q w.r.t. T relative to consistent ABoxes
We need a rewriting of unsatisfiability to obtain a rewriting of q
∙ only negative inclusions relevant
∙ one subquery for each such inclusion G ⊑ ¬H
∙ consider all possible ways of violating G ⊑ ¬H: combinations of a
subsumee (subrole) of G and a subsumee (subrole) of H
Details in the lecture notes.
42/109
complexity of instance checking in dl-lite
In data complexity
∙ rewriting takes constant time, yields FO query
∙ upper bound from FO query evaluation: AC0
43/109
complexity of instance checking in dl-lite
In data complexity
∙ rewriting takes constant time, yields FO query
∙ upper bound from FO query evaluation: AC0
In combined complexity:
∙ P membership: rewriting and evaluation both in polynomial time
∙ NLogSpace upper bound: ‘guess’ relevant part of rewriting
43/109
complexity of instance checking in dl-lite
In data complexity
∙ rewriting takes constant time, yields FO query
∙ upper bound from FO query evaluation: AC0
In combined complexity:
∙ P membership: rewriting and evaluation both in polynomial time
∙ NLogSpace upper bound: ‘guess’ relevant part of rewriting
Theorem In DL-LiteR, satisfiability and instance checking are
1. in AC0 for data complexity
2. NLogSpace-complete for combined complexity.
Note: Same bounds hold for several other DL-Lite dialects
43/109
instance checking in el
Next consider instance checking in EL.
Assume EL TBoxes given in normal form: axioms of the forms
A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B
44/109
instance checking in el
Next consider instance checking in EL.
Assume EL TBoxes given in normal form: axioms of the forms
A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B
Cannot use FO query rewriting approach for EL:
no FO-rewriting of A(x) w.r.t. T = {∃r.A ⊑ A}
44/109
instance checking in el
Next consider instance checking in EL.
Assume EL TBoxes given in normal form: axioms of the forms
A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B
Cannot use FO query rewriting approach for EL:
no FO-rewriting of A(x) w.r.t. T = {∃r.A ⊑ A}
We present a saturation-based approach.
44/109
saturation rules for el
TBox rules
A ⊑ Bi (1 ≤ i ≤ n) B1 ⊓ . . . ⊓ Bn ⊑ D
A ⊑ D
T1
A ⊑ B B ⊑ ∃r.D
A ⊑ ∃r.D
T2
A ⊑ ∃r.B B ⊑ D ∃r.D ⊑ E
A ⊑ E
T3
ABox rules
A1 ⊓ . . . ⊓ An ⊑ B Ai(a) (1 ≤ i ≤ n)
B(a)
A1
∃r.B ⊑ A r(a, b) B(b)
A(a)
A2
Algorithm: apply rules exhaustively, check if A(a) (r(a, b)) is present
45/109
example: saturation in el
46/109
example: saturation in el
ArrabSauce ⊑ Spicy T3 : (5), (6), (7) (10)
PenneArrab ⊑ Spicy T3 : (1), (10), (7) (11)
PenneArrab ⊑ Dish T1 : (2), (3) (12)
PenneArrab ⊑ ∃hasIngred.Pasta T2 : (2), (4) (13)
PenneArrab ⊑ SpicyDish T1 : (11), (12), (8) (14)
Spicy(p) A1 : (11), (9) (15)
Dish(p) A1 : (12), (9) (16)
SpicyDish(p) A1 : (16), (15) (17)
46/109
complexity of instance checking in el
Saturation approach is sound: everything derived is entailed
47/109
complexity of instance checking in el
Saturation approach is sound: everything derived is entailed
Also complete for instance checking:
Theorem Let K be an EL knowledge base, and let K′
be the result
of saturating K. For every ABox assertion α, we have:
K |= α iff α ∈ K′
47/109
complexity of instance checking in el
Saturation approach is sound: everything derived is entailed
Also complete for instance checking:
Theorem Let K be an EL knowledge base, and let K′
be the result
of saturating K. For every ABox assertion α, we have:
K |= α iff α ∈ K′
Note: does not make all consequences explicit
∙ can have infinitely many implied axioms ⇝ would not terminate!
∙ so: only complete for some reasoning tasks
47/109
complexity of instance checking in el
Saturation approach is sound: everything derived is entailed
Also complete for instance checking:
Theorem Let K be an EL knowledge base, and let K′
be the result
of saturating K. For every ABox assertion α, we have:
K |= α iff α ∈ K′
Note: does not make all consequences explicit
∙ can have infinitely many implied axioms ⇝ would not terminate!
∙ so: only complete for some reasoning tasks
Runs in polynomial time in |K|. This is optimal:
Theorem Instance checking in EL is P-complete for both data and
combined complexity.
47/109
extending the saturation approach
Saturation approach can be extended to ELHI⊥
Additional rules required
Key difference: new conjunctions of concepts can occur
A ⊑ ∃R.D ∃R−
.B ⊑ E
48/109
extending the saturation approach
Saturation approach can be extended to ELHI⊥
Additional rules required
Key difference: new conjunctions of concepts can occur
A ⊑ ∃R.D ∃R−
.B ⊑ E
A ⊓ B ⊑ ∃R.(D ⊓ E)
48/109
extended set of saturation rules
TBox rules
{A ⊑ Bi}n
i=1 B1 ⊓ . . . ⊓ Bn ⊑ D
A ⊑ D
T1
R ⊑ S S ⊑ T
R ⊑ T
T4
M ⊑ ∃R.(N ⊓ ⊥)
M ⊑ ⊥
T5
M ⊑ ∃R.(N ⊓ N′
) N ⊑ A
M ⊑ ∃R.(N ⊓ N′
⊓ A)
T6
M ⊑ ∃R.(N ⊓ A) ∃S.A ⊑ B R ⊑ S
M ⊑ B
T7
M ⊑ ∃R.N ∃inv(S).A ⊑ B R ⊑ S
M ⊓ A ⊑ ∃R.(N ⊓ B)
T8
ABox rules
A1 ⊓ . . . ⊓ An ⊑ B Ai(a) (1 ≤ i ≤ n)
B(a)
A1
∃r.B ⊑ A r(a, b) B(b)
A(a)
A2
∃r−
.B ⊑ A r(b, a) B(b)
A(a)
A3
r ⊑ s r(a, b)
s(a, b)
A4
r ⊑ s−
r(a, b)
s(b, a)
A5
49/109
extending the saturation approach
Saturation approach can be extended to ELHI⊥
Additional rules required
Key difference: new conjunctions of concepts can occur
C ⊑ ∃R.D D ⊑ ∀R−
.B
C ⊑ ∃R.(N ⊓ N′
⊓ A)
New set of rules ⇝ exponentially many different new axioms
Theorem Instance checking in ELHI⊥ is P-complete for data and
Exp-complete for combined complexity.
50/109
saturation as a datalog program
Let sat(T ) be result of applying TBox saturation rules to T .
For each ELHI⊥ TBox T and ABox signature Σ define following
Datalog program Π(T , Σ):
Π(T , Σ) ={B(x) ← A1(x), . . . , An(x) | A1 ⊓ . . . ⊓ An ⊑ B ∈ sat(T )} ∪
{B(x) ← A(y), r(x, y) | ∃r.A ⊑ B ∈ T } ∪
{B(y) ← A(x), r(x, y) | ∃r−
.A ⊑ B ∈ T } ∪
{s(x, y) ← r(x, y) | r ⊑ s ∈ sat(T ), s ∈ NR} ∪
{s(y, x) ← r(x, y) | r ⊑ s−
∈ sat(T ), s ∈ NR} ∪
{⊤(x) ← A(x) | A ∈ NC ∩ Σ} ∪
{⊤(x) ← r(x, y) | r ∈ NR ∩ Σ} ∪
{⊤(x) ← r(y, x) | r ∈ NR ∩ Σ}
51/109
saturation as a datalog program (continued)
Theorem For every finite signature Σ and ELHI⊥ KB K = (T , A)
with sig(A) ⊆ Σ:
1. K is unsatisfiable iff ans((Π(T , Σ), ⊥), IA) ̸= ∅;
2. If K is satisfiable, then for all A ∈ NC, r ∈ NR, and a, b ∈ Ind(A):
∙ K |= A(a) iff a ∈ ans((Π(T , Σ), A), IA);
∙ K |= r(a, b) iff (a, b) ∈ ans((Π(T , Σ), r), IA).
This means:
∙ get Datalog rewriting of instance queries in ELHI⊥
∙ can use Datalog program to create saturated ABox
52/109
saturation as a datalog program: an example
The Datalog program associated with our example:
PastaDish(x) ← PenneArrab(x) PenneArrab ⊑ PastaDish
Dish(x) ← PastaDish(x) PastaDish ⊑ Dish
Spicy(x) ← Peperonc(x) Peperonc ⊑ Spicy
Spicy(x) ← hasIngred(x, y), Spicy(y) ∃hasIngred.Spicy ⊑ Spicy
SpicyDish(x) ← Spicy(x), Dish(x) Dish ⊓ Spicy ⊑ SpicyDish
Spicy(x) ← ArrabSauce(x) ArrabSauce ⊑ Spicy
Spicy(x) ← PenneArrab(x) PenneArrab ⊑ Spicy
Dish(x) ← PenneArrab(x) PenneArrab ⊑ Dish
SpicyDish(x) ← PenneArrab PenneArrab ⊑ SpicyDish
(technically, also have T -independent rules for ⊤...)
53/109
conjunctive queries
(unions of) conjunctive queries
IQs quite restricted: No selections and joins as in DB queries
55/109
(unions of) conjunctive queries
IQs quite restricted: No selections and joins as in DB queries
Most work on OMQA adopts (unions of) conjunctive queries (CQs)
55/109
(unions of) conjunctive queries
IQs quite restricted: No selections and joins as in DB queries
Most work on OMQA adopts (unions of) conjunctive queries (CQs)
A conjunctive query (CQ) is a first-order query q(⃗x) of the form
∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn)
where every variable in some ⃗ti appears in either ⃗x or ⃗y
55/109
(unions of) conjunctive queries
IQs quite restricted: No selections and joins as in DB queries
Most work on OMQA adopts (unions of) conjunctive queries (CQs)
A conjunctive query (CQ) is a first-order query q(⃗x) of the form
∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn)
where every variable in some ⃗ti appears in either ⃗x or ⃗y
A union of CQs (UCQ) is a first-order query q(⃗x) of the form
q1(⃗x) ∨ · · · ∨ qn(⃗x)
where the qi(⃗x) are CQs with same tuple ⃗x of free vars
55/109
cqs and ucqs in datalog
Alternatively, CQs and UCQs can be seen as Datalog rules
CQs:
q(⃗x) = ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) ⇝ q(⃗x) ← P1(⃗t1), . . . , Pn(⃗tn)
56/109
cqs and ucqs in datalog
Alternatively, CQs and UCQs can be seen as Datalog rules
CQs:
q(⃗x) = ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) ⇝ q(⃗x) ← P1(⃗t1), . . . , Pn(⃗tn)
UCQs:
q(⃗x) = ∃⃗y1.P1
1(⃗t1
1) ∧ · · · ∧ P1
n1
( ⃗t1
n1
) q(⃗x) ← P1
1(⃗t1
1), . . . , P1
n1
( ⃗t1
n1
)
∨ ∃⃗y2.P2
1(⃗t2
1) ∧ · · · ∧ P2
n2
( ⃗t2
n2
) q(⃗x) ← P2
1(⃗t2
1), . . . , P2
n(⃗t2
n)
... ⇝
...
∨ ∃⃗yℓ.Pℓ
1(⃗tℓ
1) ∧ · · · ∧ Pℓ
nℓ
( ⃗tℓ
nℓ
) q(⃗x) ← Pℓ
1(⃗tℓ
1), . . . , Pℓ
nℓ
( ⃗tℓ
nℓ
)
56/109
what can we express as (u)cqs?
q1(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ Spicy(z)
q2(y, x) = q2(y, x) ∨
(
∃z, z′
.serves(x, y) ∧ hasIngred(y, z)
∧hasIngred(z, z′
) ∧ Spicy(z′
)
)
57/109
what can we express as (u)cqs?
q1(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ Spicy(z)
q2(y, x) = q2(y, x) ∨
(
∃z, z′
.serves(x, y) ∧ hasIngred(y, z)
∧hasIngred(z, z′
) ∧ Spicy(z′
)
)
Capture select-project-join queries of relational algebra
Capture basic graph patterns of SPARQL
57/109
query matches
A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π
from the variables in ⃗x ∪⃗y to objects in ∆I
such that:
∙ π(⃗t) ∈ PI
for every atom P(⃗t) ∈ q
58/109
query matches
A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π
from the variables in ⃗x ∪⃗y to objects in ∆I
such that:
∙ π(⃗t) ∈ PI
for every atom P(⃗t) ∈ q
We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a
58/109
query matches
A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π
from the variables in ⃗x ∪⃗y to objects in ∆I
such that:
∙ π(⃗t) ∈ PI
for every atom P(⃗t) ∈ q
We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a
By definition: ⃗a ∈ ans(q, I)
iff
there exists a match π such that I |=π q(⃗a)
58/109
query matches
A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π
from the variables in ⃗x ∪⃗y to objects in ∆I
such that:
∙ π(⃗t) ∈ PI
for every atom P(⃗t) ∈ q
We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a
By definition: ⃗a ∈ ans(q, I)
iff
there exists a match π such that I |=π q(⃗a)
Answering CQs amounts to searching for matches
Recall that ⃗a ∈ cert(q, K) iff ⃗a ∈ ans(q, I) for every model I of K
Challenge: how do we check that there is a match in every model?
58/109
the universal model property
For Horn DLs, each satisfiable K has a universal model IK
IK is ‘contained’ in every model I of K
⇝ formally, there is a homomorphism from IK to I
An answer to a (U)CQ q in IK is an answer to q in every model of K
⇝ matches of (U)CQs are preserved under homomorphisms
⃗a ∈ cert(q, K) iff ⃗a ∈ ans(q, IK)
So: IK gives us the certain answers to q over K
59/109
constructing a universal model
Use the saturation of (T , A) for building a universal model IT ,A
60/109
constructing a universal model
Use the saturation of (T , A) for building a universal model IT ,A
Intuition: - IT ,A contains the saturated ABox A′
- if an object satisfies M and M ⊑ ∃R.M′
∈ sat(T ),
a fresh object witnessing this is created
60/109
constructing a universal model
Use the saturation of (T , A) for building a universal model IT ,A
Intuition: - IT ,A contains the saturated ABox A′
- if an object satisfies M and M ⊑ ∃R.M′
∈ sat(T ),
a fresh object witnessing this is created
Formally, ∆IT ,A
contains words aR1M1 . . . RnMn with a ∈ Ind(A) and:
∙ Ri are roles and Mi are conjunctions of concepts
∙ there exists M ⊑ ∃R1.M1 ∈ sat(T ) such that T , A |= M(a)
∙ Mi ⊑ ∃Ri+1.Mi+1 ∈ sat(T ) for every 1 ≤ i < n
(note: use only strongest axioms M ⊑ ∃R.M′ in sat(T ))
The interpretation function is defined as follows:
∙ aIT ,A
= a,
∙ a ∈ AI
iff A(a) ∈ sat(T , A), eRM ∈ AIT ,A
iff M ⊑ A ∈ sat(T )
∙ (a, b) ∈ rI
iff r(a, b) ∈ sat(T , A), (e, eRM) ∈ rIT ,A
iff R ⊑ r ∈ sat(T ),
and (eRM, e) ∈ rIT ,A
if R ⊑ r−
∈ sat(T )
60/109
example of the canonical model construction (1/3)
TBox: PenneArrab ⊑ ∃hasIngred.Penne
Penne ⊑ Pasta
PenneArrab ⊑ ∃hasIngred.ArrabSauce
ArrabSauce ⊑ ∃hasIngred.Peperonc
Peperonc ⊑ Spicy
PizzaCalab ⊑ ∃hasIngred.Nduja
Nduja ⊑ Spicy
ABox: serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
The saturated TBox additionally contains:
PenneArrab ⊑ ∃hasIngred.(Penne ⊓ Pasta)
ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy)
PizzaCalab ⊑ ∃hasIngred.(Nduja ⊓ Spicy)
61/109
example of the canonical model construction (2/3)
IT ,A contains the ABox and is closed under inclusions
serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
rp
PizzaCalab
b
PenneArrabserves serves
62/109
example of the canonical model construction (3/3)
The anonymous objects witnessing existential concepts form trees
rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
PenneArrab ⊑ ∃hasIngred.ArrabSauce
PenneArrab ⊑ ∃hasIngred.(Penne ⊓ Pasta)
ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy)
PizzaCalab ⊑ ∃hasIngred.(Nduja ⊓ Spicy)
63/109
finding answers in the canonical model
To answer CQ q, it suffices to test whether it has a match in IT ,A
But this is still challenging!
- IT ,A contains assertions and objects not present in A
- we cannot build IT ,A explicitly: can be infinite!
Our approach: use query rewriting!
Formally: given a CQ q, we construct a UCQ REWT (q) such that
⃗a ∈ ans(q, IT ,A)
iff
there is a match π for a disjunct q′
of rewT (q) such that
IT ,A |=π q′
(⃗a) and π sends all vars to individuals from A
64/109
rewriting the query
Rewriting step (idea):
1. Choose leaf variable x so that no vars are mapped below it in IT ,A
2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x
are satisfied
3. Drop from q all atoms with x and add M to parent of x to get q′
65/109
rewriting the query
Rewriting step (idea):
1. Choose leaf variable x so that no vars are mapped below it in IT ,A
2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x
are satisfied
3. Drop from q all atoms with x and add M to parent of x to get q′
Properties:
∙ Every match for q′
can be extended to a match for q
∙ Every match for q contains a match for q′
∙ The answers of q and q′
coincide, but the relevant matches for q′
are closer to the ABox than those of q
65/109
rewriting the query
Rewriting step (idea):
1. Choose leaf variable x so that no vars are mapped below it in IT ,A
2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x
are satisfied
3. Drop from q all atoms with x and add M to parent of x to get q′
Properties:
∙ Every match for q′
can be extended to a match for q
∙ Every match for q contains a match for q′
∙ The answers of q and q′
coincide, but the relevant matches for q′
are closer to the ABox than those of q
We repeatedly apply this rewriting step to obtain a set of queries
whose relevant matches range over ABox individuals.
65/109
example of a rewriting step (1/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′
) = e4
r
π(x)
p
PizzaCalab
b
PenneArrab
π(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
66/109
example of a rewriting step (1/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′
) = e4
r
π(x)
p
PizzaCalab
b
PenneArrab
π(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
• Choose z′
as ‘leaf’
• Choose ArrabSauce ⊑ ∃hasIngred.Spicy ∈ sat(T )
• RHS ensures hasIngred(z, z′
), Spicy(z′
)
• We replace these atoms by ArrabSauce(z)
66/109
example of a rewriting step (1/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′
) = e4
r
π(x)
p
PizzaCalab
b
PenneArrab
π(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
• Choose z′
as ‘leaf’
• Choose ArrabSauce ⊑ ∃hasIngred.Spicy ∈ sat(T )
• RHS ensures hasIngred(z, z′
), Spicy(z′
)
• We replace these atoms by ArrabSauce(z)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
66/109
example of a rewriting step (2/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
67/109
example of a rewriting step (2/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π q(b, r) IK |=π′ q′
(b, r)
r
π(x),π′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
67/109
example of a rewriting step (2/2)
q(y, x) = ∃z, z′
.serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π q(b, r) IK |=π′ q′
(b, r)
r
π(x),π′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
depth(π) > depth(π′
)
67/109
another rewriting step (1/2)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π′ q′
(b, r)
r
π′(x)
p
PizzaCalab
b
PenneArrab
π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ′(z)
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
68/109
another rewriting step (1/2)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π′ q′
(b, r)
r
π′(x)
p
PizzaCalab
b
PenneArrab
π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ′(z)
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
• Choose z as leaf
• Choose PenneArrab ⊑ ∃hasIngred.ArrabSauce
• RHS yields hasIngred(y, z) and ArrabSauce(z)
• We replace these atoms by PenneArrab(y)
68/109
another rewriting step (1/2)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
IK |=π′ q′
(b, r)
r
π′(x)
p
PizzaCalab
b
PenneArrab
π′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ′(z)
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
• Choose z as leaf
• Choose PenneArrab ⊑ ∃hasIngred.ArrabSauce
• RHS yields hasIngred(y, z) and ArrabSauce(z)
• We replace these atoms by PenneArrab(y)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
68/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
69/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
IK |=π q(b, r) IK |=π′ q′
(b, r) IK |=π′′ q′′
(b, r)
69/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
IK |=π q(b, r) IK |=π′ q′
(b, r) IK |=π′′ q′′
(b, r)
r
π(x),π′(x),π′′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y),π′′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
69/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
IK |=π q(b, r) IK |=π′ q′
(b, r) IK |=π′′ q′′
(b, r)
r
π(x),π′(x),π′′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y),π′′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
depth(π) > depth(π′
) > depth(π′′
)
69/109
another rewriting step (2/2)
q(y, x) = ∃z, z′
serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′
) ∧ Spicy(z′
)
q′
(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z)
q′′
(y, x) = serves(x, y) ∧ PenneArrab(y)
IK |=π q(b, r) IK |=π′ q′
(b, r) IK |=π′′ q′′
(b, r)
r
π(x),π′(x),π′′(x)
p
PizzaCalab
b
PenneArrab
π(y),π′(y),π′′(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauceπ(z) ,π′(z)
e4
Peperonc, Spicy
π(z′)
serves serves
hasIngred hasIngred hasIngred
hasIngred
depth(π) > depth(π′
) > depth(π′′
)
In π′′
all variables are mapped to individuals
69/109
decision procedure
Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x):
⃗a ∈ cert(q, K) iff IK |=π q′
(⃗a) for some q′
∈ rewT (q) and some π that
maps all variables to individuals in A.
70/109
decision procedure
Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x):
⃗a ∈ cert(q, K) iff IK |=π q′
(⃗a) for some q′
∈ rewT (q) and some π that
maps all variables to individuals in A.
There is a bounded number of such restricted matches π
Checking if π is match reduces to linearly many instance checks
70/109
decision procedure
Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x):
⃗a ∈ cert(q, K) iff IK |=π q′
(⃗a) for some q′
∈ rewT (q) and some π that
maps all variables to individuals in A.
There is a bounded number of such restricted matches π
Checking if π is match reduces to linearly many instance checks
Yields terminating, sound, and complete CQ answering procedure
70/109
complexity of cq answering
Combined complexity:
sat(T ) and rewT (q) can be constructed in single exponential time
single exponential bound on candidate matches π
instance checking in single exponential time
71/109
complexity of cq answering
Combined complexity:
sat(T ) and rewT (q) can be constructed in single exponential time
single exponential bound on candidate matches π
instance checking in single exponential time
Data complexity:
sat(T ) and rewT (q) are ABox independent
polynomial bound on candidate matches π
Instance checking in polynomial time
71/109
complexity of cq answering
Combined complexity:
sat(T ) and rewT (q) can be constructed in single exponential time
single exponential bound on candidate matches π
instance checking in single exponential time
Data complexity:
sat(T ) and rewT (q) are ABox independent
polynomial bound on candidate matches π
Instance checking in polynomial time
Theorem CQ answering in ELHI⊥and Horn-SHIQ is Exp-complete
in combined complexity and P-complete in data complexity.
71/109
optimal bounds for lightweight dls
Adapting our technique gives optimal bounds for lightweight DLs:
For ELH and DL-LiteR we get NP in combined complexity:
∙ compute sat(T ) in polynomial time
∙ non-deterministically build the right q′
∈ rewT (q)
∙ guess a candidate π
∙ check if it is a match ⇝ instance checking in polynomial time
CQ answering is NP-hard over ABox alone seen as DB (no TBox)
72/109
optimal bounds for lightweight dls
Adapting our technique gives optimal bounds for lightweight DLs:
For ELH and DL-LiteR we get NP in combined complexity:
∙ compute sat(T ) in polynomial time
∙ non-deterministically build the right q′
∈ rewT (q)
∙ guess a candidate π
∙ check if it is a match ⇝ instance checking in polynomial time
CQ answering is NP-hard over ABox alone seen as DB (no TBox)
For EL in data complexity, yields P membership
⇝ optimal since instance queries already P-hard
72/109
optimal bounds for lightweight dls
Adapting our technique gives optimal bounds for lightweight DLs:
For ELH and DL-LiteR we get NP in combined complexity:
∙ compute sat(T ) in polynomial time
∙ non-deterministically build the right q′
∈ rewT (q)
∙ guess a candidate π
∙ check if it is a match ⇝ instance checking in polynomial time
CQ answering is NP-hard over ABox alone seen as DB (no TBox)
For EL in data complexity, yields P membership
⇝ optimal since instance queries already P-hard
In DL-LiteR, can get a FO rewriting ⇝ in AC0 for data complexity.
72/109
optimal bounds for lightweight dls
Adapting our technique gives optimal bounds for lightweight DLs:
For ELH and DL-LiteR we get NP in combined complexity:
∙ compute sat(T ) in polynomial time
∙ non-deterministically build the right q′
∈ rewT (q)
∙ guess a candidate π
∙ check if it is a match ⇝ instance checking in polynomial time
CQ answering is NP-hard over ABox alone seen as DB (no TBox)
For EL in data complexity, yields P membership
⇝ optimal since instance queries already P-hard
In DL-LiteR, can get a FO rewriting ⇝ in AC0 for data complexity.
Theorem CQ answering in ELH and DL-LiteR is NP-complete in
combined complexity. For ELH the data complexity is P-complete,
and for DL-LiteR the data complexity is in AC0. 72/109
datalog rewriting
Our procedure yields a Datalog rewriting:
∙ rewT (q) is a UCQ ⇝ translate into set of Datalog rules Πrew(q)
∙ use Q in head of rules
∙ the program Π(T ) (from earlier) computes all entailed ABox
assertions
73/109
datalog rewriting
Our procedure yields a Datalog rewriting:
∙ rewT (q) is a UCQ ⇝ translate into set of Datalog rules Πrew(q)
∙ use Q in head of rules
∙ the program Π(T ) (from earlier) computes all entailed ABox
assertions
(Πrew(q) ∪ Π(T ), Q)
is a Datalog rewriting of q w.r.t. T relative to consistent ABoxes
73/109
combined approach for cqs in elhi
Alternative: combined approach (saturation and rewriting)
74/109
combined approach for cqs in elhi
Alternative: combined approach (saturation and rewriting)
Know that it suffices to evaluate the UCQ rewT (q) over the set of
ABox assertions entailed from the KB K
74/109
combined approach for cqs in elhi
Alternative: combined approach (saturation and rewriting)
Know that it suffices to evaluate the UCQ rewT (q) over the set of
ABox assertions entailed from the KB K
Also know: assertions entailed from K = assertions in sat(K)
74/109
combined approach for cqs in elhi
Alternative: combined approach (saturation and rewriting)
Know that it suffices to evaluate the UCQ rewT (q) over the set of
ABox assertions entailed from the KB K
Also know: assertions entailed from K = assertions in sat(K)
Materialize assertions in sat(K) and view result as database
+ only need to evaluate a UCQ
can use standard relational database systems
– materializing not always convenient
saturation needs to be updated if data changes
74/109
an fo rewriting approach for cqs in dl-lite
For DL-LiteR we can generate an FO-rewriting as follows.
Replace in all q′
∈ rewT (q) each atom by its FO-rewriting for
instance checking:
75/109
an fo rewriting approach for cqs in dl-lite
For DL-LiteR we can generate an FO-rewriting as follows.
Replace in all q′
∈ rewT (q) each atom by its FO-rewriting for
instance checking:
∙ replace each A(t) by RewriteIQ(A, T )
∙ replace each r(t, t′
) by RewriteIQ(r, T )
Resulting FO formula:
∙ positive, can be transformed into a UCQ
∙ rewriting relative to consistent ABoxes
∙ can be combined with a rewriting of unsatisfiability
∙ yields AC0 upper bound in data complexity
75/109
a glimpse beyond
Other Horn DLs
∙ Similar results hold for other dialects of DL-Lite and EL
∙ Sometimes complexity increases, e.g., EL with complex role
inclusions
∙ The P data and Exp combined upper bounds extend to even more
expressive Horn DLs, like Horn-SHOIQ
76/109
a glimpse beyond
Other Horn DLs
∙ Similar results hold for other dialects of DL-Lite and EL
∙ Sometimes complexity increases, e.g., EL with complex role
inclusions
∙ The P data and Exp combined upper bounds extend to even more
expressive Horn DLs, like Horn-SHOIQ
Beyond Horn DLs
∙ no universal model property
∙ query answering usually exponentially harder
∙ different techniques: automata, rolling-up, resolution, etc.
∙ for some well-known DLs (e.g., SHOIQ) decidability open
76/109
complexity of answering (u)cqs
IQs CQs
data
complexity
combined
complexity
data
complexity
combined
complexity
DL-Lite
DL-LiteR
in AC0 NLogSpace in AC0 NP
EL, ELH P P P NP
ELI, ELHI⊥,
Horn-SHOIQ
P Exp P Exp
ALC,
ALCHQ
coNP Exp coNP Exp
ALCI, SH,
SHIQ
coNP Exp coNP 2Exp
SHOIQ coNP-hard coNExp coNP-hard1
coN2Exp-hard1
1
decidability open
77/109
navigational queries
limitations of (u)cqs
Some very natural queries are not expressible as CQs:
- find dishes that contain something spicy
- is a a relative of b?
- is there a bus connection from X to Y?
79/109
limitations of (u)cqs
Some very natural queries are not expressible as CQs:
- find dishes that contain something spicy
- is a a relative of b?
- is there a bus connection from X to Y?
We need navigational queries queries that can query the topology
of the graph, that is, the information stored in the connections
79/109
limitations of (u)cqs
Some very natural queries are not expressible as CQs:
- find dishes that contain something spicy
- is a a relative of b?
- is there a bus connection from X to Y?
We need navigational queries queries that can query the topology
of the graph, that is, the information stored in the connections
Especially important for highly connected data with no fixed schema
∙ social, biological, and chemical networks
79/109
navigational queries
Prominent navigational query languages:
80/109
navigational queries
Prominent navigational query languages:
Regular Path Queries (RPQs): find pairs of objects that are connected
by a chain of roles that comply with a given regular language
(hasCourse ∪ courseOf−
) · (hasIngred ∪ ingredOf
−
)∗
· Spicy?(x, y)
80/109
navigational queries
Prominent navigational query languages:
Regular Path Queries (RPQs): find pairs of objects that are connected
by a chain of roles that comply with a given regular language
(hasCourse ∪ courseOf−
) · (hasIngred ∪ ingredOf
−
)∗
· Spicy?(x, y)
Conjunctive RPQs: allow to join RPQs conjunctively
∙ similar to CQs, but each atom is an RPQ
∙ extend CQs with the navigational power of RPQs
q(x, x′
) = ∃y, z. serves · Menu? · (hasMain ∪ hasStarter)(x, y) ∧
serves · Menu? · (hasCourse ∪ courseOf−
)(x′
, y) ∧
(hasIngred ∪ ingredOf
−
)∗
· Spicy?(y, z)
Both languages have 1-way and 2-way variants
80/109
our most expressive navigational queries: c2rpqs
Recall: N±
R contains all role names and their inverses.
A conjunctive two-way regular path query (C2RPQ) has the form
q(⃗x) = ∃⃗y.
∧
L(t, t′
) ∧
∧
A(t)
where A is a concept name
t, t′
are variables or individuals (in NI ∪⃗x ∪⃗y)
L is regular language over N±
R ∪ {A? | A ∈ NC}
81/109
our most expressive navigational queries: c2rpqs
Recall: N±
R contains all role names and their inverses.
A conjunctive two-way regular path query (C2RPQ) has the form
q(⃗x) = ∃⃗y.
∧
L(t, t′
) ∧
∧
A(t)
where A is a concept name
t, t′
are variables or individuals (in NI ∪⃗x ∪⃗y)
L is regular language over N±
R ∪ {A? | A ∈ NC}
Regular languages can be given as:
∙ regular expressions E → r ∈ N±
R | A? | r · r | r ∪ r | r∗
∙ non-deterministic finite automata NFA
Recall: RegExps and NFAs are equivalent, but NFAs are more succinct
81/109
other navigational query languages
Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝
regular expressions use only (direct) role names
q(x, x′
) = ∃y, z.serves · Menu?hasCourse(x, y) ∧
serves · Menu? · hasCourse(x′
, y) ∧ hasIngred∗
· Spicy?(y, z)
q(x) = ∃y.hasIngred∗
· Spicy?(x, y)
82/109
other navigational query languages
Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝
regular expressions use only (direct) role names
q(x, x′
) = ∃y, z.serves · Menu?hasCourse(x, y) ∧
serves · Menu? · hasCourse(x′
, y) ∧ hasIngred∗
· Spicy?(y, z)
q(x) = ∃y.hasIngred∗
· Spicy?(x, y)
Two-way regular path queries (2RPQs) have only one atom and no
existential variables ⇝ both variables are answer variables
q(x, y) = (hasIngred ∪ ingredOf−
)∗
· Spicy?(x, y)
q(x, y) = (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
82/109
other navigational query languages
Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝
regular expressions use only (direct) role names
q(x, x′
) = ∃y, z.serves · Menu?hasCourse(x, y) ∧
serves · Menu? · hasCourse(x′
, y) ∧ hasIngred∗
· Spicy?(y, z)
q(x) = ∃y.hasIngred∗
· Spicy?(x, y)
Two-way regular path queries (2RPQs) have only one atom and no
existential variables ⇝ both variables are answer variables
q(x, y) = (hasIngred ∪ ingredOf−
)∗
· Spicy?(x, y)
q(x, y) = (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
(One-way) Regular path queries (RPQs) are 2RPQs with no inverses
⇝ all of the restrictions above
q(x, y) = hasIngred∗
· Spicy?(x, y)
q(x, y) = hasCourse · hasIngred∗
· Spicy?(x, y)
82/109
semantics of c2rpqs
Satisfaction of atoms L(t, t′
):
(d, d′
) ∈ LI
if there is an L-path from d to d′
, i.e.,
∙ a sequence e0e1 . . . en objects from ∆I
with e0 = d and en = d′
∙ a word u1u2 . . . un ∈ L over N±
R ∪ {A? | A ∈ NC}
such that, for every 1 ≤ i ≤ n:
∙ if ui = A?, then ei−1 = ei ∈ AI
∙ if ui = R ∈ N±
R , then (ei−1, ei) ∈ RI
83/109
semantics of c2rpqs
Satisfaction of atoms L(t, t′
):
(d, d′
) ∈ LI
if there is an L-path from d to d′
, i.e.,
∙ a sequence e0e1 . . . en objects from ∆I
with e0 = d and en = d′
∙ a word u1u2 . . . un ∈ L over N±
R ∪ {A? | A ∈ NC}
such that, for every 1 ≤ i ≤ n:
∙ if ui = A?, then ei−1 = ei ∈ AI
∙ if ui = R ∈ N±
R , then (ei−1, ei) ∈ RI
Match: mapping π from terms to elements that satisfies all atoms
As before: I |=π q(⃗a) if match π maps answer variables to ⃗a
83/109
semantics of c2rpqs
Satisfaction of atoms L(t, t′
):
(d, d′
) ∈ LI
if there is an L-path from d to d′
, i.e.,
∙ a sequence e0e1 . . . en objects from ∆I
with e0 = d and en = d′
∙ a word u1u2 . . . un ∈ L over N±
R ∪ {A? | A ∈ NC}
such that, for every 1 ≤ i ≤ n:
∙ if ui = A?, then ei−1 = ei ∈ AI
∙ if ui = R ∈ N±
R , then (ei−1, ei) ∈ RI
Match: mapping π from terms to elements that satisfies all atoms
As before: I |=π q(⃗a) if match π maps answer variables to ⃗a
Certain answers defined as for CQs
Again suffices to find match in canonical model
83/109
answering 2rpqs
We focus on answering 2PRQs: one atom, no existential variables
84/109
answering 2rpqs
We focus on answering 2PRQs: one atom, no existential variables
Bound on matches ranging over individuals only
84/109
answering 2rpqs
We focus on answering 2PRQs: one atom, no existential variables
Bound on matches ranging over individuals only
Challenge: paths may need to go deep into the canonical model
q(x, y) = serves · (hasIngred ∪ ingredOf
−
)∗
· Spicy? · Σ∗
(x, y)
r
π(x)
p
PizzaCalab
b
PenneArrabπ(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
84/109
loops through the anonymous part
Goal: compact representation of all ways in which paths through
the anonymous part can participate in matches
85/109
loops through the anonymous part
Goal: compact representation of all ways in which paths through
the anonymous part can participate in matches
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗
We use NFA representation
We write M ∈ Loopα[s, s′
] iff a ∈ MIK
implies the existence of a path
p below a that takes the NFA α from s to s′
, e.g.,
85/109
loops through the anonymous part
Goal: compact representation of all ways in which paths through
the anonymous part can participate in matches
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗
We use NFA representation
We write M ∈ Loopα[s, s′
] iff a ∈ MIK
implies the existence of a path
p below a that takes the NFA α from s to s′
, e.g.,
PenneArrab ∈ Loopα[s1, sf]
because of
PenneArrab ⊑ ∃hasIngred.ArrabSauce
ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy)
85/109
computing the loop table
We can explicitly compute the full table Loopα inductively:
86/109
computing the loop table
We can explicitly compute the full table Loopα inductively:
if s is a state then Loopα[s, s] = NC
if M1 ∈ Loopα[s1, s2] and
M2 ∈ Loopα[s2, s3]
then M1 ⊓ M2 ∈ Loopα[s1, s3]
if T |= C1 ⊓ · · · ⊓ Cn ⊑ A and
(s1, A?, s2) ∈ δ
then C1 ⊓ · · · ⊓ Cn ∈ Loopα[s1, s2]
if T |= C1 ⊓ · · · ⊓ Cn ⊑ ∃R.D,
T |= R ⊑ R′
, T |= R ⊑ R′′
,
(s1, R′
, s2) ∈ δ,
D ∈ Loopα[s2, s3], and
(s3, R′′−
, s4) ∈ δ
then C1 ⊓ · · · ⊓ Cn ∈ Loopα[s1, s4]
86/109
computing loops: an example
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗ rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
87/109
computing loops: an example
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗ rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and
Peperonc ⊑ Spicy
87/109
computing loops: an example
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗ rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and
Peperonc ⊑ Spicy
∙ ArrabSauce ∈ Loopα[s1, sf] because
(s1, hasIngred, s1), (sf, hasIngred−
, sf) ∈ δ and
ArrabSauce ⊑ ∃hasIngred.Peperonc
Peperonc ∈ Loopα[s1, sf]
87/109
computing loops: an example
s0 s1 sf
serves
hasIngred
ingredOf
−
Spicy?
Σ∗ rp
PizzaCalab
b
PenneArrab
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and
Peperonc ⊑ Spicy
∙ ArrabSauce ∈ Loopα[s1, sf] because
(s1, hasIngred, s1), (sf, hasIngred−
, sf) ∈ δ and
ArrabSauce ⊑ ∃hasIngred.Peperonc
Peperonc ∈ Loopα[s1, sf]
∙ PenneArrab ∈ Loopα[s1, sf] because
(s1, hasIngred, s1), (sf, hasIngred−
, sf) ∈ δ and
PenneArrab ⊑ ∃hasIngred.ArrabSauce
ArrabSauce ∈ Loopα[s1, sf]
87/109
evaluation 2rpqs using the loop table
Non-deterministic algorithm to decide (a, b) ∈ cert(α(x, y), K)
Input: NFA α = (S, Σ, δ, s0, F), KB K = (T , A), (a, b) from A
88/109
evaluation 2rpqs using the loop table
Non-deterministic algorithm to decide (a, b) ∈ cert(α(x, y), K)
Input: NFA α = (S, Σ, δ, s0, F), KB K = (T , A), (a, b) from A
∙ After checking consistency, we start from (a, s0)
∙ At pair (c, s), guess new pair (d, s′
) together with one of:
∙ transition (s, σ, s′
) a σ-step from c to d in ABox
⇝ check if (c, d) ∈ σI
∙ concepts M in Loopα[s, s′
] stay at same individual, and jump to s′
⇝ check if c = d ∈ MI
∙ Exit when we get pair (b, sf)
∙ Use counter to ensure termination (only need to consider each
pair once)
88/109
evaluation algorithm
Algorithm EvalAtom
Input: NFA α = (S, Σ, δ, s0, F) with Σ ⊆ N±
R ∪ {A? | A ∈ NC}, ELHI⊥ KB
(T , A), (a, b) ∈ Ind(A) × Ind(A)
1. Test whether (T , A) is satisfiable, output yes if not.
2. Initialize current = (a, s0) and count = 0. Set max = |A| · |S| + 1.
3. While count < max and current ̸∈ {(b, sf) | sf ∈ F}
3.1 Let current = (c, s).
3.2 Guess a pair (d, s′
) ∈ Ind(A) × S and either (s, σ, s′
) ∈ δ or
M ∈ Loopα[s, s′
].
3.3 If (s, σ, s′
) was guessed
∙ If σ ∈ N±
R , then verify that T , A |= σ(c, d), and return no if not.
∙ If σ = A?, then verify that c = d and T , A |= A(c), and return no if not.
3.4 If M was guessed, then verify that c = d and that T , A |= B(c) for every
concept name B ∈ M, and return no if not.
3.5 Set current = (d, s′
) and increment count.
4. If current = (b, sf) for some sf ∈ F, return yes. Else return no.
89/109
evaluation algorithm: example (1/2)
q(x, y) = serves · (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
PenneArrab ⊑ PastaDish ⊓ ∃hasIngred.ArrabSauce
PastaDish ⊑ Dish ⊓ ∃hasIngred.Pasta
ArrabSauce ⊑ ∃hasIngred.Peperonc
Peperonc ⊔ ∃hasIngred.Spicy ⊑ Spicy
Spicy ⊓ Dish ⊑ SpicyDish
r
π(x)
p
PizzaCalab
b
PenneArrabπ(y)
e1
Nduja, Spicy
e2
Penne, Pasta
e3 ArrabSauce
e4
Peperonc, Spicy
serves serves
hasIngred hasIngred hasIngred
hasIngred
90/109
evaluation algorithm: example (2/2)
serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
q(x, y) = serves · (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
s0 s1 sf
serves
hasIngred
ingredOf−
Spicy?
Σ∗Peperonc ∈ Loopα[s1, sf]
ArrabSauce ∈ Loopα[s1, sf]
PenneArrab ∈ Loopα[s1, sf]
91/109
evaluation algorithm: example (2/2)
serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p)
q(x, y) = serves · (hasIngred ∪ ingredOf−
)∗
· Spicy? · Σ∗
(x, y)
s0 s1 sf
serves
hasIngred
ingredOf−
Spicy?
Σ∗Peperonc ∈ Loopα[s1, sf]
ArrabSauce ∈ Loopα[s1, sf]
PenneArrab ∈ Loopα[s1, sf]
count: 0 1 2
Guess (r, s0) (b, s1) (b, sf)
(s0, serves, s1) ∈ δ PenneArrab ∈ Loopα[s1, sf]
Test (r, b) ∈ servesI
b ∈ PenneArrabI
return yes
91/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
92/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
∙ Iterations bounded by counter (poly. counter ⇝ log space)
∙ We need calls to procedures for:
satisfiability instance checking membership in Loopα table
∙ These calls are in Exp for ELHI⊥
Loopα computation: polynomially many iterations
each one tests entailment
92/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
∙ Iterations bounded by counter (poly. counter ⇝ log space)
∙ We need calls to procedures for:
satisfiability instance checking membership in Loopα table
∙ These calls are in Exp for ELHI⊥
Loopα computation: polynomially many iterations
each one tests entailment
Exp upper bound for ELHI⊥ (combined complexity)
92/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
∙ Iterations bounded by counter (poly. counter ⇝ log space)
∙ We need calls to procedures for:
satisfiability instance checking membership in Loopα table
∙ These calls are in Exp for ELHI⊥
Loopα computation: polynomially many iterations
each one tests entailment
Exp upper bound for ELHI⊥ (combined complexity)
For ELH and DL-Lite, we can obtain P upper bound (combined)
92/109
complexity of the algorithm
Theorem (a, b) ∈ cert(q, K) iff there is some execution of
EvalAtom(α, K, (a, b)) that returns yes.
∙ Iterations bounded by counter (poly. counter ⇝ log space)
∙ We need calls to procedures for:
satisfiability instance checking membership in Loopα table
∙ These calls are in Exp for ELHI⊥
Loopα computation: polynomially many iterations
each one tests entailment
Exp upper bound for ELHI⊥ (combined complexity)
For ELH and DL-Lite, we can obtain P upper bound (combined)
Data complexity:
∙ Loopα computation in constant time (ABox independent)
∙ called procedures in P for ELH and NLogSpace for DL-LiteR
92/109
complexity bounds
Theorem
∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined
complexity and P-complete in data complexity
∙ For DL-LiteR and ELH, the combined complexity drops to
P-complete
∙ In data complexity, the problem is NLogSpace-complete for
DL-LiteR, and P-complete for ELH.
93/109
complexity bounds
Theorem
∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined
complexity and P-complete in data complexity
∙ For DL-LiteR and ELH, the combined complexity drops to
P-complete
∙ In data complexity, the problem is NLogSpace-complete for
DL-LiteR, and P-complete for ELH.
Most matching lower bounds from simpler problems:
93/109
complexity bounds
Theorem
∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined
complexity and P-complete in data complexity
∙ For DL-LiteR and ELH, the combined complexity drops to
P-complete
∙ In data complexity, the problem is NLogSpace-complete for
DL-LiteR, and P-complete for ELH.
Most matching lower bounds from simpler problems:
∙ instance checking
∙ graph reachability = RPQ over plain ABox
93/109
complexity bounds
Theorem
∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined
complexity and P-complete in data complexity
∙ For DL-LiteR and ELH, the combined complexity drops to
P-complete
∙ In data complexity, the problem is NLogSpace-complete for
DL-LiteR, and P-complete for ELH.
Most matching lower bounds from simpler problems:
∙ instance checking
∙ graph reachability = RPQ over plain ABox
P-hardness for DL-LiteR non-trivial
93/109
answering c2rpqs
For answering C2RPQs, we combine the ideas above:
∙ rewrite the query so that matches ranging over individuals suffice
∙ in each step, consider possibly deeper paths with Loopαtable
After rewriting, guess matches using individuals only and check
them using EvalAtom on each atom
94/109
answering c2rpqs
For answering C2RPQs, we combine the ideas above:
∙ rewrite the query so that matches ranging over individuals suffice
∙ in each step, consider possibly deeper paths with Loopαtable
After rewriting, guess matches using individuals only and check
them using EvalAtom on each atom
This algorithm works for all DLs discussed and gives optimal
complexity bounds
94/109
answering c2rpqs
For answering C2RPQs, we combine the ideas above:
∙ rewrite the query so that matches ranging over individuals suffice
∙ in each step, consider possibly deeper paths with Loopαtable
After rewriting, guess matches using individuals only and check
them using EvalAtom on each atom
This algorithm works for all DLs discussed and gives optimal
complexity bounds
Answering C2RPQs is not much harder:
∙ Combined complexity increases to PSpace for DL-LiteR and ELH
∙ but most other bounds are the same as for RPQs and CQs
∙ even for very expressive DLs that are not Horn
94/109
final remarks on navigational queries
Navigational queries provide more querying power at moderate
computational cost
Expressible in Datalog, but computationally better behaved
Good alternative to CQs, gaining increasing attention
Property paths in SPARQL
∙ included in the SPARQL 1.1 standard
∙ add regular paths as in C2RPQs
Ongoing quest for more flexible navigational languages
95/109
complexity of answering (c)(2)rpqs
2RPQs C2RPQs
data
complexity
combined
complexity
data
complexity
combined
complexity
DL-Lite
DL-LiteR
NLogSpace P NLogSpace PSpace
EL, ELH P P P PSpace
ELI, ELHI⊥,
Horn-SHOIQ
P Exp P Exp
ALC,
ALCHQ
coNP Exp coNP-hard 2Exp
ALCI, SH,
SHIQ
coNP Exp coNP-hard 2Exp
SHOIQ coNP-hard coNExp coNP-hard1
coN2Exp-hard1
1
decidability open 96/109
comparison: complexity of answering (u)cqs
IQs CQs
data
complexity
combined
complexity
data
complexity
combined
complexity
DL-Lite
DL-LiteR
in AC0 NLogSpace in AC0 NP
EL, ELH P P P NP
ELI, ELHI⊥,
Horn-SHOIQ
P Exp P Exp
ALC,
ALCHQ
coNP Exp coNP Exp
ALCI, SH,
SHIQ
coNP Exp coNP 2Exp
SHOIQ coNP-hard coNExp coNP-hard1
coN2Exp-hard1
1
decidability open
97/109
queries with negation or re-
cursion
queries with negation
Conjunctive query with safe negation (CQ¬s
):
∙ like a CQ, but can also have negated atoms
∙ safety condition: every variable occurs in some positive atom
∙ example: find menus whose main course is not spicy
∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y)
99/109
queries with negation
Conjunctive query with safe negation (CQ¬s
):
∙ like a CQ, but can also have negated atoms
∙ safety condition: every variable occurs in some positive atom
∙ example: find menus whose main course is not spicy
∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y)
Conjunctive query with inequalities (CQ̸=
)
∙ like a CQ, but can also have atoms t1 ̸= t2 (t1, t2 vars or individuals)
∙ example: find restaurant offering two menus having different
dessert courses
∃y1y2z1z2 offers(x, y1) ∧ Menu(y1) ∧ hasDessert(y1, z1)∧
offers(x, y2) ∧ Menu(y2) ∧ hasDessert(y2, z2) ∧ z1 ̸= z2
∙ example: find menus with at least three courses (see notes)
99/109
queries with negation
Conjunctive query with safe negation (CQ¬s
):
∙ like a CQ, but can also have negated atoms
∙ safety condition: every variable occurs in some positive atom
∙ example: find menus whose main course is not spicy
∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y)
Conjunctive query with inequalities (CQ̸=
)
∙ like a CQ, but can also have atoms t1 ̸= t2 (t1, t2 vars or individuals)
∙ example: find restaurant offering two menus having different
dessert courses
∃y1y2z1z2 offers(x, y1) ∧ Menu(y1) ∧ hasDessert(y1, z1)∧
offers(x, y2) ∧ Menu(y2) ∧ hasDessert(y2, z2) ∧ z1 ̸= z2
∙ example: find menus with at least three courses (see notes)
Note: can define UCQ¬s
s and UCQ̸=
s in the obvious way
99/109
undecidability results for queries with negation
Adding negation leads to undecidability even in very restricted
settings.
Theorem The following problems are undecidable:
∙ CQ¬s
answering in DL-LiteR
∙ UCQ¬s
answering in EL⊥
∙ CQ̸=
answering in DL-LiteR
∙ CQ̸=
answering in EL⊥
100/109
undecidability results for queries with negation
Adding negation leads to undecidability even in very restricted
settings.
Theorem The following problems are undecidable:
∙ CQ¬s
answering in DL-LiteR
∙ UCQ¬s
answering in EL⊥
∙ CQ̸=
answering in DL-LiteR
∙ CQ̸=
answering in EL⊥
Possible solution: adopt alternative semantics (see lecture notes)
100/109
undecidability results for queries with recursion
Significant interest in combining DLs with Datalog rules
Unfortunately, this almost always leads to undecidability:
Theorem Datalog query answering is undecidable in every DL that
can express (directly or indirectly) A ⊑ ∃r.A
In particular: undecidable in both DL-Lite and EL
101/109
undecidability results for queries with recursion
Significant interest in combining DLs with Datalog rules
Unfortunately, this almost always leads to undecidability:
Theorem Datalog query answering is undecidable in every DL that
can express (directly or indirectly) A ⊑ ∃r.A
In particular: undecidable in both DL-Lite and EL
Possible solutions:
∙ use restricted classes of Datalog queries (e.g. path queries)
∙ DL-safe rules: can only apply rules to (named) individuals
101/109
research trends in omqa
efficient omqa
Lots of work on developing and implementing efficient OMQA
algorithms
Focus mostly on DL-Lite (and related dialects):
∙ First algorithm PerfectRef proposed in mid-2000’s
∙ Rewrites into UCQs, implemented in Quonto
∙ Improved versions proposed in Requiem, Presto, Rapid, …
∙ Some algorithms rewrite into positive existential queries or
Datalog programs instead of UCQs
∙ Resulting queries are smaller, can be easier to evaluate
103/109
optimizations and omqa beyond dl-lite
Tractable classes, fragments of lower complexity
Rewriting engines for other Horn DLs also developed, e.g.,
∙ Requiem and the related Kyrie cover several EL dialects
∙ Clipper, and recently Rapid cover Horn-SHIQ
They usually rewrite into Datalog programs
104/109
understanding rewritability
Much attention devoted to understanding the limits of rewritability
and size of rewritings
When are polynomial rewritings possible?
Can we give bounds on the size of rewritings?
Which non-DL-Lite ontologies can be rewritten into FO-queries?
⇝ related to non-uniform complexity:
∙ study specific pairs (q, T ), called ontology-mediated queries
105/109
combined approaches
Saturate the ABox using the TBox axioms
⇝ a finite version of the canonical model
and then evaluate the query over the saturated ABox
Two approaches:
∙ modify the query before evaluation to ensure soundess, or
∙ evaluate and then filter unsound answers
First proposed for EL, then also for DL-Lite
Extended to other dialects, richer DLs
106/109
querying existing relational data using mappings
Today: assumed data given as ABox assertions (unary + binary facts)
Problem: how to query existing relational data (arbitrary arity)?
Solution: use mapping that specifies relationship between the
database relations and the concepts / roles in DL vocabulary
Formally: mapping assertions of the form φ → ψ where:
∙ φ is an query formulated using DB relations
∙ ψ is a query in the DL vocabulary
Global-as-view (GAV) mappings: φ CQ, ψ atom (no quantifiers)
Handling mappings:
∙ apply mappings to generate ABox, proceed as usual
∙ virtual ABox: unfolding step to get rewriting over DB relations
107/109
other research topics (non-exhaustive)
Beyond classical OMQA  
∙ inconsistency-tolerant query answering
∙ probabilistic query answering
∙ privacy-aware query answering
∙ temporal query answering
Support for building and maintaining OMQA systems
∙ module extraction
∙ ontology evolution
∙ query inseparability and emptiness
Improving the usability of OMQA systems
∙ interfaces and support for query formulation
∙ explaining query (non-)answers
108/109
Questions ?
109/109

More Related Content

PPT
Semantic technologies for the Internet of Things
PDF
Introduction to Internet of Things (IoT)
PPTX
Internet of Things (IoT) Presentation
PPT
Statistika Bisnis.pptdsas sadadsqqqqqqqqqqqqqqq
PDF
Improving Power Grid Reliability Using IoT Analytics
PPTX
IT and OT Convergence
PDF
IT-Service Management nach ITIL
PDF
Komunikasi Risiko Pemasukan Ternak & Produk Hewan dari Negara Belum Bebas PMK...
Semantic technologies for the Internet of Things
Introduction to Internet of Things (IoT)
Internet of Things (IoT) Presentation
Statistika Bisnis.pptdsas sadadsqqqqqqqqqqqqqqq
Improving Power Grid Reliability Using IoT Analytics
IT and OT Convergence
IT-Service Management nach ITIL
Komunikasi Risiko Pemasukan Ternak & Produk Hewan dari Negara Belum Bebas PMK...

What's hot (15)

PPT
Siteocre Sxa and Solr - Sitecore User Group Bangalore -
PPTX
IOT_UNIT-1.pptx
PPTX
Penyakit Penyakit pada Ternak di Indonesia 2015
PPTX
Smart Cities Reference Architecture
PDF
Legislasi Veteriner dan Siskeswan - Program Pasca Sarjana FKH IPB dan ASKESMA...
PDF
DDOG 2024 Investor Day.pdf - Q4 2024 Datadog
PDF
Asp.net difference faqs- 8
PDF
The internet of things.pptx
PPT
IoT security (Internet of Things)
PPTX
Iot internet-of-things-ppt
PPTX
Unit 4 -IOT5_Domain Model refrence .pptx
PPTX
Simple Internet Of Things (IoT) PPT 2020
PPTX
Introduction to Internet of Things (IoT)
PDF
Harvard Connected Manager Certificate
Siteocre Sxa and Solr - Sitecore User Group Bangalore -
IOT_UNIT-1.pptx
Penyakit Penyakit pada Ternak di Indonesia 2015
Smart Cities Reference Architecture
Legislasi Veteriner dan Siskeswan - Program Pasca Sarjana FKH IPB dan ASKESMA...
DDOG 2024 Investor Day.pdf - Q4 2024 Datadog
Asp.net difference faqs- 8
The internet of things.pptx
IoT security (Internet of Things)
Iot internet-of-things-ppt
Unit 4 -IOT5_Domain Model refrence .pptx
Simple Internet Of Things (IoT) PPT 2020
Introduction to Internet of Things (IoT)
Harvard Connected Manager Certificate
Ad

Viewers also liked (9)

PDF
Learning ontologies
PDF
On the proof theory for Description Logics
PDF
DL'12 dl-lite explanations
PPT
Database-to-Ontology Mapping Generation for Semantic Interoperability
PPT
Ontology Mapping
PPTX
Jarrar: First Order Logic
PPTX
Jarrar: Introduction to Ontology
PPT
Data Integration Ontology Mapping
PPT
The Role Of Ontology In Modern Expert Systems Dallas 2008
Learning ontologies
On the proof theory for Description Logics
DL'12 dl-lite explanations
Database-to-Ontology Mapping Generation for Semantic Interoperability
Ontology Mapping
Jarrar: First Order Logic
Jarrar: Introduction to Ontology
Data Integration Ontology Mapping
The Role Of Ontology In Modern Expert Systems Dallas 2008
Ad

Similar to Ontology-mediated query answering with data-tractable description logics (20)

PPTX
Introduction to Prolog
PDF
Fosdem 2013 petra selmer flexible querying of graph data
PPTX
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
PDF
Labelled Variables in Logic Programming: Foundations
PDF
Data Complexity in EL Family of Description Logics
PPT
Query Translation for Data Sources with Heterogeneous Content Semantics
PDF
"That scripting language called Prolog"
PPT
Jarrar.lecture notes.aai.2011s.descriptionlogic
PDF
A scalable ontology reasoner via incremental materialization
PPT
download
PPT
download
PDF
Ontologies Ontop Databases
PDF
Tutorial - Introduction to Rule Technologies and Systems
PPTX
Dsm as theory building
PPTX
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
PDF
System Specification and Satisfiability problems
PPT
Ontology mapping needs context & approximation
PPT
Artificial Intelligence
PDF
Collective entity linking with WSRM DocEng'19
PDF
leanCoR: lean Connection-based DL Reasoner
Introduction to Prolog
Fosdem 2013 petra selmer flexible querying of graph data
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Labelled Variables in Logic Programming: Foundations
Data Complexity in EL Family of Description Logics
Query Translation for Data Sources with Heterogeneous Content Semantics
"That scripting language called Prolog"
Jarrar.lecture notes.aai.2011s.descriptionlogic
A scalable ontology reasoner via incremental materialization
download
download
Ontologies Ontop Databases
Tutorial - Introduction to Rule Technologies and Systems
Dsm as theory building
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
System Specification and Satisfiability problems
Ontology mapping needs context & approximation
Artificial Intelligence
Collective entity linking with WSRM DocEng'19
leanCoR: lean Connection-based DL Reasoner

Recently uploaded (20)

PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
famous lake in india and its disturibution and importance
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
2Systematics of Living Organisms t-.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
2. Earth - The Living Planet earth and life
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
The scientific heritage No 166 (166) (2025)
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
HPLC-PPT.docx high performance liquid chromatography
famous lake in india and its disturibution and importance
Classification Systems_TAXONOMY_SCIENCE8.pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Taita Taveta Laboratory Technician Workshop Presentation.pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
2Systematics of Living Organisms t-.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
bbec55_b34400a7914c42429908233dbd381773.pdf
2. Earth - The Living Planet Module 2ELS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
2. Earth - The Living Planet earth and life
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
The scientific heritage No 166 (166) (2025)
. Radiology Case Scenariosssssssssssssss
Cell Membrane: Structure, Composition & Functions
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Introduction to Fisheries Biotechnology_Lesson 1.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf

Ontology-mediated query answering with data-tractable description logics

  • 1. ontology-mediated query answering with data-tractable description logics Meghyn Bienvenu (CNRS & Université de Montpellier) Magdalena Ortiz (Vienna University of Technology)
  • 2. ontology-mediated query answering (omqa) data incomplete database (ground facts) ontology (logical theory) ??? user query 2/109
  • 3. ontology-mediated query answering (omqa) data ??? patient data “Melanie has listeriosis” “Paul has Lyme disease” medical knowledge “Listeriosis & Lyme disease are bacterial infections” user query “Find all patients with bacterial infections” 2/109
  • 4. ontology-mediated query answering (omqa) data ??? patient data “Melanie has listeriosis” “Paul has Lyme disease” medical knowledge “Listeriosis & Lyme disease are bacterial infections” user query “Find all patients with bacterial infections” expected answers: Melanie, Paul 2/109
  • 5. ontology-mediated query answering (omqa) data ??? employee data “Marie is a professor” “Mark teaches CS200” org. knowledge “Professors are teaching staff” “Someone who teaches is part of the teaching staff” user query “Find all teaching staff” 2/109
  • 6. ontology-mediated query answering (omqa) data ??? employee data “Marie is a professor” “Mark teaches CS200” org. knowledge “Professors are teaching staff” “Someone who teaches is part of the teaching staff” user query “Find all teaching staff” expected answers: Marie, Mark 2/109
  • 7. what are ontologies good for? To standardize the terminology of an application domain ∙ meaning of terms is constrained, so less misunderstandings ∙ by adopting a common vocabulary, easy to share information 3/109
  • 8. what are ontologies good for? To standardize the terminology of an application domain ∙ meaning of terms is constrained, so less misunderstandings ∙ by adopting a common vocabulary, easy to share information To present an intuitive and unified view of data sources ∙ ontology can be used to enrich the data vocabulary, making it easier for users to formulate their queries ∙ especially useful when integrating multiple data sources 3/109
  • 9. what are ontologies good for? To standardize the terminology of an application domain ∙ meaning of terms is constrained, so less misunderstandings ∙ by adopting a common vocabulary, easy to share information To present an intuitive and unified view of data sources ∙ ontology can be used to enrich the data vocabulary, making it easier for users to formulate their queries ∙ especially useful when integrating multiple data sources To support automated reasoning ∙ uncover implicit connections between terms, errors in modelling ∙ exploit knowledge in the ontology during query answering, to get back a more complete set of answers to queries 3/109
  • 10. applications of omqa: medicine General medical ontologies: SNOMED CT (∼ 400,000 terms!), GALEN Specialized ontologies: FMA (anatomy), NCI (cancer), ... Querying & exchanging medical records (find patients for medical trials) ∙ myocardial infarction vs. MI vs. heart attack vs. 410.0 Supports tools for annotating and visualizing patient data (scans, x-rays) 4/109
  • 11. applications of omqa: life sciences Hundreds of ontologies at BioPortal (http://bioportal.bioontology.org/): Gene Ontology (GO), Cell Ontology, Pathway Ontology, Plant Anatomy, ... Help scientists share, query, & visualize experimental data 5/109
  • 12. applications of omqa: entreprise information systems Companies and organizations have lots of data ∙ need easy and flexible access to support decision-making Example industrial projects: ∙ Public debt data: Sapienza Univ. & Italian Department of Treasury ∙ Energy sector: Optique EU project (several univ, StatOil, & Siemens) 6/109
  • 13. our focus: horn description logics Ontologies formulated using description logics (DLs): ∙ family of decidable fragments of first-order logic ∙ basis for OWL web ontology language (W3C) ∙ range from fairly simple to highly expressive ∙ complexity of query answering well understood 7/109
  • 14. our focus: horn description logics Ontologies formulated using description logics (DLs): ∙ family of decidable fragments of first-order logic ∙ basis for OWL web ontology language (W3C) ∙ range from fairly simple to highly expressive ∙ complexity of query answering well understood In this tutorial, focus on Horn description logics: ∙ DL-LiteR, EL, ELHI, Horn-SHIQ, ... ∙ good computational properties, well suited for OMQA ∙ still expressive enough for interesting applications ∙ basis for OWL 2 QL and OWL 2 EL profiles Consider various types of queries 7/109
  • 15. plan for today ∙ Horn Description Logics ∙ Basics of OMQA ∙ Instance Queries ∙ Conjunctive Queries ∙ Navigational Queries ∙ Queries with Negation and Recursion ∙ Research Trends in OMQA 8/109
  • 17. dl basics Building blocks of DLs: ∙ concept names (unary predicates, classes) IceCream, Pizza, Meat, SpicyDish, Dish, Menu, Restaurant, ... ∙ role names (binary predicates, properties) hasIngred, hasCourse, hasDessert, serves, ... ∙ individual names (constants) menu32, pastadish17, d3, rest156, r12, ... (specific menus, dishes, restaurants ...) NC / NR / NI: set of all concept / role / individual names 10/109
  • 18. dl knowledge bases Knowledge base (KB) = ABox (data) + TBox (ontology) ABox contains facts about specific individuals ∙ finite set of concept assertions A(a) and role assertions r(a, b) ∙ IceCream(d2): dish d2 is of type IceCream ∙ hasDessert(m, d2): menu m is connected via hasDessert to dish d2 11/109
  • 19. dl knowledge bases Knowledge base (KB) = ABox (data) + TBox (ontology) ABox contains facts about specific individuals ∙ finite set of concept assertions A(a) and role assertions r(a, b) ∙ IceCream(d2): dish d2 is of type IceCream ∙ hasDessert(m, d2): menu m is connected via hasDessert to dish d2 TBox contains general knowledge about the domain of interest ∙ finite set of axioms (details on syntax to follow) ∙ IceCream is a subclass of Dessert ∙ hasCourse connects Menus to Dishes ∙ every Menu is connected to at least one dish via hasCourse 11/109
  • 20. concept and role constructors Can build complex concepts and roles using constructors: ∙ conjunction (⊓), disjunction (⊔), negation (¬) Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish 12/109
  • 21. concept and role constructors Can build complex concepts and roles using constructors: ∙ conjunction (⊓), disjunction (⊔), negation (¬) Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish ∙ restricted forms of existential and universal quantification (∃, ∀) ∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat ( ⊤ acts as a “wildcard”, denotes set of all things) 12/109
  • 22. concept and role constructors Can build complex concepts and roles using constructors: ∙ conjunction (⊓), disjunction (⊔), negation (¬) Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish ∙ restricted forms of existential and universal quantification (∃, ∀) ∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat ( ⊤ acts as a “wildcard”, denotes set of all things) ∙ inverse (− ) and composition (·) of roles hasCourse− contains · contains (use N± R for set of role names and inverse roles) (use inv(r) to toggle −: inv(r) = r−, inv(r−) = r ) 12/109
  • 23. concept and role constructors Can build complex concepts and roles using constructors: ∙ conjunction (⊓), disjunction (⊔), negation (¬) Dessert ⊓ ¬IceCream Pizza ⊔ PastaDish ∙ restricted forms of existential and universal quantification (∃, ∀) ∃hasCourse.⊤ ∃contains.Meat Dish ⊓ ∀contains.¬Meat ( ⊤ acts as a “wildcard”, denotes set of all things) ∙ inverse (− ) and composition (·) of roles hasCourse− contains · contains (use N± R for set of role names and inverse roles) (use inv(r) to toggle −: inv(r) = r−, inv(r−) = r ) Note: set of available constructors depends on the particular DL! 12/109
  • 24. tbox axioms Concept inclusions C ⊑ D (C, D possibly complex concepts) IceCream ⊑ Dessert Menu ⊑ ∃hasCourse.⊤ Spicy ⊓ Dish ⊑ SpicyDish Role inclusions R ⊑ S (R, S possibly complex roles) hasIngred ⊑ contains ingredOf− ⊑ hasIngred hasDessert ⊑ hasCourse Note: type and syntax of axioms depends on the particular DL! 13/109
  • 25. dl semantics Interpretation I (“possible world”) ∙ domain of objects ∆I (possibly infinite set) ∙ interpretation function ·I that maps ∙ concept name A ⇝ set of objects AI ⊆ ∆I ∙ role name r ⇝ set of pairs of objects rI ⊆ ∆I × ∆I ∙ individual name a ⇝ object aI ∈ ∆I 14/109
  • 26. example: interpretation ∆I italFeastI chCakeI Dish I DessertI Appetizer I MenuI hasCourse hasCourse hasCourse hasCourse 4 concept names: Dish, Dessert, Appetizer, Menu 1 role name: hasCourse 2 individual names: italFeast, chCake 15/109
  • 27. dl semantics Interpretation I (“possible world”) ∙ domain of objects ∆I (possibly infinite set) ∙ interpretation function ·I that maps ∙ concept name A ⇝ set of objects AI ⊆ ∆I ∙ role name r ⇝ set of pairs of objects rI ⊆ ∆I × ∆I ∙ individual name a ⇝ object aI ∈ ∆I Interpretation function ·I extends to complex concepts and roles: ⊤ ∆I ⊥ ∅ ¬C ∆I CI C1 ⊓ C2 C1 I ∩ C2 I ∃R.C {d1 | there exists (d1, d2) ∈ RI with d2 ∈ CI } ∀R.C {d1 | d2 ∈ CI for all (d1, d2) ∈ RI } r− {(d2, d1) | (d1, d2) ∈ rI } 16/109
  • 28. back to the example ∆I italFeastI chCakeI Dish I DessertI Appetizer I MenuI hasCourse hasCourse hasCourse hasCourse Dish ⊓ Menu Dessert ⊓ Appetizer ∃hasCourse.⊤ ∃hasCourse− .Dessert 17/109
  • 29. semantics of dl kbs Satisfaction in an interpretation ∙ I satisfies C ⊑ D ⇔ CI ⊆ DI ∙ I satisfies R ⊑ S ⇔ RI ⊆ SI 18/109
  • 30. semantics of dl kbs Satisfaction in an interpretation ∙ I satisfies C ⊑ D ⇔ CI ⊆ DI ∙ I satisfies R ⊑ S ⇔ RI ⊆ SI ∙ I satisfies A(a) ⇔ aI ∈ AI ∙ I satisfies r(a, b) ⇔ (aI , bI ) ∈ rI 18/109
  • 31. semantics of dl kbs Satisfaction in an interpretation ∙ I satisfies C ⊑ D ⇔ CI ⊆ DI ∙ I satisfies R ⊑ S ⇔ RI ⊆ SI ∙ I satisfies A(a) ⇔ aI ∈ AI ∙ I satisfies r(a, b) ⇔ (aI , bI ) ∈ rI Model of a KB K = interpretation that satisfies all statements in K K is satisfiable = K has at least one model K entails α (written K |= α) = every model I of K satisfies α 18/109
  • 32. back to the example ∆I italFeastI chCakeI Dish I DessertI Appetizer I MenuI hasCourse hasCourse hasCourse hasCourse Which of the following assertions / axioms is satisfied in I? Dessert ⊑ Dish Dish ⊓ Menu ⊑ ⊥ Menu ⊑ ∃hasCourse.⊤ ∃hasCourse− .⊤ ⊑ Dish Menu(italFeast) hasCourse(italFeast, chCake) 19/109
  • 33. some important horn dls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) 20/109
  • 34. some important horn dls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) DL-LiteR ∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N± R ) ∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N± R 20/109
  • 35. some important horn dls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) DL-LiteR ∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N± R ) ∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N± R EL ∙ allows only ⊤, ⊓, and ∃r.C as constructors ∙ only concept inclusions in TBox 20/109
  • 36. some important horn dls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) DL-LiteR ∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N± R ) ∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N± R EL ∙ allows only ⊤, ⊓, and ∃r.C as constructors ∙ only concept inclusions in TBox ELHI⊥ ∙ additionally allows for ⊥ and inverse roles (r− ) ∙ can also have role inclusions 20/109
  • 37. some important horn dls Idea: Horn DLs cannot express disjunction (explicitly or implicitly) ∙ better computational properties than non-Horn DLs (more on this later) DL-LiteR ∙ concept inclusions B1 ⊑ (¬)B2 B1, B2 either A ∈ NC or ∃R (R ∈ N± R ) ∙ role inclusions R1 ⊑ (¬)R2 R1, R2 ∈ N± R EL ∙ allows only ⊤, ⊓, and ∃r.C as constructors ∙ only concept inclusions in TBox ELHI⊥ ∙ additionally allows for ⊥ and inverse roles (r− ) ∙ can also have role inclusions Horn-SHIQ ∙ limited use of ¬, ∀r.C, and number restrictions (≥ nR.C, ≤ nR.C) ∙ also have transitivity axioms (e.g. assert contains is transitive) 20/109
  • 39. aboxes vs. databases ABoxes and databases (DBs) and are syntactically similar: ∙ ABox = finite set of assertions (unary and binary facts) ∙ Database = finite set of facts of arbitrary arity 22/109
  • 40. aboxes vs. databases ABoxes and databases (DBs) and are syntactically similar: ∙ ABox = finite set of assertions (unary and binary facts) ∙ Database = finite set of facts of arbitrary arity ABoxes interpreted under open world assumption: ∙ every assertion in the ABox is assumed to hold (true) ∙ assertions not present in the ABox may hold or not (unknown) Each ABox gives rise to many interpretations (its models) ∙ models can be infinite, can have infinitely many models 22/109
  • 41. aboxes vs. databases ABoxes and databases (DBs) and are syntactically similar: ∙ ABox = finite set of assertions (unary and binary facts) ∙ Database = finite set of facts of arbitrary arity ABoxes interpreted under open world assumption: ∙ every assertion in the ABox is assumed to hold (true) ∙ assertions not present in the ABox may hold or not (unknown) Each ABox gives rise to many interpretations (its models) ∙ models can be infinite, can have infinitely many models Databases interpreted under closed world assumption: ∙ every fact in the DB is assumed to hold (true) ∙ every fact not in the DB is assumed not to hold (false) In other words, each DB corresponds to single finite interpretation ∙ domain of the interpretation = set of constants in DB 22/109
  • 42. querying databases Database query q of arity n maps (Boolean query = arity 0) Database D ⇝ ans(q, D) = set of n-tuples of constants from D 23/109
  • 43. querying databases Database query q of arity n maps (Boolean query = arity 0) Database D ⇝ ans(q, D) = set of n-tuples of constants from D Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I 23/109
  • 44. querying databases Database query q of arity n maps (Boolean query = arity 0) Database D ⇝ ans(q, D) = set of n-tuples of constants from D Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I First-order (FO) query = first-order formula ∙ arity of FO query = number of free variables ∙ answers = substitutions for free vars that make formula hold ∙ example: Dish(x) ∧ ∀y.(contains(x, y) → ¬Spicy(y)) 23/109
  • 45. querying databases Database query q of arity n maps (Boolean query = arity 0) Database D ⇝ ans(q, D) = set of n-tuples of constants from D Interpretation I ⇝ ans(q, I) = set of n-tuples of elements from I First-order (FO) query = first-order formula ∙ arity of FO query = number of free variables ∙ answers = substitutions for free vars that make formula hold ∙ example: Dish(x) ∧ ∀y.(contains(x, y) → ¬Spicy(y)) Datalog queries = finite set of Datalog rules + ‘goal’ relation ∙ arity of Datalog query = arity of goal relation ∙ answers = exhaustively apply rules to DB / interpretation, collect tuples in goal relation ∙ example: rules contains(x, z) ← contains(x, y), contains(y, z) and SpicyDish(x) ← Dish(x), contains(x, y), Spicy(y) 23/109
  • 46. querying dl knowledge bases Problem: each KB gives rise to multiple interpretations (its models), but DB query semantics defines answers w.r.t. a single interpretation 24/109
  • 47. querying dl knowledge bases Problem: each KB gives rise to multiple interpretations (its models), but DB query semantics defines answers w.r.t. a single interpretation Solution: adopt certain answer semantics ∙ require tuple to be an answer w.r.t. all models of KB 24/109
  • 48. querying dl knowledge bases Problem: each KB gives rise to multiple interpretations (its models), but DB query semantics defines answers w.r.t. a single interpretation Solution: adopt certain answer semantics ∙ require tuple to be an answer w.r.t. all models of KB Formally: Call a tuple (a1, . . . , an) of individuals from A a certain answer to n-ary query q over DL KB K = (T , A) iff (aI 1 , . . . , aI n ) ∈ ans(q, I) for every model I of K 24/109
  • 49. querying dl knowledge bases Problem: each KB gives rise to multiple interpretations (its models), but DB query semantics defines answers w.r.t. a single interpretation Solution: adopt certain answer semantics ∙ require tuple to be an answer w.r.t. all models of KB Formally: Call a tuple (a1, . . . , an) of individuals from A a certain answer to n-ary query q over DL KB K = (T , A) iff (aI 1 , . . . , aI n ) ∈ ans(q, I) for every model I of K Ontology-mediated query answering (OMQA) = computing certain answers to queries 24/109
  • 50. example: certain answers Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A): T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse ∃hasCourse ⊑ Menu ∃hasDessert− ⊑ Dessert } A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)} 25/109
  • 51. example: certain answers Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A): T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse ∃hasCourse ⊑ Menu ∃hasDessert− ⊑ Dessert } A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)} There are four certain answers to q w.r.t. K: ∙ d1 ∈ cert(q, K) Cake(d1) ∈ A, Cake ⊑ Dessert ∈ T ∙ d2 ∈ cert(q, K) IceCream(d2) ∈ A, IceCream ⊑ Dessert ∈ T ∙ d3 ∈ cert(q, K) Dessert(d3) ∈ A ∙ d4 ∈ cert(q, K) hasDessert(m, d4)∈A, hasDessert− ⊑ Dessert ∈ T 25/109
  • 52. example: certain answers Consider the query q(x) = Dessert(x) and the DL-LiteR KB K = (T , A): T = {Cake ⊑ Dessert IceCream ⊑ Dessert hasDessert ⊑ hasCourse ∃hasCourse ⊑ Menu ∃hasDessert− ⊑ Dessert } A = {Cake(d1) IceCream(d2) Dessert(d3) hasDessert(m, d4)} There are four certain answers to q w.r.t. K: ∙ d1 ∈ cert(q, K) Cake(d1) ∈ A, Cake ⊑ Dessert ∈ T ∙ d2 ∈ cert(q, K) IceCream(d2) ∈ A, IceCream ⊑ Dessert ∈ T ∙ d3 ∈ cert(q, K) Dessert(d3) ∈ A ∙ d4 ∈ cert(q, K) hasDessert(m, d4)∈A, hasDessert− ⊑ Dessert ∈ T The fifth individual m is not a certain answer: can construct model J of K in which mJ ̸∈ DessertJ (see lecture notes). 25/109
  • 53. key techniques for omqa: query rewriting Query rewriting: Reduces problem of finding certain answers to standard DB query evaluation (⇝ exploit existing DB systems) + query rewriting + + query evaluation TBox T query ABox q database query q0 query answersA 26/109
  • 54. key techniques for omqa: query rewriting Query rewriting: Reduces problem of finding certain answers to standard DB query evaluation (⇝ exploit existing DB systems) + query rewriting + + query evaluation TBox T query ABox q database query q0 query answersA Call q′ (⃗x) a rewriting of q(⃗x) and T iff for every ABox A and tuple ⃗a T , A |= q(⃗a) ⇔ ⃗a ∈ ans(q′ (⃗x), IA) (IA = treat A as DB) 26/109
  • 55. key techniques for omqa: query rewriting Query rewriting: Reduces problem of finding certain answers to standard DB query evaluation (⇝ exploit existing DB systems) + query rewriting + + query evaluation TBox T query ABox q database query q0 query answersA Call q′ (⃗x) a rewriting of q(⃗x) and T iff for every ABox A and tuple ⃗a T , A |= q(⃗a) ⇔ ⃗a ∈ ans(q′ (⃗x), IA) (IA = treat A as DB) Types of rewritings: FO-rewritings (SQL), Datalog rewritings, ... 26/109
  • 56. key techniques for omqa: saturation Saturation: Render explicit (some of) the implicit information contained in the KB, making it available for query evaluation 27/109
  • 57. key techniques for omqa: saturation Saturation: Render explicit (some of) the implicit information contained in the KB, making it available for query evaluation Simple use of saturation: (works e.g. for RDFS ontologies) ∙ use saturation to ‘complete’ the ABox by adding those assertions that are logically entailed from the KB ∙ then evaluate the query over the saturated ABox 27/109
  • 58. key techniques for omqa: saturation Saturation: Render explicit (some of) the implicit information contained in the KB, making it available for query evaluation Simple use of saturation: (works e.g. for RDFS ontologies) ∙ use saturation to ‘complete’ the ABox by adding those assertions that are logically entailed from the KB ∙ then evaluate the query over the saturated ABox More complex uses: ∙ enrich the ABox in other ways (e.g. add new ABox individuals to witness the existential restrictions ∃R.C) ∙ combine saturation with query rewriting 27/109
  • 59. complexity of omqa View OMQA as a decision problem (yes-or-no question): Problem: Q answering in L (Q a query language, L a DL) Input: An n-ary query q ∈ Q, an ABox A, a L-TBox T , and a tuple ⃗a ∈ Ind(A)n Question: Does ⃗a belong to cert(q, (T , A))? 28/109
  • 60. complexity of omqa View OMQA as a decision problem (yes-or-no question): Problem: Q answering in L (Q a query language, L a DL) Input: An n-ary query q ∈ Q, an ABox A, a L-TBox T , and a tuple ⃗a ∈ Ind(A)n Question: Does ⃗a belong to cert(q, (T , A))? Combined complexity: in terms of size of whole input Data complexity: in terms of size of A only ∙ view rest of input as fixed (of constant size) ∙ motivation: ABox typically much larger than rest of input Note: use |A| to denote size of A (similarly for |T |, |q|, etc.) 28/109
  • 61. complexity classes We will mention the following standard classes: P problems solvable in deterministic polynomial time NP problems solvable in non-det. polynomial time coNP problems whose complement is solvable in non-deterministic polynomial time LogSpace problems solvable in deterministic logarithmic space NLogSpace problems solvable in non-det. logarithmic space PSpace problems solvable in polynomial space (note: =NPSpace) Exp problems solvable in deterministic exponential time 29/109
  • 62. complexity classes We will mention the following standard classes: P problems solvable in deterministic polynomial time NP problems solvable in non-det. polynomial time coNP problems whose complement is solvable in non-deterministic polynomial time LogSpace problems solvable in deterministic logarithmic space NLogSpace problems solvable in non-det. logarithmic space PSpace problems solvable in polynomial space (note: =NPSpace) Exp problems solvable in deterministic exponential time Another less known but important class: AC0 problems solvable by uniform family of polynomial-size constant-depth circuits Relationships between classes: AC0 ⊊ LogSpace ⊆ NLogSpace ⊆ P ⊆ NP ⊆ PSpace ⊆ Exp 29/109
  • 64. instance queries Instance queries (IQs): find instances of a given concept or role A(x) where A ∈ NC concept instance query r(x, y) where r ∈ NR role instance query 31/109
  • 65. instance queries Instance queries (IQs): find instances of a given concept or role A(x) where A ∈ NC concept instance query r(x, y) where r ∈ NR role instance query To query for a complex concept C, take AC(x) for fresh AC ∈ NC and add C ⊑ AC to the TBox 31/109
  • 66. instance queries Instance queries (IQs): find instances of a given concept or role A(x) where A ∈ NC concept instance query r(x, y) where r ∈ NR role instance query To query for a complex concept C, take AC(x) for fresh AC ∈ NC and add C ⊑ AC to the TBox Remarks: ∙ Instance query answering is often called instance checking ∙ Focus of OMQA until mid-2000s 31/109
  • 67. instance checking in dl-lite via query rewriting Input = instance query q + DL-LiteR TBox T We construct an FO-rewriting of q w.r.t. T More specifically, we construct: ∙ an FO-rewriting of q relative to consistent ABoxes, and ∙ an FO-rewriting of unsatisfiability (these can be easily combined into FO-rewriting of q for all ABoxes) 32/109
  • 68. rewriting relative to consistent aboxes We first define two procedures: ComputeSubsumees all reasons for an individual to be in B input concept B, TBox T output set of C such that T |= C ⊑ B ⇝ subsumees of B w.r.t. T ComputeSubroles all reasons for a pair to be in R input role R, TBox T output set of S such that T |= S ⊑ R ⇝ subroles of R w.r.t. T 33/109
  • 69. computing subsumees Algorithm ComputeSubsumees Input: DL-LiteR TBox T , concept B ∈ NC ∪ {∃R | R ∈ N± R } 1. Initialize Subsumees = {B} and Examined = ∅. 2. While Subsumees Examined ̸= ∅ 2.1 Select D ∈ Subsumees Examined and add D to Examined. 2.2 For every concept inclusion C ⊑ D ∈ T ∙ If C ̸∈ Subsumees, add C to Subsumees 2.3 For every role inclusion R ⊑ S ∈ T such that D = ∃S. ∙ If ∃R ̸∈ Subsumees, add ∃R to Subsumees 2.4 For every role inclusion R ⊑ S ∈ T such that D = ∃inv(S). ∙ If ∃inv(R) ̸∈ Subsumees, add ∃inv(R) to Subsumees. 3. Return Subsumees. 34/109
  • 70. computing subsumees: an example (1/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Examined = ∅ Subsumees = {Dish} 35/109
  • 71. computing subsumees: an example (1/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Examined = ∅ Subsumees = {Dish} Choose: Dish Examined = {Dish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } 35/109
  • 72. computing subsumees: an example (1/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Examined = ∅ Subsumees = {Dish} Choose: Dish Examined = {Dish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } Choose: ItalDish Examined = {Dish, ItalDish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } 35/109
  • 73. computing subsumees: an example (2/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Choose: VegDish Examined = {Dish, ItalDish, VegDish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } 36/109
  • 74. computing subsumees: an example (2/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Choose: VegDish Examined = {Dish, ItalDish, VegDish} Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− } Choose: ∃hasCourse− Examined = {Dish, ItalDish, VegDish, ∃hasCourse− } Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } 36/109
  • 75. computing subsumees: an example (3/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Choose: ∃hasMain− Examined = {Dish, ItalDish, VegDish, ∃hasCourse− , hasMain− } Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } 37/109
  • 76. computing subsumees: an example (3/3) ComputeSubsumees on (T , Dish), where T : ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Choose: ∃hasMain− Examined = {Dish, ItalDish, VegDish, ∃hasCourse− , hasMain− } Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } Choose: ∃hasDessert− Examined = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } Subsumees = {Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain− , ∃hasDessert− } 37/109
  • 77. computing subroles Algorithm ComputeSubroles Input: DL-LiteR TBox T , role R ∈ N± R 1. Initialize Subroles = {R} and Examined = ∅. 2. While Subroles Examined ̸= ∅ 2.1 Select S ∈ Subroles Examined and add S to Examined. 2.2 For every role inclusion U ⊑ S or inv(U) ⊑ inv(S) in T ∙ If U ̸∈ Subsumees, add U to Subsumees 3. Return Subroles. 38/109
  • 78. computing subroles Algorithm ComputeSubroles Input: DL-LiteR TBox T , role R ∈ N± R 1. Initialize Subroles = {R} and Examined = ∅. 2. While Subroles Examined ̸= ∅ 2.1 Select S ∈ Subroles Examined and add S to Examined. 2.2 For every role inclusion U ⊑ S or inv(U) ⊑ inv(S) in T ∙ If U ̸∈ Subsumees, add U to Subsumees 3. Return Subroles. ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse Run on hasCourse: Subroles = {hasCourse, hasMain, hasDessert} 38/109
  • 79. from concepts and roles to queries Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ). Rewriting of A(x) w.r.t. T (and consistent ABoxes): RewriteIQ(A, T ) = ∨ C∈SC∩NC C(x) ∨ ∨ ∃r∈SC ∃y.r(x, y) ∨ ∨ ∃r−∈SC ∃y.r(y, x) 39/109
  • 80. from concepts and roles to queries Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ). Rewriting of A(x) w.r.t. T (and consistent ABoxes): RewriteIQ(A, T ) = ∨ C∈SC∩NC C(x) ∨ ∨ ∃r∈SC ∃y.r(x, y) ∨ ∨ ∃r−∈SC ∃y.r(y, x) Rewriting of r(x, y) w.r.t. T (and consistent ABoxes): RewriteIQ(r, T ) = ∨ s∈SR s(x, y) ∨ ∨ s−∈SR s(y, x) 39/109
  • 81. from concepts and roles to queries Let SC = ComputeSubsumees(A, T ), SR = ComputeSubroles(r, T ). Rewriting of A(x) w.r.t. T (and consistent ABoxes): RewriteIQ(A, T ) = ∨ C∈SC∩NC C(x) ∨ ∨ ∃r∈SC ∃y.r(x, y) ∨ ∨ ∃r−∈SC ∃y.r(y, x) Rewriting of r(x, y) w.r.t. T (and consistent ABoxes): RewriteIQ(r, T ) = ∨ s∈SR s(x, y) ∨ ∨ s−∈SR s(y, x) The rewriting is ABox-independent and polysize in |T | and |q|. 39/109
  • 82. example of query rewriting (1/2) We have already computed: ComputeSubsumees(Dish, T ) ={Dish, ItalDish, VegDish, ∃hasCourse− , ∃hasMain − , ∃hasDessert− } Get following rewriting of Dish(x) w.r.t. T : RewriteIQ(Dish, T ) = Dish(x) ∨ ItalDish(x) ∨ VegDish(x) ∨ ∃y.hasCourse(y, x) ∨ ∃y.hasMain(y, x) ∨ ∃y.hasDessert(y, x) 40/109
  • 83. example of query rewriting (2/2) ItalDish ⊑ Dish VegDish ⊑ Dish Dish ⊑ ∃hasIngred ∃hasCourse− ⊑ Dish hasMain ⊑ hasCourse hasDessert ⊑ hasCourse ABox A: hasMain(m, d1) hasDessert(m, d2) VegDish(d3) RewriteIQ(Dish, T ) = Dish(x) ∨ ItalDish(x) ∨ VegDish(x) ∨ ∃y.hasCourse(y, x) ∨ ∃y.hasMain(y, x) ∨ ∃y.hasDessert(y, x) Certain answers: d1, because of the disjunct ∃y.hasMain(y, x) d2, because of the disjunct ∃y.hasDessert(y, x) d3, because of the disjunct VegDish(x) 41/109
  • 84. checking unsatisfiability We have a FO-rewriting of q w.r.t. T relative to consistent ABoxes We need a rewriting of unsatisfiability to obtain a rewriting of q 42/109
  • 85. checking unsatisfiability We have a FO-rewriting of q w.r.t. T relative to consistent ABoxes We need a rewriting of unsatisfiability to obtain a rewriting of q ∙ only negative inclusions relevant ∙ one subquery for each such inclusion G ⊑ ¬H ∙ consider all possible ways of violating G ⊑ ¬H: combinations of a subsumee (subrole) of G and a subsumee (subrole) of H Details in the lecture notes. 42/109
  • 86. complexity of instance checking in dl-lite In data complexity ∙ rewriting takes constant time, yields FO query ∙ upper bound from FO query evaluation: AC0 43/109
  • 87. complexity of instance checking in dl-lite In data complexity ∙ rewriting takes constant time, yields FO query ∙ upper bound from FO query evaluation: AC0 In combined complexity: ∙ P membership: rewriting and evaluation both in polynomial time ∙ NLogSpace upper bound: ‘guess’ relevant part of rewriting 43/109
  • 88. complexity of instance checking in dl-lite In data complexity ∙ rewriting takes constant time, yields FO query ∙ upper bound from FO query evaluation: AC0 In combined complexity: ∙ P membership: rewriting and evaluation both in polynomial time ∙ NLogSpace upper bound: ‘guess’ relevant part of rewriting Theorem In DL-LiteR, satisfiability and instance checking are 1. in AC0 for data complexity 2. NLogSpace-complete for combined complexity. Note: Same bounds hold for several other DL-Lite dialects 43/109
  • 89. instance checking in el Next consider instance checking in EL. Assume EL TBoxes given in normal form: axioms of the forms A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B 44/109
  • 90. instance checking in el Next consider instance checking in EL. Assume EL TBoxes given in normal form: axioms of the forms A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B Cannot use FO query rewriting approach for EL: no FO-rewriting of A(x) w.r.t. T = {∃r.A ⊑ A} 44/109
  • 91. instance checking in el Next consider instance checking in EL. Assume EL TBoxes given in normal form: axioms of the forms A1 ⊓ . . . ⊓ An ⊑ B A ⊑ ∃r.B ∃r.A ⊑ B Cannot use FO query rewriting approach for EL: no FO-rewriting of A(x) w.r.t. T = {∃r.A ⊑ A} We present a saturation-based approach. 44/109
  • 92. saturation rules for el TBox rules A ⊑ Bi (1 ≤ i ≤ n) B1 ⊓ . . . ⊓ Bn ⊑ D A ⊑ D T1 A ⊑ B B ⊑ ∃r.D A ⊑ ∃r.D T2 A ⊑ ∃r.B B ⊑ D ∃r.D ⊑ E A ⊑ E T3 ABox rules A1 ⊓ . . . ⊓ An ⊑ B Ai(a) (1 ≤ i ≤ n) B(a) A1 ∃r.B ⊑ A r(a, b) B(b) A(a) A2 Algorithm: apply rules exhaustively, check if A(a) (r(a, b)) is present 45/109
  • 94. example: saturation in el ArrabSauce ⊑ Spicy T3 : (5), (6), (7) (10) PenneArrab ⊑ Spicy T3 : (1), (10), (7) (11) PenneArrab ⊑ Dish T1 : (2), (3) (12) PenneArrab ⊑ ∃hasIngred.Pasta T2 : (2), (4) (13) PenneArrab ⊑ SpicyDish T1 : (11), (12), (8) (14) Spicy(p) A1 : (11), (9) (15) Dish(p) A1 : (12), (9) (16) SpicyDish(p) A1 : (16), (15) (17) 46/109
  • 95. complexity of instance checking in el Saturation approach is sound: everything derived is entailed 47/109
  • 96. complexity of instance checking in el Saturation approach is sound: everything derived is entailed Also complete for instance checking: Theorem Let K be an EL knowledge base, and let K′ be the result of saturating K. For every ABox assertion α, we have: K |= α iff α ∈ K′ 47/109
  • 97. complexity of instance checking in el Saturation approach is sound: everything derived is entailed Also complete for instance checking: Theorem Let K be an EL knowledge base, and let K′ be the result of saturating K. For every ABox assertion α, we have: K |= α iff α ∈ K′ Note: does not make all consequences explicit ∙ can have infinitely many implied axioms ⇝ would not terminate! ∙ so: only complete for some reasoning tasks 47/109
  • 98. complexity of instance checking in el Saturation approach is sound: everything derived is entailed Also complete for instance checking: Theorem Let K be an EL knowledge base, and let K′ be the result of saturating K. For every ABox assertion α, we have: K |= α iff α ∈ K′ Note: does not make all consequences explicit ∙ can have infinitely many implied axioms ⇝ would not terminate! ∙ so: only complete for some reasoning tasks Runs in polynomial time in |K|. This is optimal: Theorem Instance checking in EL is P-complete for both data and combined complexity. 47/109
  • 99. extending the saturation approach Saturation approach can be extended to ELHI⊥ Additional rules required Key difference: new conjunctions of concepts can occur A ⊑ ∃R.D ∃R− .B ⊑ E 48/109
  • 100. extending the saturation approach Saturation approach can be extended to ELHI⊥ Additional rules required Key difference: new conjunctions of concepts can occur A ⊑ ∃R.D ∃R− .B ⊑ E A ⊓ B ⊑ ∃R.(D ⊓ E) 48/109
  • 101. extended set of saturation rules TBox rules {A ⊑ Bi}n i=1 B1 ⊓ . . . ⊓ Bn ⊑ D A ⊑ D T1 R ⊑ S S ⊑ T R ⊑ T T4 M ⊑ ∃R.(N ⊓ ⊥) M ⊑ ⊥ T5 M ⊑ ∃R.(N ⊓ N′ ) N ⊑ A M ⊑ ∃R.(N ⊓ N′ ⊓ A) T6 M ⊑ ∃R.(N ⊓ A) ∃S.A ⊑ B R ⊑ S M ⊑ B T7 M ⊑ ∃R.N ∃inv(S).A ⊑ B R ⊑ S M ⊓ A ⊑ ∃R.(N ⊓ B) T8 ABox rules A1 ⊓ . . . ⊓ An ⊑ B Ai(a) (1 ≤ i ≤ n) B(a) A1 ∃r.B ⊑ A r(a, b) B(b) A(a) A2 ∃r− .B ⊑ A r(b, a) B(b) A(a) A3 r ⊑ s r(a, b) s(a, b) A4 r ⊑ s− r(a, b) s(b, a) A5 49/109
  • 102. extending the saturation approach Saturation approach can be extended to ELHI⊥ Additional rules required Key difference: new conjunctions of concepts can occur C ⊑ ∃R.D D ⊑ ∀R− .B C ⊑ ∃R.(N ⊓ N′ ⊓ A) New set of rules ⇝ exponentially many different new axioms Theorem Instance checking in ELHI⊥ is P-complete for data and Exp-complete for combined complexity. 50/109
  • 103. saturation as a datalog program Let sat(T ) be result of applying TBox saturation rules to T . For each ELHI⊥ TBox T and ABox signature Σ define following Datalog program Π(T , Σ): Π(T , Σ) ={B(x) ← A1(x), . . . , An(x) | A1 ⊓ . . . ⊓ An ⊑ B ∈ sat(T )} ∪ {B(x) ← A(y), r(x, y) | ∃r.A ⊑ B ∈ T } ∪ {B(y) ← A(x), r(x, y) | ∃r− .A ⊑ B ∈ T } ∪ {s(x, y) ← r(x, y) | r ⊑ s ∈ sat(T ), s ∈ NR} ∪ {s(y, x) ← r(x, y) | r ⊑ s− ∈ sat(T ), s ∈ NR} ∪ {⊤(x) ← A(x) | A ∈ NC ∩ Σ} ∪ {⊤(x) ← r(x, y) | r ∈ NR ∩ Σ} ∪ {⊤(x) ← r(y, x) | r ∈ NR ∩ Σ} 51/109
  • 104. saturation as a datalog program (continued) Theorem For every finite signature Σ and ELHI⊥ KB K = (T , A) with sig(A) ⊆ Σ: 1. K is unsatisfiable iff ans((Π(T , Σ), ⊥), IA) ̸= ∅; 2. If K is satisfiable, then for all A ∈ NC, r ∈ NR, and a, b ∈ Ind(A): ∙ K |= A(a) iff a ∈ ans((Π(T , Σ), A), IA); ∙ K |= r(a, b) iff (a, b) ∈ ans((Π(T , Σ), r), IA). This means: ∙ get Datalog rewriting of instance queries in ELHI⊥ ∙ can use Datalog program to create saturated ABox 52/109
  • 105. saturation as a datalog program: an example The Datalog program associated with our example: PastaDish(x) ← PenneArrab(x) PenneArrab ⊑ PastaDish Dish(x) ← PastaDish(x) PastaDish ⊑ Dish Spicy(x) ← Peperonc(x) Peperonc ⊑ Spicy Spicy(x) ← hasIngred(x, y), Spicy(y) ∃hasIngred.Spicy ⊑ Spicy SpicyDish(x) ← Spicy(x), Dish(x) Dish ⊓ Spicy ⊑ SpicyDish Spicy(x) ← ArrabSauce(x) ArrabSauce ⊑ Spicy Spicy(x) ← PenneArrab(x) PenneArrab ⊑ Spicy Dish(x) ← PenneArrab(x) PenneArrab ⊑ Dish SpicyDish(x) ← PenneArrab PenneArrab ⊑ SpicyDish (technically, also have T -independent rules for ⊤...) 53/109
  • 107. (unions of) conjunctive queries IQs quite restricted: No selections and joins as in DB queries 55/109
  • 108. (unions of) conjunctive queries IQs quite restricted: No selections and joins as in DB queries Most work on OMQA adopts (unions of) conjunctive queries (CQs) 55/109
  • 109. (unions of) conjunctive queries IQs quite restricted: No selections and joins as in DB queries Most work on OMQA adopts (unions of) conjunctive queries (CQs) A conjunctive query (CQ) is a first-order query q(⃗x) of the form ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) where every variable in some ⃗ti appears in either ⃗x or ⃗y 55/109
  • 110. (unions of) conjunctive queries IQs quite restricted: No selections and joins as in DB queries Most work on OMQA adopts (unions of) conjunctive queries (CQs) A conjunctive query (CQ) is a first-order query q(⃗x) of the form ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) where every variable in some ⃗ti appears in either ⃗x or ⃗y A union of CQs (UCQ) is a first-order query q(⃗x) of the form q1(⃗x) ∨ · · · ∨ qn(⃗x) where the qi(⃗x) are CQs with same tuple ⃗x of free vars 55/109
  • 111. cqs and ucqs in datalog Alternatively, CQs and UCQs can be seen as Datalog rules CQs: q(⃗x) = ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) ⇝ q(⃗x) ← P1(⃗t1), . . . , Pn(⃗tn) 56/109
  • 112. cqs and ucqs in datalog Alternatively, CQs and UCQs can be seen as Datalog rules CQs: q(⃗x) = ∃⃗y.P1(⃗t1) ∧ · · · ∧ Pn(⃗tn) ⇝ q(⃗x) ← P1(⃗t1), . . . , Pn(⃗tn) UCQs: q(⃗x) = ∃⃗y1.P1 1(⃗t1 1) ∧ · · · ∧ P1 n1 ( ⃗t1 n1 ) q(⃗x) ← P1 1(⃗t1 1), . . . , P1 n1 ( ⃗t1 n1 ) ∨ ∃⃗y2.P2 1(⃗t2 1) ∧ · · · ∧ P2 n2 ( ⃗t2 n2 ) q(⃗x) ← P2 1(⃗t2 1), . . . , P2 n(⃗t2 n) ... ⇝ ... ∨ ∃⃗yℓ.Pℓ 1(⃗tℓ 1) ∧ · · · ∧ Pℓ nℓ ( ⃗tℓ nℓ ) q(⃗x) ← Pℓ 1(⃗tℓ 1), . . . , Pℓ nℓ ( ⃗tℓ nℓ ) 56/109
  • 113. what can we express as (u)cqs? q1(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ Spicy(z) q2(y, x) = q2(y, x) ∨ ( ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧hasIngred(z, z′ ) ∧ Spicy(z′ ) ) 57/109
  • 114. what can we express as (u)cqs? q1(y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ Spicy(z) q2(y, x) = q2(y, x) ∨ ( ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧hasIngred(z, z′ ) ∧ Spicy(z′ ) ) Capture select-project-join queries of relational algebra Capture basic graph patterns of SPARQL 57/109
  • 115. query matches A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π from the variables in ⃗x ∪⃗y to objects in ∆I such that: ∙ π(⃗t) ∈ PI for every atom P(⃗t) ∈ q 58/109
  • 116. query matches A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π from the variables in ⃗x ∪⃗y to objects in ∆I such that: ∙ π(⃗t) ∈ PI for every atom P(⃗t) ∈ q We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a 58/109
  • 117. query matches A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π from the variables in ⃗x ∪⃗y to objects in ∆I such that: ∙ π(⃗t) ∈ PI for every atom P(⃗t) ∈ q We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a By definition: ⃗a ∈ ans(q, I) iff there exists a match π such that I |=π q(⃗a) 58/109
  • 118. query matches A match for q(⃗x) = ∃⃗y.φ(⃗x,⃗y) in an interpretation I is a mapping π from the variables in ⃗x ∪⃗y to objects in ∆I such that: ∙ π(⃗t) ∈ PI for every atom P(⃗t) ∈ q We write I |=π q(⃗a) if π is a match for q(⃗x) in I and π(⃗x) = ⃗a By definition: ⃗a ∈ ans(q, I) iff there exists a match π such that I |=π q(⃗a) Answering CQs amounts to searching for matches Recall that ⃗a ∈ cert(q, K) iff ⃗a ∈ ans(q, I) for every model I of K Challenge: how do we check that there is a match in every model? 58/109
  • 119. the universal model property For Horn DLs, each satisfiable K has a universal model IK IK is ‘contained’ in every model I of K ⇝ formally, there is a homomorphism from IK to I An answer to a (U)CQ q in IK is an answer to q in every model of K ⇝ matches of (U)CQs are preserved under homomorphisms ⃗a ∈ cert(q, K) iff ⃗a ∈ ans(q, IK) So: IK gives us the certain answers to q over K 59/109
  • 120. constructing a universal model Use the saturation of (T , A) for building a universal model IT ,A 60/109
  • 121. constructing a universal model Use the saturation of (T , A) for building a universal model IT ,A Intuition: - IT ,A contains the saturated ABox A′ - if an object satisfies M and M ⊑ ∃R.M′ ∈ sat(T ), a fresh object witnessing this is created 60/109
  • 122. constructing a universal model Use the saturation of (T , A) for building a universal model IT ,A Intuition: - IT ,A contains the saturated ABox A′ - if an object satisfies M and M ⊑ ∃R.M′ ∈ sat(T ), a fresh object witnessing this is created Formally, ∆IT ,A contains words aR1M1 . . . RnMn with a ∈ Ind(A) and: ∙ Ri are roles and Mi are conjunctions of concepts ∙ there exists M ⊑ ∃R1.M1 ∈ sat(T ) such that T , A |= M(a) ∙ Mi ⊑ ∃Ri+1.Mi+1 ∈ sat(T ) for every 1 ≤ i < n (note: use only strongest axioms M ⊑ ∃R.M′ in sat(T )) The interpretation function is defined as follows: ∙ aIT ,A = a, ∙ a ∈ AI iff A(a) ∈ sat(T , A), eRM ∈ AIT ,A iff M ⊑ A ∈ sat(T ) ∙ (a, b) ∈ rI iff r(a, b) ∈ sat(T , A), (e, eRM) ∈ rIT ,A iff R ⊑ r ∈ sat(T ), and (eRM, e) ∈ rIT ,A if R ⊑ r− ∈ sat(T ) 60/109
  • 123. example of the canonical model construction (1/3) TBox: PenneArrab ⊑ ∃hasIngred.Penne Penne ⊑ Pasta PenneArrab ⊑ ∃hasIngred.ArrabSauce ArrabSauce ⊑ ∃hasIngred.Peperonc Peperonc ⊑ Spicy PizzaCalab ⊑ ∃hasIngred.Nduja Nduja ⊑ Spicy ABox: serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) The saturated TBox additionally contains: PenneArrab ⊑ ∃hasIngred.(Penne ⊓ Pasta) ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy) PizzaCalab ⊑ ∃hasIngred.(Nduja ⊓ Spicy) 61/109
  • 124. example of the canonical model construction (2/3) IT ,A contains the ABox and is closed under inclusions serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) rp PizzaCalab b PenneArrabserves serves 62/109
  • 125. example of the canonical model construction (3/3) The anonymous objects witnessing existential concepts form trees rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred PenneArrab ⊑ ∃hasIngred.ArrabSauce PenneArrab ⊑ ∃hasIngred.(Penne ⊓ Pasta) ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy) PizzaCalab ⊑ ∃hasIngred.(Nduja ⊓ Spicy) 63/109
  • 126. finding answers in the canonical model To answer CQ q, it suffices to test whether it has a match in IT ,A But this is still challenging! - IT ,A contains assertions and objects not present in A - we cannot build IT ,A explicitly: can be infinite! Our approach: use query rewriting! Formally: given a CQ q, we construct a UCQ REWT (q) such that ⃗a ∈ ans(q, IT ,A) iff there is a match π for a disjunct q′ of rewT (q) such that IT ,A |=π q′ (⃗a) and π sends all vars to individuals from A 64/109
  • 127. rewriting the query Rewriting step (idea): 1. Choose leaf variable x so that no vars are mapped below it in IT ,A 2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x are satisfied 3. Drop from q all atoms with x and add M to parent of x to get q′ 65/109
  • 128. rewriting the query Rewriting step (idea): 1. Choose leaf variable x so that no vars are mapped below it in IT ,A 2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x are satisfied 3. Drop from q all atoms with x and add M to parent of x to get q′ Properties: ∙ Every match for q′ can be extended to a match for q ∙ Every match for q contains a match for q′ ∙ The answers of q and q′ coincide, but the relevant matches for q′ are closer to the ABox than those of q 65/109
  • 129. rewriting the query Rewriting step (idea): 1. Choose leaf variable x so that no vars are mapped below it in IT ,A 2. Find axiom M ⊑ ∃R.N in sat(T ) that ensures atoms containing x are satisfied 3. Drop from q all atoms with x and add M to parent of x to get q′ Properties: ∙ Every match for q′ can be extended to a match for q ∙ Every match for q contains a match for q′ ∙ The answers of q and q′ coincide, but the relevant matches for q′ are closer to the ABox than those of q We repeatedly apply this rewriting step to obtain a set of queries whose relevant matches range over ABox individuals. 65/109
  • 130. example of a rewriting step (1/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′ ) = e4 r π(x) p PizzaCalab b PenneArrab π(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred 66/109
  • 131. example of a rewriting step (1/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′ ) = e4 r π(x) p PizzaCalab b PenneArrab π(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred • Choose z′ as ‘leaf’ • Choose ArrabSauce ⊑ ∃hasIngred.Spicy ∈ sat(T ) • RHS ensures hasIngred(z, z′ ), Spicy(z′ ) • We replace these atoms by ArrabSauce(z) 66/109
  • 132. example of a rewriting step (1/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) We have IK |=π q(b, r) with π(x) = r, π(y) = b, π(z) = e3, π(z′ ) = e4 r π(x) p PizzaCalab b PenneArrab π(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred • Choose z′ as ‘leaf’ • Choose ArrabSauce ⊑ ∃hasIngred.Spicy ∈ sat(T ) • RHS ensures hasIngred(z, z′ ), Spicy(z′ ) • We replace these atoms by ArrabSauce(z) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) 66/109
  • 133. example of a rewriting step (2/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) 67/109
  • 134. example of a rewriting step (2/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π q(b, r) IK |=π′ q′ (b, r) r π(x),π′(x) p PizzaCalab b PenneArrab π(y),π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred 67/109
  • 135. example of a rewriting step (2/2) q(y, x) = ∃z, z′ .serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π q(b, r) IK |=π′ q′ (b, r) r π(x),π′(x) p PizzaCalab b PenneArrab π(y),π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred depth(π) > depth(π′ ) 67/109
  • 136. another rewriting step (1/2) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π′ q′ (b, r) r π′(x) p PizzaCalab b PenneArrab π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ′(z) e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred 68/109
  • 137. another rewriting step (1/2) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π′ q′ (b, r) r π′(x) p PizzaCalab b PenneArrab π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ′(z) e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred • Choose z as leaf • Choose PenneArrab ⊑ ∃hasIngred.ArrabSauce • RHS yields hasIngred(y, z) and ArrabSauce(z) • We replace these atoms by PenneArrab(y) 68/109
  • 138. another rewriting step (1/2) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) IK |=π′ q′ (b, r) r π′(x) p PizzaCalab b PenneArrab π′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ′(z) e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred • Choose z as leaf • Choose PenneArrab ⊑ ∃hasIngred.ArrabSauce • RHS yields hasIngred(y, z) and ArrabSauce(z) • We replace these atoms by PenneArrab(y) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) 68/109
  • 139. another rewriting step (2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) 69/109
  • 140. another rewriting step (2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) IK |=π q(b, r) IK |=π′ q′ (b, r) IK |=π′′ q′′ (b, r) 69/109
  • 141. another rewriting step (2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) IK |=π q(b, r) IK |=π′ q′ (b, r) IK |=π′′ q′′ (b, r) r π(x),π′(x),π′′(x) p PizzaCalab b PenneArrab π(y),π′(y),π′′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred 69/109
  • 142. another rewriting step (2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) IK |=π q(b, r) IK |=π′ q′ (b, r) IK |=π′′ q′′ (b, r) r π(x),π′(x),π′′(x) p PizzaCalab b PenneArrab π(y),π′(y),π′′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred depth(π) > depth(π′ ) > depth(π′′ ) 69/109
  • 143. another rewriting step (2/2) q(y, x) = ∃z, z′ serves(x, y) ∧ hasIngred(y, z) ∧ hasIngred(z, z′ ) ∧ Spicy(z′ ) q′ (y, x) = ∃z.serves(x, y) ∧ hasIngred(y, z) ∧ ArrabSauce(z) q′′ (y, x) = serves(x, y) ∧ PenneArrab(y) IK |=π q(b, r) IK |=π′ q′ (b, r) IK |=π′′ q′′ (b, r) r π(x),π′(x),π′′(x) p PizzaCalab b PenneArrab π(y),π′(y),π′′(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauceπ(z) ,π′(z) e4 Peperonc, Spicy π(z′) serves serves hasIngred hasIngred hasIngred hasIngred depth(π) > depth(π′ ) > depth(π′′ ) In π′′ all variables are mapped to individuals 69/109
  • 144. decision procedure Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x): ⃗a ∈ cert(q, K) iff IK |=π q′ (⃗a) for some q′ ∈ rewT (q) and some π that maps all variables to individuals in A. 70/109
  • 145. decision procedure Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x): ⃗a ∈ cert(q, K) iff IK |=π q′ (⃗a) for some q′ ∈ rewT (q) and some π that maps all variables to individuals in A. There is a bounded number of such restricted matches π Checking if π is match reduces to linearly many instance checks 70/109
  • 146. decision procedure Theorem For every satisfiable ELHI⊥KB K = (T , A), and CQ q(⃗x): ⃗a ∈ cert(q, K) iff IK |=π q′ (⃗a) for some q′ ∈ rewT (q) and some π that maps all variables to individuals in A. There is a bounded number of such restricted matches π Checking if π is match reduces to linearly many instance checks Yields terminating, sound, and complete CQ answering procedure 70/109
  • 147. complexity of cq answering Combined complexity: sat(T ) and rewT (q) can be constructed in single exponential time single exponential bound on candidate matches π instance checking in single exponential time 71/109
  • 148. complexity of cq answering Combined complexity: sat(T ) and rewT (q) can be constructed in single exponential time single exponential bound on candidate matches π instance checking in single exponential time Data complexity: sat(T ) and rewT (q) are ABox independent polynomial bound on candidate matches π Instance checking in polynomial time 71/109
  • 149. complexity of cq answering Combined complexity: sat(T ) and rewT (q) can be constructed in single exponential time single exponential bound on candidate matches π instance checking in single exponential time Data complexity: sat(T ) and rewT (q) are ABox independent polynomial bound on candidate matches π Instance checking in polynomial time Theorem CQ answering in ELHI⊥and Horn-SHIQ is Exp-complete in combined complexity and P-complete in data complexity. 71/109
  • 150. optimal bounds for lightweight dls Adapting our technique gives optimal bounds for lightweight DLs: For ELH and DL-LiteR we get NP in combined complexity: ∙ compute sat(T ) in polynomial time ∙ non-deterministically build the right q′ ∈ rewT (q) ∙ guess a candidate π ∙ check if it is a match ⇝ instance checking in polynomial time CQ answering is NP-hard over ABox alone seen as DB (no TBox) 72/109
  • 151. optimal bounds for lightweight dls Adapting our technique gives optimal bounds for lightweight DLs: For ELH and DL-LiteR we get NP in combined complexity: ∙ compute sat(T ) in polynomial time ∙ non-deterministically build the right q′ ∈ rewT (q) ∙ guess a candidate π ∙ check if it is a match ⇝ instance checking in polynomial time CQ answering is NP-hard over ABox alone seen as DB (no TBox) For EL in data complexity, yields P membership ⇝ optimal since instance queries already P-hard 72/109
  • 152. optimal bounds for lightweight dls Adapting our technique gives optimal bounds for lightweight DLs: For ELH and DL-LiteR we get NP in combined complexity: ∙ compute sat(T ) in polynomial time ∙ non-deterministically build the right q′ ∈ rewT (q) ∙ guess a candidate π ∙ check if it is a match ⇝ instance checking in polynomial time CQ answering is NP-hard over ABox alone seen as DB (no TBox) For EL in data complexity, yields P membership ⇝ optimal since instance queries already P-hard In DL-LiteR, can get a FO rewriting ⇝ in AC0 for data complexity. 72/109
  • 153. optimal bounds for lightweight dls Adapting our technique gives optimal bounds for lightweight DLs: For ELH and DL-LiteR we get NP in combined complexity: ∙ compute sat(T ) in polynomial time ∙ non-deterministically build the right q′ ∈ rewT (q) ∙ guess a candidate π ∙ check if it is a match ⇝ instance checking in polynomial time CQ answering is NP-hard over ABox alone seen as DB (no TBox) For EL in data complexity, yields P membership ⇝ optimal since instance queries already P-hard In DL-LiteR, can get a FO rewriting ⇝ in AC0 for data complexity. Theorem CQ answering in ELH and DL-LiteR is NP-complete in combined complexity. For ELH the data complexity is P-complete, and for DL-LiteR the data complexity is in AC0. 72/109
  • 154. datalog rewriting Our procedure yields a Datalog rewriting: ∙ rewT (q) is a UCQ ⇝ translate into set of Datalog rules Πrew(q) ∙ use Q in head of rules ∙ the program Π(T ) (from earlier) computes all entailed ABox assertions 73/109
  • 155. datalog rewriting Our procedure yields a Datalog rewriting: ∙ rewT (q) is a UCQ ⇝ translate into set of Datalog rules Πrew(q) ∙ use Q in head of rules ∙ the program Π(T ) (from earlier) computes all entailed ABox assertions (Πrew(q) ∪ Π(T ), Q) is a Datalog rewriting of q w.r.t. T relative to consistent ABoxes 73/109
  • 156. combined approach for cqs in elhi Alternative: combined approach (saturation and rewriting) 74/109
  • 157. combined approach for cqs in elhi Alternative: combined approach (saturation and rewriting) Know that it suffices to evaluate the UCQ rewT (q) over the set of ABox assertions entailed from the KB K 74/109
  • 158. combined approach for cqs in elhi Alternative: combined approach (saturation and rewriting) Know that it suffices to evaluate the UCQ rewT (q) over the set of ABox assertions entailed from the KB K Also know: assertions entailed from K = assertions in sat(K) 74/109
  • 159. combined approach for cqs in elhi Alternative: combined approach (saturation and rewriting) Know that it suffices to evaluate the UCQ rewT (q) over the set of ABox assertions entailed from the KB K Also know: assertions entailed from K = assertions in sat(K) Materialize assertions in sat(K) and view result as database + only need to evaluate a UCQ can use standard relational database systems – materializing not always convenient saturation needs to be updated if data changes 74/109
  • 160. an fo rewriting approach for cqs in dl-lite For DL-LiteR we can generate an FO-rewriting as follows. Replace in all q′ ∈ rewT (q) each atom by its FO-rewriting for instance checking: 75/109
  • 161. an fo rewriting approach for cqs in dl-lite For DL-LiteR we can generate an FO-rewriting as follows. Replace in all q′ ∈ rewT (q) each atom by its FO-rewriting for instance checking: ∙ replace each A(t) by RewriteIQ(A, T ) ∙ replace each r(t, t′ ) by RewriteIQ(r, T ) Resulting FO formula: ∙ positive, can be transformed into a UCQ ∙ rewriting relative to consistent ABoxes ∙ can be combined with a rewriting of unsatisfiability ∙ yields AC0 upper bound in data complexity 75/109
  • 162. a glimpse beyond Other Horn DLs ∙ Similar results hold for other dialects of DL-Lite and EL ∙ Sometimes complexity increases, e.g., EL with complex role inclusions ∙ The P data and Exp combined upper bounds extend to even more expressive Horn DLs, like Horn-SHOIQ 76/109
  • 163. a glimpse beyond Other Horn DLs ∙ Similar results hold for other dialects of DL-Lite and EL ∙ Sometimes complexity increases, e.g., EL with complex role inclusions ∙ The P data and Exp combined upper bounds extend to even more expressive Horn DLs, like Horn-SHOIQ Beyond Horn DLs ∙ no universal model property ∙ query answering usually exponentially harder ∙ different techniques: automata, rolling-up, resolution, etc. ∙ for some well-known DLs (e.g., SHOIQ) decidability open 76/109
  • 164. complexity of answering (u)cqs IQs CQs data complexity combined complexity data complexity combined complexity DL-Lite DL-LiteR in AC0 NLogSpace in AC0 NP EL, ELH P P P NP ELI, ELHI⊥, Horn-SHOIQ P Exp P Exp ALC, ALCHQ coNP Exp coNP Exp ALCI, SH, SHIQ coNP Exp coNP 2Exp SHOIQ coNP-hard coNExp coNP-hard1 coN2Exp-hard1 1 decidability open 77/109
  • 166. limitations of (u)cqs Some very natural queries are not expressible as CQs: - find dishes that contain something spicy - is a a relative of b? - is there a bus connection from X to Y? 79/109
  • 167. limitations of (u)cqs Some very natural queries are not expressible as CQs: - find dishes that contain something spicy - is a a relative of b? - is there a bus connection from X to Y? We need navigational queries queries that can query the topology of the graph, that is, the information stored in the connections 79/109
  • 168. limitations of (u)cqs Some very natural queries are not expressible as CQs: - find dishes that contain something spicy - is a a relative of b? - is there a bus connection from X to Y? We need navigational queries queries that can query the topology of the graph, that is, the information stored in the connections Especially important for highly connected data with no fixed schema ∙ social, biological, and chemical networks 79/109
  • 170. navigational queries Prominent navigational query languages: Regular Path Queries (RPQs): find pairs of objects that are connected by a chain of roles that comply with a given regular language (hasCourse ∪ courseOf− ) · (hasIngred ∪ ingredOf − )∗ · Spicy?(x, y) 80/109
  • 171. navigational queries Prominent navigational query languages: Regular Path Queries (RPQs): find pairs of objects that are connected by a chain of roles that comply with a given regular language (hasCourse ∪ courseOf− ) · (hasIngred ∪ ingredOf − )∗ · Spicy?(x, y) Conjunctive RPQs: allow to join RPQs conjunctively ∙ similar to CQs, but each atom is an RPQ ∙ extend CQs with the navigational power of RPQs q(x, x′ ) = ∃y, z. serves · Menu? · (hasMain ∪ hasStarter)(x, y) ∧ serves · Menu? · (hasCourse ∪ courseOf− )(x′ , y) ∧ (hasIngred ∪ ingredOf − )∗ · Spicy?(y, z) Both languages have 1-way and 2-way variants 80/109
  • 172. our most expressive navigational queries: c2rpqs Recall: N± R contains all role names and their inverses. A conjunctive two-way regular path query (C2RPQ) has the form q(⃗x) = ∃⃗y. ∧ L(t, t′ ) ∧ ∧ A(t) where A is a concept name t, t′ are variables or individuals (in NI ∪⃗x ∪⃗y) L is regular language over N± R ∪ {A? | A ∈ NC} 81/109
  • 173. our most expressive navigational queries: c2rpqs Recall: N± R contains all role names and their inverses. A conjunctive two-way regular path query (C2RPQ) has the form q(⃗x) = ∃⃗y. ∧ L(t, t′ ) ∧ ∧ A(t) where A is a concept name t, t′ are variables or individuals (in NI ∪⃗x ∪⃗y) L is regular language over N± R ∪ {A? | A ∈ NC} Regular languages can be given as: ∙ regular expressions E → r ∈ N± R | A? | r · r | r ∪ r | r∗ ∙ non-deterministic finite automata NFA Recall: RegExps and NFAs are equivalent, but NFAs are more succinct 81/109
  • 174. other navigational query languages Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝ regular expressions use only (direct) role names q(x, x′ ) = ∃y, z.serves · Menu?hasCourse(x, y) ∧ serves · Menu? · hasCourse(x′ , y) ∧ hasIngred∗ · Spicy?(y, z) q(x) = ∃y.hasIngred∗ · Spicy?(x, y) 82/109
  • 175. other navigational query languages Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝ regular expressions use only (direct) role names q(x, x′ ) = ∃y, z.serves · Menu?hasCourse(x, y) ∧ serves · Menu? · hasCourse(x′ , y) ∧ hasIngred∗ · Spicy?(y, z) q(x) = ∃y.hasIngred∗ · Spicy?(x, y) Two-way regular path queries (2RPQs) have only one atom and no existential variables ⇝ both variables are answer variables q(x, y) = (hasIngred ∪ ingredOf− )∗ · Spicy?(x, y) q(x, y) = (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) 82/109
  • 176. other navigational query languages Conjunctive (one-way) regular path queries (CRPQs) disallow inverses ⇝ regular expressions use only (direct) role names q(x, x′ ) = ∃y, z.serves · Menu?hasCourse(x, y) ∧ serves · Menu? · hasCourse(x′ , y) ∧ hasIngred∗ · Spicy?(y, z) q(x) = ∃y.hasIngred∗ · Spicy?(x, y) Two-way regular path queries (2RPQs) have only one atom and no existential variables ⇝ both variables are answer variables q(x, y) = (hasIngred ∪ ingredOf− )∗ · Spicy?(x, y) q(x, y) = (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) (One-way) Regular path queries (RPQs) are 2RPQs with no inverses ⇝ all of the restrictions above q(x, y) = hasIngred∗ · Spicy?(x, y) q(x, y) = hasCourse · hasIngred∗ · Spicy?(x, y) 82/109
  • 177. semantics of c2rpqs Satisfaction of atoms L(t, t′ ): (d, d′ ) ∈ LI if there is an L-path from d to d′ , i.e., ∙ a sequence e0e1 . . . en objects from ∆I with e0 = d and en = d′ ∙ a word u1u2 . . . un ∈ L over N± R ∪ {A? | A ∈ NC} such that, for every 1 ≤ i ≤ n: ∙ if ui = A?, then ei−1 = ei ∈ AI ∙ if ui = R ∈ N± R , then (ei−1, ei) ∈ RI 83/109
  • 178. semantics of c2rpqs Satisfaction of atoms L(t, t′ ): (d, d′ ) ∈ LI if there is an L-path from d to d′ , i.e., ∙ a sequence e0e1 . . . en objects from ∆I with e0 = d and en = d′ ∙ a word u1u2 . . . un ∈ L over N± R ∪ {A? | A ∈ NC} such that, for every 1 ≤ i ≤ n: ∙ if ui = A?, then ei−1 = ei ∈ AI ∙ if ui = R ∈ N± R , then (ei−1, ei) ∈ RI Match: mapping π from terms to elements that satisfies all atoms As before: I |=π q(⃗a) if match π maps answer variables to ⃗a 83/109
  • 179. semantics of c2rpqs Satisfaction of atoms L(t, t′ ): (d, d′ ) ∈ LI if there is an L-path from d to d′ , i.e., ∙ a sequence e0e1 . . . en objects from ∆I with e0 = d and en = d′ ∙ a word u1u2 . . . un ∈ L over N± R ∪ {A? | A ∈ NC} such that, for every 1 ≤ i ≤ n: ∙ if ui = A?, then ei−1 = ei ∈ AI ∙ if ui = R ∈ N± R , then (ei−1, ei) ∈ RI Match: mapping π from terms to elements that satisfies all atoms As before: I |=π q(⃗a) if match π maps answer variables to ⃗a Certain answers defined as for CQs Again suffices to find match in canonical model 83/109
  • 180. answering 2rpqs We focus on answering 2PRQs: one atom, no existential variables 84/109
  • 181. answering 2rpqs We focus on answering 2PRQs: one atom, no existential variables Bound on matches ranging over individuals only 84/109
  • 182. answering 2rpqs We focus on answering 2PRQs: one atom, no existential variables Bound on matches ranging over individuals only Challenge: paths may need to go deep into the canonical model q(x, y) = serves · (hasIngred ∪ ingredOf − )∗ · Spicy? · Σ∗ (x, y) r π(x) p PizzaCalab b PenneArrabπ(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred 84/109
  • 183. loops through the anonymous part Goal: compact representation of all ways in which paths through the anonymous part can participate in matches 85/109
  • 184. loops through the anonymous part Goal: compact representation of all ways in which paths through the anonymous part can participate in matches s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ We use NFA representation We write M ∈ Loopα[s, s′ ] iff a ∈ MIK implies the existence of a path p below a that takes the NFA α from s to s′ , e.g., 85/109
  • 185. loops through the anonymous part Goal: compact representation of all ways in which paths through the anonymous part can participate in matches s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ We use NFA representation We write M ∈ Loopα[s, s′ ] iff a ∈ MIK implies the existence of a path p below a that takes the NFA α from s to s′ , e.g., PenneArrab ∈ Loopα[s1, sf] because of PenneArrab ⊑ ∃hasIngred.ArrabSauce ArrabSauce ⊑ ∃hasIngred.(Peperonc ⊓ Spicy) 85/109
  • 186. computing the loop table We can explicitly compute the full table Loopα inductively: 86/109
  • 187. computing the loop table We can explicitly compute the full table Loopα inductively: if s is a state then Loopα[s, s] = NC if M1 ∈ Loopα[s1, s2] and M2 ∈ Loopα[s2, s3] then M1 ⊓ M2 ∈ Loopα[s1, s3] if T |= C1 ⊓ · · · ⊓ Cn ⊑ A and (s1, A?, s2) ∈ δ then C1 ⊓ · · · ⊓ Cn ∈ Loopα[s1, s2] if T |= C1 ⊓ · · · ⊓ Cn ⊑ ∃R.D, T |= R ⊑ R′ , T |= R ⊑ R′′ , (s1, R′ , s2) ∈ δ, D ∈ Loopα[s2, s3], and (s3, R′′− , s4) ∈ δ then C1 ⊓ · · · ⊓ Cn ∈ Loopα[s1, s4] 86/109
  • 188. computing loops: an example s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred 87/109
  • 189. computing loops: an example s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred ∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and Peperonc ⊑ Spicy 87/109
  • 190. computing loops: an example s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred ∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and Peperonc ⊑ Spicy ∙ ArrabSauce ∈ Loopα[s1, sf] because (s1, hasIngred, s1), (sf, hasIngred− , sf) ∈ δ and ArrabSauce ⊑ ∃hasIngred.Peperonc Peperonc ∈ Loopα[s1, sf] 87/109
  • 191. computing loops: an example s0 s1 sf serves hasIngred ingredOf − Spicy? Σ∗ rp PizzaCalab b PenneArrab e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred ∙ Peperonc ∈ Loopα[s1, sf] because (s1, Spicy?, sf) ∈ δ and Peperonc ⊑ Spicy ∙ ArrabSauce ∈ Loopα[s1, sf] because (s1, hasIngred, s1), (sf, hasIngred− , sf) ∈ δ and ArrabSauce ⊑ ∃hasIngred.Peperonc Peperonc ∈ Loopα[s1, sf] ∙ PenneArrab ∈ Loopα[s1, sf] because (s1, hasIngred, s1), (sf, hasIngred− , sf) ∈ δ and PenneArrab ⊑ ∃hasIngred.ArrabSauce ArrabSauce ∈ Loopα[s1, sf] 87/109
  • 192. evaluation 2rpqs using the loop table Non-deterministic algorithm to decide (a, b) ∈ cert(α(x, y), K) Input: NFA α = (S, Σ, δ, s0, F), KB K = (T , A), (a, b) from A 88/109
  • 193. evaluation 2rpqs using the loop table Non-deterministic algorithm to decide (a, b) ∈ cert(α(x, y), K) Input: NFA α = (S, Σ, δ, s0, F), KB K = (T , A), (a, b) from A ∙ After checking consistency, we start from (a, s0) ∙ At pair (c, s), guess new pair (d, s′ ) together with one of: ∙ transition (s, σ, s′ ) a σ-step from c to d in ABox ⇝ check if (c, d) ∈ σI ∙ concepts M in Loopα[s, s′ ] stay at same individual, and jump to s′ ⇝ check if c = d ∈ MI ∙ Exit when we get pair (b, sf) ∙ Use counter to ensure termination (only need to consider each pair once) 88/109
  • 194. evaluation algorithm Algorithm EvalAtom Input: NFA α = (S, Σ, δ, s0, F) with Σ ⊆ N± R ∪ {A? | A ∈ NC}, ELHI⊥ KB (T , A), (a, b) ∈ Ind(A) × Ind(A) 1. Test whether (T , A) is satisfiable, output yes if not. 2. Initialize current = (a, s0) and count = 0. Set max = |A| · |S| + 1. 3. While count < max and current ̸∈ {(b, sf) | sf ∈ F} 3.1 Let current = (c, s). 3.2 Guess a pair (d, s′ ) ∈ Ind(A) × S and either (s, σ, s′ ) ∈ δ or M ∈ Loopα[s, s′ ]. 3.3 If (s, σ, s′ ) was guessed ∙ If σ ∈ N± R , then verify that T , A |= σ(c, d), and return no if not. ∙ If σ = A?, then verify that c = d and T , A |= A(c), and return no if not. 3.4 If M was guessed, then verify that c = d and that T , A |= B(c) for every concept name B ∈ M, and return no if not. 3.5 Set current = (d, s′ ) and increment count. 4. If current = (b, sf) for some sf ∈ F, return yes. Else return no. 89/109
  • 195. evaluation algorithm: example (1/2) q(x, y) = serves · (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) PenneArrab ⊑ PastaDish ⊓ ∃hasIngred.ArrabSauce PastaDish ⊑ Dish ⊓ ∃hasIngred.Pasta ArrabSauce ⊑ ∃hasIngred.Peperonc Peperonc ⊔ ∃hasIngred.Spicy ⊑ Spicy Spicy ⊓ Dish ⊑ SpicyDish r π(x) p PizzaCalab b PenneArrabπ(y) e1 Nduja, Spicy e2 Penne, Pasta e3 ArrabSauce e4 Peperonc, Spicy serves serves hasIngred hasIngred hasIngred hasIngred 90/109
  • 196. evaluation algorithm: example (2/2) serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) q(x, y) = serves · (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) s0 s1 sf serves hasIngred ingredOf− Spicy? Σ∗Peperonc ∈ Loopα[s1, sf] ArrabSauce ∈ Loopα[s1, sf] PenneArrab ∈ Loopα[s1, sf] 91/109
  • 197. evaluation algorithm: example (2/2) serves(r, b) serves(r, p) PenneArrab(b) PizzaCalab(p) q(x, y) = serves · (hasIngred ∪ ingredOf− )∗ · Spicy? · Σ∗ (x, y) s0 s1 sf serves hasIngred ingredOf− Spicy? Σ∗Peperonc ∈ Loopα[s1, sf] ArrabSauce ∈ Loopα[s1, sf] PenneArrab ∈ Loopα[s1, sf] count: 0 1 2 Guess (r, s0) (b, s1) (b, sf) (s0, serves, s1) ∈ δ PenneArrab ∈ Loopα[s1, sf] Test (r, b) ∈ servesI b ∈ PenneArrabI return yes 91/109
  • 198. complexity of the algorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. 92/109
  • 199. complexity of the algorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. ∙ Iterations bounded by counter (poly. counter ⇝ log space) ∙ We need calls to procedures for: satisfiability instance checking membership in Loopα table ∙ These calls are in Exp for ELHI⊥ Loopα computation: polynomially many iterations each one tests entailment 92/109
  • 200. complexity of the algorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. ∙ Iterations bounded by counter (poly. counter ⇝ log space) ∙ We need calls to procedures for: satisfiability instance checking membership in Loopα table ∙ These calls are in Exp for ELHI⊥ Loopα computation: polynomially many iterations each one tests entailment Exp upper bound for ELHI⊥ (combined complexity) 92/109
  • 201. complexity of the algorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. ∙ Iterations bounded by counter (poly. counter ⇝ log space) ∙ We need calls to procedures for: satisfiability instance checking membership in Loopα table ∙ These calls are in Exp for ELHI⊥ Loopα computation: polynomially many iterations each one tests entailment Exp upper bound for ELHI⊥ (combined complexity) For ELH and DL-Lite, we can obtain P upper bound (combined) 92/109
  • 202. complexity of the algorithm Theorem (a, b) ∈ cert(q, K) iff there is some execution of EvalAtom(α, K, (a, b)) that returns yes. ∙ Iterations bounded by counter (poly. counter ⇝ log space) ∙ We need calls to procedures for: satisfiability instance checking membership in Loopα table ∙ These calls are in Exp for ELHI⊥ Loopα computation: polynomially many iterations each one tests entailment Exp upper bound for ELHI⊥ (combined complexity) For ELH and DL-Lite, we can obtain P upper bound (combined) Data complexity: ∙ Loopα computation in constant time (ABox independent) ∙ called procedures in P for ELH and NLogSpace for DL-LiteR 92/109
  • 203. complexity bounds Theorem ∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined complexity and P-complete in data complexity ∙ For DL-LiteR and ELH, the combined complexity drops to P-complete ∙ In data complexity, the problem is NLogSpace-complete for DL-LiteR, and P-complete for ELH. 93/109
  • 204. complexity bounds Theorem ∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined complexity and P-complete in data complexity ∙ For DL-LiteR and ELH, the combined complexity drops to P-complete ∙ In data complexity, the problem is NLogSpace-complete for DL-LiteR, and P-complete for ELH. Most matching lower bounds from simpler problems: 93/109
  • 205. complexity bounds Theorem ∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined complexity and P-complete in data complexity ∙ For DL-LiteR and ELH, the combined complexity drops to P-complete ∙ In data complexity, the problem is NLogSpace-complete for DL-LiteR, and P-complete for ELH. Most matching lower bounds from simpler problems: ∙ instance checking ∙ graph reachability = RPQ over plain ABox 93/109
  • 206. complexity bounds Theorem ∙ For ELHI⊥, 2RPQ answering is Exp-complete in combined complexity and P-complete in data complexity ∙ For DL-LiteR and ELH, the combined complexity drops to P-complete ∙ In data complexity, the problem is NLogSpace-complete for DL-LiteR, and P-complete for ELH. Most matching lower bounds from simpler problems: ∙ instance checking ∙ graph reachability = RPQ over plain ABox P-hardness for DL-LiteR non-trivial 93/109
  • 207. answering c2rpqs For answering C2RPQs, we combine the ideas above: ∙ rewrite the query so that matches ranging over individuals suffice ∙ in each step, consider possibly deeper paths with Loopαtable After rewriting, guess matches using individuals only and check them using EvalAtom on each atom 94/109
  • 208. answering c2rpqs For answering C2RPQs, we combine the ideas above: ∙ rewrite the query so that matches ranging over individuals suffice ∙ in each step, consider possibly deeper paths with Loopαtable After rewriting, guess matches using individuals only and check them using EvalAtom on each atom This algorithm works for all DLs discussed and gives optimal complexity bounds 94/109
  • 209. answering c2rpqs For answering C2RPQs, we combine the ideas above: ∙ rewrite the query so that matches ranging over individuals suffice ∙ in each step, consider possibly deeper paths with Loopαtable After rewriting, guess matches using individuals only and check them using EvalAtom on each atom This algorithm works for all DLs discussed and gives optimal complexity bounds Answering C2RPQs is not much harder: ∙ Combined complexity increases to PSpace for DL-LiteR and ELH ∙ but most other bounds are the same as for RPQs and CQs ∙ even for very expressive DLs that are not Horn 94/109
  • 210. final remarks on navigational queries Navigational queries provide more querying power at moderate computational cost Expressible in Datalog, but computationally better behaved Good alternative to CQs, gaining increasing attention Property paths in SPARQL ∙ included in the SPARQL 1.1 standard ∙ add regular paths as in C2RPQs Ongoing quest for more flexible navigational languages 95/109
  • 211. complexity of answering (c)(2)rpqs 2RPQs C2RPQs data complexity combined complexity data complexity combined complexity DL-Lite DL-LiteR NLogSpace P NLogSpace PSpace EL, ELH P P P PSpace ELI, ELHI⊥, Horn-SHOIQ P Exp P Exp ALC, ALCHQ coNP Exp coNP-hard 2Exp ALCI, SH, SHIQ coNP Exp coNP-hard 2Exp SHOIQ coNP-hard coNExp coNP-hard1 coN2Exp-hard1 1 decidability open 96/109
  • 212. comparison: complexity of answering (u)cqs IQs CQs data complexity combined complexity data complexity combined complexity DL-Lite DL-LiteR in AC0 NLogSpace in AC0 NP EL, ELH P P P NP ELI, ELHI⊥, Horn-SHOIQ P Exp P Exp ALC, ALCHQ coNP Exp coNP Exp ALCI, SH, SHIQ coNP Exp coNP 2Exp SHOIQ coNP-hard coNExp coNP-hard1 coN2Exp-hard1 1 decidability open 97/109
  • 213. queries with negation or re- cursion
  • 214. queries with negation Conjunctive query with safe negation (CQ¬s ): ∙ like a CQ, but can also have negated atoms ∙ safety condition: every variable occurs in some positive atom ∙ example: find menus whose main course is not spicy ∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y) 99/109
  • 215. queries with negation Conjunctive query with safe negation (CQ¬s ): ∙ like a CQ, but can also have negated atoms ∙ safety condition: every variable occurs in some positive atom ∙ example: find menus whose main course is not spicy ∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y) Conjunctive query with inequalities (CQ̸= ) ∙ like a CQ, but can also have atoms t1 ̸= t2 (t1, t2 vars or individuals) ∙ example: find restaurant offering two menus having different dessert courses ∃y1y2z1z2 offers(x, y1) ∧ Menu(y1) ∧ hasDessert(y1, z1)∧ offers(x, y2) ∧ Menu(y2) ∧ hasDessert(y2, z2) ∧ z1 ̸= z2 ∙ example: find menus with at least three courses (see notes) 99/109
  • 216. queries with negation Conjunctive query with safe negation (CQ¬s ): ∙ like a CQ, but can also have negated atoms ∙ safety condition: every variable occurs in some positive atom ∙ example: find menus whose main course is not spicy ∃y Menu(x) ∧ hasMain(x, y) ∧ ¬Spicy(y) Conjunctive query with inequalities (CQ̸= ) ∙ like a CQ, but can also have atoms t1 ̸= t2 (t1, t2 vars or individuals) ∙ example: find restaurant offering two menus having different dessert courses ∃y1y2z1z2 offers(x, y1) ∧ Menu(y1) ∧ hasDessert(y1, z1)∧ offers(x, y2) ∧ Menu(y2) ∧ hasDessert(y2, z2) ∧ z1 ̸= z2 ∙ example: find menus with at least three courses (see notes) Note: can define UCQ¬s s and UCQ̸= s in the obvious way 99/109
  • 217. undecidability results for queries with negation Adding negation leads to undecidability even in very restricted settings. Theorem The following problems are undecidable: ∙ CQ¬s answering in DL-LiteR ∙ UCQ¬s answering in EL⊥ ∙ CQ̸= answering in DL-LiteR ∙ CQ̸= answering in EL⊥ 100/109
  • 218. undecidability results for queries with negation Adding negation leads to undecidability even in very restricted settings. Theorem The following problems are undecidable: ∙ CQ¬s answering in DL-LiteR ∙ UCQ¬s answering in EL⊥ ∙ CQ̸= answering in DL-LiteR ∙ CQ̸= answering in EL⊥ Possible solution: adopt alternative semantics (see lecture notes) 100/109
  • 219. undecidability results for queries with recursion Significant interest in combining DLs with Datalog rules Unfortunately, this almost always leads to undecidability: Theorem Datalog query answering is undecidable in every DL that can express (directly or indirectly) A ⊑ ∃r.A In particular: undecidable in both DL-Lite and EL 101/109
  • 220. undecidability results for queries with recursion Significant interest in combining DLs with Datalog rules Unfortunately, this almost always leads to undecidability: Theorem Datalog query answering is undecidable in every DL that can express (directly or indirectly) A ⊑ ∃r.A In particular: undecidable in both DL-Lite and EL Possible solutions: ∙ use restricted classes of Datalog queries (e.g. path queries) ∙ DL-safe rules: can only apply rules to (named) individuals 101/109
  • 222. efficient omqa Lots of work on developing and implementing efficient OMQA algorithms Focus mostly on DL-Lite (and related dialects): ∙ First algorithm PerfectRef proposed in mid-2000’s ∙ Rewrites into UCQs, implemented in Quonto ∙ Improved versions proposed in Requiem, Presto, Rapid, … ∙ Some algorithms rewrite into positive existential queries or Datalog programs instead of UCQs ∙ Resulting queries are smaller, can be easier to evaluate 103/109
  • 223. optimizations and omqa beyond dl-lite Tractable classes, fragments of lower complexity Rewriting engines for other Horn DLs also developed, e.g., ∙ Requiem and the related Kyrie cover several EL dialects ∙ Clipper, and recently Rapid cover Horn-SHIQ They usually rewrite into Datalog programs 104/109
  • 224. understanding rewritability Much attention devoted to understanding the limits of rewritability and size of rewritings When are polynomial rewritings possible? Can we give bounds on the size of rewritings? Which non-DL-Lite ontologies can be rewritten into FO-queries? ⇝ related to non-uniform complexity: ∙ study specific pairs (q, T ), called ontology-mediated queries 105/109
  • 225. combined approaches Saturate the ABox using the TBox axioms ⇝ a finite version of the canonical model and then evaluate the query over the saturated ABox Two approaches: ∙ modify the query before evaluation to ensure soundess, or ∙ evaluate and then filter unsound answers First proposed for EL, then also for DL-Lite Extended to other dialects, richer DLs 106/109
  • 226. querying existing relational data using mappings Today: assumed data given as ABox assertions (unary + binary facts) Problem: how to query existing relational data (arbitrary arity)? Solution: use mapping that specifies relationship between the database relations and the concepts / roles in DL vocabulary Formally: mapping assertions of the form φ → ψ where: ∙ φ is an query formulated using DB relations ∙ ψ is a query in the DL vocabulary Global-as-view (GAV) mappings: φ CQ, ψ atom (no quantifiers) Handling mappings: ∙ apply mappings to generate ABox, proceed as usual ∙ virtual ABox: unfolding step to get rewriting over DB relations 107/109
  • 227. other research topics (non-exhaustive) Beyond classical OMQA   ∙ inconsistency-tolerant query answering ∙ probabilistic query answering ∙ privacy-aware query answering ∙ temporal query answering Support for building and maintaining OMQA systems ∙ module extraction ∙ ontology evolution ∙ query inseparability and emptiness Improving the usability of OMQA systems ∙ interfaces and support for query formulation ∙ explaining query (non-)answers 108/109