SlideShare a Scribd company logo
Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering  and Computer Science  The Hebrew University of Jerusalem
Overview of the Talk OR-semantics and weak semantics for querying incomplete data Complexity of query evaluation Full disjunctions as a special case of weak semantics Generalizing full disjunctions – the join constraints are not restricted to be equality constraints Lower bounds for some related problems
Querying Incomplete Data  Requires a Special Semantics  Usually, answers to a query are  complete   assignments  of database objects (or values) to the query variables Consequently, partial information is lost For example, dangling tuples are lost when joining several relations The purpose of outerjoins and full disjunctions is to solve this problem, i.e., answers could be partial assignments (to some of the variables)
Querying Incomplete  Semistructured Data In semistructured data, incompleteness of data is prevalent OR-semantics and weak semantics were introduced so that queries over semistructured data would return  maximal answers  rather than complete answers [Kanza, Nutt & Sagiv 1999]
In the Semistructured Data Model Both data and queries are labeled rooted directed graphs Query nodes are variables Database nodes are objects Matchings are assignments of database objects to query variables, such that The database root is assigned to the query root, and Labels are preserved
1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody   Allen title year acted in acted in A Semistructured Database About Movies
v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A Query Under complete semantics, the query returns actor-movie pairs, such that the actor played in the movie and was also the director of the movie
1 2 4 5 6 title language 7 3 year 8 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A complete matching of the query variables  to database objects director 1 2 5 6 4 10 11
Constraints on Complete Matchings Query Root Database Root The  root constraint  is satisfied if the query root is mapped to the database root A query edge is an  edge constraint :   A query edge with a label  l  is satisfied if it is mapped to a database edge with the same label  l r 1 x y 9 11 l l
language 1 2 4 5 title 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody   Allen title year acted in acted in Suppose that  Node 6 is missing 6 English language 6 English
1 2 4 5 title 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language An incomplete  matching This matching is maximal 1 2 5 4 10 11 w 2 null
The Reachability Constraint on Partial Matchings A query node  v  that is mapped to a database object  o  satisfies the  reachability constraint  if there is a path from the query root to  v , such that all edge constraints along this path are satisfied Database 1 x z w y l 1 r v l 3 l 2 l 5 l 4 l 6 v Query x z r l 2 l 4 l 6 7 9 1 l 2 l 4 l 6 w y l 1 r v l 3 l 5 v 1 55 5 8 l 1 1 l 3 l 5 55
Weak Satisfaction of Edge Constraints An edge constraint is  weakly satisfied  if it is either Satisfied (as defined earlier), or One (or more) of its nodes is mapped to a null value x y 9 11 l l x y 9 11 l m x y 9 11 l m null null x y l null null
Weak Matchings A partial matching is a  weak matching  if The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Every edge constraint is weakly satisfied
1 2 4 5 title 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A weak matching w 2 1 2 5 4 10 11 null
1 2 4 5 title 7 3 year 8 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody   Allen title year acted in acted in A Movie Database Consider the case where  the director edge is missing director director
1 2 4 5 title 7 3 year 8 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language An incomplete matching that is not  a weak matching w 2 1 2 5 4 10 11 null There is an edge that is  not weakly satisfied
OR Matchings A partial matching is an  OR matching  if The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Differently from a weak matching, in an  OR Matching,  an edge constraint does not  have to be weakly satisfied
Maximal Matchings Matchings can be represented as tuples (where numbers are object id’s) A matching t 1   subsumes   a matching t 2  if t 1  can be obtained from t 2  by replacing some nulls in t 2  with non-null values A matching is  maximal  if no other matching subsumes it  A query result consists only of maximal matchings t 1 =(1, 5, 2, null) t 2 =(1, null, 2, null)
More Examples
1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody   Allen title year acted in acted in The Movie Database Before the Removals
1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A complete  matching It is also a maximal  weak matching It is also a maximal OR-matching In the result,  the actor must be both an actor in the movie  and the director of the movie 1 2 5 6 4 10 11
1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A second maximal weak matching In the result, if the actor and the movie are assigned non-null values, then the actor must be both an actor in the movie  and  the director of the movie 1 8 3 null null null null
1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A maximal OR-matching In the result,  the actor either played in the movie, directed the movie,  or  is not related at all to the movie 1 8 3 4 10 11 null It is not a weak matching
Complexity of Evaluating Maximal Weak Matchings and Maximal OR Matchings
Data Complexity Under data complexity, the time complexity is a function of the size of the  database
Two Alternatives for Query Evaluation A naïve algorithm computes all matchings and then removes subsumed matchings A better algorithm avoids computing all matchings – ideally it only computes maximal matchings Under data complexity, both algorithms are polynomial time
Input-Output Complexity Under input-output complexity, the time complexity is a function of the size of the  query , the size of the  database,  and  the size of the  result
A Naïve Algorithm vs. A Better Algorithm Under I-O complexity, a naïve algorithm is exponential Is there a better algorithm with a polynomial time I-O complexity? The answer is positive for DAG queries [Kanza, Nutt & Sagiv 1999]
Cyclic Queries Theorem:  For a query Q and a database D,  the set of all maximal weak matchings  can be computed in  O ( q 3 dm 2 ) time, where q  is the size of the query,  d  is the size of the  database and  m  is the size of the result  (computing all maximal OR matchings has the  same complexity)
Full Disjunctions What is the full disjunction of a set of relations? How are full disjunctions related to queries with incomplete answers ?
Movies Actors Acted-in Actors-that-Directed The Full Disjunction of the Given Relations English 1998 Armageddon 3 English 1940 Fantasia 4 English 1998 Antz 2 English 1983 Zelig 1 language year title m-id 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 1/12/1935 Woody Allen 1 date-of-birth name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id 1 1 m-id a-id Harry 19/3/1955 Bruce Willis 2 English 1998 Armageddon 3     English 1940 Fantasia 4  Z Zelig role 28/10/1967 1/12/1935 1/12/1935 Date-of-birth Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id     English 1998 Antz 2 English 1983 Zelig 1 language year title m-id
The Full Disjunction of the Given Relations The full disjunction does not include subsumed tuples Movies Harry 19/3/1955 Bruce Willis 2 English 1998 Armageddon 3     English 1940 Fantasia 4  Z Zelig role 28/10/1967 1/12/1935 1/12/1935 Date-of-birth Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id     English 1998 Antz 2 English 1983 Zelig 1 language year title m-id  role  Date-of-birth  name  a-id English 1983 Zelig 1 language year title m-id English 1998 Armageddon 3 English 1940 Fantasia 4 English 1998 Antz 2 English 1983 Zelig 1 language year title m-id This tuple will not be in the full disjunction
Movies Actors Acted-in Actors-that-Directed The Full Disjunction of the Given Relations The full disjunction does not include tuples that are based  on Cartesian Product rather than join English 1998 Armageddon 3 English 1940 Fantasia 4 English 1998 Antz 2 English 1983 Zelig 1 language year title m-id 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 1/12/1935 Woody Allen 1 date-of-birth name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id 1 1 m-id a-id Harry 19/3/1955 Bruce Willis 2 English 1998 Armageddon 3     English 1940 Fantasia 4  Z Zelig role 28/10/1967 1/12/1935 1/12/1935 Date-of-birth Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id     English 1998 Antz 2 English 1983 Zelig 1 language year title m-id  role 28/10/1967 Date-of-birth Julia Roberts name 3 a-id English 1940 Fantasia 4 language year title m-id
In the Full Disjunction of a Given Set of Relations: Every tuple of the input is a part of at least one tuple of the output Tuples are joined as in a natural join, padded with null values  The result includes only “ maximal connected portions”
Motivation for Full Disjunctions Full disjunctions have been proposed by Galiando-Legaria as an alternative for outerjoins [SIGMOD’94] Rajaraman and Ullman suggested to use full disjunctions for information integration [PODS’96]
Computing Full Disjunctions for  γ -acyclic Relation Schemas Rajaraman and Ullman have shown how to evaluate the full disjunction by a sequence of natural outerjoins when the relation schemas are  γ -acyclic   Hence, the full disjunction can be computed  in polynomial time, under input-output complexity, when the relation schemas are  γ -acyclic
Weak Semantics Generalizes Full Disjunctions Relations can be converted into a semistructured database The full disjunction can be expressed as the union of several queries that are evaluated under weak semantics
Example Movies Actors Acted-in A node is created for each tuple Edges are added between connected tuples, in both directions A root is added, and edges are added from the root to every node Creating The Database We use colors instead of labels Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id r
Movies Actors Acted-in Creating The Queries Example A node is created for each relation schema Edges are added between connected schemas, in both directions r The number of queries is equal to the number of schemas In each query, the root is connected to a different schema Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Movies Actors Acted-in r
Queries are Evaluated under  Weak Semantics Movies Actors Acted-in Example r Movies Actors Acted-in r Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Zelig role Woody Allen name 1 a-id Zelig 1 title m-id role name a-id title m-id
Movies Actors Acted-in Example r Movies Actors Acted-in r Queries are Evaluated under  Weak Semantics Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Zelig role Woody Allen name 1 a-id Zelig 1 title m-id Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id
Movies Actors Acted-in Example r Movies Actors Acted-in r Queries are Evaluated under  Weak Semantics Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Zelig role Woody Allen name 1 a-id Zelig 1 title m-id Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id Harry Bruce Willis 2 Armageddon 3 Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id
Movies Actors Acted-in Example r Movies Actors Acted-in r Queries are Evaluated under  Weak Semantics Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Harry Bruce Willis 2 Armageddon 3 Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id Harry Bruce Willis 2 Armageddon 3  Z Zelig role Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id   Antz 2 Zelig 1 title m-id null null
Movies Actors Acted-in Example r Movies Actors Acted-in r Queries are Evaluated under  Weak Semantics Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Harry Bruce Willis 2 Armageddon 3 Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id Harry Bruce Willis 2 Armageddon 3  Z Zelig role Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id   Antz 2 Zelig 1 title m-id
Movies Actors Acted-in Example r Movies Actors Acted-in r Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Harry Bruce Willis 2 Armageddon 3 Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id Harry Bruce Willis 2 Armageddon 3  Z Zelig role Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id   Antz 2 Zelig 1 title m-id null null Harry Bruce Willis 2 Armageddon 3  Julia Roberts 3    Z Zelig role  Woody Allen Woody Allen name  1 1 a-id Fantasia 4 Antz 2 Zelig 1 title m-id
The Algorithm Computes Full Disjunctions in Polynomial Time Under Input-Output Complexity Theorem:  The full disjunction of relations  r 1 , …, r n  can be computed in  O ( n 5 s   2 f  2 ) time,  where  n  is the number of relations,  s  is the  total size of all the relations and  f  is the size  of the result
Generalizing Full Disjunctions In a full disjunction, tuples are joined according to equality constraints as in a natural join (or equi-join) We can generalize full disjunctions to support constraints that are not merely equality among attributes
Example Movies ( m-id , title, year, language, location) Actors ( a-id , name, date-of-birth) Acted-in (a-id, m-id, role) Actors-that-Directed (a-id, m-id) Historical-Events ( name , date, description) Historical-Sites (Country, State, City, Site) The date of the historical event is a date in the year when the movie was released The filming location is near the historical site
The General Idea A set of constraints specifies how tuples should be joined  The queries and the database are constructed according to the given constraints  A pair of nodes is connected by an edge when it satisfies the corresponding constraint Queries are evaluated w.r.t. the database under weak semantics
Another Way of Generalizing Full Disjunctions: Use OR-Semantics   Generate the queries and the database as before, but the queries are evaluated under OR-semantics (rather than weak semantics) This relaxes the requirement that every pair of tuples should be join consistent Instead, a tuple of the full disjunction is only required to be generated by database tuples that form a connected subgraph, but need not be pairwise join consistent
Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Example The Full Disjunction Employee: (007, James Bond, London, 6) Department:  (6, MI-6, 10) Located-in: (10, Liverpool, King) 10  building Liverpool  city 10 10 building 6 6 dept -no King MI-6      MI-6 6 London James Bond 007 street dname dept -no city ename e-id
Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Example The Full Disjunction under OR-Semantics Employee: (007, James Bond, London, 6) Department:  (6, MI-6, 10) Located-in: (10, Liverpool, King) 10 building Liverpool city 10 building 6 dept -no King MI-6 6 London James Bond 007 street dname dept -no city ename e-id
The Projection Problem :   Computing the projection of  the full disjunction on a given set of attributes The Restriction Problem :   Computing only those  tuples of the full disjunction that are non-null on a  given set of attributes Two Related Problems The projection problem and the restriction problem  cannot be computed in polynomial time (under  input-output complexity) unless P=NP
Conclusion Cyclic queries can be computed in polynomial time (in the size of the query, the database and the result) under either OR-semantics or weak semantics A reduction of full-disjunction evaluation to query evaluation under weak semantics is described Using the reduction, full disjunctions can be computed in polynomial time (in the size of the relation schemas, the relations and the result)
Conclusion (continued) Full disjunctions can be generalized in two ways By using OR-semantics instead of weak semantics By joining tuples according to general constraints Generalized full disjunctions can be useful in the context of data integration from heterogeneous sources The projection problem and the restriction problem have polynomial-time algorithms (under input-output complexity) when the relations have  γ-acyclic  schemas, but not in the general case
Thank You Questions?

More Related Content

PPT
Computing FDs
PDF
3.antonyms
PDF
PDF
Moffett RAB EPA Vapor Intrusion Update, September 9, 2010
PPTX
Poliedros [recuperado]
PDF
This is not your Father\'s Education
KEY
Lazy Professors
PPTX
Panel on linked data in enterprise information systems ICEIS 2013
Computing FDs
3.antonyms
Moffett RAB EPA Vapor Intrusion Update, September 9, 2010
Poliedros [recuperado]
This is not your Father\'s Education
Lazy Professors
Panel on linked data in enterprise information systems ICEIS 2013

Similar to Pods2003 (20)

PPTX
Dissertation Defense - Managing and Consuming Completeness Information for RD...
PDF
03 ra-examples3(1)
PDF
On the Semantic Web, Completeness does Matter!
PDF
[ISWC 2013] Completeness statements about RDF data sources and their use for ...
PDF
Managing Completeness of Web Data
PPTX
Managing Completeness of Data
PDF
Week 1 - 4 summary slides.pdfhxbsnsnanbsbs
PDF
4 Basic SQL.pdf SQL is a standard language for storing, manipulating and retr...
PDF
Four Basic SQL Programming for Learners
PPT
introduction into IR
PPTX
MS Access Ch 2 PPT
PPT
lecture1-intro.pptbbbbbbbbbbbbbbbbbbbbbbbbbb
PPT
lecture1-intro.ppt
PPT
lecture1-intro.ppt
DOCX
Database SystemsDesign, Implementation, and ManagementCo.docx
PDF
Sql success ch03
PPTX
Basics of IR: Web Information Systems class
PDF
Chapter_8_SQL.pdf
PDF
Introduction to Relational Algebra
PDF
lecture1.pdf
Dissertation Defense - Managing and Consuming Completeness Information for RD...
03 ra-examples3(1)
On the Semantic Web, Completeness does Matter!
[ISWC 2013] Completeness statements about RDF data sources and their use for ...
Managing Completeness of Web Data
Managing Completeness of Data
Week 1 - 4 summary slides.pdfhxbsnsnanbsbs
4 Basic SQL.pdf SQL is a standard language for storing, manipulating and retr...
Four Basic SQL Programming for Learners
introduction into IR
MS Access Ch 2 PPT
lecture1-intro.pptbbbbbbbbbbbbbbbbbbbbbbbbbb
lecture1-intro.ppt
lecture1-intro.ppt
Database SystemsDesign, Implementation, and ManagementCo.docx
Sql success ch03
Basics of IR: Web Information Systems class
Chapter_8_SQL.pdf
Introduction to Relational Algebra
lecture1.pdf
Ad

Recently uploaded (20)

PPTX
28 - relative valuation lecture economicsnotes
PDF
GVCParticipation_Automation_Climate_India
PDF
Principal of magaement is good fundamentals in economics
PDF
1a In Search of the Numbers ssrn 1488130 Oct 2009.pdf
DOCX
BUSINESS PERFORMANCE SITUATION AND PERFORMANCE EVALUATION OF FELIX HOTEL IN H...
PDF
International Financial Management, 9th Edition, Cheol Eun, Bruce Resnick Tuu...
PPT
Fundamentals of Financial Management Chapter 3
PDF
Fintech Regulatory Sandbox: Lessons Learned and Future Prospects
PPTX
ML Credit Scoring of Thin-File Borrowers
PDF
Statistics for Management and Economics Keller 10th Edition by Gerald Keller ...
PDF
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
PDF
7a Lifetime Expected Income Breakeven Comparison between SPIAs and Managed Po...
PPTX
Group Presentation Development Econ and Envi..pptx
PDF
3CMT J.AFABLE Flexible-Learning ENTREPRENEURIAL MANAGEMENT.pdf
PDF
The Right Social Media Strategy Can Transform Your Business
PPT
KPMG FA Benefits Report_FINAL_Jan 27_2010.ppt
PDF
The Role of Islamic Faith, Ethics, Culture, and values in promoting fairness ...
PPTX
Maths science sst hindi english cucumber
PDF
Buy Verified Stripe Accounts for Sale - Secure and.pdf
PDF
3a The Dynamic Implications of Sequence Risk on a Distribution Portfolio JFP ...
28 - relative valuation lecture economicsnotes
GVCParticipation_Automation_Climate_India
Principal of magaement is good fundamentals in economics
1a In Search of the Numbers ssrn 1488130 Oct 2009.pdf
BUSINESS PERFORMANCE SITUATION AND PERFORMANCE EVALUATION OF FELIX HOTEL IN H...
International Financial Management, 9th Edition, Cheol Eun, Bruce Resnick Tuu...
Fundamentals of Financial Management Chapter 3
Fintech Regulatory Sandbox: Lessons Learned and Future Prospects
ML Credit Scoring of Thin-File Borrowers
Statistics for Management and Economics Keller 10th Edition by Gerald Keller ...
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
7a Lifetime Expected Income Breakeven Comparison between SPIAs and Managed Po...
Group Presentation Development Econ and Envi..pptx
3CMT J.AFABLE Flexible-Learning ENTREPRENEURIAL MANAGEMENT.pdf
The Right Social Media Strategy Can Transform Your Business
KPMG FA Benefits Report_FINAL_Jan 27_2010.ppt
The Role of Islamic Faith, Ethics, Culture, and values in promoting fairness ...
Maths science sst hindi english cucumber
Buy Verified Stripe Accounts for Sale - Secure and.pdf
3a The Dynamic Implications of Sequence Risk on a Distribution Portfolio JFP ...
Ad

Pods2003

  • 1. Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem
  • 2. Overview of the Talk OR-semantics and weak semantics for querying incomplete data Complexity of query evaluation Full disjunctions as a special case of weak semantics Generalizing full disjunctions – the join constraints are not restricted to be equality constraints Lower bounds for some related problems
  • 3. Querying Incomplete Data Requires a Special Semantics Usually, answers to a query are complete assignments of database objects (or values) to the query variables Consequently, partial information is lost For example, dangling tuples are lost when joining several relations The purpose of outerjoins and full disjunctions is to solve this problem, i.e., answers could be partial assignments (to some of the variables)
  • 4. Querying Incomplete Semistructured Data In semistructured data, incompleteness of data is prevalent OR-semantics and weak semantics were introduced so that queries over semistructured data would return maximal answers rather than complete answers [Kanza, Nutt & Sagiv 1999]
  • 5. In the Semistructured Data Model Both data and queries are labeled rooted directed graphs Query nodes are variables Database nodes are objects Matchings are assignments of database objects to query variables, such that The database root is assigned to the query root, and Labels are preserved
  • 6. 1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in A Semistructured Database About Movies
  • 7. v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A Query Under complete semantics, the query returns actor-movie pairs, such that the actor played in the movie and was also the director of the movie
  • 8. 1 2 4 5 6 title language 7 3 year 8 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A complete matching of the query variables to database objects director 1 2 5 6 4 10 11
  • 9. Constraints on Complete Matchings Query Root Database Root The root constraint is satisfied if the query root is mapped to the database root A query edge is an edge constraint : A query edge with a label l is satisfied if it is mapped to a database edge with the same label l r 1 x y 9 11 l l
  • 10. language 1 2 4 5 title 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody Allen title year acted in acted in Suppose that Node 6 is missing 6 English language 6 English
  • 11. 1 2 4 5 title 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language An incomplete matching This matching is maximal 1 2 5 4 10 11 w 2 null
  • 12. The Reachability Constraint on Partial Matchings A query node v that is mapped to a database object o satisfies the reachability constraint if there is a path from the query root to v , such that all edge constraints along this path are satisfied Database 1 x z w y l 1 r v l 3 l 2 l 5 l 4 l 6 v Query x z r l 2 l 4 l 6 7 9 1 l 2 l 4 l 6 w y l 1 r v l 3 l 5 v 1 55 5 8 l 1 1 l 3 l 5 55
  • 13. Weak Satisfaction of Edge Constraints An edge constraint is weakly satisfied if it is either Satisfied (as defined earlier), or One (or more) of its nodes is mapped to a null value x y 9 11 l l x y 9 11 l m x y 9 11 l m null null x y l null null
  • 14. Weak Matchings A partial matching is a weak matching if The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Every edge constraint is weakly satisfied
  • 15. 1 2 4 5 title 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A weak matching w 2 1 2 5 4 10 11 null
  • 16. 1 2 4 5 title 7 3 year 8 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody Allen title year acted in acted in A Movie Database Consider the case where the director edge is missing director director
  • 17. 1 2 4 5 title 7 3 year 8 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language An incomplete matching that is not a weak matching w 2 1 2 5 4 10 11 null There is an edge that is not weakly satisfied
  • 18. OR Matchings A partial matching is an OR matching if The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Differently from a weak matching, in an OR Matching, an edge constraint does not have to be weakly satisfied
  • 19. Maximal Matchings Matchings can be represented as tuples (where numbers are object id’s) A matching t 1 subsumes a matching t 2 if t 1 can be obtained from t 2 by replacing some nulls in t 2 with non-null values A matching is maximal if no other matching subsumes it A query result consists only of maximal matchings t 1 =(1, 5, 2, null) t 2 =(1, null, 2, null)
  • 21. 1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in The Movie Database Before the Removals
  • 22. 1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A complete matching It is also a maximal weak matching It is also a maximal OR-matching In the result, the actor must be both an actor in the movie and the director of the movie 1 2 5 6 4 10 11
  • 23. 1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A second maximal weak matching In the result, if the actor and the movie are assigned non-null values, then the actor must be both an actor in the movie and the director of the movie 1 8 3 null null null null
  • 24. 1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year acted in acted in v 1 v 2 w 1 v 3 title actor movie director acted in w 2 w 3 w 4 date of birth name language A maximal OR-matching In the result, the actor either played in the movie, directed the movie, or is not related at all to the movie 1 8 3 4 10 11 null It is not a weak matching
  • 25. Complexity of Evaluating Maximal Weak Matchings and Maximal OR Matchings
  • 26. Data Complexity Under data complexity, the time complexity is a function of the size of the database
  • 27. Two Alternatives for Query Evaluation A naïve algorithm computes all matchings and then removes subsumed matchings A better algorithm avoids computing all matchings – ideally it only computes maximal matchings Under data complexity, both algorithms are polynomial time
  • 28. Input-Output Complexity Under input-output complexity, the time complexity is a function of the size of the query , the size of the database, and the size of the result
  • 29. A Naïve Algorithm vs. A Better Algorithm Under I-O complexity, a naïve algorithm is exponential Is there a better algorithm with a polynomial time I-O complexity? The answer is positive for DAG queries [Kanza, Nutt & Sagiv 1999]
  • 30. Cyclic Queries Theorem: For a query Q and a database D, the set of all maximal weak matchings can be computed in O ( q 3 dm 2 ) time, where q is the size of the query, d is the size of the database and m is the size of the result (computing all maximal OR matchings has the same complexity)
  • 31. Full Disjunctions What is the full disjunction of a set of relations? How are full disjunctions related to queries with incomplete answers ?
  • 32. Movies Actors Acted-in Actors-that-Directed The Full Disjunction of the Given Relations English 1998 Armageddon 3 English 1940 Fantasia 4 English 1998 Antz 2 English 1983 Zelig 1 language year title m-id 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 1/12/1935 Woody Allen 1 date-of-birth name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id 1 1 m-id a-id Harry 19/3/1955 Bruce Willis 2 English 1998 Armageddon 3     English 1940 Fantasia 4  Z Zelig role 28/10/1967 1/12/1935 1/12/1935 Date-of-birth Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id     English 1998 Antz 2 English 1983 Zelig 1 language year title m-id
  • 33. The Full Disjunction of the Given Relations The full disjunction does not include subsumed tuples Movies Harry 19/3/1955 Bruce Willis 2 English 1998 Armageddon 3     English 1940 Fantasia 4  Z Zelig role 28/10/1967 1/12/1935 1/12/1935 Date-of-birth Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id     English 1998 Antz 2 English 1983 Zelig 1 language year title m-id  role  Date-of-birth  name  a-id English 1983 Zelig 1 language year title m-id English 1998 Armageddon 3 English 1940 Fantasia 4 English 1998 Antz 2 English 1983 Zelig 1 language year title m-id This tuple will not be in the full disjunction
  • 34. Movies Actors Acted-in Actors-that-Directed The Full Disjunction of the Given Relations The full disjunction does not include tuples that are based on Cartesian Product rather than join English 1998 Armageddon 3 English 1940 Fantasia 4 English 1998 Antz 2 English 1983 Zelig 1 language year title m-id 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 1/12/1935 Woody Allen 1 date-of-birth name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id 1 1 m-id a-id Harry 19/3/1955 Bruce Willis 2 English 1998 Armageddon 3     English 1940 Fantasia 4  Z Zelig role 28/10/1967 1/12/1935 1/12/1935 Date-of-birth Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id     English 1998 Antz 2 English 1983 Zelig 1 language year title m-id  role 28/10/1967 Date-of-birth Julia Roberts name 3 a-id English 1940 Fantasia 4 language year title m-id
  • 35. In the Full Disjunction of a Given Set of Relations: Every tuple of the input is a part of at least one tuple of the output Tuples are joined as in a natural join, padded with null values The result includes only “ maximal connected portions”
  • 36. Motivation for Full Disjunctions Full disjunctions have been proposed by Galiando-Legaria as an alternative for outerjoins [SIGMOD’94] Rajaraman and Ullman suggested to use full disjunctions for information integration [PODS’96]
  • 37. Computing Full Disjunctions for γ -acyclic Relation Schemas Rajaraman and Ullman have shown how to evaluate the full disjunction by a sequence of natural outerjoins when the relation schemas are γ -acyclic Hence, the full disjunction can be computed in polynomial time, under input-output complexity, when the relation schemas are γ -acyclic
  • 38. Weak Semantics Generalizes Full Disjunctions Relations can be converted into a semistructured database The full disjunction can be expressed as the union of several queries that are evaluated under weak semantics
  • 39. Example Movies Actors Acted-in A node is created for each tuple Edges are added between connected tuples, in both directions A root is added, and edges are added from the root to every node Creating The Database We use colors instead of labels Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id r
  • 40. Movies Actors Acted-in Creating The Queries Example A node is created for each relation schema Edges are added between connected schemas, in both directions r The number of queries is equal to the number of schemas In each query, the root is connected to a different schema Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Movies Actors Acted-in r
  • 41. Queries are Evaluated under Weak Semantics Movies Actors Acted-in Example r Movies Actors Acted-in r Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Zelig role Woody Allen name 1 a-id Zelig 1 title m-id role name a-id title m-id
  • 42. Movies Actors Acted-in Example r Movies Actors Acted-in r Queries are Evaluated under Weak Semantics Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Zelig role Woody Allen name 1 a-id Zelig 1 title m-id Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id
  • 43. Movies Actors Acted-in Example r Movies Actors Acted-in r Queries are Evaluated under Weak Semantics Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Zelig role Woody Allen name 1 a-id Zelig 1 title m-id Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id Harry Bruce Willis 2 Armageddon 3 Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id
  • 44. Movies Actors Acted-in Example r Movies Actors Acted-in r Queries are Evaluated under Weak Semantics Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Harry Bruce Willis 2 Armageddon 3 Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id Harry Bruce Willis 2 Armageddon 3  Z Zelig role Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id   Antz 2 Zelig 1 title m-id null null
  • 45. Movies Actors Acted-in Example r Movies Actors Acted-in r Queries are Evaluated under Weak Semantics Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Harry Bruce Willis 2 Armageddon 3 Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id Harry Bruce Willis 2 Armageddon 3  Z Zelig role Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id   Antz 2 Zelig 1 title m-id
  • 46. Movies Actors Acted-in Example r Movies Actors Acted-in r Armageddon 3 Fantasia 4 Antz 2 Zelig 1 title m-id Bruce Willis 2 Julia Roberts 3 Woody Allen 1 name a-id Z 2 1 Harry 3 2 Zelig 1 1 role m-id a-id Harry Bruce Willis 2 Armageddon 3 Z Zelig role Woody Allen Woody Allen name 1 1 a-id Antz 2 Zelig 1 title m-id Harry Bruce Willis 2 Armageddon 3  Z Zelig role Julia Roberts Woody Allen Woody Allen name 3 1 1 a-id   Antz 2 Zelig 1 title m-id null null Harry Bruce Willis 2 Armageddon 3  Julia Roberts 3    Z Zelig role  Woody Allen Woody Allen name  1 1 a-id Fantasia 4 Antz 2 Zelig 1 title m-id
  • 47. The Algorithm Computes Full Disjunctions in Polynomial Time Under Input-Output Complexity Theorem: The full disjunction of relations r 1 , …, r n can be computed in O ( n 5 s 2 f 2 ) time, where n is the number of relations, s is the total size of all the relations and f is the size of the result
  • 48. Generalizing Full Disjunctions In a full disjunction, tuples are joined according to equality constraints as in a natural join (or equi-join) We can generalize full disjunctions to support constraints that are not merely equality among attributes
  • 49. Example Movies ( m-id , title, year, language, location) Actors ( a-id , name, date-of-birth) Acted-in (a-id, m-id, role) Actors-that-Directed (a-id, m-id) Historical-Events ( name , date, description) Historical-Sites (Country, State, City, Site) The date of the historical event is a date in the year when the movie was released The filming location is near the historical site
  • 50. The General Idea A set of constraints specifies how tuples should be joined The queries and the database are constructed according to the given constraints A pair of nodes is connected by an edge when it satisfies the corresponding constraint Queries are evaluated w.r.t. the database under weak semantics
  • 51. Another Way of Generalizing Full Disjunctions: Use OR-Semantics Generate the queries and the database as before, but the queries are evaluated under OR-semantics (rather than weak semantics) This relaxes the requirement that every pair of tuples should be join consistent Instead, a tuple of the full disjunction is only required to be generated by database tuples that form a connected subgraph, but need not be pairwise join consistent
  • 52. Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Example The Full Disjunction Employee: (007, James Bond, London, 6) Department: (6, MI-6, 10) Located-in: (10, Liverpool, King) 10  building Liverpool  city 10 10 building 6 6 dept -no King MI-6      MI-6 6 London James Bond 007 street dname dept -no city ename e-id
  • 53. Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Example The Full Disjunction under OR-Semantics Employee: (007, James Bond, London, 6) Department: (6, MI-6, 10) Located-in: (10, Liverpool, King) 10 building Liverpool city 10 building 6 dept -no King MI-6 6 London James Bond 007 street dname dept -no city ename e-id
  • 54. The Projection Problem : Computing the projection of the full disjunction on a given set of attributes The Restriction Problem : Computing only those tuples of the full disjunction that are non-null on a given set of attributes Two Related Problems The projection problem and the restriction problem cannot be computed in polynomial time (under input-output complexity) unless P=NP
  • 55. Conclusion Cyclic queries can be computed in polynomial time (in the size of the query, the database and the result) under either OR-semantics or weak semantics A reduction of full-disjunction evaluation to query evaluation under weak semantics is described Using the reduction, full disjunctions can be computed in polynomial time (in the size of the relation schemas, the relations and the result)
  • 56. Conclusion (continued) Full disjunctions can be generalized in two ways By using OR-semantics instead of weak semantics By joining tuples according to general constraints Generalized full disjunctions can be useful in the context of data integration from heterogeneous sources The projection problem and the restriction problem have polynomial-time algorithms (under input-output complexity) when the relations have γ-acyclic schemas, but not in the general case