Full Disjunctions :   Polynomial-Delay Iterators in Action VLDB 2006 Seoul, Korea Sara Cohen  Technion  Israel Yaron Kanza University of Toronto Canada   Benny Kimelfeld  Hebrew University Israel   Yehoshua Sagiv Hebrew University Israel   Itzhak Fadida Technion  Israel
Computing Full Disjunctions The  full disjunction  is a relational operator that  maximally combines  data from several relations It extends the  natural join  by allowing incompleteness It extends the  binary   outerjoin  to  many  relations This paper presents algorithms and optimizations for computing full disjunctions Theoretically, full disjunctions are  more tractable  than previously known Practically, a significant  improvement  over the state-of-art, an  iterator -like evaluation
Contents Full Disjunctions Complexity Contributions Algorithms Algorithm  NLOJ  for Tree-Structured Schemes Algorithm  PDelayFD  for General Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
Contents Full Disjunctions Complexity Contributions Algorithms Algorithm  NLOJ  for Tree-Structured Schemes Algorithm  PDelayFD  for General Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
The Natural  Join  Operator Climates   Accommodations   Sites Climates Accommodations Sites Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
The Natural Join Misses Information Climates Accommodations Sites Climates   Accommodations   Sites Bahamas  is not in  Sites , so the natural join  misses  it temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada Stars Hotel Climate City Site Country
The Natural Join Misses Information Climates Accommodations Climates   Accommodations   Sites Bahamas  is not in  Sites , so the natural join  misses  it Mouth Logan  is not in a city, hence  missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada Empty space  means  null   value
The Natural Join Misses Information Climates Accommodations A  looser  notion of join is needed — one that enables joining tuples from  some  of the tables Climates   Accommodations   Sites Bahamas  is not in  Sites , so the natural join  misses  it Mouth Logan  is not in a city, hence  missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada
The Natural  Join  Operator Climates   Accommodations   Sites Climates Accommodations Sites A  tuple  of the join corresponds to a  set of tuples  from the source relations Join consistent Connected No Cartesian product Complete One tuple from each relation Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
Join-Consistent Sets of Tuples A set  T  of tuples is  join-consistent  if every two tuples of  T  are join-consistent Two tuples  t 1  and  t 2  are  join-consistent   if for every common attribute  A : 1.  t 1 [ A ] and  t 2 [ A ] are  non-null   2.   t 1 [ A ] =  t 2 [ A ] Ramada London Canada Stars Hotel City Country Air Show London Canada Site City Country
Connected Sets of Tuples The nodes are the tuples of  T   An edge between every two tuples with a common attribute The  join graph  of a set   T  of tuples: A set of tuples is  connected  if its  join graph  is connected diverse Canada Climate Country Buckingham London UK Site City Country 4 Plaza Toronto Stars Hotel City
Natural  Join  (w/o Cartesian Product) Each  tuple  of the result corresponds to a set  T  of tuples  from the source relations  T  is  join consistent 1. T  is  connected No Cartesian product 2. T  is  complete One tuple from each relation 3. JCC
Full   Disjunction  (Galindo-Legaria 1994) T  is  join consistent 1. Each  tuple  of the result corresponds to a  set  T  of tuples  from the source relations  T  is  connected No Cartesian product 2. T  is  complete One tuple from each relation 3. T  is  maximal Not properly contained in any JCC set 3. JCC
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
Padding Joined Tuple Sets with Nulls Mouth Logan Canada Site City Country diverse Canada Climate Country Mouth Logan diverse Canada Stars Hotel Climate City Site Country
The Outerjoin Operator The   outerjoin   of two relations  R 1   and   R 2 R 1   R 2 The  natural join   R 1  R 2  and, in addition,  all  dangling tuples   padded with  nulls
Example of an Outerjoin  Climates Accommodations temperate UK tropical Bahamas diverse Canada Climate Country 4 Atala  Paris France Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country temperate UK Hilton Nassau tropical Bahamas diverse Climate Paris   Toronto City Atala Plaza Hotel 4 France 4 Canada Stars Country Climates  Accommodations
Combining Relations using Outerjoins  The outerjoin operator is  not  associative For more than two relations, the result depends on the order in which the outerjoin is applied In general, outerjoins  cannot  maximally combine relations  (no matter what order is used) Outerjoin is not suitable for combining more than two relations !
Contents Full   Disjunctions Complexity Contributions Algorithms Algorithm  NLOJ  for Tree-Structured Schemes Algorithm  PDelayFD  for General Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
Efficiency of Evaluation The full-disjunction operator  (as well as other operators  like the  Cartesian product  or the  natural join )  can generate  an  exponential   (in the input size)   number of tuples Polynomial running time  is not a suitable yardstick The usual notion:   Polynomial time  in the  combined  size of the  input  and the  output
History of Algorithms for Full Disjunctions n : N : F : number of relations number of tuples in the  DB number of tuples in the  FD This paper:   linear  dependence on  F F  is typically very large  Can be  exponential  in the size of the database Source Time Databases RU96 O ( n + F 2 )  -acyclic KS03 O ( n 5  N 2  F 2 ) general CS05 O ( n 3  N  F 2 ) “ incremental polynomial” general
Polynomial Delay One way to obtain an evaluation with a running time  linear in the output is to devise an algorithm that acts as an  iterator  with an efficient  next () operator, that is, An  enumeration algorithm  that runs with  polynomial delay An  enumeration algorithm  runs with  polynomial delay  if the time between every two successive answers is  polynomial in the size of the input time
Other Benefits of Polynomial Delay Incremental evaluation First tuples are generated quickly Full disjunctions are large, yet the user need not wait for the whole result to be generated Suitable for Web applications, where users expect to get the first few pages quickly In addition, the user can decide anytime that enough information has been shown Enable parallel query processing While one processor generates the FD tuples, other processors apply further processing
Contents Full Disjunctions Complexity Contributions Algorithms Algorithm  NLOJ  for Tree-Structured Schemes Algorithm  PDelayFD  for General Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
Main Contributions 1.   First algorithm for computing full disjunctions with   polynomial delay 2.  First algorithm for computing full disjunctions in time  linear  in the output 3.  A general  optimization   technique   for computing full disjunctions Division into  biconnected components Substantial improvement over the state-of-art  is proved  theoretically  and  experimentally
Contents Full   Disjunctions Complexity Contributions Algorithms Algorithm  NLOJ  for Tree-Structured Schemes Algorithm  PDelayFD  for General Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
Our Algorithms  Algorithm  NLOJ Tree Schemes Algorithm  PDelayFD General   Schemes Division into  Biconnected Components Optimization Algorithm  BiComNLOJ Main Algorithm   −  General   Schemes Combine
Contents Full Disjunctions Complexity Contributions Algorithms Algorithm  NLOJ  for Tree-Structured Schemes Algorithm  PDelayFD  for General Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
Tree Schemes Scheme graphs w/o cycles In the  scheme graph , the relation schemes are the  nodes  and there is an  edge  between every two schemes with one or more  common attributes R 1 R 2 R 3 R 4 R 5 R 6 R 7
Left-Deep Sequence of Outerjoins R   : a set of relations with a tree scheme R 1 ,…, R n   :  a  connected-prefix  order of  R 1.  Compute a  connected-prefix  order of  R 2.  Apply  outerjoins  in a left-deep order FD ( R ) = (…(( R 1   R 2 )  R 3 )  …)  R n Proposition: Algorithm   NLOJ  ( N ested  L oop  O uter J oin)
Connected-Prefix Order of Relations R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 1 R 3 R 2 R 7 R 4 R 5 R 6 A   connected-prefix   order of relations: Each prefix forms a  (connected)  subtree
Achieving Polynomial Delay 1.  Compute a  connected-prefix  order of  R 2.  Apply  outerjoins  in a left-deep order R 1 … Problem:  exp. delay Solution:  use  iterators Algorithm   NLOJ  ( N ested  L oop  O uter J oin) R 2 R 3 R n -1 R n Already exponential size !
Iterators Algorithm Operate on top of an enumeration algorithm  Implement  next ()  by controlling the execution To obtain polynomial delay, we use  iterators Iterator next ()
Using Iterators for Outerjoins R 1 … Iterator 1 Iterator  n Iterator 2 Iterator  n -1 R 2 R 3 R n -1 R n
Outerjoins are not Always Applicable It is  not  always possible to formulate a  full disjunction as a  left-deep sequence of outerjoins  Rajaraman and Ullman   [PODS 96] :   Some full disjunctions cannot be formulated as expressions of outerjoins  (i.e., with arbitrary placement of parentheses)
Contents Full Disjunctions Complexity Contributions Algorithms Algorithm   NLOJ   for Tree-Structured Schemes Algorithm  PDelayFD   for   General   Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
About the Algorithm Unlike  NLOJ , the next algorithm,  PDelayFD , is applicable to all schemes  (and not just trees) Algorithm  PDelayFD  has a  polynomial delay , but the delay is   larger   than that of  NLOJ Nevertheless,  PDelayFD  by itself is a significant improvement over the state-of-art
Shifting a Maximal JCC Tuple Set  T t -shifting  T : t t t t -shift of  T 1.   Add  t  to  T 2.   Extract   max. JCC subset containing  t 3.  Extend  to a maximal JCC set T
Algorithm  PDelayFD Validate that the  t -shift is not already in  Q  or   C  1.  Generate a max. JCC set  T 0 2.   Insert  T 0  into  Q Repeat until  Q  is empty : 1.  Move some  T  from  Q  to   C 2.   Print the join of   T , padded with nulls 3.   Insert into  Q   a  t -shift of  T   for all  tuples  t  in the database Output : … PDelayFD ( R ) computes FD ( R )   with polynomial delay C Q Theorem:
Contents Full Disjunctions Complexity Contributions Algorithms Algorithm   NLOJ   for Tree-Structured Schemes Algorithm   PDelayFD  for General Schemes Algorithm   BiComNLOJ   − Main Algorithm Experimental Results Conclusion
NLOJ  vs.  PDelayFD NLOJ PDelayFD ? Our approach:  divide and conquer Shorter delays Less space Simpler to impl. R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1 R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1 R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1
Biconnected Components R 1 R 2 R 3 R 4 R 7 R 5 R 6 R 8 Biconnected component : A maximal subset  B   of relations, s.t. the scheme graph has two  (or more)  disjoint paths  between every two relations of   B R 1 R 2 R 4 R 7 R 8 R 9 R 5 R 6 R 3
Left-Deep Sequence of Outerjoins R   : a set of relations Theorem: Optimized Algorithm: 1.  Compute the biconnected components of  R 2.  Compute the full disjunction of each component 3.  Apply  outerjoins  in a suitable order There exists an  (efficiently computable)  order  B 1 ,…, B k  of the biconnected components of   R , s.t . FD ( R ) = (…(( FD ( B 1 )  FD ( B 2 ))  …)  FD ( B k )
BiComNLOJ : a Naïve Attempt 1.   Divide   R   into biconnected components ->   B 1 ,… B k  in a suitable order 2.   Compute   FD ( B 1 ) ,…, FD ( B k )  −  using  PDelayFD   3.  U sing  NLOJ , compute   (…(( FD ( B 1 )  FD ( B 2 ))  …)  FD ( B k )   Each  FD ( B i )  can be exponential in the input Non-polynomial delay! Solution: Iterator  Iterator  Iterator
Retaining Polynomial Delay: 1 st  Problem After generating a tuple  t  of  FD ( B 1 ) , we need to generate all tuples of  FD ( B 2 )  that can join  t Non-polynomial delay  if all of  FD ( B 2 )  is computed for finding these tuples ! Solution:   PDelayFD  can be modified so that it generates only those tuples of  FD ( B 2 )  that can join  t For simplification, assume only two components R 2 R 3 R 1 R 4 R 6 R 7 R 5 R 8 B 1 B 2 Details in the proceedings…
Retaining Polynomial Delay: 2 nd  Problem The last step is to generate all tuples of  FD ( B 2 )  that  cannot  be joined with tuples of  FD ( B 1 ) However, this task is by itself  NP-hard ! Solution:  When generating all tuples of  FD ( B 2 )  that can be joined with some tuple of  FD ( B 1 ) , we collect  enough information  for generating the remaining tuples of  FD ( B 2 ) For simplification, assume only two components Details in the proceedings… R 2 R 3 R 1 R 4 R 6 R 7 R 5 R 8 B 1 B 2
Contents Full Disjunctions Complexity Contributions Algorithms Algorithm   NLOJ   for Tree-Structured Schemes Algorithm   PDelayFD  for General Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
Experimental Setting Algorithms:   PDelayFD ,   BiComNLOJ   (main)   IncrementalFD   (CS05, state-of-art) PosgreSQL   (open source) HW:  Pentium 4 ,  1.6 GHZ,  512 MB RAM  Synthetic   data   (randomly generated) Fixed schemes Implementation R 3 R 1 R 5 R 2 R 4 R 6 R 9 R 8 R 7 R 10 Scheme  S 1 R 3 R 1 R 7 R 5 R 8 R 2 R 4 R 6 R 10 R 9 Scheme  S 2 R 2 R 5 R 1 R 4 R 9 R 10 R 8 R 7 R 6 R 3 Scheme  S 3
State-of-Art vs. Main Algorithm Number of Tuples in each Relation Average Delay  (msec) IncrementalFD   (state of art, CS05) BiComNJOJ our main algorithm BiComNLOJ  is a substantial improvement over the state-of-art  Scheme  1 Scheme  2 Scheme  3
Division into Biconnected Components Number of Tuples in each Relation Average Delay  (msec) Division reduces delays (amount depends on the scheme) PDelayFD   (no division to b.c.c.) BiComNJOJ our main algorithm Scheme  1 Scheme  2 Scheme  3
Behavior of Delay IncrementalFD   (state of art, CS05) BiComNJOJ our main algorithm Tuple Number Delay (msec) Measure the delay before each generated tuple While  IncrementalFD   has a  slowdown , the delay of  BiComNLOJ  remains  almost constant
Contents Full Disjunctions Complexity Contributions Algorithms Algorithm   NLOJ   for Tree-Structured Schemes Algorithm   PDelayFD  for General Schemes Algorithm  BiComNLOJ  − Main Algorithm Experimental Results Conclusion
Summary Full Disjunction :   An  associative  extension of the  outerjoin  operator to an  arbitrary  number of relations 3  Algorithms for computing FD: NLOJ N ested- L oop  O uter j oin Tree-Structured Schemes PDelayFD P olynomial- Delay   F ull  D isjunction General Schemes BiComNLOJ Combine first 2, deploy div. into  bi connected  com ponents General Schemes
Contributions Substantial  improvement of evaluation time  over  the state-of-art  Proved  theoretically  and  experimentally   Full disjunctions can be computed with  polynomial delay  and in time  linear  in the output size Optimization  techniques for computing FDs Implementation within  PostgreSQL  ( ongoing …) Incorporating our algorithms into an  SQL optimizer E.g., some operators can be pushed through the FD Not discussed here, appears in the proceedings…
Thank you. Questions?

More Related Content

PPT
Full Disjunction
PPT
Computing FDs
PPT
J2 Ee Overview
PPT
Python Intro For Managers
PPT
Pods2003
PPT
Semantic Search Engines
PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
Full Disjunction
Computing FDs
J2 Ee Overview
Python Intro For Managers
Pods2003
Semantic Search Engines
2024 Trend Updates: What Really Works In SEO & Content Marketing
Storytelling For The Web: Integrate Storytelling in your Design Process

Recently uploaded (20)

PDF
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
PPTX
PROFITS AND GAINS OF BUSINESS OR PROFESSION 2024.pptx
PDF
GVCParticipation_Automation_Climate_India
PPT
KPMG FA Benefits Report_FINAL_Jan 27_2010.ppt
PDF
THE EFFECT OF FOREIGN AID ON ECONOMIC GROWTH IN ETHIOPIA
PDF
Pitch Deck.pdf .pdf all about finance in
PPTX
Lesson Environment and Economic Growth.pptx
PDF
Financial discipline for educational purpose
PDF
Statistics for Management and Economics Keller 10th Edition by Gerald Keller ...
PDF
Principal of magaement is good fundamentals in economics
PDF
USS pension Report and Accounts 2025.pdf
PDF
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
PDF
Truxton Capital: Middle Market Quarterly Review - August 2025
PDF
International Financial Management, 9th Edition, Cheol Eun, Bruce Resnick Tuu...
PDF
2012_The dark side of valuation a jedi guide to valuing difficult to value co...
PPTX
lesson in englishhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
PDF
Pension Trustee Training (1).pdf From Salih Shah
PDF
5-principles-of-PD-design.pdfvvvhvjvvcjcxhhcjb ggfvjhvjjhbvbbbvccxhgcxzzghjbv...
PPTX
Machine Learning (ML) is a branch of Artificial Intelligence (AI)
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
PROFITS AND GAINS OF BUSINESS OR PROFESSION 2024.pptx
GVCParticipation_Automation_Climate_India
KPMG FA Benefits Report_FINAL_Jan 27_2010.ppt
THE EFFECT OF FOREIGN AID ON ECONOMIC GROWTH IN ETHIOPIA
Pitch Deck.pdf .pdf all about finance in
Lesson Environment and Economic Growth.pptx
Financial discipline for educational purpose
Statistics for Management and Economics Keller 10th Edition by Gerald Keller ...
Principal of magaement is good fundamentals in economics
USS pension Report and Accounts 2025.pdf
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
Truxton Capital: Middle Market Quarterly Review - August 2025
International Financial Management, 9th Edition, Cheol Eun, Bruce Resnick Tuu...
2012_The dark side of valuation a jedi guide to valuing difficult to value co...
lesson in englishhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Pension Trustee Training (1).pdf From Salih Shah
5-principles-of-PD-design.pdfvvvhvjvvcjcxhhcjb ggfvjhvjjhbvbbbvccxhgcxzzghjbv...
Machine Learning (ML) is a branch of Artificial Intelligence (AI)
Ad
Ad

Full Disjunction

  • 1. Full Disjunctions : Polynomial-Delay Iterators in Action VLDB 2006 Seoul, Korea Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University Israel Yehoshua Sagiv Hebrew University Israel Itzhak Fadida Technion Israel
  • 2. Computing Full Disjunctions The full disjunction is a relational operator that maximally combines data from several relations It extends the natural join by allowing incompleteness It extends the binary outerjoin to many relations This paper presents algorithms and optimizations for computing full disjunctions Theoretically, full disjunctions are more tractable than previously known Practically, a significant improvement over the state-of-art, an iterator -like evaluation
  • 3. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 4. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 5. The Natural Join Operator Climates Accommodations Sites Climates Accommodations Sites Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
  • 6. The Natural Join Misses Information Climates Accommodations Sites Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada Stars Hotel Climate City Site Country
  • 7. The Natural Join Misses Information Climates Accommodations Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it Mouth Logan is not in a city, hence missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada Empty space means null value
  • 8. The Natural Join Misses Information Climates Accommodations A looser notion of join is needed — one that enables joining tuples from some of the tables Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it Mouth Logan is not in a city, hence missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada
  • 9. The Natural Join Operator Climates Accommodations Sites Climates Accommodations Sites A tuple of the join corresponds to a set of tuples from the source relations Join consistent Connected No Cartesian product Complete One tuple from each relation Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
  • 10. Join-Consistent Sets of Tuples A set T of tuples is join-consistent if every two tuples of T are join-consistent Two tuples t 1 and t 2 are join-consistent if for every common attribute A : 1. t 1 [ A ] and t 2 [ A ] are non-null 2. t 1 [ A ] = t 2 [ A ] Ramada London Canada Stars Hotel City Country Air Show London Canada Site City Country
  • 11. Connected Sets of Tuples The nodes are the tuples of T An edge between every two tuples with a common attribute The join graph of a set T of tuples: A set of tuples is connected if its join graph is connected diverse Canada Climate Country Buckingham London UK Site City Country 4 Plaza Toronto Stars Hotel City
  • 12. Natural Join (w/o Cartesian Product) Each tuple of the result corresponds to a set T of tuples from the source relations T is join consistent 1. T is connected No Cartesian product 2. T is complete One tuple from each relation 3. JCC
  • 13. Full Disjunction (Galindo-Legaria 1994) T is join consistent 1. Each tuple of the result corresponds to a set T of tuples from the source relations T is connected No Cartesian product 2. T is complete One tuple from each relation 3. T is maximal Not properly contained in any JCC set 3. JCC
  • 14. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country
  • 15. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada
  • 16. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada
  • 17. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada
  • 18. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
  • 19. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
  • 20. Padding Joined Tuple Sets with Nulls Mouth Logan Canada Site City Country diverse Canada Climate Country Mouth Logan diverse Canada Stars Hotel Climate City Site Country
  • 21. The Outerjoin Operator The outerjoin of two relations R 1 and R 2 R 1 R 2 The natural join R 1 R 2 and, in addition, all dangling tuples padded with nulls
  • 22. Example of an Outerjoin Climates Accommodations temperate UK tropical Bahamas diverse Canada Climate Country 4 Atala Paris France Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country temperate UK Hilton Nassau tropical Bahamas diverse Climate Paris Toronto City Atala Plaza Hotel 4 France 4 Canada Stars Country Climates Accommodations
  • 23. Combining Relations using Outerjoins The outerjoin operator is not associative For more than two relations, the result depends on the order in which the outerjoin is applied In general, outerjoins cannot maximally combine relations (no matter what order is used) Outerjoin is not suitable for combining more than two relations !
  • 24. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 25. Efficiency of Evaluation The full-disjunction operator (as well as other operators like the Cartesian product or the natural join ) can generate an exponential (in the input size) number of tuples Polynomial running time is not a suitable yardstick The usual notion: Polynomial time in the combined size of the input and the output
  • 26. History of Algorithms for Full Disjunctions n : N : F : number of relations number of tuples in the DB number of tuples in the FD This paper: linear dependence on F F is typically very large Can be exponential in the size of the database Source Time Databases RU96 O ( n + F 2 )  -acyclic KS03 O ( n 5  N 2  F 2 ) general CS05 O ( n 3  N  F 2 ) “ incremental polynomial” general
  • 27. Polynomial Delay One way to obtain an evaluation with a running time linear in the output is to devise an algorithm that acts as an iterator with an efficient next () operator, that is, An enumeration algorithm that runs with polynomial delay An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input time
  • 28. Other Benefits of Polynomial Delay Incremental evaluation First tuples are generated quickly Full disjunctions are large, yet the user need not wait for the whole result to be generated Suitable for Web applications, where users expect to get the first few pages quickly In addition, the user can decide anytime that enough information has been shown Enable parallel query processing While one processor generates the FD tuples, other processors apply further processing
  • 29. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 30. Main Contributions 1. First algorithm for computing full disjunctions with polynomial delay 2. First algorithm for computing full disjunctions in time linear in the output 3. A general optimization technique for computing full disjunctions Division into biconnected components Substantial improvement over the state-of-art is proved theoretically and experimentally
  • 31. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 32. Our Algorithms Algorithm NLOJ Tree Schemes Algorithm PDelayFD General Schemes Division into Biconnected Components Optimization Algorithm BiComNLOJ Main Algorithm − General Schemes Combine
  • 33. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 34. Tree Schemes Scheme graphs w/o cycles In the scheme graph , the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes R 1 R 2 R 3 R 4 R 5 R 6 R 7
  • 35. Left-Deep Sequence of Outerjoins R : a set of relations with a tree scheme R 1 ,…, R n : a connected-prefix order of R 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order FD ( R ) = (…(( R 1 R 2 ) R 3 ) …) R n Proposition: Algorithm NLOJ ( N ested L oop O uter J oin)
  • 36. Connected-Prefix Order of Relations R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 1 R 3 R 2 R 7 R 4 R 5 R 6 A connected-prefix order of relations: Each prefix forms a (connected) subtree
  • 37. Achieving Polynomial Delay 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order R 1 … Problem: exp. delay Solution: use iterators Algorithm NLOJ ( N ested L oop O uter J oin) R 2 R 3 R n -1 R n Already exponential size !
  • 38. Iterators Algorithm Operate on top of an enumeration algorithm Implement next () by controlling the execution To obtain polynomial delay, we use iterators Iterator next ()
  • 39. Using Iterators for Outerjoins R 1 … Iterator 1 Iterator n Iterator 2 Iterator n -1 R 2 R 3 R n -1 R n
  • 40. Outerjoins are not Always Applicable It is not always possible to formulate a full disjunction as a left-deep sequence of outerjoins Rajaraman and Ullman [PODS 96] : Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)
  • 41. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 42. About the Algorithm Unlike NLOJ , the next algorithm, PDelayFD , is applicable to all schemes (and not just trees) Algorithm PDelayFD has a polynomial delay , but the delay is larger than that of NLOJ Nevertheless, PDelayFD by itself is a significant improvement over the state-of-art
  • 43. Shifting a Maximal JCC Tuple Set T t -shifting T : t t t t -shift of T 1. Add t to T 2. Extract max. JCC subset containing t 3. Extend to a maximal JCC set T
  • 44. Algorithm PDelayFD Validate that the t -shift is not already in Q or C 1. Generate a max. JCC set T 0 2. Insert T 0 into Q Repeat until Q is empty : 1. Move some T from Q to C 2. Print the join of T , padded with nulls 3. Insert into Q a t -shift of T for all tuples t in the database Output : … PDelayFD ( R ) computes FD ( R ) with polynomial delay C Q Theorem:
  • 45. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 46. NLOJ vs. PDelayFD NLOJ PDelayFD ? Our approach: divide and conquer Shorter delays Less space Simpler to impl. R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1 R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1 R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1
  • 47. Biconnected Components R 1 R 2 R 3 R 4 R 7 R 5 R 6 R 8 Biconnected component : A maximal subset B of relations, s.t. the scheme graph has two (or more) disjoint paths between every two relations of B R 1 R 2 R 4 R 7 R 8 R 9 R 5 R 6 R 3
  • 48. Left-Deep Sequence of Outerjoins R : a set of relations Theorem: Optimized Algorithm: 1. Compute the biconnected components of R 2. Compute the full disjunction of each component 3. Apply outerjoins in a suitable order There exists an (efficiently computable) order B 1 ,…, B k of the biconnected components of R , s.t . FD ( R ) = (…(( FD ( B 1 ) FD ( B 2 )) …) FD ( B k )
  • 49. BiComNLOJ : a Naïve Attempt 1. Divide R into biconnected components -> B 1 ,… B k in a suitable order 2. Compute FD ( B 1 ) ,…, FD ( B k ) − using PDelayFD 3. U sing NLOJ , compute (…(( FD ( B 1 ) FD ( B 2 )) …) FD ( B k ) Each FD ( B i ) can be exponential in the input Non-polynomial delay! Solution: Iterator Iterator Iterator
  • 50. Retaining Polynomial Delay: 1 st Problem After generating a tuple t of FD ( B 1 ) , we need to generate all tuples of FD ( B 2 ) that can join t Non-polynomial delay if all of FD ( B 2 ) is computed for finding these tuples ! Solution: PDelayFD can be modified so that it generates only those tuples of FD ( B 2 ) that can join t For simplification, assume only two components R 2 R 3 R 1 R 4 R 6 R 7 R 5 R 8 B 1 B 2 Details in the proceedings…
  • 51. Retaining Polynomial Delay: 2 nd Problem The last step is to generate all tuples of FD ( B 2 ) that cannot be joined with tuples of FD ( B 1 ) However, this task is by itself NP-hard ! Solution: When generating all tuples of FD ( B 2 ) that can be joined with some tuple of FD ( B 1 ) , we collect enough information for generating the remaining tuples of FD ( B 2 ) For simplification, assume only two components Details in the proceedings… R 2 R 3 R 1 R 4 R 6 R 7 R 5 R 8 B 1 B 2
  • 52. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 53. Experimental Setting Algorithms: PDelayFD , BiComNLOJ (main) IncrementalFD (CS05, state-of-art) PosgreSQL (open source) HW: Pentium 4 , 1.6 GHZ, 512 MB RAM Synthetic data (randomly generated) Fixed schemes Implementation R 3 R 1 R 5 R 2 R 4 R 6 R 9 R 8 R 7 R 10 Scheme S 1 R 3 R 1 R 7 R 5 R 8 R 2 R 4 R 6 R 10 R 9 Scheme S 2 R 2 R 5 R 1 R 4 R 9 R 10 R 8 R 7 R 6 R 3 Scheme S 3
  • 54. State-of-Art vs. Main Algorithm Number of Tuples in each Relation Average Delay (msec) IncrementalFD (state of art, CS05) BiComNJOJ our main algorithm BiComNLOJ is a substantial improvement over the state-of-art Scheme 1 Scheme 2 Scheme 3
  • 55. Division into Biconnected Components Number of Tuples in each Relation Average Delay (msec) Division reduces delays (amount depends on the scheme) PDelayFD (no division to b.c.c.) BiComNJOJ our main algorithm Scheme 1 Scheme 2 Scheme 3
  • 56. Behavior of Delay IncrementalFD (state of art, CS05) BiComNJOJ our main algorithm Tuple Number Delay (msec) Measure the delay before each generated tuple While IncrementalFD has a slowdown , the delay of BiComNLOJ remains almost constant
  • 57. Contents Full Disjunctions Complexity Contributions Algorithms Algorithm NLOJ for Tree-Structured Schemes Algorithm PDelayFD for General Schemes Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion
  • 58. Summary Full Disjunction : An associative extension of the outerjoin operator to an arbitrary number of relations 3 Algorithms for computing FD: NLOJ N ested- L oop O uter j oin Tree-Structured Schemes PDelayFD P olynomial- Delay F ull D isjunction General Schemes BiComNLOJ Combine first 2, deploy div. into bi connected com ponents General Schemes
  • 59. Contributions Substantial improvement of evaluation time over the state-of-art Proved theoretically and experimentally Full disjunctions can be computed with polynomial delay and in time linear in the output size Optimization techniques for computing FDs Implementation within PostgreSQL ( ongoing …) Incorporating our algorithms into an SQL optimizer E.g., some operators can be pushed through the FD Not discussed here, appears in the proceedings…