SlideShare a Scribd company logo
Distributed Query Processing
Agenda
• Recap of query optimization
• Transformation rules for P&D systems
• Memoization
• Query evaluation strategies
• Eddies
Introduction
• Alternative ways of evaluating a given query
– Equivalent expressions
– Different algorithms for each operation (Chapter 13)
• Cost difference between a good and a bad way of evaluating a query can
be enormous
– Example: performing a r X s followed by a selection r.A = s.B is much
slower than performing a join on the same condition
• Need to estimate the cost of operations
– Depends critically on statistical information about relations which the
database must maintain
– Need to estimate statistics for intermediate results to compute cost of
complex expressions
Introduction (Cont.)
Relations generated by two equivalent expressions have the same
set of attributes and contain the same set of tuples, although their
attributes may be ordered differently.
Introduction (Cont.)
• Generation of query-evaluation plans for an
expression involves several steps:
1. Generating logically equivalent expressions
• Use equivalence rules to transform an expression into an
equivalent one.
2. Annotating resultant expressions to get alternative
query plans
3. Choosing the cheapest plan based on estimated cost
• The overall process is called cost based
optimization.
Equivalence Rules
1. Conjunctive selection operations can be
deconstructed into a sequence of individual
selections.
2. Selection operations are commutative.
3. Only the last in a sequence of projection
operations is needed, the others can be omitted.
4. Selections can be combined with Cartesian
products and theta joins.
a. (E1 X E2) = E1  E2
b. 1(E1 2 E2) = E1 1 2 E2
))
(
(
))
(
( 1
2
2
1
E
E 


 


 
))
(
(
)
( 2
1
2
1
E
E 


 

 

)
(
))
))
(
(
(
( 1
2
1
E
E t
tn
t
t 



 

Equivalence Rules (Cont.)
5. Theta-join operations (and natural joins) are
commutative.
E1  E2 = E2  E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)
(b) Theta joins are associative in the following manner:
(E1 1 E2) 2  3 E3 = E1 2 3 (E2 2 E3)
where 2 involves attributes from only E2 and E3.
Pictorial Depiction of Equivalence
Rules
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join
operation under the following two conditions:
(a) When all the attributes in 0 involve only the
attributes of one of the expressions (E1) being joined.
0E1  E2) = (0(E1))  E2
(b) When  1 involves only the attributes of E1 and 2
involves only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))
Equivalence Rules (Cont.)
8.The projections operation distributes over the
theta join operation as follows:
(a) if L involves only attributes from L1  L2:
(b) Consider a join E1  E2.
– Let L1 and L2 be sets of attributes from E1 and E2,
respectively.
– Let L3 be attributes of E1 that are involved in join
condition , but are not in L1  L2, and
– let L4 be attributes of E2 that are involved in join
condition , but are not in L1  L2.
))
(
(
))
(
(
)
( 2
......
1
2
.......
1 2
1
2
1
E
E
E
E L
L
L
L 


  

)))
(
(
))
(
((
)
.....
( 2
......
1
2
1 4
2
3
1
2
1
2
1
E
E
E
E L
L
L
L
L
L
L
L 


 



 

Equivalence Rules (Cont.)
9. The set operations union and intersection are commutative
E1  E2 = E2  E1
E1  E2 = E2  E1
9. (set difference is not commutative).
10. Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
9. The selection operation distributes over ,  and –.
 (E1 – E2) =  (E1) – (E2)
and similarly for  and  in place of –
Also:  (E1 – E2) = (E1) – E2
and similarly for  in place of –, but not for 
12. The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
Multiple Transformations (Cont.)
Optimizer strategies
• Heuristic
– Apply the transformation rules in a specific order
such that the cost converges to a minimum
• Cost based
– Simulated annealing
– Randomized generation of candidate QEP
– Problem, how to guarantee randomness
Memoization Techniques
• How to generate alternative Query Evaluation Plans?
– Early generation systems centred around a tree representation of the
plan
– Hardwired tree rewriting rules are deployed to enumerate part of the
space of possible QEP
– For each alternative the total cost is determined
– The best (alternatives) are retained for execution
– Problems: very large space to explore, duplicate plans, local maxima,
expensive query cost evaluation.
– SQL Server optimizer contains about 300 rules to be deployed.
Memoization Techniques
• How to generate alternative Query Evaluation Plans?
– Keep a memo of partial QEPs and their cost.
– Use the heuristic rules to generate alternatives to
built more complex QEPs
– r1 r2 r3 r4
r1 r2 r2 r3 r3 r4 r1 r4
x
Level 1 plans
r3 r3
Level 2 plans
Level n plans
r4
r2 r1
Distributed Query Processing
• For centralized systems, the primary criterion
for measuring the cost of a particular strategy
is the number of disk accesses.
• In a distributed system, other issues must be
taken into account:
– The cost of a data transmission over the network.
– The potential gain in performance from having
several sites process parts of the query in parallel.
Transformation rules for
distributed systems
• Primary horizontally fragmented table:
– Rule 9: The union is commutative
E1  E2 = E2  E1
– Rule 10: Set union is associative.
(E1  E2)  E3 = E1  (E2  E3)
– Rule 12: The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
• Derived horizontally fragmented table:
– The join through foreign-key dependency is already reflected in the
fragmentation criteria
Transformation rules for
distributed systems
Vertical fragmented tables:
– Rules: Hint look at projection rules
Optimization in Par & Distr
• Cost model is changed!!!
– Network transport is a dominant cost factor
• The facilities for query processing are not
homogenous distributed
– Light-resource systems form a bottleneck
– Need for dynamic load scheduling
Simple Distributed Join Processing
• Consider the following relational algebra expression
in which the three relations are neither replicated
nor fragmented
account depositor branch
• account is stored at site S1
• depositor at S2
• branch at S3
• For a query issued at site SI, the system needs to
produce the result at site SI
Possible Query Processing
Strategies
• Ship copies of all three relations to site SI and choose a strategy
for processing the entire locally at site SI.
• Ship a copy of the account relation to site S2 and compute temp1
= account depositor at S2. Ship temp1 from S2 to S3, and compute
temp2 = temp1 branch at S3. Ship the result temp2 to SI.
• Devise similar strategies, exchanging the roles S1, S2, S3
• Must consider following factors:
– amount of data being shipped
– cost of transmitting a data block between sites
– relative processing speed at each site
Semijoin Strategy
• Let r1 be a relation with schema R1 stores at site S1
Let r2 be a relation with schema R2 stores at site S2
• Evaluate the expression r1 r2 and obtain the result at S1.
1. Compute temp1  R1  R2 (r1) at S1.
2. Ship temp1 from S1 to S2.
3. Compute temp2  r2 temp1 at S2
4. Ship temp2 from S2 to S1.
5. Compute r1 temp2 at S1. This is the same as r1 r2.
Formal Definition
• The semijoin of r1 with r2, is denoted by:
r1 r2
• it is defined by:
R1 (r1 r2)
• Thus, r1 r2 selects those tuples of r1 that contributed to r1 r2.
• In step 3 above, temp2=r2 r1.
• For joins of several relations, the above strategy can be
extended to a series of semijoin steps.
Join Strategies that Exploit Parallelism
• Consider r1 r2 r3 r4 where relation ri is stored at site Si. The
result must be presented at site S1.
• r1 is shipped to S2 and r1 r2 is computed at S2: simultaneously r3 is
shipped to S4 and r3 r4 is computed at S4
• S2 ships tuples of (r1 r2) to S1 as they produced;
S4 ships tuples of (r3 r4) to S1
• Once tuples of (r1 r2) and (r3 r4) arrive at S1 (r1 r2) (r3 r4) is
computed in parallel with the computation of (r1 r2) at S2 and the
computation of (r3 r4) at S4.
Query plan generation
• Apers-Aho-Hopcroft
– Hill-climber, repeatedly split the multi-join query
in fragments and optimize its subqueries
independently
• Apply centralized algorithms and rely on cost-
model to avoid expensive query execution
plans.
Query evaluators
Query evaluation strategy
• Pipe-line query evaluation strategy
– Called Volcano query processing model
– Standard in commercial systems and MySQL
• Basic algorithm:
– Demand-driven evaluation of query tree.
– Operators exchange data in units such as records
– Each operator supports the following interfaces:– open, next, close
• open() at top of tree results in cascade of opens down the tree.
• An operator getting a next() call may recursively make next() calls
from within to produce its next answer.
• close() at top of tree results in cascade of close down the tree
Query evaluation strategy
• Pipe-line query evaluation strategy
– Evaluation:
• Oriented towards OLTP applications
– Granule size of data interchange
• Items produced one at a time
• No temporary files
– Choice of intermediate buffer size allocations
• Query executed as one process
• Generic interface, sufficient to add the iterator primitives for the
new containers.
• CPU intensive
• Amenable to parallelization
Query evaluation strategy
• Materialized evaluation strategy
– Used in MonetDB
– Basic algorithm:
• for each relational operator produce the complete intermediate
result using materialized operands
– Evaluation:
• Oriented towards decision support queries
• Limited internal administration and dependencies
• Basis for multi-query optimization strategy
• Memory intensive
• Amendable for distributed/parallel processing
Eddies: Continuously Adaptive
Query processing
R. Avnur, J.M. Hellerstein
UCB
ACM Sigmod 2000
Problem Statement
• Context: large federated and shared-nothing databases
• Problem: assumptions made at query optimization rarely
hold during execution
• Hypothesis: do away with traditional optimizers, solve it
thru adaptation
• Focus: scheduling in a tuple-based pipeline query
execution model
Problem Statement Refinement
• Large scale systems are unpredictable, because
– Hardware and workload complexity,
• bursty servers & networks, heterogenity, hardware
characteristics
– Data complexity,
• Federated database often come without proper statistical
summaries
– User Interface Complexity
• Online aggregation may involve user ‘control’
The Idea
• Relational algebra operators consume a stream from
multiple sources to produce a new stream
• A priori you don’t now how selective- how fast- tuples are
consumed/produced
• You have to adapt continuously and learn this information
on the fly
• Adapt the order of processing based on these lessons
The Idea
JOIN JOIN
JOIN
next
next next
next
next next
The Idea
• Standard method: derive a spanning tree over the query graph
• Pre-optimize a query plan to determine operator pairs and their algorithm,
e.g. to exploit access paths
• Re-optimization a query pipeline on the fly requires careful state
management, coupled with
– Synchronization barriers
• Operators have widely differing arrival rates for their operands
– This limits concurrency, e.g. merge-join algorithm
– Moments of symmetry
• Algorithm provides option to exchange the role of the operands
without too much complications
– E.g switching the role of R and S in a nested-loop join
Nested-loop
R
s
Join and sorting
• Index-joins are asymmetric, you can not easily change their role
– Combine index-join + operands as a unit in the process
• Sorting requires look-ahead
– Merge-joins are combined into unit
• Ripple joins
– Break the space into smaller pieces and solve the join operation for
each piece individually
– The piece crossings are moments of symmetry
The Idea
Tuple buffer
JOIN
JOIN JOIN
Eddie
next next next next
next next next
next
Rivers and Eddies
Eddies are tuple routers that distribute arriving tuples to interested operators
– What are efficient scheduling policies?
• Fixed strategy? Random ? Learning?
Static Eddies
• Delivery of tuples to operators can be hardwired in the Eddie to reflect a
traditional query execution plan
Naïve Eddie
• Operators are delivered tuples based on a priority queue
• Intermediate results get highest priority to avoid buffer congestion
Observations for selections
• Extended priority queue for the operators
– Receiving a tuple leads to a credit increment
– Returning a tuple leads to a credit decrement
– Priority is determined by “weighted lottery”
• Naïve Eddies exhibit back pressure in the tuple flow; production is limited by
the rate of consumption at the output
• Lottery Eddies approach the cost of optimal ordering, without a need to a
priory determine the order
• Lottery Eddies outperform heuristics
– Hash-use first, or Index-use first, Naive
Observations
• The dynamics during a run can be controlled by a learning scheme
– Split the processing in steps (‘windows’) to re-adjust the weight during
tuple delivery
• Initial delays can not be handled efficiently
• Research challenges:
– Better learning algorithms to adjust flow
– Aggressive adjustments
– Remove pre-optimization
– Balance ‘hostile’ parallel environment
– Deploy eddies to control degree of partitioning (and replication)

More Related Content

PDF
CH5_Query Processing and Optimization.pdf
PDF
Query Optimization - Brandon Latronica
PPTX
Transaction Management, Recovery and Query Processing.pptx
PPTX
Adbms 40 heuristics in query optimization
PPTX
Distributed database
PPT
Query Decomposition and data localization
PPT
ch14.ppt
PPTX
Phases of distributed query processing
CH5_Query Processing and Optimization.pdf
Query Optimization - Brandon Latronica
Transaction Management, Recovery and Query Processing.pptx
Adbms 40 heuristics in query optimization
Distributed database
Query Decomposition and data localization
ch14.ppt
Phases of distributed query processing

Similar to Distributed query processing for Advance database technology .ppt (20)

PPTX
RAMSES: Robust Analytic Models for Science at Extreme Scales
PDF
comp422-534-2020-Lecture2-ConcurrencyDecomposition.pdf
PPT
Relational Algebra
PDF
Query trees
PPTX
Presentación Oracle Database Migración consideraciones 10g/11g/12c
PDF
07.Overview_of_Query_Processing.pdf
PPTX
RQP reverse query processing it's application 2011.pptx
PDF
01Query Processing and Optimization-SUM25.pdf
PPT
design mapping lecture6-mapreducealgorithmdesign.ppt
PDF
[Www.pkbulk.blogspot.com]dbms13
PDF
Exploiting the query structure for efficient join ordering in SPARQL queries
PDF
Hadoop map reduce concepts
PPTX
DB LECTURE 5 QUERY PROCESSING.pptx
PDF
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
PPTX
Searching Algorithms
PPT
Ch7
PDF
MapReduce Algorithm Design - Parallel Reduce Operations
PDF
seminar100326a.pdf
PPTX
Query optimization
PPTX
Chapter 16: Query Optimization PowerPoint
RAMSES: Robust Analytic Models for Science at Extreme Scales
comp422-534-2020-Lecture2-ConcurrencyDecomposition.pdf
Relational Algebra
Query trees
Presentación Oracle Database Migración consideraciones 10g/11g/12c
07.Overview_of_Query_Processing.pdf
RQP reverse query processing it's application 2011.pptx
01Query Processing and Optimization-SUM25.pdf
design mapping lecture6-mapreducealgorithmdesign.ppt
[Www.pkbulk.blogspot.com]dbms13
Exploiting the query structure for efficient join ordering in SPARQL queries
Hadoop map reduce concepts
DB LECTURE 5 QUERY PROCESSING.pptx
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
Searching Algorithms
Ch7
MapReduce Algorithm Design - Parallel Reduce Operations
seminar100326a.pdf
Query optimization
Chapter 16: Query Optimization PowerPoint
Ad

Recently uploaded (20)

PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Lecture1 pattern recognition............
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Miokarditis (Inflamasi pada Otot Jantung)
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
STUDY DESIGN details- Lt Col Maksud (21).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Ppt On Nestle.pptx huunnnhhgfvu
Supervised vs unsupervised machine learning algorithms
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Database Infoormation System (DBIS).pptx
Introduction to machine learning and Linear Models
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction to Knowledge Engineering Part 1
Qualitative Qantitative and Mixed Methods.pptx
Lecture1 pattern recognition............
IBA_Chapter_11_Slides_Final_Accessible.pptx
Ad

Distributed query processing for Advance database technology .ppt

  • 2. Agenda • Recap of query optimization • Transformation rules for P&D systems • Memoization • Query evaluation strategies • Eddies
  • 3. Introduction • Alternative ways of evaluating a given query – Equivalent expressions – Different algorithms for each operation (Chapter 13) • Cost difference between a good and a bad way of evaluating a query can be enormous – Example: performing a r X s followed by a selection r.A = s.B is much slower than performing a join on the same condition • Need to estimate the cost of operations – Depends critically on statistical information about relations which the database must maintain – Need to estimate statistics for intermediate results to compute cost of complex expressions
  • 4. Introduction (Cont.) Relations generated by two equivalent expressions have the same set of attributes and contain the same set of tuples, although their attributes may be ordered differently.
  • 5. Introduction (Cont.) • Generation of query-evaluation plans for an expression involves several steps: 1. Generating logically equivalent expressions • Use equivalence rules to transform an expression into an equivalent one. 2. Annotating resultant expressions to get alternative query plans 3. Choosing the cheapest plan based on estimated cost • The overall process is called cost based optimization.
  • 6. Equivalence Rules 1. Conjunctive selection operations can be deconstructed into a sequence of individual selections. 2. Selection operations are commutative. 3. Only the last in a sequence of projection operations is needed, the others can be omitted. 4. Selections can be combined with Cartesian products and theta joins. a. (E1 X E2) = E1  E2 b. 1(E1 2 E2) = E1 1 2 E2 )) ( ( )) ( ( 1 2 2 1 E E          )) ( ( ) ( 2 1 2 1 E E          ) ( )) )) ( ( ( ( 1 2 1 E E t tn t t       
  • 7. Equivalence Rules (Cont.) 5. Theta-join operations (and natural joins) are commutative. E1  E2 = E2  E1 6. (a) Natural join operations are associative: (E1 E2) E3 = E1 (E2 E3) (b) Theta joins are associative in the following manner: (E1 1 E2) 2  3 E3 = E1 2 3 (E2 2 E3) where 2 involves attributes from only E2 and E3.
  • 8. Pictorial Depiction of Equivalence Rules
  • 9. Equivalence Rules (Cont.) 7. The selection operation distributes over the theta join operation under the following two conditions: (a) When all the attributes in 0 involve only the attributes of one of the expressions (E1) being joined. 0E1  E2) = (0(E1))  E2 (b) When  1 involves only the attributes of E1 and 2 involves only the attributes of E2. 1 E1  E2) = (1(E1))  ( (E2))
  • 10. Equivalence Rules (Cont.) 8.The projections operation distributes over the theta join operation as follows: (a) if L involves only attributes from L1  L2: (b) Consider a join E1  E2. – Let L1 and L2 be sets of attributes from E1 and E2, respectively. – Let L3 be attributes of E1 that are involved in join condition , but are not in L1  L2, and – let L4 be attributes of E2 that are involved in join condition , but are not in L1  L2. )) ( ( )) ( ( ) ( 2 ...... 1 2 ....... 1 2 1 2 1 E E E E L L L L        ))) ( ( )) ( (( ) ..... ( 2 ...... 1 2 1 4 2 3 1 2 1 2 1 E E E E L L L L L L L L           
  • 11. Equivalence Rules (Cont.) 9. The set operations union and intersection are commutative E1  E2 = E2  E1 E1  E2 = E2  E1 9. (set difference is not commutative). 10. Set union and intersection are associative. (E1  E2)  E3 = E1  (E2  E3) (E1  E2)  E3 = E1  (E2  E3) 9. The selection operation distributes over ,  and –.  (E1 – E2) =  (E1) – (E2) and similarly for  and  in place of – Also:  (E1 – E2) = (E1) – E2 and similarly for  in place of –, but not for  12. The projection operation distributes over union L(E1  E2) = (L(E1))  (L(E2))
  • 13. Optimizer strategies • Heuristic – Apply the transformation rules in a specific order such that the cost converges to a minimum • Cost based – Simulated annealing – Randomized generation of candidate QEP – Problem, how to guarantee randomness
  • 14. Memoization Techniques • How to generate alternative Query Evaluation Plans? – Early generation systems centred around a tree representation of the plan – Hardwired tree rewriting rules are deployed to enumerate part of the space of possible QEP – For each alternative the total cost is determined – The best (alternatives) are retained for execution – Problems: very large space to explore, duplicate plans, local maxima, expensive query cost evaluation. – SQL Server optimizer contains about 300 rules to be deployed.
  • 15. Memoization Techniques • How to generate alternative Query Evaluation Plans? – Keep a memo of partial QEPs and their cost. – Use the heuristic rules to generate alternatives to built more complex QEPs – r1 r2 r3 r4 r1 r2 r2 r3 r3 r4 r1 r4 x Level 1 plans r3 r3 Level 2 plans Level n plans r4 r2 r1
  • 16. Distributed Query Processing • For centralized systems, the primary criterion for measuring the cost of a particular strategy is the number of disk accesses. • In a distributed system, other issues must be taken into account: – The cost of a data transmission over the network. – The potential gain in performance from having several sites process parts of the query in parallel.
  • 17. Transformation rules for distributed systems • Primary horizontally fragmented table: – Rule 9: The union is commutative E1  E2 = E2  E1 – Rule 10: Set union is associative. (E1  E2)  E3 = E1  (E2  E3) – Rule 12: The projection operation distributes over union L(E1  E2) = (L(E1))  (L(E2)) • Derived horizontally fragmented table: – The join through foreign-key dependency is already reflected in the fragmentation criteria
  • 18. Transformation rules for distributed systems Vertical fragmented tables: – Rules: Hint look at projection rules
  • 19. Optimization in Par & Distr • Cost model is changed!!! – Network transport is a dominant cost factor • The facilities for query processing are not homogenous distributed – Light-resource systems form a bottleneck – Need for dynamic load scheduling
  • 20. Simple Distributed Join Processing • Consider the following relational algebra expression in which the three relations are neither replicated nor fragmented account depositor branch • account is stored at site S1 • depositor at S2 • branch at S3 • For a query issued at site SI, the system needs to produce the result at site SI
  • 21. Possible Query Processing Strategies • Ship copies of all three relations to site SI and choose a strategy for processing the entire locally at site SI. • Ship a copy of the account relation to site S2 and compute temp1 = account depositor at S2. Ship temp1 from S2 to S3, and compute temp2 = temp1 branch at S3. Ship the result temp2 to SI. • Devise similar strategies, exchanging the roles S1, S2, S3 • Must consider following factors: – amount of data being shipped – cost of transmitting a data block between sites – relative processing speed at each site
  • 22. Semijoin Strategy • Let r1 be a relation with schema R1 stores at site S1 Let r2 be a relation with schema R2 stores at site S2 • Evaluate the expression r1 r2 and obtain the result at S1. 1. Compute temp1  R1  R2 (r1) at S1. 2. Ship temp1 from S1 to S2. 3. Compute temp2  r2 temp1 at S2 4. Ship temp2 from S2 to S1. 5. Compute r1 temp2 at S1. This is the same as r1 r2.
  • 23. Formal Definition • The semijoin of r1 with r2, is denoted by: r1 r2 • it is defined by: R1 (r1 r2) • Thus, r1 r2 selects those tuples of r1 that contributed to r1 r2. • In step 3 above, temp2=r2 r1. • For joins of several relations, the above strategy can be extended to a series of semijoin steps.
  • 24. Join Strategies that Exploit Parallelism • Consider r1 r2 r3 r4 where relation ri is stored at site Si. The result must be presented at site S1. • r1 is shipped to S2 and r1 r2 is computed at S2: simultaneously r3 is shipped to S4 and r3 r4 is computed at S4 • S2 ships tuples of (r1 r2) to S1 as they produced; S4 ships tuples of (r3 r4) to S1 • Once tuples of (r1 r2) and (r3 r4) arrive at S1 (r1 r2) (r3 r4) is computed in parallel with the computation of (r1 r2) at S2 and the computation of (r3 r4) at S4.
  • 25. Query plan generation • Apers-Aho-Hopcroft – Hill-climber, repeatedly split the multi-join query in fragments and optimize its subqueries independently • Apply centralized algorithms and rely on cost- model to avoid expensive query execution plans.
  • 27. Query evaluation strategy • Pipe-line query evaluation strategy – Called Volcano query processing model – Standard in commercial systems and MySQL • Basic algorithm: – Demand-driven evaluation of query tree. – Operators exchange data in units such as records – Each operator supports the following interfaces:– open, next, close • open() at top of tree results in cascade of opens down the tree. • An operator getting a next() call may recursively make next() calls from within to produce its next answer. • close() at top of tree results in cascade of close down the tree
  • 28. Query evaluation strategy • Pipe-line query evaluation strategy – Evaluation: • Oriented towards OLTP applications – Granule size of data interchange • Items produced one at a time • No temporary files – Choice of intermediate buffer size allocations • Query executed as one process • Generic interface, sufficient to add the iterator primitives for the new containers. • CPU intensive • Amenable to parallelization
  • 29. Query evaluation strategy • Materialized evaluation strategy – Used in MonetDB – Basic algorithm: • for each relational operator produce the complete intermediate result using materialized operands – Evaluation: • Oriented towards decision support queries • Limited internal administration and dependencies • Basis for multi-query optimization strategy • Memory intensive • Amendable for distributed/parallel processing
  • 30. Eddies: Continuously Adaptive Query processing R. Avnur, J.M. Hellerstein UCB ACM Sigmod 2000
  • 31. Problem Statement • Context: large federated and shared-nothing databases • Problem: assumptions made at query optimization rarely hold during execution • Hypothesis: do away with traditional optimizers, solve it thru adaptation • Focus: scheduling in a tuple-based pipeline query execution model
  • 32. Problem Statement Refinement • Large scale systems are unpredictable, because – Hardware and workload complexity, • bursty servers & networks, heterogenity, hardware characteristics – Data complexity, • Federated database often come without proper statistical summaries – User Interface Complexity • Online aggregation may involve user ‘control’
  • 33. The Idea • Relational algebra operators consume a stream from multiple sources to produce a new stream • A priori you don’t now how selective- how fast- tuples are consumed/produced • You have to adapt continuously and learn this information on the fly • Adapt the order of processing based on these lessons
  • 34. The Idea JOIN JOIN JOIN next next next next next next
  • 35. The Idea • Standard method: derive a spanning tree over the query graph • Pre-optimize a query plan to determine operator pairs and their algorithm, e.g. to exploit access paths • Re-optimization a query pipeline on the fly requires careful state management, coupled with – Synchronization barriers • Operators have widely differing arrival rates for their operands – This limits concurrency, e.g. merge-join algorithm – Moments of symmetry • Algorithm provides option to exchange the role of the operands without too much complications – E.g switching the role of R and S in a nested-loop join
  • 37. Join and sorting • Index-joins are asymmetric, you can not easily change their role – Combine index-join + operands as a unit in the process • Sorting requires look-ahead – Merge-joins are combined into unit • Ripple joins – Break the space into smaller pieces and solve the join operation for each piece individually – The piece crossings are moments of symmetry
  • 38. The Idea Tuple buffer JOIN JOIN JOIN Eddie next next next next next next next next
  • 39. Rivers and Eddies Eddies are tuple routers that distribute arriving tuples to interested operators – What are efficient scheduling policies? • Fixed strategy? Random ? Learning? Static Eddies • Delivery of tuples to operators can be hardwired in the Eddie to reflect a traditional query execution plan Naïve Eddie • Operators are delivered tuples based on a priority queue • Intermediate results get highest priority to avoid buffer congestion
  • 40. Observations for selections • Extended priority queue for the operators – Receiving a tuple leads to a credit increment – Returning a tuple leads to a credit decrement – Priority is determined by “weighted lottery” • Naïve Eddies exhibit back pressure in the tuple flow; production is limited by the rate of consumption at the output • Lottery Eddies approach the cost of optimal ordering, without a need to a priory determine the order • Lottery Eddies outperform heuristics – Hash-use first, or Index-use first, Naive
  • 41. Observations • The dynamics during a run can be controlled by a learning scheme – Split the processing in steps (‘windows’) to re-adjust the weight during tuple delivery • Initial delays can not be handled efficiently • Research challenges: – Better learning algorithms to adjust flow – Aggressive adjustments – Remove pre-optimization – Balance ‘hostile’ parallel environment – Deploy eddies to control degree of partitioning (and replication)