Distributed_Database_System

Distributed Database Systems Distributed Database Systems

Contents I
1 Motivation

Distributed Database Systems 2 Detour on centralized query processing
Translating SQL into relational algebra
Distributed Query Processing
Phases of centralized query processing
Query parsing
Katja Hose, Ralf Schenkel Query transformation
Query optimization
Max-Planck-Institut f¨r Informatik, Cluster of Excellence MMCI
u 3 Basics of distributed query processing
Phases of distributed query processing
November 10, 2011 Introduction
November 17, 2011 Meta data management
Data localization
4 Global query optimization
Main questions
Katja Hose Distributed Database Systems November 10, 2011 1 / 167 Katja Hose Distributed Database Systems November 10, 2011 2 / 167

Motivation

Contents II Motivation
Global query optimizer
Distributed cost model The task of query processing is . . .
Join order optimization . . . to answer user queries
Total time models
Response time models Example
How many students are at Saarland University?
Answer: 18.000
Additional constraints
5 Summary Low response times
High query throughput
Eﬃcient hardware usage
...


Motivation Detour on centralized query processing

Motivation 1 Motivation
2 Detour on centralized query processing
Diﬀerences to centralized query processing Query parsing
Considering the physical data distribution during query optimization Query transformation
Query optimization
Considering communication costs
3 Basics of distributed query processing
Assumptions
Data is distributed among multiple nodes Introduction
Existence of a global conceptual schema, which is used by all nodes Meta data management
Data localization
Queries are formulated on the global schema
Main questions
Distributed cost model

Detour on centralized query processing Detour on centralized query processing

Join order optimization Translating SQL into relational algebra
Total time models
Response time models SQL query structure:

select distinct a1 , . . . , an
from R1 , . . . , Rn
where p

Algorithm:
5 Summary 1 Translating the from clause
Let R1 , . . . , Rk be the relations in the from clause of the query
Construct expression:

R1 if k = 1
R=
((. . . (R1 × R2 ) × . . . ) × Rk ) otherwise


Translating SQL into relational algebra Translating SQL into relational algebra

Translating SQL into relational algebra Translating SQL into relational algebra

Algorithm : Algorithm :
2 Translating the where clause 3 Translating the select clause

Let F be the predicate in the where clause of the query (if a where clause Let a1 , . . . , an (or “*”) be the projection in the select clause of the query
exists) Construct expression:
Construct expression:
W if the projection is “*”
S=
R if there is no where clause πa1 ,...,an (W ) otherwise
W =
σF (R) otherwise Output:
S


Translating SQL into relational algebra Phases of centralized query processing

Translating SQL into relational algebra Workﬂow for centralized query processing
Example query
select distinct e.EN ame, s.Salary
from Employees e, Salary s
where e.T itle = s.T itle and s.Salary ≥ 60.000

R1 if k = 1
R=
((. . . (R1 × R2 ) × . . . ) × Rk ) otherwise

R = Employees × Salary

R if there is no where clause
W =
σF (R) otherwise

Query parsing Query parsing

Query parsing Example

Transform a declarative query into an internal representation
Query formulated using a declarative query language, e.g., SQL Example
The Parser translates the query into an internal representation Database managing information about employees and projects
Called naive query plan Employees(EID, EN ame, T itle)
Plan described by an operator tree of relational algebra operators Assignment(EN o, P N o, Duration)
Query: return the names of all employees working for project ’P1’
SELECT EName
FROM Employees e, Assignment a
WHERE e.EID = ENo AND PNo=’P1’


Query parsing Query parsing

Example Operator tree

πEN ame σP N o= P 1 ∧Employees.EID=Assignment.EN o Employees × Assignment
Query
SELECT EName
FROM Employees e, Assignment a
WHERE e.EID = ENo AND PNo=’P1’
Translation into relational algebra
πEN ame σP N o= P 1 ∧Employees.EID=Assignment.EN o Employees ×
Assignment
In contrast to the SQL statement, the algebra statement already contains
the required basic evaluation operators
Operator tree


Query transformation Query transformation

Workflow for centralized query processing Query transformation

Steps
1 Name resolution
Transforming object names into internal names
2 Semantic analysis
Checking for global relations and attributes, view expansion, global
access control
3 Normalization
Transforming predicates into a canonical format
4 Simple algebraic rewriting
Application of heuristics to eliminate bad plans



Semantic analysis Normalization

Objective
Check if the global schema defines all attributes and relations
Simplification of the following optimization by transforming the query
referenced in the query
into a canonical format
If the query is formulated on a view, replace references to
Selection and join predicates
relations/attributes with references to global relations/attributes
Conjunctive normal form vs. disjunctive normal form
Perform simple integrity checks, e.g., are the types of attributes Conjunctive normal form:
used in comparison predicates of the same type? (p11 ∨ p12 ∨ · · · ∨ p1n ) ∧ · · · ∧ (pm1 ∨ pm2 ∨ · · · ∨ pmn )
Initial check if the query has the rights to access referenced Disjunctive normal form:
(p11 ∧ p12 ∧ · · · ∧ p1n ) ∨ · · · ∨ (pm1 ∧ pm2 ∧ · · · ∧ pmn )
relations/attributes
Transformation based on equivalence rules for logical operators



Normalization Normalization
Example
SELECT EName
Equivalence rules FROM Employees e, Assignment a
p1 ∧ p2 ⇐⇒ p2 ∧ p1 and p1 ∨ p2 ⇐⇒ p2 ∨ p1 WHERE e.EID = a.ENo AND Duration ≥ 3 AND (PNo=’P1’ OR
PNo=’P2’)
p1 ∧ (p2 ∧ p3 ) ⇐⇒ (p1 ∧ p2 ) ∧ p3 and p1 ∨ (p2 ∨ p3 ) ⇐⇒ (p1 ∨ p2 ) ∨ p3
p1 ∧ (p2 ∨ p3 ) ⇐⇒ (p1 ∧ p2) ∨ (p1 ∧ p3 ) and Selection condition in disjunctive normal form
p1 ∨ (p2 ∧ p3 ) ⇐⇒ (p1 ∨ p2) ∧ (p1 ∨ p3 )
(EID = ENo ∧ Duration ≥ 3 ∧ PNo=’P1’) ∨
¬(p1 ∧ p2 ) ⇐⇒ ¬p1 ∨ ¬p2 and ¬(p1 ∨ p2 ) ⇐⇒ ¬p1 ∧ ¬p2
(EID = ENo ∧ Duration ≥ 3 ∧ PNo=’P2’)
¬(¬p1 ) ⇐⇒ p1
Selection condition in conjunctive normal form

EID = ENo ∧ Duration ≥ 3 ∧ (PNo=’P1’ ∨ PNo=’P2’)


Query transformation Query optimization

Simple algebraic rewriting Workflow for centralized query processing

Simple optimizations that are always beneficial regardless of system state
Elimination of redundant predicates
Simplification of expressions
Unnesting of subqueries and views
Tasks
Recognize and simplify all
expressions/operations/subqueries that
are “obviously” unnecessary, redundant,
or contradictory.
Do not consider system state
information, e.g., size of tables,
existence of indexes, etc.


Query optimization Query optimization

Query optimization Heuristics

Steps Use simple heuristics which usually lead to better performance
1 Algebraic optimization
Not the optimal plan is needed, but the really bad ones should be
Find a good relational algebra operator tree avoided
Heuristic query optimization
Heuristics
Cost-based query optimization
Statistical query optimization Break selections
Complex selection criteria should be broken into multiple parts
2 Physical optimization Push projection and push selection
Find suitable algorithms for implementing the operations Cheap selections and projections should be performed as early as
possible to reduce the sizes of intermediate results
Force joins
In most cases, using a join is much cheaper than using a Cartesian
product and a selection



Algebraic optimization rules Algebraic optimization rules

Operator is commutative:

r1 r2 ⇐⇒ r2 r1
Combinations of selections σ can be combined using logical and (∧). The
Operator is associative: order of the selections is arbitrary:

(r1 r2 ) r3 ⇐⇒ r1 (r2 r3 ) σF1 (σF2 (r1 )) ⇐⇒ σF1 ∧F2 (r1 ) ⇐⇒ σF2 (σF1 (r1 ))

For operator π in combination with another operator π, the “outer” Exploiting commutativity of ∧
parameter dominates the “inner” one:

πX (πY (r1 )) ⇐⇒ πX (r1 ) if X ⊆ Y



Operators σ and commute if all selection attributes are contained in the same
relation:
Operators π and σ commute if predicate F is defined based on the σF (r1 r2 ) ⇐⇒ σF (r1 ) r2 if attr(F ) ⊆ R1
projection attributes: A selection predicate can be split up in conjunction with a join (F = F1 ∧ F2 ) if
the attributes referred to by F1 and F2 are contained in different relations:
σF (πX (r1 )) ⇐⇒ πX (σF (r1 )) if attr(F ) ⊆ X
σF (r1 r2 ) ⇐⇒ σF1 (r1 ) σF2 (r2 )
Alternatively, change in ordering possible if the projection is extended by
if attr(F1 ) ⊆ R1 and attr(F2 ) ⊆ R2
all necessary attributes:
In any case, part of a selection can be split up by separating predicates F1
πX1 (σF (r1 )) ⇐⇒ πX1 (σF (πX1 ,X2 (r1 ))) if attr(F ) ⊇ X2 referencing attributes of R1 only, F2 contains the remaining predicates referencing
attributes of both relations

σF (r1 r2 ) ⇐⇒ σF2 (σF1 (r1 ) r2 ) if attr(F1 ) ⊆ R1




Commutativity of σ and ∪: Commutativity of π and :

σF (r1 ∪ r2 ) ⇐⇒ σF (r1 ) ∪ σF (r2 ) πX (r1 r2 ) ⇐⇒ πX (πY1 (r1 ) πY2 (r2 ))

Commutativity of σ and −: with
Y1 = (X ∩ R1 ) ∪ (R1 ∩ R2 )
σF (r1 − r2 ) ⇐⇒ σF (r1 ) − σF (r2 )
and
or in case F only references tuples in r1 : Y2 = (X ∩ R2 ) ∪ (R1 ∩ R2 )

σF (r1 − r2 ) ⇐⇒ σF (r1 ) − r2 Pushing a projection is possible if all Yi are defined in such a way that they
preserve all attributes necessary to perform the join.



Algebraic optimization rules Heuristic algebraic optimization – Example

Further rules
Commutativity of π and ∪:

πX (r1 ∪ r2 ) ⇐⇒ πX (r1 ) ∪ πX (r2 )
Use algebraic optimization heuristics
Distributive law for and ∪, distributive law for and −,
Commutativity of renaming β with other operators, . . . Force join
Idempotence, e.g., A ∨ A ⇐⇒ A Push selection and projection
Operations involving empty relations
Commutative and associative laws for , ∪ und ∩



Cost-based algebraic query optimization Physical query optimization

Physical optimization
Most non-distributed RDBMS strongly rely on cost-based optimizations
Input:
Aim for better optimized plan with respect to system and data Optimized query plan consisting of algebra operators
characteristics Choose an algorithm to compute a particular algebra operator
Join order optimization
Join:
Basic approach Block-Nested-Loop join, hash join, merge join, . . .
Establish a cost model for various operations
Enumerate all query plans and compute costs Select:
Pick the best query plan Full table scan, index lookup, ad-hoc index generation & lookup, . . .
Usually, dynamic programming techniques are used to keep Tasks
computational eﬀort manageable
Translating a query plan into an execution plan
Physical and algebraic optimization are often interleaved


Detour on centralized query processing Basics of distributed query processing
Query optimization

Query optimization example 1 Motivation
2 Detour on centralized query processing
Output: query execution plan
Query parsing
Query transformation
Query optimization
Introduction
Meta data management
Data localization
Main questions

Basics of distributed query processing Basics of distributed query processing

Join order optimization Workﬂow for distributed query processing
Total time models
Response time models

5 Summary


Introduction Introduction

Basic considerations Basic considerations
Costs are more difficult to predict
Distributed query processing
Join selectivity: is it worthwhile to push down a selection?
Shares the same properties of centralized query processing
Data is distributed: difficult to get meaningful statistics
Similar problem but with different objectives and constraints
Network latency is very hard to predict
Objectives for centralized query processing Current workload at nodes, load shedding
Minimize the number of disk accesses Additional cost factors and constraints
Minimize computational time Extension of relational algebra (sending/receiving data)
Objectives for distributed query processing Data localization (which node holds relevant data)
Minimize resource consumption Replication and caching (where to compute an operation)
Minimize response time Network models
Maximize throughput Response-time models
Data and structural heterogeneity (federated databases . . . )


Consequences Example

Query
Optimization is much more difficult than in the central case
Return the names of all employees working for project ’P1’
Statistics and costs change over time, e.g., workload at a node,
network load πEN ame (πEID,EN ame (Employees) Employees.EID=Assignment.EN o
πEN o (σP N o= P 1 (Assignment)))
More conflicting optimization goals
Increase throughput → reduce replication and parallelization, Problems
increase query response time → increase parallelization Relations are fragmented and distributed among five nodes
More cost factors and constraints The Employees relation uses primary horizontal fragmentation
Consequences One fragment located at node 1, the other at node 2, no replication
Adaptive query plans (create an initial plan and optimize it on-the-fly) The Assignment relation uses derived horizontal fragmentation
One fragment located at node 3, the other at node 4, no replication
Do not aim for the best plan, but for a good plan
The query originates from node 5



Example Example
Cost model and statistics
Accessing a tuple costs 1 unit (acc)
Transferring a tuple costs 10 units (trans)
There are 400 employees and 1000 assignments
20 assignments for project ‘P1’
All tuples are uniformly distributed, i.e., nodes 3 and 4 provide 10
assignments for project ‘P1’ each
There are local indexes on attribute P N o at nodes 3 and 4 (as well as
indexes on primary keys at all nodes)
Direct tuple access is possible on local sites, no scanning
All nodes can directly communicate with each other
Simpliﬁcation: no costs for unions and projections



Example Example
Simple execution plan - Version B
Simple execution plan - Version A
Ship intermediate results
Transfer all data to Node 5



Example Example
Costs plan B: 440 units
Costs plan A: 23.000 units



Important aspects of distributed query processing Important aspects of distributed query processing

Data localization
Global query optimization
Post-processing


Meta data management Meta data management

Workﬂow for distributed query processing Meta data management

Prerequisites to perform query optimization
Meta data must be available
Meta data is stored in the catalog
Catalog provides information about the data distribution
Use this information to decide, for instance, if it is worthwhile to execute a
selection very early.



Typical contents of a catalog for distributed database management systems
Database schema Where to store the catalog in a distributed system?
Deﬁnitions of tables, views, constraints, keys,. . . Central node
Partitioning schema Simple solution, bottleneck
Information about how the schema is partitioned and how tables can Replicated at all nodes
be reconstructed Updates are expensive
Allocation schema
Fragmented
Information about which fragment can be found at which node
In rare cases, the catalog may become very large
(including information about replication)
Catalog has to be fragmented and allocated
Network information
Caching
Information about node connections, network model
Replicate only needed parts of a central catalog, anticipate potential
Additional physical information
inconsistencies
Information about indexes, data statistics (histograms, etc.),
hardware resources (processing & storage),. . .



Centralized catalog Replicated catalog
One instance of the global catalog at a central node Full copy of the global catalog at each node
Advantages Advantages
No need to update copies Little communication overhead for queries
Little memory consumption Good availability
Disadvantages Disadvantages
Communication with central node for each query High update costs
Central node potentially represents a bottleneck




Fragmented catalog Caching catalog data

Partitioning the global catalog and assigning partitions to nodes Caching non-local catalog data
Advantages Advantages
Sharing load among nodes Avoiding remote access to frequently needed catalog data
Reducing update overhead Reducing communication overhead
Disadvantages Disadvantages
Localizing necessary partitions of the global catalog Coherency control
Invalidating cached copies in the presence of updates


Meta data management Data localization

Meta data management Workflow for distributed query processing

Caching catalog data
Explicit invalidation
Owner of catalog data remembers nodes with local copies
In case of updates: sending an invalidation message to nodes with local
copies
Implicit invalidation
Identifying old catalog data during runtime (adding version numbers
and time stamps to query messages)


Data localization Data localization

Data localization Example – horizontal reduction
Objective Schema

Creating subqueries in consideration of the data distribution Projects1 = σBudget≤150.000 (Projects)
Projects2 = σ150.000<Budget≤200.000 (Projects)
Assumptions Projects3 = σBudget>200.000 (Projects)
Fragmentation is defined by fragmentation expressions Reconstruction expression (horizontal fragmentation)
Each fragment is allocated only at one node (no replication) Projects = Projects1 ∪ Projects2 ∪ Projects3
Fragmentation expressions and locations of the fragments are stored Example query
in the catalog
σLocation= Saarbr. ∧Budget≤100.000 (Projects)
Main tasks After replacing references to global relations
Replace access to global relations with accesses to the fragments σLocation= Saarbr. ∧Budget≤100.000 (Projects1 ∪ Projects2 ∪
Insert reconstruction expression into algebra query Projects3 )
Basic algebraic simplifications of the query Further optimization is possible!


Query simplification – horizontal reduction Example – horizontal reduction

Objective
Query with fragmentation expression
Eliminate non-necessary subqueries σLocation= Saarbr. ∧Budget≤100.000 (Projects1 ∪ Projects2 ∪ Projects3 )

Horizontal reduction rule Fragment definitions
Projects1 = σBudget≤150.000 (Projects)
Given fragments of R as FR = {R1 , . . . , Rn } with Ri = σpi (R) Projects2 = σ150.000<Budget≤200.000 (Projects)
All fragments Ri for which σps (Ri ) = ∅ can be removed Projects3 = σBudget>200.000 (Projects)
with ps denoting the query’s selection predicate
Because of
σps (Ri ) = ∅ ⇐ ∀x ∈ R : ¬(ps (x) ∧ (pi (x)) σBudget≤100.000 (Projects2 ) = ∅, σBudget≤100.000 (Projects3 ) = ∅
The selection with the query predicate ps on fragment Ri is empty if
ps contradicts the fragmentation predicate pi of Ri , i.e., ps and pi are We obtain the reduced query
never true at the same time for all tuples in Ri σLocation= Saarbr. (σBudget≤100.000 (Projects1 ))



Query simplification – join reduction Example – join reduction
Join Reductions Schema
Larger joins are replaced by multiple partial joins on fragments Projects(PNo, PName, Budget, Location)
Distributive law: (R1 ∪ R2 ) S = (R1 S) ∪ (R2 S) Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects)
Projects2 = σP N o= P 3 (Projects)
Eliminate all union fragments that will return an empty result
Projects3 = σP N o= P 4 (Projects)
Expectations
Assignment(ENo, PNo, Duration)
Elimination of partial joins producing empty results Assignment1 = σP N o= P 1 ∨P N o= P 2 (Assignment)
Depends on fragmentation optimality Assignment2 = σP N o= P 3 ∨P N o= P 4 (Assignment)
Many joins on small relations have lower resource costs than one large
Example query
join
Depends on fragmentation and applied join algorithms select * from Projects p, Assignment a where p.PNo = a.PNo
Smaller joins can be executed in parallel In relational algebra
Might decrease response time but might also increase communication Projects Assignment
costs


Example – join reduction Query simplification – join reduction
Query
Projects Assignment Join reduction rule
Given fragments of R as FR = {R1 , . . . , Rn } and fragments of S as
After replacing global relations with reconstruction expressions FS = {S1 , . . . , Sn }
(Projects1 ∪ Projects2 ∪ Projects3 ) (Assignment1 ∪ Assignment2 ) Apply distributive law, e.g.:
(R1 ∪ R2 ) (S1 ∪ S2 ) = (R1 S1 ) ∪ (R1 S2 ) ∪ (R2 S1 ) ∪ (R2 S2 )
After applying the distributive law All partial joins between fragments Ri and Sj for which Ri Sj = ∅
can be removed
(Projects1 Assignment1 ) ∪ (Projects1 Assignment2 ) ∪
Ri Sj = ∅ ⇐ ∀x ∈ Ri , y ∈ Sj : ¬(pi (x) ∧ pj (y))
The join between fragments Ri and Rj is empty if their respective
(Projects3 Assignment1 ) ∪ (Projects3 Assignment2 ) fragmentation predicates (on the join attribute) contradict, i.e., there
is no tuple combination x and y such that both partitioning
Further optimization is possible! predicates are fulfilled at the same time.



Example – join reduction Query simplification – join reduction for horizontal
fragmentation
Query with fragmentation expression
(Projects1 Assignment1 ) ∪ (Projects1 Assignment2 ) ∪ The easiest join reduction case follows from derived horizontal
(Projects2 Assignment1 ) ∪ (Projects2 Assignment2 ) ∪ fragmentation
(Projects3 Assignment1 ) ∪ (Projects3 Assignment2 ) For each fragment of the first relation, there is exactly one matching
fragment of the second relation
Some of these partial joins are empty, e.g.:
Simply use the information contained in the reconstruction expression
Projects1 Assignment2 = ∅ instead of comparing the reconstruction predicates to each other
Because their fragmentation expressions contradict: Join reduction for arbitrary horizontal partitioning might not be beneficial
Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects) and
Assignment2 = σP N o= P 3 ∨P N o= P 4 (Assignment)
Reduced query
(Projects3 Assignment2 )


Query simplification – join reduction for derived Query simplification – join reduction for derived
horizontal fragmentation horizontal fragmentation

Example After replacing global relations with reconstruction expressions
Projects(PNo, PName, Budget, Location)
(Projects1 ∪ Projects2 ) (Assignment1 ∪ Assignment2 )
Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects)
Projects2 = σP N o= P 3 ∨P N o= P 4 (Projects) After applying the distributive law

Assignment(ENo, PNo, Duration) (Projects1 Assignment1 ) ∪ (Projects1 Assignment2 ) ∪
Assignment1 = Assignment Projects1 (Projects2 Assignment1 ) ∪ (Projects2 Assignment2 )
Assignment2 = Assignment Projects2 Reduced query (using information about fragmentation of relation Assignment
directly)
Query in relational algebra
Projects Assignment (Projects1 Assignment1 ) ∪ (Projects2 Assignment2 )



Query simplification – vertical reduction Example – vertical reduction
Schema
Projects(PNo, PName, Budget, Location)
Projects1 = πP N o,P N ame,Location (Projects)
Projects2 = πP N o,Budget (Projects)
Vertical fragmentation rule
Reconstruction expression
Given fragments of R as FR = {R1 , . . . , Rn } with Ri = πβi (R) with
Projects = Projects1 Projects2
βi representing the enumeration of a subset of R’s attributes
Avoid joining fragments containing “useless” attributes, i.e., Example query
fragments containing only attributes that are not referenced in the πP N ame (Projects)
query and not output in the result
After replacing references to global relations
πP N ame (Projects1 Projects2 )

After removing unnecessary fragments
πP N ame (Projects1 )



Query simplification – hybrid fragmentation Qualified relations
Supporting algebraic optimization of queries involving fragments
Annotating fragments and intermediate relations with predicates
Estimating the size of a relation
The reconstruction expression introduces combinations of joins and Extension of relational algebra
unions
General guidelines Definition: qualified relation
Remove empty relations generated by contradicting relations on A qualified relation is a pair [R : qR ] where R is a relation and qR is a
horizontal fragments predicate.
Remove useless relations generated by vertical fragments
Break and distribute joins, eliminate empty fragment joins Example
Representing horizontal fragments as qualified relations where the
qualification predicate corresponds to the fragmentation expression

[Projects : σP N o= P 1 ∨P N o= P 2 ]


Qualified relations Qualified relations
Example query
σ100.000≤Budget≤200.000 (Projects)
Extended relational algebra Qualified relations
E1 = σ100.000≤Budget≤200.000 [Projects1 : Budget ≤ 150.000]
(1) E := σF [R : qR ] → [E : F ∧ qR ]
[E1 : (100.000 ≤ Budget ≤ 200.000) ∧ (Budget ≤ 150.000)]
(2) E := πA [R : qR ] → [E : qR ]
[E1 : 100.000 ≤ Budget ≤ 150.000]
(3) E := [R : qR ] × [S : qS ] → [E : qR ∧ qS ]
(4) E := [R : qR ] − [S : qS ] → [E : qR ] E2 = σ1000≤Budget≤200.000 [Projects2 : 150.000 < Budget ≤ 200.000]
(5) E := [R : qR ] ∪ [S : qS ] → [E : qR ∨ qS ] [E2 : (100.000 ≤ Budget ≤ 200.000) ∧
(6) E := [R : qR ] F [S : qS ] → [E : qR ∧ qS ∧ F ] (150.000 < Budget ≤ 200.000)]
[E2 : 150.000 < Budget ≤ 200.000]
E3 = σ100.000≤Budget≤200.000 [Projects3 : Budget > 200.000]
[E3 : (100.000 ≤ Budget ≤ 200.000) ∧ (Budget > 200.000)]
E3 = ∅


Global query optimization Global query optimization

1 Motivation Join order optimization
Total time models
2 Detour on centralized query processing Response time models
Query parsing
Query optimization
Introduction
5 Summary
Data localization
Main questions

Main questions Main questions

Workﬂow for distributed query processing Introduction to global query optimization

Main questions
When to optimize?
What criteria to optimize?
Where to execute the query?



When to optimize? When to optimize?

Full compile time optimization Fully dynamic optimization
The full query execution plan is computed at compile time Each query is optimized individually at runtime
Assumption
This technique heavily relies on heuristics, learning algorithms, and
Applications use canned queries
luck
Prepared and parameterized SQL statements
Pros
Pros
Might produce very good plans
Queries can be executed directly
Uses current network state
Cons Also usable for ad-hoc queries
Complex to model Cons
Much information unknown or too expensive to gather
Result quality might be very unpredictable
Collecting statistics on all nodes?
Complex algorithms and heuristics
Statistics outdated
Diﬃcult to keep statistics up-to-date
Especially machine load and network properties are very volatile



When to optimize? When to optimize?

Semi-dynamic optimization Hierarchical optimization
Pre-optimize the query Plans are created in multiple stages
During query execution, test if execution runs as expected during Global-Local-Plans
optimization Global query optimizer creates a global query plan
e.g., are tuples/fragments delivered in time?, does the network adhere Focus on data transfer: which intermediate results are to be computed
by which node? How should intermediate results be shipped?
to the predicted properties?, are there any bad network latencies?, etc.
Local query optimizers create local query plans
If execution shows severe deviations, compute a new query plan for all Decide on query plan layout, algorithms, indexes, etc. to deliver the
parts that have not yet been executed requested intermediate result
Makes only sense for queries that run for a longer time Two-Step-Plans



When to optimize? What criteria to optimize?
Hierarchical optimization Important aspects for global optimization
Plans are created in multiple stages
Communication operators
Global-Local-Plans
Two-Step-Plans Fragment cardinalities
During compile time, only stable parts of the plan are computed Order of operations
Join order, join methods, access paths, etc. Join ordering
During query execution, all missing plan elements are added Because permutations of the joins within the query may lead to
Node selection, transfer policies, etc.
Both steps can be performed using traditional query optimization improvements of orders of magnitude
techniques Most important alternative optimization criteria
Plan enumeration with dynamic programming
Complexity is manageable as each optimization problem is much easier Query response time
than a full optimization Resource consumption
During runtime optimization, fresh statistics are available
Total query execution costs
Most distributed database management systems use semi-dynamic or
hierarchical optimization techniques (or both) ...


Where to execute the query? Global query optimization

Global query optimization. . .
Query optimizer has to decide which parts of the query have to be . . . deals with ﬁnding the “best” ordering of operations in the query
shipped to which node (cost model) (extended by fragmentation expressions and including communication
operations) that minimizes a cost function.
In heavily replicated scenarios, clever hybrid shipping can eﬀectively
be used for load balancing Input
Move expensive computations to lightly loaded nodes, avoid an algebraic query extended by fragmentation expressions
expensive communication Output
an algebraic query or query execution plan with communication
operations


Global query optimizer Global query optimizer

Basics of global query optimization Optimizer components

Objective
The global optimizer has three main components
Choose a cost efficient execution plan based on the algebraic query
plan given as input The search space
Decide which parts of the query have to be transferred to which node Set of alternative equivalent execution plans to represent the input
query
Prerequisites
The cost model
Knowledge about fragmentation Predicts the costs of a given query execution plan
Knowledge about fragment/relation sizes The search strategy
Knowledge about data distribution Explores the search space and selects the best plan
Knowledge about costs of operations



Phases of optimization Search space

Query
Phases SELECT EName, Title
FROM Employees e, Assignment a, Project p
1 Spanning the search space using WHERE e.EID = ENo AND a.PNo=p.PNo
transformation rules
→ equivalent search plans Equivalent join trees
2 Applying a search strategy and a
cost model
→ choose an efficient plan
Main focus: join trees and join
ordering
O(N !) different join trees by applying commutativity and associativity
rules for N relations



Search space Search strategies
Tree variants for join order optimization
Linear join trees
All inner nodes have at least one leaf node (base relation) as child
A search strategy needs to reduce search space
Reduces search space
Bushy trees Applying heuristics (similar to centralized algebraic optimization)
May have inner nodes with no base relation as child Perform projections and selections when accessing base relations
High potential for parallelization Avoid Cartesian products – enforce joins
Applying further heuristics influencing the shape of the join tree
⊲⊳
Reducing the size of the search space vs. exhibiting parallelism
⊲⊳
Linear vs. bushy trees
⊲⊳ R1
⊲⊳ ⊲⊳
⊲⊳ R2
R1 R2 R3 R4
R3 R4

bushy join tree
linear join tree



Search strategies Search strategies

Deterministic search strategy
Systematic generation of query plans
Example deterministic search strategies
Starting with plans accessing the base relations
Dynamic programming
Constructing complex plans by combining easier plans, e.g., joining
(Almost) exhaustive search by building all possible plans (breadth first)
one more relation at each step “Very bad” partial plans are pruned at an early stage
Guarantee to find the best plan
Only possible for a small number (5-6) of relations
Greedy algorithm
Only one plan is built (depth-first)

Exhaustive search guarantees finding the best plan


Global query optimizer Distributed cost model

Search strategies Distributed cost model

Randomized search strategy Components
One or more start plans using a greedy strategy (depth-first search) Cost functions
Improving start plans by examining “neighbor plans” Estimating costs to execute operations
Neighbor plan: applying transformation rules, e.g., exchanging two Statistics
arbitrarily chosen operations Data about relation sizes, attribute domains, value distribution, etc.
Better performance with a higher number of relations Formulas
Determine cardinalities, sizes of intermediate results, etc.
No guarantee to find the best plan


Distributed cost model Distributed cost model

Cost functions Cost functions

Total execution time Components of total execution time
Sum of all costs, i.e., the sum of all processing times at all nodes Local processing costs/time
involved in answering the query
Tlocal = TCPU · #insts + TI/O · #opsI/O
Ttotal = TCPU · #insts + TI/O · #opsI/O +
TMSG · #msgs + TTR · #bytes Communication costs/time

Tcomm = TMSG · #msgs + TTR · #bytes
TCPU time to process a CPU instruction
TI/O time for a disk access
Coefficients (TCPU , TI/O , TMSG , TTR ) characterize a specific
TMSG time to send and receive a message
TTR time to transmit a data unit from one node to another
distributed database system
#bytes is the sum of the sizes of all messages WAN (Wide Area Network): communication time is dominant
Typical assumption: TTR is constant – although it might not be true LAN (Local Area Network): also local costs play an important role
for remote nodes



Cost functions Total time vs. response time

Communication costs
Response time
Time that elapses between query initiation and completion
Considering parallel local processing and parallel communication

Tresponse =TCPU · seq #insts + TI/O · seq #opsI/O +
TMSG · seq #msgs + TTR · seq #bytes

where seq #x represents the maximum number of instructions Tcommtotal = 2 · TMSG + TTR · (x + y)
(insts), I/O operations (opsI/O ), messages (msgs), or bytes (bytes) Tcommresponse = max{TMSG + TTR · x, TMSG + TTR · y}
that have to be processed sequentially
Minimizing response time does not imply that the total time is also
minimized!



Statistics Typical statistics

Typical statistics for relation R fragmented as R1 , R2 , . . . , Rr with
attributes A1 , . . . , An
Good statistics are crucial
Length of each attribute Ai in terms of bytes: length(Ai )
Most important cost factor:
Number of distinct values for each attribute Ai and for each fragment
Size of intermediate results produced during execution
Rj : valuesAi ,Rj := card(πAi (Rj ))
Estimating sizes using statistics and formulas
Minimum and maximum attribute values: min(Ai ) and max(Ai )
Tradeoﬀ between precision and costs of managing statistics
Number of dinstinct values (cardinality) of the attribute domains:
card(dom[Ai ])
Number of tuples in each fragment Rj : card(Rj )



Additional statistics Cardinality estimation

Assumptions
Additional statistics Independence between attributes
Histogram for each attribute Ai to approximate the frequency Uniform distribution of attribute values
distribution
Selectivity
Join selectivity factor for some pairs of relations
Ratio between expected number of result tuples and tuples of the
card(R S) input relation
SFJ (R, S) =
card(R) · card(S)
Expected result size
good (high) selectivity: SFJ = 0.001 SF =
Cardinality of the input relation
bad (low) selectivity: SFJ = 0.5
Example: σF (R) returns 10% of R’s tuples SFS (F, R) = 0.1
(SF selectivity factor)



Cardinality estimation Selection
Cardinality
Assumptions
card(σF (R)) = SFS (F, R) · card(R)
Independence between attributes
Selectivity
Uniform distribution of attribute values
Selectivity depends on selection predicates p(A) and constants v
Cardinality
1 1
SFS (A = v, R) = =
Estimate result size (cardinality of the output relation) valuesA,R card(πA (R))
Example: SFS (F, R) = 0.1 v − min(A)
SFS (A < v, R) =
max(A) − min(A)
card(σF (R)) = SFS (F, R) · card(R) max(A) − v
SFS (A > v, R) =
max(A) − min(A)
v2 − v1
SFS (v1 < A < v2 , R) =
max(A) − min(A)



Selection Projection
Cardinality
Cardinality Without duplicate elimination

card(σF (R)) = SFS (F, R) · card(R) card(πA (R)) = card(R)

Selectivity With duplicate elimination (if deﬁned on an arbitrary attribute A):
Selectivity depends on selection predicates p(A) and constants v
card(πA (R)) = valuesA,R
SFS (p(Ai ) ∧ p(Aj ), R) = SFS (p(Ai ), R) · SFS (p(Aj ), R)
SFS (p(Ai ) ∨ p(Aj ), R) = SFS (p(Ai ), R) + SFS (p(Aj ), R) −
With duplicate elimination (if one of the attributes is the primary key):
(SFS (p(Ai ), R) · SFS (p(Aj ), R)) card(πAi ,... (R)) = card(R)

Cardinalities for projections on arbitrary combinations of attributes are
hard to predict because attribute correlations are unknown


Cartesian product Joins

Cardinality
Given: R S with R(A, B) and S(B, C)
Upper bound: size of the Cartesian product
Cardinality Natural join on attribute B
No B values shared between R and S:
card(R × S) = card(R) · card(S)
card(R S) = 0
Foreign key relationship R.B → S.B:
card(R S) = card(R)
All tuples in R.B und S.B have the same value:
card(R S) = card(R) · card(S)



Joins Union and Difference
Cardinality
Cardinality
Difficult to estimate because duplicates are removed
Given: R S with R(A, B) and S(B, C)
Union
Upper bound: size of the Cartesian product
Upper bound
Natural join on attribute B card(R ∪ S) = card(R) + card(S)

Estimate Lower bound

card(R) · card(S) card(R ∪ S) = max{card(R), card(S)}
card(R S) =
max{valuesB,R , valuesB,S }
Difference
Store statistics (join cardinality SFJ ) for important joins Upper bound
card(R S) = card(R)
card(R S) = SFJ · card(R) · card(S) Lower bound
card(R S) = 0



Selectivity estimation using histograms Selectivity estimation using histograms

Histograms
In reality distribution of attribute values in a relation is often not Equality predicate
uniform
Given predicate A = v
Histograms consist of a set of buckets bi
Identify bucket bi with v ∈ rangei
Example histogram on attribute A of relation R
1
Each bucket bi defined by SFS (A = v, R) =
di
Range: rangei
Range of values in attribute domain dom[A] fi
Frequency: fi
card(σA=v (R)) = SFS (A = v, R) · fi =
di
Number of tuples of R where R.A ∈ rangei
Distinct values: di
Number of distinct values of A where R.A ∈ rangei


Distributed cost model Join order optimization

Selectivity estimation using histograms Phases of optimization

Phases
Range predicates
1 Spanning the search space using
Given predicate A ≤ v
transformation rules
Identify buckets that overlap the queried range → equivalent search plans
Sum up frequencies 2 Applying a search strategy and a
i−1
v − min(rangei ) cost model
card(σA≤v (R)) = fi + · fi
max(rangei ) − min(rangei ) → choose an eﬃcient plan
j=1
Main focus: join trees and join
Bucket i only partially overlaps the queried range ordering


Join order optimization Join order optimization

Join order optimization Join order optimization two relations

Simplifying assumptions Determine the join order for two relations R S
No distinction between fragments and relations
Ignoring local processing time
Ignoring other operations (selection, projection)
No pipelining
Ignoring data transfer to the result site Transfer the smaller relation to minimize the network load


Join order optimization Join order optimization

Join order optimization for three relations Join order optimization with semijoins
Determine the join order for three relations R A S B T Considering semijoins for joining two relations R (at nodeR ) and S (at
1 R nodeS , nodeS : R = R S, R nodeT ,
nodeS ) results in three alternatives – assuming A is the join attribute
1 R
nodeT : R T A S = (R A S) A S = (R A πA (S)) A S
2 R
A S = R A (S A R)
2 S nodeR , nodeR : R = R S, R nodeT ,
nodeT : R T
3 R
A S = (R A S) A (S A R)
3 S nodeT , nodeT : S = S T, S nodeR ,
nodeR : S R Workﬂow for alternative 1
4 T nodeS , nodeS : S = S
nodeR : S R
T, S nodeR ,
nodeS : compute S = πA (S), send S to nodeR
5 T nodeS , R nodeS , nodeS : R S R nodeR : compute R = R A S , send R to nodeS
nodeS : compute R A S
Possible orders Transfer costs (neglecting TM SG )
1 nodeR : send R to nodeS TT R · card(πA (S)) + TT R · card(R A S )
nodeS : compute join R = R S, send R to nodeT Considerung full joins (R A S) only and assuming that
nodeT : compute join R T
card(R) < card(S), the complete relation R would have been sent to
2 nodeS : send S to nodeR nodeS , costs: TT R · card(R)
nodeR : compute join R = R S, send R to nodeT
nodeT : compute join R T
3 nodeS : send S to nodeT
node : compute join S = S
T
Distributed Database Systems T , send S to nodeR Distributed Database Systems
node : compute join S
Global query optimization
R R Global query optimization
Join order optimization Total time models
4 nodeT : send T to nodeS
SemijoinS vs. joinsjoin S
node : compute =S T , send S to nodeR Total time models
nodeR : compute join S R
5 nodeT : send T to nodeS Basic strategy
nodeR : send R to nodeS
nodeS : compute join R S R
Coordinator (master) site
Conclusion Exhaustive search
Decision
Transfer costssizes of the T R · card(πA (S)) + TT R · card(R A S)
Based on the semijoin: Tbase relations and intermediate results Optimization objective: total time
Transfer exploiting parallelismTT R · card(R) 5
Perhaps costs standard join: of alternative Input
The semijoin is preferable if Relational algebra tree
Cost model
card(πA (S)) + card(R A S) < card(R)
Statistics
Location of relations
Output
Optimized query execution plan


Total time models Total time models

Total time models Site selection and data transfer

Aspects Query shipping
Cost model
Query initiator (node at which
Site selection and data transfer
the query is issued/optimized)
Join order optimization sends the query to other nodes
Join implementation Receiver nodes compute the
query result and ship the result
back to the initiator



Site selection and data transfer Site selection and data transfer

Hybrid shipping
Data shipping Initiator sends partial queries to
other nodes
Query remains at the initiator
Other nodes execute some parts
Initiator sends data request
of the query and send
messages to other nodes
intermediate results to the
Receiver nodes ship all required initiator
data to the initiator
Initiator executes remaining
Initiator computes result query operations
(post-processing)



Site selection and data transfer for joins Site selection and data transfer for joins

Problem Scenario

Queries make extensive use of joins 2 nodes; one (nodeR ) storing relation R the other (nodeS ) storing
relation S
Computing joins is very expensive
The query asks for R S
Especially in distributed systems: special attention because of
fragments and replication R A B
S B C D
3 7
Basic strategies 1 1
9 8 8
1 5 1 R S A B C D
Ship whole 4 6
9 4 2 1 1 5 1
7 7
Transferring the complete relation 4 5
4 3 3 4 5 7 8
4 2 6
Fetch as needed 6 2
5 7 8
Transferring the relation piecewise 5 7



Ship whole Ship whole

R A B R A B
S B C D S B C D
3 7 3 7
9 8 8 9 8 8
1 1 1 1
1 5 1 R S A B C D 1 5 1 R S A B C D
4 6 4 6
9 4 2 1 1 5 1 9 4 2 1 1 5 1
7 7 7 7
4 3 3 4 5 7 8 4 3 3 4 5 7 8
4 5 4 5
4 2 6 4 2 6
6 2 6 2
5 7 8 5 7 8
5 7 5 7

Execution at nodeR Execution at nodeS
nodeR : send data request message (relation S) to nodeS nodeS : send data request message (relation R) to nodeR
nodeS : send requested data (relation S) to nodeR nodeR : send requested data (relation R) to nodeS
Total costs: 2 messages, 18 attribute values Total costs: 2 messages, 14 attribute values



Ship whole Fetch as needed

R A B
S B C D R A B
S B C D
3 7 3 7
9 8 8 1 1
9 8 8
1 1 1 5 1 R S A B C D
1 5 1 R S A B C D 4 6
9 4 2 1 1 5 1
4 6 7 7
4 3 3 4 5 7 8
9 4 2 1 1 5 1 4 5
7 7 6 2
4 2 6
4 3 3 4 5 7 8 5 7 8
4 5 5 7
4 2 6
6 2
5 7 8
5 7 Execution at nodeR
nodeR : send data request message (tuples of relation S with B = ‘7 ) to nodeS
Execution at a third node nodeX
nodeS : send requested data (0 tuples of relation S with B = ‘7 ) to nodeR
nodeX : send data request message (relation R) to nodeR
nodeR : send data request message (tuples of relation S with B = ‘1 ) to nodeS
nodeX : send data request message (relation S) to nodeS nodeS : send requested data (1 tuple of relation S with B = ‘1 ) to nodeR
nodeR : send requested data (relation R) to nodeX ...
nodeS : send requested data (relation S) to nodeX
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Total costs: 4 messages, 18 + 14 = 32 attribute values


Fetch as needed Ship whole vs. fetch as needed

R A B
S B C D
3 7
9 8 8
1 1
1 5 1 R S A B C D
4 6
7 7
9
4
4
3
2
3
1
4
1
5
5
7
1
8
Conclusion
4 5
4 2 6
6 2
5 7
5 7 8 Fetch as needed results in a high number of messages
Ship whole results in high amounts of transferred data
Execution at nodeS
More advanced strategies based on these two basic strategies
nodeS : send data request message (tuples of relation R with B = ‘9 ) to nodeR
nodeR : send requested data (0 tuples of relation R with B = ‘9 ) to nodeS Semijoin
nodeS : send data request message (tuples of relation R with B = ‘1 ) to nodeR Bitvector join
nodeR : send requested data (1 tuple of relation R with B = ‘1 ) to nodeS
...

Total costs: 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values



Semijoin Semijoin

Requesting all join partners in just one step

Basic consideration:
R S = R (S R) = R (S πB (R))
with B being the join attribute

Algorithm
nodeR : determine πB (R) and send the result to nodeS
nodeS : determine S = S πB (R) = S R and send result to
nodeR
nodeR : determine R S =R S



Bitvector join Bitvector join

Also known as hash ﬁlter join
Algorithm
Avoiding the transfer of all join attribute values to the other node
nodeR : determine πB (R), apply hash function h to the result, set the
Transfer bitvector instead BV [1 . . . n]
corresponding bits in BV to 1, and send the result to nodeS
Transformation nodeS : apply hash function h to the join attribute of relation S,
Choose an appropriate hash function h determine S = {t ∈ S|BV [h(t.B)] = 1}, send S to nodeR
Apply h to transform attribute values to the range [1 . . . n] nodeR : determine R S =R S
Set the corresponding bits in the bitvector BV [1 . . . n] to 1



Bitvector join Bitvector join

Conclusions
Transferring the bitvector reduces network load
Bitvector only indicates potential join partners because multiple
attribute values might map to the same hash value
Might result in transferring unnecessary tuples
Requirements: an appropriate hash function h and n needs to be
large enough to avoid a high number of collisions


Response time models Response time models


Two diﬀerent response times
When does the ﬁrst result tuples arrive?
“Classic” cost models consider total resource consumption of a query When have all result tuples arrived?
Good results for heavy computational load and slow network
connections Example situation
By saving resources, many queries can be executed in parallel
Given relations/fragments A, B, C, and D
(minimum load, maximum throughput)
Optimization for short response times Full replication, i.e., all relations/fragments are available on all nodes
“Waste” some resources to get query results earlier Compute (A B) (C D)
Take advantage of lightly loaded machines and fast connections Assumptions
Utilize intraquery parallelism Each join costs 20 time units (TCP U + TI/O )
Transferring an intermediate result costs 10 time units (TM SG + TT R )
Accessing a relation is for free
Each node has one computational thread



Example Example
Two plans
Plan 1: Execute all operations on one node
Total costs: 60
Plan 2: Join on different nodes, ship results
Total costs: 80

Response time costs: 60 for plan 1, 50 for plan 2
Plan 1 ⇒ Plan 2 is better with respect to response time
Because operations can be executed in parallel (exploiting intra-query
Plan 2 parallelism)
Plan 1 is obviously better with respect to total costs Response time can be improved even more by applying pipelining


Pipelining Pipelining

Goal of applying pipelining
Good first tuple response times by executing queries in a pipelined fashion Problems
Operations have different execution times
Not pipelined If execution speed of operations in the pipeline differs, tuples are
Each operation is fully completed and an intermediate result is created either cached or the pipeline blocks
Next operation reads intermediate result and is then fully completed Some operations more suitable than others
Reading and writing of intermediate results costs resources Good: scan, select, project, union, . . .
Pipelined Tricky: join, intersection, . . .
Operations do not create intermediate results Very hard: sort
Each processed tuple is fed directly into the next operation
Tuples “flow” through the operations



Pipelining example Pipelining example
Simple query Simple query
Tablescan, selection, projection Tablescan, selection, projection
1000 tuples are scanned, selectivity is 0.1 1000 tuples are scanned, selectivity is 0.1
Costs Costs
Accessing one tuple during tablescan: 2 time units Accessing one tuple during tablescan: 2 time units
Selecting (testing) one tuple: 1 time unit Selecting (testing) one tuple: 1 time unit
Projecting one tuple: 1 time unit Projecting one tuple: 1 time unit
Non-Pipelined
time event Pipelined time event
2 first tuple in IR1 2 first tuple finished table scan
2000 all tuples in IR1 3 first tuple finished selection (if selected. . . )
2001 first tuple in IR2 4 first tuple in Result
3000 all tuples in IR2 3098 last tuple finished tablescan
3001 first tuple in Result 3099 last tuple finished selection
3100 all tuples in Result 3100 all tuples in Result


Pipelining example Pipelining example

Join query Costs
Joining two table subsets using a non-pipelined 1000 tuple are scanned in each pipeline,
BNL(Block-Nested-Loop) join selectivity 0.1
Both pipelines run in parallel Joining 100 100 tuples: 10.000 time units
(one time unit per combination)
Response time
The first tuple arrives at the end of any pipeline after 4 time units
All tuples have arrived at the end of the pipelines after 3.100 time
units
Final result will be available after 13.100 time units
No benefit from pipelining with respect to response time
First tuple arrives long after step 3.100


Joins and pipelining Single-pipelined hash join

“Classic” join algorithm
Suboptimal result because of the unpipelined join Basic idea A B
One input relation is read from an intermediate result (A), the other is
Most traditional join algorithms are unsuitable for pipelining pipelined through the join operation (B)
Single/semi-pipelined: only one pipeline, the other intermediate result All tuples of A are stored in a hash table
Hash function is used on the join attribute
has to be available
All tuples with the same hash value for the join attribute are in the
Fully pipelined: both inputs are processed in a pipelined fashion same bucket
Every incoming tuple (via pipeline) of B is hashed by join attributes
Compare tuple to each tuple in the respective A bucket
Return those tuples showing matching join attributes



Double-pipelined hash join Double-pipelined hash join – example

Dynamically build hashtables for A and B tuples – memory intensive!
Process tuples upon arrival
Cache tuples if necessary B(31, B2) arrives
Balance between A and B tuples for better performance
Rely on statistics for a good A:B ratio Insert into B Hash
If a new tuple arrives of relation A Find matching A tuples
Insert it into the A hashtable Found A3
Check in the B hashtable if there are join partners Assume that A3 matches B3. . .
If yes, return all combined AB tuples Add AB(A3, B2) to the result
If a new B tuple arrives, process it analogously



Pipelining in distributed setups Pipelining in distributed setups – tuple blocking

In pipelines, tuples “flow” through the operations
Works well with one processing unit! (one node) Minimize communication overhead by tuple blocking
Problem: sending each tuple in separate from one node to another
Do not send single tuples, but blocks containing multiple tuples
might be inefficient
Burst transmission
Communication costs Packets have to be cached
Setting up transfer and opening communication channel Block size should be at least the packet size of the underlying network
Composing a message protocol
Transmitting message: header information and payload (minimum
packet size is bigger than tuple) Results in even more cost factors for the cost model
Receiving and decoding a message
Closing the channel


Global query optimization Summary

Summary on global query optimization Summary I

Detour on centralized query processing
Query parsing
Global query optimization has to deal with additional constraints and cost Query optimization
factors compared to “classic” query optimization Basics of distributed query optimization
Many steps can be reused from centralized query processing
Network costs, network model, shipping policies
Optimization in distributed systems is much more complex (network
Fragmentation and allocation schemes latency, selectivities, communication costs, response time, etc.)
Different optimization goals (response time vs. total time) Meta data management – where to store the global catalog?
Data localization – consider fragmentation
Distributed query optimization
Very important question: where to execute which parts of the query?
When to optimize: compile time vs. dynamic optimization, most
common: semi-dynamic and hierarchical optimization
Cost model (cost functions, statistics, cardinality estimation, etc.)

Summary Summary

Summary II References I

¨
M. Tamer Ozsu, P. Valduriez.
Principles of Distributed Database Systems.
Join order optimization Third Edition, Springer, 2011.
Join implementations (ship whole, fetch as needed, semijoin, bitvector
join, pipelined hash join, etc.) E. Rahm.
Total time and response time Mehrrechner-Datenbanksysteme.
Addison-Wesley, Bonn, 1994.
P. Dadam.
Verteilte Datenbanken und Client/Server-Systeme.
Springer-Verlag, Berlin, Heidelberg 1996.


Distributed Database Systems
Summary

References II

Toby J. Teorey
Database modeling and design
Third Edition, Morgan Kaufmann Publishers, San Francisco, CA,
1999.
D. Kossmann.
The State of the Art in Distributed Query Processing,
ACM Computing Surveys,
Vol. 32, No. 4, 2000, S. 422-469.

Katja Hose Distributed Database Systems November 17, 2011 167 / 167

Distributed_Database_System

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Distributed_Database_System (20)

More from Philip Zhong (14)

Recently uploaded (20)

Distributed_Database_System