Query processing System

QUERY
 Overview
 Measures of Query Cost
 Selection Operation
 Sorting
 Join Operation
 Other Operations
 Evaluation of Expressions
 Catalog Information for Cost Estimation
 Estimation of Statistics
 Transformation of Relational Expressions
 Dynamic Programming for Choosing Evaluation Plans

Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation

Cont…
• Parsing and translation
– Translate the query into its internal form. This is then translated
into relational algebra.
– Parser checks syntax, verifies relations
• Evaluation
– The query-execution engine takes a query-evaluation plan,
executes that plan, and returns the answers to the query.

Query Optimization
 Amongst all equivalent evaluation plans choose the one with
lowest cost.
 Cost is estimated using statistical information from the
database catalog e.g. number of tuples in each relation, size of tuples, etc.
 Cost is generally measured as total elapsed time for
answering query
 Number of seeks * average-seek-cost
 Number of blocks read * average-block-read-cost
 Number of blocks written * average-block-write-cost

Measures of Query Cost
 Costs depends on the size of the buffer in main memory
 Having more memory reduces need for disk access
 Amount of real memory available to buffer depends on other
concurrent OS processes, and hard to determine ahead of actual
execution
 We often use worst case estimates, assuming only the minimum
amount of memory needed for the operation is available
 Real systems take CPU cost into account, differentiate
between sequential and random I/O, and take buffer size
into account

Selection Operation
 File scan – search algorithms that locate and retrieve records
that fulfill a selection condition.
 Algorithm A1 (linear search). Scan each file block and test all
records to see whether they satisfy the selection condition.
 A2 (binary search). Applicable if selection is an equality
comparison on the attribute on which file is ordered.
 Index scan – search algorithms that use an index
 selection condition must be on search-key of index.

Cont…
• A3 (primary index on candidate key, equality). Retrieve a
single record that satisfies the corresponding equality
condition
• A4 (primary index on nonkey, equality) Retrieve multiple
records.
• A5 (equality on search-key of secondary index).
• A6 (primary index, comparison). (Relation is sorted on A)
• A7 (secondary index, comparison).

Cont…
• Conjunction: σθ1∧ θ2∧. . . θn(r)
• A8 (conjunctive selection using one index).
• A9 (conjunctive selection using multiple-key index).
• A10 (conjunctive selection by intersection of identifiers).
• Disjunction:σθ1∨ θ2∨. . . θn(r).
• A11 (disjunctive selection by union of identifiers).
• Negation: σ¬θ(r)

Sorting
 We may build an index on the relation, and then use the index
to read the relation in sorted order. May lead to one disk
block access for each tuple.
 For relations that fit in memory, techniques like quicksort can
be used. For relations that don’t fit in memory, external
sort-merge is a good choice.

External Sorting Using Sort-Merge
 Create sorted
runs.
 Merge the runs
(N-way merge).

Join Operation
Several different algorithms to implement
joins
 Nested-loop join
 Block nested-loop join
 Indexed nested-loop join
 Merge-join
 Hash-join

Nested-Loop Join
• To compute the theta join r θ s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition θ
if they do, add tr • ts to the result.
end
end
• r is called the outer relation and s the inner relation of the join.
• Requires no indices and can be used with any kind of join condition.
• Expensive since it examines every pair of tuples in the two relations.

Block Nested-Loop Join
• Variant of nested-loop join in which every block of inner relation is paired
with every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
if they do, add tr• ts to the result.
end
end
end
end

Indexed Nested-Loop Join
 Index lookups can replace file scans if
 Join is an equi-join or natural join and
 An index is available on the inner relation’s join attribute
▪ Can construct an index just to compute a join
 For each tuple tr in the outer relation r, use the index to
look up tuples in s that satisfy the join condition with
tuple tr.
 Worst case: buffer has space for only one page of r, and,
for each tuple in r, we perform an index lookup on s.

Merge-Join
1. Sort both relations on their join
attribute (if not already sorted
on the join attributes).
2. Merge the sorted relations to
join them
1. Join step is similar to the merge
stage of the sort-merge algorithm.
2. Main difference is handling of
duplicate values in join attribute
— every pair with same value on
join attribute must be matched
3. Detailed algorithm in book

Hash-Join
 Applicable for equi-
joins and natural
joins.
 A hash function h is
used to partition
tuples of both
relations

Evaluation of Expressions
• Alternatives for evaluating an entire expression
tree
– Materialization: generate results of an expression
whose inputs are relations or are already computed,
materialize (store) it on disk. Repeat.
– Pipelining: pass on tuples to parent operations even
as an operation is being executed

Materialization
 Materialized evaluation: evaluate
one operation at a time, starting at
the lowest-level. Use intermediate
results materialized into temporary
relations to evaluate next-level
operations.
 E.g., in figure below, compute and
store
then compute the store its join with
customer, and finally compute the
projections on customer-name.
)(2500 accountbalance<σ

Pipelining
 Pipelined evaluation : evaluate several operations
simultaneously, passing the results of one operation on to the
next.
 Much cheaper than materialization: no need to store a
temporary relation to disk.
 Pipelining may not always be possible – e.g., sort, hash-join.
 Pipelines can be executed in two ways: demand driven and
producer driven

Demand driven or lazy evaluation
• System repeatedly requests next tuple from top level
operation
• Each operation requests next tuple from children
operations as required, in order to output its next tuple
• In between calls, operation has to maintain “state” so it
knows what to return next
• Each operation is implemented as an iterator
implementing the following operations

Cont…
 open()
▪ E.g. file scan: initialize file scan, store pointer to beginning of file
as state
▪ E.g.merge join: sort relations and store pointers to beginning of
sorted relations as state
 next()
▪ E.g. for file scan: Output next tuple, and advance and store file
pointer
▪ E.g. for merge join: continue with merge from earlier state till
next output tuple is found. Save pointers as iterator state.
 close()

Evaluation Plan
• An evaluation plan defines exactly what algorithm is used for each
operation, and how the execution of the operations is coordinated.

Transformation of Relational
Expressions

Pictorial Depiction of Equivalence
Rules

Heuristic Optimization
• Cost-based optimization is expensive, even with
dynamic programming.
• Systems may use heuristics to reduce the number of
choices that must be made in a cost-based fashion.
• Heuristic optimization transforms the query-tree by
using a set of rules that typically (but not in all cases)
improve execution performance

Cont…
 Perform selection early (reduces the number of tuples)
 Perform projection early (reduces the number of
attributes)
 Perform most restrictive selection and join operations
before other similar operations.
 Some systems use only heuristics, others combine
heuristics with partial cost-based optimization.

Steps in Typical Heuristic Optimization
1. Deconstruct conjunctive selections into a sequence of single
selection operations (Equiv. rule 1.).
2. Move selection operations down the query tree for the
earliest possible execution (Equiv. rules 2, 7a, 7b, 11).
3. Execute first those selection and join operations that will
produce the smallest relations (Equiv. rule 6).

Cont…
4. Replace Cartesian product operations that are followed by a
selection condition by join operations (Equiv. rule 4a).
5. Deconstruct and move as far down the tree as possible lists of
projection attributes, creating new projections where needed (Equiv.
rules 3, 8a, 8b, 12).
6. Identify those subtrees whose operations can be pipelined, and
execute them using pipelining).

Query processing System

More Related Content

What's hot (20)

Similar to Query processing System (20)

Recently uploaded (20)

Query processing System

Editor's Notes