SlideShare a Scribd company logo
Query Processing System
QUERY
 Overview
 Measures of Query Cost
 Selection Operation
 Sorting
 Join Operation
 Other Operations
 Evaluation of Expressions
 Catalog Information for Cost Estimation
 Estimation of Statistics
 Transformation of Relational Expressions
 Dynamic Programming for Choosing Evaluation Plans
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Cont…
• Parsing and translation
– Translate the query into its internal form. This is then translated
into relational algebra.
– Parser checks syntax, verifies relations
• Evaluation
– The query-execution engine takes a query-evaluation plan,
executes that plan, and returns the answers to the query.
Query Optimization
 Amongst all equivalent evaluation plans choose the one with
lowest cost.
 Cost is estimated using statistical information from the
database catalog e.g. number of tuples in each relation, size of tuples, etc.
 Cost is generally measured as total elapsed time for
answering query
 Number of seeks * average-seek-cost
 Number of blocks read * average-block-read-cost
 Number of blocks written * average-block-write-cost
Measures of Query Cost
 Costs depends on the size of the buffer in main memory
 Having more memory reduces need for disk access
 Amount of real memory available to buffer depends on other
concurrent OS processes, and hard to determine ahead of actual
execution
 We often use worst case estimates, assuming only the minimum
amount of memory needed for the operation is available
 Real systems take CPU cost into account, differentiate
between sequential and random I/O, and take buffer size
into account
Selection Operation
 File scan – search algorithms that locate and retrieve records
that fulfill a selection condition.
 Algorithm A1 (linear search). Scan each file block and test all
records to see whether they satisfy the selection condition.
 A2 (binary search). Applicable if selection is an equality
comparison on the attribute on which file is ordered.
 Index scan – search algorithms that use an index
 selection condition must be on search-key of index.
Cont…
• A3 (primary index on candidate key, equality). Retrieve a
single record that satisfies the corresponding equality
condition
• A4 (primary index on nonkey, equality) Retrieve multiple
records.
• A5 (equality on search-key of secondary index).
• A6 (primary index, comparison). (Relation is sorted on A)
• A7 (secondary index, comparison).
Cont…
• Conjunction: σθ1∧ θ2∧. . . θn(r)
• A8 (conjunctive selection using one index).
• A9 (conjunctive selection using multiple-key index).
• A10 (conjunctive selection by intersection of identifiers).
• Disjunction:σθ1∨ θ2∨. . . θn(r).
• A11 (disjunctive selection by union of identifiers).
• Negation: σ¬θ(r)
Sorting
 We may build an index on the relation, and then use the index
to read the relation in sorted order. May lead to one disk
block access for each tuple.
 For relations that fit in memory, techniques like quicksort can
be used. For relations that don’t fit in memory, external
sort-merge is a good choice.
External Sorting Using Sort-Merge
 Create sorted
runs.
 Merge the runs
(N-way merge).
Join Operation
Several different algorithms to implement
joins
 Nested-loop join
 Block nested-loop join
 Indexed nested-loop join
 Merge-join
 Hash-join
Nested-Loop Join
• To compute the theta join r θ s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition θ
if they do, add tr • ts to the result.
end
end
• r is called the outer relation and s the inner relation of the join.
• Requires no indices and can be used with any kind of join condition.
• Expensive since it examines every pair of tuples in the two relations.
Block Nested-Loop Join
• Variant of nested-loop join in which every block of inner relation is paired
with every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
if they do, add tr• ts to the result.
end
end
end
end
Indexed Nested-Loop Join
 Index lookups can replace file scans if
 Join is an equi-join or natural join and
 An index is available on the inner relation’s join attribute
▪ Can construct an index just to compute a join
 For each tuple tr in the outer relation r, use the index to
look up tuples in s that satisfy the join condition with
tuple tr.
 Worst case: buffer has space for only one page of r, and,
for each tuple in r, we perform an index lookup on s.
Merge-Join
1. Sort both relations on their join
attribute (if not already sorted
on the join attributes).
2. Merge the sorted relations to
join them
1. Join step is similar to the merge
stage of the sort-merge algorithm.
2. Main difference is handling of
duplicate values in join attribute
— every pair with same value on
join attribute must be matched
3. Detailed algorithm in book
Hash-Join
 Applicable for equi-
joins and natural
joins.
 A hash function h is
used to partition
tuples of both
relations
Evaluation of Expressions
• Alternatives for evaluating an entire expression
tree
– Materialization: generate results of an expression
whose inputs are relations or are already computed,
materialize (store) it on disk. Repeat.
– Pipelining: pass on tuples to parent operations even
as an operation is being executed
Materialization
 Materialized evaluation: evaluate
one operation at a time, starting at
the lowest-level. Use intermediate
results materialized into temporary
relations to evaluate next-level
operations.
 E.g., in figure below, compute and
store
then compute the store its join with
customer, and finally compute the
projections on customer-name.
)(2500 accountbalance<σ
Pipelining
 Pipelined evaluation : evaluate several operations
simultaneously, passing the results of one operation on to the
next.
 Much cheaper than materialization: no need to store a
temporary relation to disk.
 Pipelining may not always be possible – e.g., sort, hash-join.
 Pipelines can be executed in two ways: demand driven and
producer driven
Demand driven or lazy evaluation
• System repeatedly requests next tuple from top level
operation
• Each operation requests next tuple from children
operations as required, in order to output its next tuple
• In between calls, operation has to maintain “state” so it
knows what to return next
• Each operation is implemented as an iterator
implementing the following operations
Cont…
 open()
▪ E.g. file scan: initialize file scan, store pointer to beginning of file
as state
▪ E.g.merge join: sort relations and store pointers to beginning of
sorted relations as state
 next()
▪ E.g. for file scan: Output next tuple, and advance and store file
pointer
▪ E.g. for merge join: continue with merge from earlier state till
next output tuple is found. Save pointers as iterator state.
 close()
Evaluation Plan
• An evaluation plan defines exactly what algorithm is used for each
operation, and how the execution of the operations is coordinated.
Transformation of Relational
Expressions
Pictorial Depiction of Equivalence
Rules
Cont…
Query processing System
Cont…
Cont…
Cont…
Heuristic Optimization
• Cost-based optimization is expensive, even with
dynamic programming.
• Systems may use heuristics to reduce the number of
choices that must be made in a cost-based fashion.
• Heuristic optimization transforms the query-tree by
using a set of rules that typically (but not in all cases)
improve execution performance
Cont…
 Perform selection early (reduces the number of tuples)
 Perform projection early (reduces the number of
attributes)
 Perform most restrictive selection and join operations
before other similar operations.
 Some systems use only heuristics, others combine
heuristics with partial cost-based optimization.
Steps in Typical Heuristic Optimization
1. Deconstruct conjunctive selections into a sequence of single
selection operations (Equiv. rule 1.).
2. Move selection operations down the query tree for the
earliest possible execution (Equiv. rules 2, 7a, 7b, 11).
3. Execute first those selection and join operations that will
produce the smallest relations (Equiv. rule 6).
Cont…
4. Replace Cartesian product operations that are followed by a
selection condition by join operations (Equiv. rule 4a).
5. Deconstruct and move as far down the tree as possible lists of
projection attributes, creating new projections where needed (Equiv.
rules 3, 8a, 8b, 12).
6. Identify those subtrees whose operations can be pipelined, and
execute them using pipelining).

More Related Content

PDF
Query trees
PPTX
Cost estimation for Query Optimization
PPTX
Query evaluation and optimization
PPT
13. Query Processing in DBMS
PDF
8 query processing and optimization
PPT
Chapter15
PPTX
Query Optimization
PPTX
CS 542 -- Query Optimization
Query trees
Cost estimation for Query Optimization
Query evaluation and optimization
13. Query Processing in DBMS
8 query processing and optimization
Chapter15
Query Optimization
CS 542 -- Query Optimization

What's hot (20)

PPT
Overview of query evaluation
PPTX
Query processing and Query Optimization
PPT
Query processing-and-optimization
PPT
Query compiler
PPT
Query optimization
PDF
SQL: Query optimization in practice
PPTX
Query-porcessing-& Query optimization
PPT
Query optimisation
PPT
14. Query Optimization in DBMS
PPTX
Heuristic approch monika sanghani
PDF
U nit i data structure-converted
PDF
Unit ii data structure-converted
PPTX
Algorithm analysis in fundamentals of data structure
PPTX
Distributed Query Processing
PPTX
ADS Introduction
PDF
Query Optimization - Brandon Latronica
PPTX
Query processing
PPTX
Data structures and algorithms
Overview of query evaluation
Query processing and Query Optimization
Query processing-and-optimization
Query compiler
Query optimization
SQL: Query optimization in practice
Query-porcessing-& Query optimization
Query optimisation
14. Query Optimization in DBMS
Heuristic approch monika sanghani
U nit i data structure-converted
Unit ii data structure-converted
Algorithm analysis in fundamentals of data structure
Distributed Query Processing
ADS Introduction
Query Optimization - Brandon Latronica
Query processing
Data structures and algorithms
Ad

Similar to Query processing System (20)

PPTX
unit-2 Query processing and optimization,Query equivalence, Join strategies.pptx
PPTX
PDF
CH5_Query Processing and Optimization.pdf
PPTX
DB LECTURE 5 QUERY PROCESSING.pptx
PPTX
RDBMS
PDF
Implementation of query optimization for reducing run time
PDF
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
PPTX
Join operation
PPT
Query Decomposition and data localization
PDF
Query Processing, Query Optimization and Transaction
PPT
Algorithm ch13.ppt
PPT
queryprocessing of dbms presenataions.ppt
PDF
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
PPT
Data structure
PDF
Design Analysis and Algorithm Module1.pdf
PPTX
UNIT 1 Memory ManagementMemory Management.pptx
PPTX
UNIT 1.pptx
PDF
Hadoop map reduce concepts
PDF
Tech Talk - JPA and Query Optimization - publish
unit-2 Query processing and optimization,Query equivalence, Join strategies.pptx
CH5_Query Processing and Optimization.pdf
DB LECTURE 5 QUERY PROCESSING.pptx
RDBMS
Implementation of query optimization for reducing run time
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
Join operation
Query Decomposition and data localization
Query Processing, Query Optimization and Transaction
Algorithm ch13.ppt
queryprocessing of dbms presenataions.ppt
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Data structure
Design Analysis and Algorithm Module1.pdf
UNIT 1 Memory ManagementMemory Management.pptx
UNIT 1.pptx
Hadoop map reduce concepts
Tech Talk - JPA and Query Optimization - publish
Ad

Recently uploaded (20)

PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Lesson notes of climatology university.
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
master seminar digital applications in india
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
PPH.pptx obstetrics and gynecology in nursing
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Lesson notes of climatology university.
Abdominal Access Techniques with Prof. Dr. R K Mishra
Sports Quiz easy sports quiz sports quiz
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
STATICS OF THE RIGID BODIES Hibbelers.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
GDM (1) (1).pptx small presentation for students
human mycosis Human fungal infections are called human mycosis..pptx
Microbial diseases, their pathogenesis and prophylaxis
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Pre independence Education in Inndia.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
master seminar digital applications in india
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
TR - Agricultural Crops Production NC III.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPH.pptx obstetrics and gynecology in nursing

Query processing System

  • 2. QUERY  Overview  Measures of Query Cost  Selection Operation  Sorting  Join Operation  Other Operations  Evaluation of Expressions  Catalog Information for Cost Estimation  Estimation of Statistics  Transformation of Relational Expressions  Dynamic Programming for Choosing Evaluation Plans
  • 3. Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation
  • 4. Cont… • Parsing and translation – Translate the query into its internal form. This is then translated into relational algebra. – Parser checks syntax, verifies relations • Evaluation – The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.
  • 5. Query Optimization  Amongst all equivalent evaluation plans choose the one with lowest cost.  Cost is estimated using statistical information from the database catalog e.g. number of tuples in each relation, size of tuples, etc.  Cost is generally measured as total elapsed time for answering query  Number of seeks * average-seek-cost  Number of blocks read * average-block-read-cost  Number of blocks written * average-block-write-cost
  • 6. Measures of Query Cost  Costs depends on the size of the buffer in main memory  Having more memory reduces need for disk access  Amount of real memory available to buffer depends on other concurrent OS processes, and hard to determine ahead of actual execution  We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available  Real systems take CPU cost into account, differentiate between sequential and random I/O, and take buffer size into account
  • 7. Selection Operation  File scan – search algorithms that locate and retrieve records that fulfill a selection condition.  Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition.  A2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered.  Index scan – search algorithms that use an index  selection condition must be on search-key of index.
  • 8. Cont… • A3 (primary index on candidate key, equality). Retrieve a single record that satisfies the corresponding equality condition • A4 (primary index on nonkey, equality) Retrieve multiple records. • A5 (equality on search-key of secondary index). • A6 (primary index, comparison). (Relation is sorted on A) • A7 (secondary index, comparison).
  • 9. Cont… • Conjunction: σθ1∧ θ2∧. . . θn(r) • A8 (conjunctive selection using one index). • A9 (conjunctive selection using multiple-key index). • A10 (conjunctive selection by intersection of identifiers). • Disjunction:σθ1∨ θ2∨. . . θn(r). • A11 (disjunctive selection by union of identifiers). • Negation: σ¬θ(r)
  • 10. Sorting  We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each tuple.  For relations that fit in memory, techniques like quicksort can be used. For relations that don’t fit in memory, external sort-merge is a good choice.
  • 11. External Sorting Using Sort-Merge  Create sorted runs.  Merge the runs (N-way merge).
  • 12. Join Operation Several different algorithms to implement joins  Nested-loop join  Block nested-loop join  Indexed nested-loop join  Merge-join  Hash-join
  • 13. Nested-Loop Join • To compute the theta join r θ s for each tuple tr in r do begin for each tuple ts in s do begin test pair (tr,ts) to see if they satisfy the join condition θ if they do, add tr • ts to the result. end end • r is called the outer relation and s the inner relation of the join. • Requires no indices and can be used with any kind of join condition. • Expensive since it examines every pair of tuples in the two relations.
  • 14. Block Nested-Loop Join • Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation. for each block Br of r do begin for each block Bs of s do begin for each tuple tr in Br do begin for each tuple ts in Bs do begin Check if (tr,ts) satisfy the join condition if they do, add tr• ts to the result. end end end end
  • 15. Indexed Nested-Loop Join  Index lookups can replace file scans if  Join is an equi-join or natural join and  An index is available on the inner relation’s join attribute ▪ Can construct an index just to compute a join  For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr.  Worst case: buffer has space for only one page of r, and, for each tuple in r, we perform an index lookup on s.
  • 16. Merge-Join 1. Sort both relations on their join attribute (if not already sorted on the join attributes). 2. Merge the sorted relations to join them 1. Join step is similar to the merge stage of the sort-merge algorithm. 2. Main difference is handling of duplicate values in join attribute — every pair with same value on join attribute must be matched 3. Detailed algorithm in book
  • 17. Hash-Join  Applicable for equi- joins and natural joins.  A hash function h is used to partition tuples of both relations
  • 18. Evaluation of Expressions • Alternatives for evaluating an entire expression tree – Materialization: generate results of an expression whose inputs are relations or are already computed, materialize (store) it on disk. Repeat. – Pipelining: pass on tuples to parent operations even as an operation is being executed
  • 19. Materialization  Materialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level operations.  E.g., in figure below, compute and store then compute the store its join with customer, and finally compute the projections on customer-name. )(2500 accountbalance<σ
  • 20. Pipelining  Pipelined evaluation : evaluate several operations simultaneously, passing the results of one operation on to the next.  Much cheaper than materialization: no need to store a temporary relation to disk.  Pipelining may not always be possible – e.g., sort, hash-join.  Pipelines can be executed in two ways: demand driven and producer driven
  • 21. Demand driven or lazy evaluation • System repeatedly requests next tuple from top level operation • Each operation requests next tuple from children operations as required, in order to output its next tuple • In between calls, operation has to maintain “state” so it knows what to return next • Each operation is implemented as an iterator implementing the following operations
  • 22. Cont…  open() ▪ E.g. file scan: initialize file scan, store pointer to beginning of file as state ▪ E.g.merge join: sort relations and store pointers to beginning of sorted relations as state  next() ▪ E.g. for file scan: Output next tuple, and advance and store file pointer ▪ E.g. for merge join: continue with merge from earlier state till next output tuple is found. Save pointers as iterator state.  close()
  • 23. Evaluation Plan • An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated.
  • 25. Pictorial Depiction of Equivalence Rules
  • 31. Heuristic Optimization • Cost-based optimization is expensive, even with dynamic programming. • Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion. • Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance
  • 32. Cont…  Perform selection early (reduces the number of tuples)  Perform projection early (reduces the number of attributes)  Perform most restrictive selection and join operations before other similar operations.  Some systems use only heuristics, others combine heuristics with partial cost-based optimization.
  • 33. Steps in Typical Heuristic Optimization 1. Deconstruct conjunctive selections into a sequence of single selection operations (Equiv. rule 1.). 2. Move selection operations down the query tree for the earliest possible execution (Equiv. rules 2, 7a, 7b, 11). 3. Execute first those selection and join operations that will produce the smallest relations (Equiv. rule 6).
  • 34. Cont… 4. Replace Cartesian product operations that are followed by a selection condition by join operations (Equiv. rule 4a). 5. Deconstruct and move as far down the tree as possible lists of projection attributes, creating new projections where needed (Equiv. rules 3, 8a, 8b, 12). 6. Identify those subtrees whose operations can be pipelined, and execute them using pipelining).

Editor's Notes

  • #18: Applicable for equi-joins and natural joins. A hash function h is used to partition tuples of both relations