SlideShare a Scribd company logo
Advanced Database
System
Chapter Two
Query processing and
Optimization
2
Query processing and
Optimization
3
Parsing checks the
query syntax to
determine whether
it is formulated
according to the
syntax rules (rules
of grammar) of the
query language.
scanner
identifies the
query tokens—
such as SQL
keywords,
attribute names,
and relation
names—that
appear in the
text of the query
validate checking
that all attribute
and relation
names are valid
and semantically
meaningful names
in the schema of
the particular
database being
queried.
Query processing
 What is Query Processing?
‱ Steps required to transform high level SQL query into a
correct and “efficient” strategy for execution and retrieval.
‱ Processing can be divided into : Decomposition,
Optimization, Execution, and Code generation
1. Query Decomposition
‱ It is the process of transforming a high level query
into a relational algebra query, and to check that
the query is syntactically and semantically correct. It
Consists of parsing and validation
5
Typical stages in query decomposition are:
i. Analysis: lexical and syntactical analysis of the
query(correctness) based on attributes, data type.. ,. Query
tree will be built for the query containing leaf node for base
relations, one or many non-leaf nodes for relations produced
by relational algebra operations and root node for the result of
the query. Sequence of operation is from the leaves to the
root.
(SELECT * FROM Catalog c ,Author a Where a.authorid =
c.authorid AND c.price>200 AND a.country= ‘ USA’ )
ii. Normalization: convert the query into a normalized form.
The predicate WHERE will be converted to Conjunctive ( )
√
or Disjunctive ( ) Normal form.
∧
6
iii. Semantic Analysis: to reject normalized queries that
are not correctly formulated or contradictory. Incorrect
if components do not contribute to generate result.
Contradictory if the predicate can not be satisfied by any
tuple. Say for example,(Catalog =“BS”  Catalog= “CS”)
since a given book can only be classified in either of the
category at a time
iv. Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform the
query to a semantically equivalent but more easily and
effectively computed form. For example, If a user don’t
have the necessary access to all of the objects of the
query , it should be rejected.
7
2. Query Optimization
What is Query Optimization?
– The activity of choosing a single “efficient” execution
strategy (from hundreds) as determined by database
catalog statistics.
– Which relational algebra expression, equivalent to the
given query, will lead to the most efficient solution
plan?
– For each algebraic operator, what algorithm (of several
available) do we use to compute that operator?
– How do operations pass data (main memory buffer,
disk buffer,
)?
8
 Everyone wants the performance of their database to be optimal. In particular,
there is often a requirement for a specific query or object that is query based, to
run faster.
 Problem of query optimization is to find the sequence of steps that produces
the answer to user request in the most efficient manner, given the database
structure.
 The performance of a query is affected by the tables or queries that underlies
the query and by the complexity of the query.
 Given a request for data manipulation or retrieval, an optimizer will choose an
optimal plan for evaluating the request from among the manifold alternative
strategies. i.e. there are many ways (access paths) for accessing desired
file/record.
 hence ,DBMS is responsible to pick the best execution strategy based on
various considerations( Least amount of I/O and CPU resources. ) 9

continued
‱ A query typically has many possible execution
strategies, and the process of choosing a suitable
one for processing a query is known as query
optimization.
‱ Is not the optimal (or absolute best) strategy—it is
just a reasonably efficient strategy for executing
the query.
10

continued
‱ There are two main techniques that are employed
during query optimization.
‱ The first technique is based on heuristic rules for
ordering the operations in a query execution strategy. A
heuristic is a rule that works well in most cases but is
not guaranteed to work well in every case. The rules
typically reorder the operations in a query tree.
‱ The second technique involves systematically
estimating the cost of different execution strategies and
choosing the execution plan with the lowest cost
estimate. These techniques are usually combined in a
query optimizer.
11

continued
 Example: Consider relations r(AB) and s(CD). We
require r X s.
 Method 1 :
a. Load next record of r in RAM.
b. Load all records of s, one at a time and
concatenate with r.
c. All records of r concatenated?
 NO: goto a.
 YES: exit (the result in RAM or on disk).
 Performance: Too many accesses.
12

continued
 Method 2: Improvement
a. Load as many blocks of r as possible leaving
room for one block of s.
b. Run through the s file completely one block
at a time.
 Performance: Reduces the number of times s blocks are
loaded by a factor of equal to the number of r records than
can fit in main memory.
 Considerations during query Optimization:
– Narrow down intermediate result sets
quickly. SELECT and PROJECTION before
JOIN
– Use access structures (indexes).
13
Using Heuristics in Query Optimization
‱ In practice, SQL is the query language that is
used in most commercial RDBMSs. An SQL
query is first translated into an equivalent
extended relational algebra expression-
represented as a query tree data structure-
that is then optimized.
‱ Typically, SQL queries are decomposed into
query blocks, which form the basic units that
can be translated into the algebraic operators
and optimized.
1
5
Transformation rule for relational
algebra with example
1. Cascade of SELECTION
Rule: Multiple SELECTION operations
can be combined into a single
SELECTION operation.
Example:
 Initial Query:
 Optimized Query:
Explanation: Instead of first selecting
employees with a salary greater than
50,000 and then selecting those older
than 30, you can combine these
conditions into one SELECTION
2. Commutativity of SELECTION
Rule: The order of SELECTION
operations can be interchanged
without affecting the result.
Example:
 Initial Query:
 Equivalent Query:
Explanation: Whether you first
select employees older than 30 or
those in the HR department, the
final result will be the same.
Transformation rule for relational
algebra with example
.
3. Cascade of PROJECTION
Rule: In a sequence of
PROJECTION operations, only the
last one is necessary.
Example:
 Initial Query:
 Optimized Query:
Explanation: If you first project
the attributes name, age, and
salary, and then project only
name and age, you can directly
project name and age from the
start.
4. Commutativity of SELECTION with
PROJECTION
Rule: SELECTION and PROJECTION
operations can be interchanged if the
SELECTION predicate involves only the
attributes in the PROJECTION list.
Example:
 Initial Query:
 Equivalent Query:
Explanation: If you first project the
attributes name and age and then
select employees older than 30, or if
you first select employees older than
30 and then project name and age, the
Transformation rule for relational
algebra with example
.
5. Commutativity of THETA JOIN/Cartesian Product
Rule: The THETA JOIN (⹝) and Cartesian Product (×)
operations are commutative, meaning the order of
the relations can be swapped without affecting the
result.
Example:
 Initial Query:
R×S
 Equivalent Query:
S×R
Explanation: Whether you join R with S or S with R,
the result will be the same set of tuples.
Transformation rule for relational
algebra with example
.
Case b: SELECTION Predicate
Involves Attributes of Both
Relations
Example:
 Initial Query:
 Equivalent Query:
Explanation: If c1 involves only
attributes of R and c2 involves
only attributes of S, you can first
select the tuples from R that
satisfy c1 and the tuples from S
that satisfy c2, and then join the
results.
6. Commutativity of SELECTION with
THETA JOIN
Rule: If the SELECTION predicate
involves only attributes of one of the
relations being joined, the SELECTION
and JOIN operations can be
interchanged.
Case a: SELECTION Predicate
Involves Only Attributes of One
Relation
Example:
 Initial Query:
 Equivalent Query:
Explanation: If the predicate c1
involves only attributes of R, you can
Transformation rule for relational
algebra with example
.
7. Commutativity of PROJECTION and THETA JOIN
Rule: If the projection list is of the form
L1, L2, where L1 involves only attributes of R and L2 involves
only attributes of S being joined, and the predicate Ξ involves
only attributes in the projection list, then:
Example:
 Initial Query:
 Optimized Query:
Explanation: Instead of projecting the attributes after the join,
you can project the relevant attributes from each relation
before performing the join.
Transformation rule for relational
algebra with example
.
8. Commutativity of the Set
Operations: UNION and
INTERSECTION but not SET
DIFFERENCE
Rule: UNION and INTERSECTION
operations are commutative, but
SET DIFFERENCE is not.
Example:
 Initial Query:
 Optimized Query:
Explanation: The order of UNION
9. Associativity of the THETA JOIN,
CARTESIAN PRODUCT, UNION, and
INTERSECTION
Rule: These operations are associative.
Explanation: The order in which you
perform the JOIN, CARTESIAN
PRODUCT, UNION, and INTERSECTION
does not affect the final result.
Transformation rule for relational
algebra with example
.
10. Commuting SELECTION with SET OPERATIONS
Rule: SELECTION operations can commute with UNION and
INTERSECTION.
Example:
Explanation: Instead of applying the SELECTION after the UNION,
you can apply the SELECTION to each relation before performing
the UNION.
Transformation rule for relational
algebra with example
.
11. Commuting PROJECTION with UNION
Rule: PROJECTION operations can commute with UNION.
Example:
Explanation: Instead of projecting the attributes after the UNION,
you can project the relevant attributes from each relation before
performing the UNION.
24
Using Heuristics
Heuristic optimization in query processing
involves using rule-based techniques to
transform a query into a more efficient form.
Here’s a detailed explanation of the process:
Process for heuristics optimization
1. Initial Internal Representation:
 When a high-level query (like SQL) is
submitted, the parser translates it
into an initial internal representation,
often in the form of a relational
algebra tree. This tree represents the
logical steps needed to execute the
query.
Using Heuristics

2. Applying Heuristic Rules:
o Heuristic rules are applied to this internal
representation to optimize it. These rules are
based on general principles that typically lead
to more efficient query execution. Some
common heuristic rules include:
 Selection Pushdown: Moving selection
operations as close to the base relations as
possible to reduce the size of intermediate
results.
 Projection Pushdown: Moving projection
operations down the query tree to
eliminate unnecessary columns early.
 Join Reordering: Reordering join
operations to minimize the size of
Using Heuristics

3. Generating a Query Execution Plan:
 After applying heuristic rules, the optimized
internal representation is used to generate
a query execution plan. This plan outlines
the specific steps and methods the DBMS
will use to execute the query.
 The execution plan considers the access
paths available, such as indexes and
sequential scans, to determine the most
efficient way to retrieve and process the
data.
 The plan may include operations like index
scans, nested loop joins, hash joins, and
sort-merge joins, depending on the
available access paths and the structure of
Using Heuristics

 The main heuristic is to apply first the operations that reduce
the size of intermediate results.
– E.g. Apply SELECT and PROJECT operations
before applying the JOIN or other binary operations.
Intermediate results in the context of database
query processing are the temporary data sets
produced during the execution of a query before
arriving at the final result. Intermediate results are
not stored permanently in the database. They exist
only for the duration of the query execution and are
discarded once the final result is produced. Sli
de
15-
28

continued
‱ Heuristics Approach uses the knowledge of the
characteristics of the relational algebra operations and
the relationship between the operators to optimize the
query.
‱ Thus the heuristic approach of optimization will make
use of:
– Properties of individual operators
– Association between operators
– Query Tree: a graphical representation of the operators,
relations, attributes and predicates and processing
sequence during query processing.
‱ It is composed of three main parts:
– Sequence of execution of operation in a query tree will
29

continued
 Query block: The basic unit that can be translated
into the algebraic operators and optimized.
 A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
 Nested queries within a query are identified as
separate query blocks.
 There are two types of nested queries: 30
Uncorrelated Nested Queries
Uncorrelated nested queries could be
performed separately and their results will be
used in outer query.
SELECT name
FROM employees
WHERE department_id IN (SELECT department_id
FROM departments WHERE location = 'New York’);
In this example, the inner query (SELECT
department_id FROM departments WHERE location
= 'New York') is executed first, and its result is used
by the outer query to filter employees.
Correlated Nested Queries
‱ Correlated nested queries need
information (tuple variable) from outer
query in their execution.
SELECT name
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM
employees WHERE department_id =
e.department_id);
In this example, the inner query (SELECT
AVG(salary) FROM employees WHERE
department_id = e.department_id) depends on the
department_id of each row in the outer query.
Therefore, the inner query is executed for each
employee to compare their salary with the average
Sli
de
15-
33
‱ Query tree:
– A tree data structure that corresponds to a relational
algebra expression. It represents the input relations
of the query as leaf nodes of the tree, and represents
the relational algebra operations as internal nodes.
– Leafs: the base relations used for processing
the query/ extracting the required information
– Root: the final result/relation as an out put
based on the operation on the relations used
for query processing
– Nodes: intermediate results or relations
before reaching the final result.
‱ An execution of the query tree consists of executing an
internal node operation whenever its operands are
available and then replacing that internal node by the
‱ A query graph is a visual representation used in
database theory to illustrate a relational calculus
expression. Here’s a breakdown of the key points:
 Graph Data Structure: The query graph is a type of
graph that visually represents the relationships and
constraints of a query.
 Relational Calculus Expression: It corresponds to a
relational calculus expression, which is a non-
procedural query language used to specify what
data to retrieve rather than how to retrieve it.
 No Operation Order: The graph does not specify
the order in which operations should be performed.
It simply shows the relationships and constraints.
 Uniqueness: Each query has a unique
corresponding graph, meaning there is only one
34
Query graph

continued
 Example:
‱ For every project located in ‘Stafford’, retrieve the project number, the
controlling department number and the department manager’s last
name, address and birthdate.
 Relation algebra:
πPNUMBER, DNUM, LNAME, ADDRESS, BDATE (((σPLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
 SQL query:
SELECT P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS,
E.BDATE FROM PROJECT AS P,DEPARTMENT AS D,
EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’;
35
Sli
de
15-
36
Sli
de
15-
37

cont
Step 1. Perform Selection operation as early as
possible : By using selection operation at early
stages, you can reduce the unwanted number of
record or data, to transfer from database to
primary memory. Optimizer use transformation
rule 1 to divide selection operations with
conjunctive conditions into a cascade of selection
operations.

 cont
Step 2. Perform commutativity of selection operation
with other operations as early as possible : Optimizer
use transformation rule 2, 4, 6, and 9 to move
selection operation as far down the tree as possible
and keep selection predicates on the same relation
together. By keeping selection operation down at
tree reduces the unwanted data transfer and by
keeping selection predicates together on same
relations reduces the number of times of database
manipulation to retrieve records from same
database table.

 cont
Step 3. Combine the Cartesian Product with subsequent
selection operation whose predicates represents a join
condition into a JOIN operation : Optimizer uses
transformation rule 13 to convert a selection and
cartesian product sequence into join. It reduces data
transfer. It is always better to transfer only required data
from database instead of transferring whole data and
then refine it. (Cartesian product combines all data of all
the tables mention in query while join operation retrieves
only those records from database that satisfy the join
condition).
Step 4. Use Commutativity and Associativity of Binary
operations : Optimizer use transformation rules 5, 11, and
12 to execute the most restrictive selection operations
first.
Step 5. Perform projection operations as early as possible :
After performing selection operations, optimizer use
transformation rules 3, 4, 7 and 10 to reduce the number
of columns of a relation by moving projection operations
as far down the tree as possible and keeping projection
predicates on the same relation together.
Step 6. Compute common expressions only once: It is used
to identify sub-trees that represent groups of operations
that can be executed by a single algorithm.
‱ Heuristic Optimization of Query Trees:
– The same query could correspond to many
different relational algebra expressions — and
hence many different query trees.
– The task of heuristic optimization of query trees
is to find a final query tree that is efficient to
execute.
‱ Example:
Q2: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’AND
PNMUBER=PNO AND ESSN=SSN AND BDATE
Sli
de
15-
42
Sli
de
15-
43
(a) Initial (canonical)
query tree for SQL
query Q.
Executing this tree directly
first creates a very large file
containing the CARTESIAN
PRODUCT of the entire
EMPLOYEE, WORKS_ON,
and PROJECT files.
(b) Moving SELECT
operations down the
query tree.
an improved query tree that
first applies the SELECT
operations to reduce the
number of tuples that appear in
the CARTESIAN PRODUCT.
(c) Applying the more
restrictive SELECT
operation first.
A further improvement is achieved
by switching the positions of the
EMPLOYEE and PROJECT
relations in the tree, as shown in
(c).This uses the information that
Pnumber is a key attribute of the
PROJECT relation, and hence the
SELECT operation on the
PROJECT relation will retrieve a
Sli
de
15-
44
(d) Replacing CARTESIAN
PRODUCT and SELECT
with JOIN operations.
We can further improve the
query tree by replacing any
CARTESIAN PRODUCT
operation that is followed by a
join condition with a JOIN
operation
(e) Moving PROJECT
operations down the query
tree.
Another improvement is to keep
only the attributes needed by
subsequent operations in the
intermediate relations, by
including PROJECT (π) operations
as early as possible in the query
tree, as shown in (e). This reduces
the attributes (columns) of the
Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations that reduce the size
of intermediate results.
2. Perform select operations as early as possible to reduce the number of
tuples and perform project operations as early as possible to reduce the
number of attributes. (This is done by moving select and project
operations as far down the tree as possible.)
3. The select and join operations that are most restrictive should be
executed before other similar operations. (This is done by reordering
the leaf nodes of the tree among themselves and adjusting the rest of
the tree appropriately.)
Slide 15-
45
B. Cost Estimation Approach to Query Optimization
‱ The main idea is to minimize he cost of processing a query. The cost
function is comprised of:
‱ I/O cost + CPU processing cost + communication cost + Storage
cost
‱ These components might have different weights in different
processing environments
‱ The DBMs will use information stored in the system catalogue for
the purpose of estimating cost.
‱ The main target of query optimization is to minimize the size of the
intermediate relation. The size will have effect in the cost of:
‱ Disk Access
‱ Data Transportation
‱ Storage space in the Primary Memory
‱ Writing on Disk
46
‱ Cost-based query optimization:
‱ Estimate and compare the costs of executing a
query using different execution strategies and
choose the strategy with the lowest cost estimate.
(Compare to heuristic query optimization)
‱ Issues
‱ Cost function
‱ Number of execution strategies to be considered
Sli
de
15-
47
‱ Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
1. Access Cost of Secondary Storage
‱ Data is going to be accessed from secondary storage, as a query will
be needing some part of the data stored in the database. The disk
access cost can again be analyzed in terms of:
– Searching
– Reading, and
– Writing, data blocks used to store some portion of a
relation.
‱ Remark: The disk access cost will vary depending on
– The file organization used and the access method
implemented for the file organization.
– whether the data is stored contiguously or in
scattered manner, will affect the disk access cost.
48

continued
49
2. Storage Cost
‱ While processing a query, as any query would be
composed of many database operations, there could
be one or more intermediate results before reaching
the final output. These intermediate results should be
stored in primary memory for further processing. The
bigger the intermediate relation, the larger the
memory requirement, which will have impact on the
limited available space. This will be considered as a
3. Query Execution Plans
–An execution plan for a relational algebra
query consists of a combination of the
relational algebra query tree and
information about the access methods to be
used for each relation as well as the
methods to be used in computing the Sli
de
15-
50
4. Computation Cost
‱ Query is composed of many operations. The operations could be database
operations like reading and writing to a disk, or mathematical and other
operations like:
‱ Searching
‱ Sorting
‱ Merging
‱ Computation on field values
51
5. Communication Cost
‱ In most database systems the database resides in one
station and various queries originate from different
terminals. This will have impact on the performance
of the system adding cost for query processing. Thus,
the cost of transporting data between the database site
and the terminal from where the query originate
should be analyzed.

More Related Content

PPTX
Query processing and Optimization in Database
PPTX
DB LECTURE 5 QUERY PROCESSING.pptx
PPT
Query processing-and-optimization
PPTX
Ch-2-Query-Process.pptx advanced database
PPTX
700442110-advanced database Ch-2-Query-Process.pptx
PPTX
Chapter 4 - Query Processing and Optimization.pptx
PPTX
Concepts of Query Processing in ADBMS.pptx
PDF
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
Query processing and Optimization in Database
DB LECTURE 5 QUERY PROCESSING.pptx
Query processing-and-optimization
Ch-2-Query-Process.pptx advanced database
700442110-advanced database Ch-2-Query-Process.pptx
Chapter 4 - Query Processing and Optimization.pptx
Concepts of Query Processing in ADBMS.pptx
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK

Similar to Advanced Database System Chapter Two Query processing and Optimization.pptx (20)

PPT
Query optimisation
PDF
CH5_Query Processing and Optimization.pdf
PPTX
LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx
PPTX
Query processing and optimization (updated)
PPT
Query optimization and processing for advanced database systems
PPTX
Query processing
PDF
Query Optimization - Brandon Latronica
PPTX
Adbms 40 heuristics in query optimization
PPT
ch02-240507064009-ac337bf1 .ppt
PPT
QPOfutyfurfugfuyttruft7rfu65rfuyt PPT - Copy.ppt
PPTX
Lecture 5.pptx
PPTX
Transaction Management, Recovery and Query Processing.pptx
PDF
itm661-lecture0VBBBBBBBBBBBBBBM3-part2-2015.pdf
PPTX
Query-porcessing-& Query optimization
PPTX
Query optimization
PPTX
Query Execution Time and Query Optimization.
PPTX
Query processing
PPTX
Computer Science DBMS_Presentations_Unit-5.pptx
 
PPTX
Heuristic approch monika sanghani
PPTX
Query and optimizing operating system.pptx
Query optimisation
CH5_Query Processing and Optimization.pdf
LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx
Query processing and optimization (updated)
Query optimization and processing for advanced database systems
Query processing
Query Optimization - Brandon Latronica
Adbms 40 heuristics in query optimization
ch02-240507064009-ac337bf1 .ppt
QPOfutyfurfugfuyttruft7rfu65rfuyt PPT - Copy.ppt
Lecture 5.pptx
Transaction Management, Recovery and Query Processing.pptx
itm661-lecture0VBBBBBBBBBBBBBBM3-part2-2015.pdf
Query-porcessing-& Query optimization
Query optimization
Query Execution Time and Query Optimization.
Query processing
Computer Science DBMS_Presentations_Unit-5.pptx
 
Heuristic approch monika sanghani
Query and optimizing operating system.pptx
Ad

Recently uploaded (20)

PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Transform Your Business with a Software ERP System
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Introduction to Artificial Intelligence
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
System and Network Administration Chapter 2
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
PTS Company Brochure 2025 (1).pdf.......
Design an Analysis of Algorithms II-SECS-1021-03
Design an Analysis of Algorithms I-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025
Transform Your Business with a Software ERP System
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Introduction to Artificial Intelligence
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
System and Network Administration Chapter 2
How to Choose the Right IT Partner for Your Business in Malaysia
Which alternative to Crystal Reports is best for small or large businesses.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
2025 Textile ERP Trends: SAP, Odoo & Oracle
Odoo Companies in India – Driving Business Transformation.pdf
Digital Strategies for Manufacturing Companies
wealthsignaloriginal-com-DS-text-... (1).pdf
CHAPTER 2 - PM Management and IT Context
PTS Company Brochure 2025 (1).pdf.......
Ad

Advanced Database System Chapter Two Query processing and Optimization.pptx

  • 2. Chapter Two Query processing and Optimization 2
  • 3. Query processing and Optimization 3 Parsing checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language. scanner identifies the query tokens— such as SQL keywords, attribute names, and relation names—that appear in the text of the query validate checking that all attribute and relation names are valid and semantically meaningful names in the schema of the particular database being queried.
  • 4. Query processing  What is Query Processing? ‱ Steps required to transform high level SQL query into a correct and “efficient” strategy for execution and retrieval. ‱ Processing can be divided into : Decomposition, Optimization, Execution, and Code generation 1. Query Decomposition ‱ It is the process of transforming a high level query into a relational algebra query, and to check that the query is syntactically and semantically correct. It Consists of parsing and validation 5
  • 5. Typical stages in query decomposition are: i. Analysis: lexical and syntactical analysis of the query(correctness) based on attributes, data type.. ,. Query tree will be built for the query containing leaf node for base relations, one or many non-leaf nodes for relations produced by relational algebra operations and root node for the result of the query. Sequence of operation is from the leaves to the root. (SELECT * FROM Catalog c ,Author a Where a.authorid = c.authorid AND c.price>200 AND a.country= ‘ USA’ ) ii. Normalization: convert the query into a normalized form. The predicate WHERE will be converted to Conjunctive ( ) √ or Disjunctive ( ) Normal form. ∧ 6
  • 6. iii. Semantic Analysis: to reject normalized queries that are not correctly formulated or contradictory. Incorrect if components do not contribute to generate result. Contradictory if the predicate can not be satisfied by any tuple. Say for example,(Catalog =“BS”  Catalog= “CS”) since a given book can only be classified in either of the category at a time iv. Simplification: to detect redundant qualifications, eliminate common sub-expressions, and transform the query to a semantically equivalent but more easily and effectively computed form. For example, If a user don’t have the necessary access to all of the objects of the query , it should be rejected. 7
  • 7. 2. Query Optimization What is Query Optimization? – The activity of choosing a single “efficient” execution strategy (from hundreds) as determined by database catalog statistics. – Which relational algebra expression, equivalent to the given query, will lead to the most efficient solution plan? – For each algebraic operator, what algorithm (of several available) do we use to compute that operator? – How do operations pass data (main memory buffer, disk buffer,
)? 8
  • 8.  Everyone wants the performance of their database to be optimal. In particular, there is often a requirement for a specific query or object that is query based, to run faster.  Problem of query optimization is to find the sequence of steps that produces the answer to user request in the most efficient manner, given the database structure.  The performance of a query is affected by the tables or queries that underlies the query and by the complexity of the query.  Given a request for data manipulation or retrieval, an optimizer will choose an optimal plan for evaluating the request from among the manifold alternative strategies. i.e. there are many ways (access paths) for accessing desired file/record.  hence ,DBMS is responsible to pick the best execution strategy based on various considerations( Least amount of I/O and CPU resources. ) 9
  • 9. 
continued ‱ A query typically has many possible execution strategies, and the process of choosing a suitable one for processing a query is known as query optimization. ‱ Is not the optimal (or absolute best) strategy—it is just a reasonably efficient strategy for executing the query. 10
  • 10. 
continued ‱ There are two main techniques that are employed during query optimization. ‱ The first technique is based on heuristic rules for ordering the operations in a query execution strategy. A heuristic is a rule that works well in most cases but is not guaranteed to work well in every case. The rules typically reorder the operations in a query tree. ‱ The second technique involves systematically estimating the cost of different execution strategies and choosing the execution plan with the lowest cost estimate. These techniques are usually combined in a query optimizer. 11
  • 11. 
continued  Example: Consider relations r(AB) and s(CD). We require r X s.  Method 1 : a. Load next record of r in RAM. b. Load all records of s, one at a time and concatenate with r. c. All records of r concatenated?  NO: goto a.  YES: exit (the result in RAM or on disk).  Performance: Too many accesses. 12
  • 12. 
continued  Method 2: Improvement a. Load as many blocks of r as possible leaving room for one block of s. b. Run through the s file completely one block at a time.  Performance: Reduces the number of times s blocks are loaded by a factor of equal to the number of r records than can fit in main memory.  Considerations during query Optimization: – Narrow down intermediate result sets quickly. SELECT and PROJECTION before JOIN – Use access structures (indexes). 13
  • 13. Using Heuristics in Query Optimization ‱ In practice, SQL is the query language that is used in most commercial RDBMSs. An SQL query is first translated into an equivalent extended relational algebra expression- represented as a query tree data structure- that is then optimized. ‱ Typically, SQL queries are decomposed into query blocks, which form the basic units that can be translated into the algebraic operators and optimized.
  • 14. 1 5
  • 15. Transformation rule for relational algebra with example 1. Cascade of SELECTION Rule: Multiple SELECTION operations can be combined into a single SELECTION operation. Example:  Initial Query:  Optimized Query: Explanation: Instead of first selecting employees with a salary greater than 50,000 and then selecting those older than 30, you can combine these conditions into one SELECTION 2. Commutativity of SELECTION Rule: The order of SELECTION operations can be interchanged without affecting the result. Example:  Initial Query:  Equivalent Query: Explanation: Whether you first select employees older than 30 or those in the HR department, the final result will be the same.
  • 16. Transformation rule for relational algebra with example
. 3. Cascade of PROJECTION Rule: In a sequence of PROJECTION operations, only the last one is necessary. Example:  Initial Query:  Optimized Query: Explanation: If you first project the attributes name, age, and salary, and then project only name and age, you can directly project name and age from the start. 4. Commutativity of SELECTION with PROJECTION Rule: SELECTION and PROJECTION operations can be interchanged if the SELECTION predicate involves only the attributes in the PROJECTION list. Example:  Initial Query:  Equivalent Query: Explanation: If you first project the attributes name and age and then select employees older than 30, or if you first select employees older than 30 and then project name and age, the
  • 17. Transformation rule for relational algebra with example
. 5. Commutativity of THETA JOIN/Cartesian Product Rule: The THETA JOIN (⚝) and Cartesian Product (×) operations are commutative, meaning the order of the relations can be swapped without affecting the result. Example:  Initial Query: R×S  Equivalent Query: S×R Explanation: Whether you join R with S or S with R, the result will be the same set of tuples.
  • 18. Transformation rule for relational algebra with example
. Case b: SELECTION Predicate Involves Attributes of Both Relations Example:  Initial Query:  Equivalent Query: Explanation: If c1 involves only attributes of R and c2 involves only attributes of S, you can first select the tuples from R that satisfy c1 and the tuples from S that satisfy c2, and then join the results. 6. Commutativity of SELECTION with THETA JOIN Rule: If the SELECTION predicate involves only attributes of one of the relations being joined, the SELECTION and JOIN operations can be interchanged. Case a: SELECTION Predicate Involves Only Attributes of One Relation Example:  Initial Query:  Equivalent Query: Explanation: If the predicate c1 involves only attributes of R, you can
  • 19. Transformation rule for relational algebra with example
. 7. Commutativity of PROJECTION and THETA JOIN Rule: If the projection list is of the form L1, L2, where L1 involves only attributes of R and L2 involves only attributes of S being joined, and the predicate Ξ involves only attributes in the projection list, then: Example:  Initial Query:  Optimized Query: Explanation: Instead of projecting the attributes after the join, you can project the relevant attributes from each relation before performing the join.
  • 20. Transformation rule for relational algebra with example
. 8. Commutativity of the Set Operations: UNION and INTERSECTION but not SET DIFFERENCE Rule: UNION and INTERSECTION operations are commutative, but SET DIFFERENCE is not. Example:  Initial Query:  Optimized Query: Explanation: The order of UNION 9. Associativity of the THETA JOIN, CARTESIAN PRODUCT, UNION, and INTERSECTION Rule: These operations are associative. Explanation: The order in which you perform the JOIN, CARTESIAN PRODUCT, UNION, and INTERSECTION does not affect the final result.
  • 21. Transformation rule for relational algebra with example
. 10. Commuting SELECTION with SET OPERATIONS Rule: SELECTION operations can commute with UNION and INTERSECTION. Example: Explanation: Instead of applying the SELECTION after the UNION, you can apply the SELECTION to each relation before performing the UNION.
  • 22. Transformation rule for relational algebra with example
. 11. Commuting PROJECTION with UNION Rule: PROJECTION operations can commute with UNION. Example: Explanation: Instead of projecting the attributes after the UNION, you can project the relevant attributes from each relation before performing the UNION.
  • 23. 24
  • 24. Using Heuristics Heuristic optimization in query processing involves using rule-based techniques to transform a query into a more efficient form. Here’s a detailed explanation of the process: Process for heuristics optimization 1. Initial Internal Representation:  When a high-level query (like SQL) is submitted, the parser translates it into an initial internal representation, often in the form of a relational algebra tree. This tree represents the logical steps needed to execute the query.
  • 25. Using Heuristics
 2. Applying Heuristic Rules: o Heuristic rules are applied to this internal representation to optimize it. These rules are based on general principles that typically lead to more efficient query execution. Some common heuristic rules include:  Selection Pushdown: Moving selection operations as close to the base relations as possible to reduce the size of intermediate results.  Projection Pushdown: Moving projection operations down the query tree to eliminate unnecessary columns early.  Join Reordering: Reordering join operations to minimize the size of
  • 26. Using Heuristics
 3. Generating a Query Execution Plan:  After applying heuristic rules, the optimized internal representation is used to generate a query execution plan. This plan outlines the specific steps and methods the DBMS will use to execute the query.  The execution plan considers the access paths available, such as indexes and sequential scans, to determine the most efficient way to retrieve and process the data.  The plan may include operations like index scans, nested loop joins, hash joins, and sort-merge joins, depending on the available access paths and the structure of
  • 27. Using Heuristics
  The main heuristic is to apply first the operations that reduce the size of intermediate results. – E.g. Apply SELECT and PROJECT operations before applying the JOIN or other binary operations. Intermediate results in the context of database query processing are the temporary data sets produced during the execution of a query before arriving at the final result. Intermediate results are not stored permanently in the database. They exist only for the duration of the query execution and are discarded once the final result is produced. Sli de 15- 28
  • 28. 
continued ‱ Heuristics Approach uses the knowledge of the characteristics of the relational algebra operations and the relationship between the operators to optimize the query. ‱ Thus the heuristic approach of optimization will make use of: – Properties of individual operators – Association between operators – Query Tree: a graphical representation of the operators, relations, attributes and predicates and processing sequence during query processing. ‱ It is composed of three main parts: – Sequence of execution of operation in a query tree will 29
  • 29. 
continued  Query block: The basic unit that can be translated into the algebraic operators and optimized.  A query block contains a single SELECT-FROM- WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block.  Nested queries within a query are identified as separate query blocks.  There are two types of nested queries: 30
  • 30. Uncorrelated Nested Queries Uncorrelated nested queries could be performed separately and their results will be used in outer query. SELECT name FROM employees WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'New York’); In this example, the inner query (SELECT department_id FROM departments WHERE location = 'New York') is executed first, and its result is used by the outer query to filter employees.
  • 31. Correlated Nested Queries ‱ Correlated nested queries need information (tuple variable) from outer query in their execution. SELECT name FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id); In this example, the inner query (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) depends on the department_id of each row in the outer query. Therefore, the inner query is executed for each employee to compare their salary with the average
  • 32. Sli de 15- 33 ‱ Query tree: – A tree data structure that corresponds to a relational algebra expression. It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes. – Leafs: the base relations used for processing the query/ extracting the required information – Root: the final result/relation as an out put based on the operation on the relations used for query processing – Nodes: intermediate results or relations before reaching the final result. ‱ An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the
  • 33. ‱ A query graph is a visual representation used in database theory to illustrate a relational calculus expression. Here’s a breakdown of the key points:  Graph Data Structure: The query graph is a type of graph that visually represents the relationships and constraints of a query.  Relational Calculus Expression: It corresponds to a relational calculus expression, which is a non- procedural query language used to specify what data to retrieve rather than how to retrieve it.  No Operation Order: The graph does not specify the order in which operations should be performed. It simply shows the relationships and constraints.  Uniqueness: Each query has a unique corresponding graph, meaning there is only one 34 Query graph
  • 34. 
continued  Example: ‱ For every project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate.  Relation algebra: πPNUMBER, DNUM, LNAME, ADDRESS, BDATE (((σPLOCATION=‘STAFFORD’(PROJECT)) DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))  SQL query: SELECT P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS, E.BDATE FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’; 35
  • 37. 
cont Step 1. Perform Selection operation as early as possible : By using selection operation at early stages, you can reduce the unwanted number of record or data, to transfer from database to primary memory. Optimizer use transformation rule 1 to divide selection operations with conjunctive conditions into a cascade of selection operations.
  • 38. 
 cont Step 2. Perform commutativity of selection operation with other operations as early as possible : Optimizer use transformation rule 2, 4, 6, and 9 to move selection operation as far down the tree as possible and keep selection predicates on the same relation together. By keeping selection operation down at tree reduces the unwanted data transfer and by keeping selection predicates together on same relations reduces the number of times of database manipulation to retrieve records from same database table.
  • 39. 
 cont Step 3. Combine the Cartesian Product with subsequent selection operation whose predicates represents a join condition into a JOIN operation : Optimizer uses transformation rule 13 to convert a selection and cartesian product sequence into join. It reduces data transfer. It is always better to transfer only required data from database instead of transferring whole data and then refine it. (Cartesian product combines all data of all the tables mention in query while join operation retrieves only those records from database that satisfy the join condition). Step 4. Use Commutativity and Associativity of Binary operations : Optimizer use transformation rules 5, 11, and 12 to execute the most restrictive selection operations first.
  • 40. Step 5. Perform projection operations as early as possible : After performing selection operations, optimizer use transformation rules 3, 4, 7 and 10 to reduce the number of columns of a relation by moving projection operations as far down the tree as possible and keeping projection predicates on the same relation together. Step 6. Compute common expressions only once: It is used to identify sub-trees that represent groups of operations that can be executed by a single algorithm.
  • 41. ‱ Heuristic Optimization of Query Trees: – The same query could correspond to many different relational algebra expressions — and hence many different query trees. – The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute. ‱ Example: Q2: SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME = ‘AQUARIUS’AND PNMUBER=PNO AND ESSN=SSN AND BDATE Sli de 15- 42
  • 42. Sli de 15- 43 (a) Initial (canonical) query tree for SQL query Q. Executing this tree directly first creates a very large file containing the CARTESIAN PRODUCT of the entire EMPLOYEE, WORKS_ON, and PROJECT files. (b) Moving SELECT operations down the query tree. an improved query tree that first applies the SELECT operations to reduce the number of tuples that appear in the CARTESIAN PRODUCT. (c) Applying the more restrictive SELECT operation first. A further improvement is achieved by switching the positions of the EMPLOYEE and PROJECT relations in the tree, as shown in (c).This uses the information that Pnumber is a key attribute of the PROJECT relation, and hence the SELECT operation on the PROJECT relation will retrieve a
  • 43. Sli de 15- 44 (d) Replacing CARTESIAN PRODUCT and SELECT with JOIN operations. We can further improve the query tree by replacing any CARTESIAN PRODUCT operation that is followed by a join condition with a JOIN operation (e) Moving PROJECT operations down the query tree. Another improvement is to keep only the attributes needed by subsequent operations in the intermediate relations, by including PROJECT (π) operations as early as possible in the query tree, as shown in (e). This reduces the attributes (columns) of the
  • 44. Summary of Heuristics for Algebraic Optimization: 1. The main heuristic is to apply first the operations that reduce the size of intermediate results. 2. Perform select operations as early as possible to reduce the number of tuples and perform project operations as early as possible to reduce the number of attributes. (This is done by moving select and project operations as far down the tree as possible.) 3. The select and join operations that are most restrictive should be executed before other similar operations. (This is done by reordering the leaf nodes of the tree among themselves and adjusting the rest of the tree appropriately.) Slide 15- 45
  • 45. B. Cost Estimation Approach to Query Optimization ‱ The main idea is to minimize he cost of processing a query. The cost function is comprised of: ‱ I/O cost + CPU processing cost + communication cost + Storage cost ‱ These components might have different weights in different processing environments ‱ The DBMs will use information stored in the system catalogue for the purpose of estimating cost. ‱ The main target of query optimization is to minimize the size of the intermediate relation. The size will have effect in the cost of: ‱ Disk Access ‱ Data Transportation ‱ Storage space in the Primary Memory ‱ Writing on Disk 46
  • 46. ‱ Cost-based query optimization: ‱ Estimate and compare the costs of executing a query using different execution strategies and choose the strategy with the lowest cost estimate. (Compare to heuristic query optimization) ‱ Issues ‱ Cost function ‱ Number of execution strategies to be considered Sli de 15- 47 ‱ Cost Components for Query Execution 1. Access cost to secondary storage 2. Storage cost 3. Computation cost 4. Memory usage cost 5. Communication cost
  • 47. 1. Access Cost of Secondary Storage ‱ Data is going to be accessed from secondary storage, as a query will be needing some part of the data stored in the database. The disk access cost can again be analyzed in terms of: – Searching – Reading, and – Writing, data blocks used to store some portion of a relation. ‱ Remark: The disk access cost will vary depending on – The file organization used and the access method implemented for the file organization. – whether the data is stored contiguously or in scattered manner, will affect the disk access cost. 48
  • 48. 
continued 49 2. Storage Cost ‱ While processing a query, as any query would be composed of many database operations, there could be one or more intermediate results before reaching the final output. These intermediate results should be stored in primary memory for further processing. The bigger the intermediate relation, the larger the memory requirement, which will have impact on the limited available space. This will be considered as a
  • 49. 3. Query Execution Plans –An execution plan for a relational algebra query consists of a combination of the relational algebra query tree and information about the access methods to be used for each relation as well as the methods to be used in computing the Sli de 15- 50
  • 50. 4. Computation Cost ‱ Query is composed of many operations. The operations could be database operations like reading and writing to a disk, or mathematical and other operations like: ‱ Searching ‱ Sorting ‱ Merging ‱ Computation on field values 51 5. Communication Cost ‱ In most database systems the database resides in one station and various queries originate from different terminals. This will have impact on the performance of the system adding cost for query processing. Thus, the cost of transporting data between the database site and the terminal from where the query originate should be analyzed.