SlideShare a Scribd company logo
2
Most read
3
Most read
8
Most read
Prof. Neeraj Bhargava
Pooja Dixit
Department of Computer Science
School of Engineering & System Science
MDS, University Ajmer, Rajasthan, India
1
 Query optimization is a function of many relational database management systems. The
query optimizer attempts to determine the most efficient way to execute a given query
by considering the possible query plans.
 Generally, the query optimizer cannot be accessed directly by users: once queries are
submitted to the database server, and parsed by the parser, they are then passed to the
query optimizer where optimization occurs.
 Query optimization is a combination of:-
◦ Query: A query is a request for information from a database.
◦ Query Plans: A query plan (or query execution plan) is an ordered set of steps used to
access data in a SQL relational database management system.
◦ Query Optimization: A single query can be executed through different algorithms or
re-written in different forms and structures. Hence, the question of query
optimization comes into the picture – Which of these forms or pathways is the most
optimal? The query optimizer attempts to determine the most efficient way to execute
a given query by considering the possible query plans.
2
 Importance: The goal of query optimization is to reduce the system resources required to
fulfill a query, and ultimately provide the user with the correct result set faster.
◦ First, it provides the user with faster results, which makes the application seem faster to the user.
◦ Secondly, it allows the system to service more queries in the same amount of time, because each
request takes less time than unoptimized queries.
◦ Thirdly, query optimization ultimately reduces the amount of wear on the hardware (e.g. disk drives),
and allows the server to run more efficiently (e.g. lower power consumption, less memory usage).
 There are broadly two ways a query can be optimized:
◦ Analyze and transform equivalent relational expressions: Try to minimize the tuple and column counts
of the intermediate and final query processes (discussed here).
◦ Using different algorithms for each operation: These underlying algorithms determine how tuples are
accessed from the data structures they are stored in, indexing, hashing, data retrieval and hence
influence the number of disk and block accesses (discussed in query processing).
3
 Query processing refers to the range of activities involved in extracting data from a database.
The activities include translation of queries in high-level database languages into
expressions that can be used at the physical level of the file system, a variety of query-
optimizing transformations, and actual evaluation of queries.
 Overview
 The steps involved in processing a query appear in Figure. The basic steps are:
◦ Parsing and translation.
◦ Optimization.
◦ Evaluation.
4
 Before query processing can begin, the system must translate the query into a usable form. A
language such as SQL is suitable for human use, but is ill suited to be the system’s internal
representation of a query. A more useful internal representation is one based on the
extended relational algebra.
 Given a query, there are generally a variety of methods for computing the answer. For
example, we have seen that, in SQL, a query could be expressed in several different ways.
Each SQL query can itself be translated into a relationalalgebra expression in one of several
ways. Furthermore, the relational-algebra representation of a query specifies only partially
how to evaluate a query; there are usually several ways to evaluate relational-algebra
expressions. As an
select salary from instructor where salary < 75000;
 This query can be translated into either of the following relational-algebra expressions:
5
 Further, we can execute each relational-algebra operation by one of several different
algorithms. For example, to implement the preceding selection, we can search every tuple in
instructor to find tuples with salary less than 75000. If a B+-tree index is available on the
attribute salary, we can use the index instead to locate the tuples.
A Query-Evaluation Plan
6
 A sequence of primitive operations that can be used to evaluate a query is a query-
execution plan or query-evaluation plan.
 The query-execution engine takes a query-evaluation plan, executes that plan, and
returns the answers to the query.
 The query optimizer uses these two techniques to determine which process or
expression to consider for evaluating the query.
 There are two methods of query optimization.
1. Cost based Optimization (Physical)
2. Heuristic Optimization (Logical)
7
 Cost-Based Optimization also known as Cost-Based Query
Optimization or CBO Optimizer) is an optimization technique in
Spark SQL that uses table statistics to determine the most efficient
query execution plan of a structured query (given the logical query
plan).
 Cost-based optimization is disabled by default. Spark SQL uses
spark.sql.cbo.enabled configuration property to control whether
the CBO should be enabled and used for query optimization or not.
 Cost-Based Optimization uses logical optimization rules (e.g.
CostBasedJoinReorder) to optimize the logical plan of a structured
query based on statistics.
8
Heuristic Based Optimization
◦ Heuristic based optimization uses rule-based optimization approaches
for query optimization. These algorithms have polynomial time and
space complexity, which is lower than the exponential complexity of
exhaustive search-based algorithms. However, these algorithms do not
necessarily produce the best query plan.
◦ Some of the common heuristic rules are −
 Perform select and project operations before join operations. This is
done by moving the select and project operations down the query
tree. This reduces the number of tuples available for join.
 Perform the most restrictive select/project operations at first before
the other operations.
 Avoid cross-product operation since they result in very large-sized
intermediate tables.
9
 External sorting is a technique in which the data is stored on the secondary memory, in
which part by part data is loaded into the main memory and then sorting can be done over
there. Then this sorted data will be stored in the intermediate files. Finally, these files will be
merged to get a sorted data. Thus by using the external sorting technique, a huge amount of
data can be sorted easily. In case of external sorting, all the data cannot be accommodated
on the single memory, in this case, some amount of memory needs to be kept on a memory
such as hard disk, compact disk and so on.
 The requirement of external sorting is there, where the data we have to store in the main
memory does not fit into it. Basically, it consists of two phases that are:
 Sorting phase: This is a phase in which a large amount of data is sorted in an intermediate
file.
 Merge phase: In this phase, the sorted files are combined into a single larger file.
10
 One of the best examples of external sorting is external merge sort.
 External merge sort
 The external merge sort is a technique in which the data is stored in intermediate files and
then each intermediate files are sorted independently and then combined or merged to get a
sorted data.
 For example: Let us consider there are 10,000 records which have to be sorted. For this, we
need to apply the external merge sort method. Suppose the main memory has a capacity to
store 500 records in a block, with having each block size of 100 records.
11
 In this example, we can see 5 blocks will be sorted in intermediate files. This
process will be repeated 20 times to get all the records. Then by this, we start
merging a pair of intermediate files in the main memory to get a sorted output.
 Two-Way Merge Sort
 Two-way merge sort is a technique which works in two stages which are as follows
here:
◦ Stage 1: Firstly break the records into the blocks and then sort the individual record with
the help of two input tapes.
◦ Stage 2: In this merge the sorted blocks and then create a single sorted file with the help
of two output tapes.
 By this, it can be said that two-way merge sort uses the two input tapes and two
output tapes for sorting the data.
12
 Algorithm for Two-Way Merge Sort:
 Step 1) Divide the elements into the blocks of size M. Sort each block and then write
on disk.
 Step 2) Merge two runs
◦ Read first value on every two runs.
◦ Then compare it and sort it.
◦ Write the sorted record on the output tape.
 Step 3) Repeat the step 2 and get longer and longer runs on alternates tapes.
Finally, at last, we will get a single sorted list.
13
 Analysis
 This algorithm requires log(N/M) passes with initial run pass.
Therefore, at each pass the N records are processed and at last we
will get a time complexity as O(N log(N/M).
14

More Related Content

PPTX
Introduction to snowflake
PPT
Query processing-and-optimization
PDF
Query optimization in SQL
PPT
PHP variables
PPTX
Query processing and Query Optimization
PPTX
Query processing and Query Optimization
PPT
14. Query Optimization in DBMS
Introduction to snowflake
Query processing-and-optimization
Query optimization in SQL
PHP variables
Query processing and Query Optimization
Query processing and Query Optimization
14. Query Optimization in DBMS

What's hot (20)

PPTX
Distributed dbms architectures
PPTX
Query processing
PPTX
Distributed DBMS - Unit 6 - Query Processing
PPTX
Distributed design alternatives
PPT
15. Transactions in DBMS
PPTX
Cost estimation for Query Optimization
PPTX
Concurrency Control in Distributed Database.
PPTX
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
PPTX
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
PPTX
Distributed database management system
PPTX
Query processing in Distributed Database System
PPTX
Distributed DBMS - Unit 5 - Semantic Data Control
PPTX
Forward and Backward chaining in AI
PPT
Amortized Analysis of Algorithms
PPTX
Attribute grammer
PPTX
Concurrency control
PPT
Distributed Database System
PPTX
Graph coloring using backtracking
PPT
12. Indexing and Hashing in DBMS
PDF
Deadlock in Distributed Systems
Distributed dbms architectures
Query processing
Distributed DBMS - Unit 6 - Query Processing
Distributed design alternatives
15. Transactions in DBMS
Cost estimation for Query Optimization
Concurrency Control in Distributed Database.
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed database management system
Query processing in Distributed Database System
Distributed DBMS - Unit 5 - Semantic Data Control
Forward and Backward chaining in AI
Amortized Analysis of Algorithms
Attribute grammer
Concurrency control
Distributed Database System
Graph coloring using backtracking
12. Indexing and Hashing in DBMS
Deadlock in Distributed Systems
Ad

Similar to Query optimization (20)

PDF
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
PDF
Measures of query cost
PDF
dd presentation.pdf
PPTX
Query processing
PDF
Implementation of query optimization for reducing run time
PPTX
Ch-2-Query-Process.pptx advanced database
PPTX
700442110-advanced database Ch-2-Query-Process.pptx
PDF
Final report group2
PDF
A Review of Data Access Optimization Techniques in a Distributed Database Man...
PDF
A Review of Data Access Optimization Techniques in a Distributed Database Man...
PPTX
LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx
PPT
Oracle query optimizer
PPTX
Query Processing in Database mgmt system
PDF
CH5_Query Processing and Optimization.pdf
PPTX
Query processing and optimization (updated)
PDF
Design of file system architecture with cluster
PDF
Implementing sorting in database systems
PPTX
Advanced Database System Chapter Two Query processing and Optimization.pptx
PPTX
Query processing and optimization on dbms
PPT
Overview of query evaluation
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
Measures of query cost
dd presentation.pdf
Query processing
Implementation of query optimization for reducing run time
Ch-2-Query-Process.pptx advanced database
700442110-advanced database Ch-2-Query-Process.pptx
Final report group2
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx
Oracle query optimizer
Query Processing in Database mgmt system
CH5_Query Processing and Optimization.pdf
Query processing and optimization (updated)
Design of file system architecture with cluster
Implementing sorting in database systems
Advanced Database System Chapter Two Query processing and Optimization.pptx
Query processing and optimization on dbms
Overview of query evaluation
Ad

More from Pooja Dixit (20)

PPTX
Combinational circuit.pptx
PPTX
number system.pptx
PPTX
Multiplexer.pptx
PPTX
Logic Gates.pptx
PPTX
K-Map.pptx
PPTX
Karnaugh Map Simplification Rules.pptx
PPTX
Half Subtractor.pptx
PPTX
Gray Code.pptx
PPTX
Flip Flop.pptx
PPTX
Encoder.pptx
PPTX
De-multiplexer.pptx
PPTX
DeMorgan’s Theory.pptx
PPTX
Combinational circuit.pptx
PPTX
Boolean Algebra.pptx
PPTX
Binary Multiplication & Division.pptx
PPTX
Binary addition.pptx
PPTX
Basics of Computer Organization.pptx
PPTX
Decoders
PPTX
Three Address code
PPTX
Cyrus beck line clipping algorithm
Combinational circuit.pptx
number system.pptx
Multiplexer.pptx
Logic Gates.pptx
K-Map.pptx
Karnaugh Map Simplification Rules.pptx
Half Subtractor.pptx
Gray Code.pptx
Flip Flop.pptx
Encoder.pptx
De-multiplexer.pptx
DeMorgan’s Theory.pptx
Combinational circuit.pptx
Boolean Algebra.pptx
Binary Multiplication & Division.pptx
Binary addition.pptx
Basics of Computer Organization.pptx
Decoders
Three Address code
Cyrus beck line clipping algorithm

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
Cell Structure & Organelles in detailed.
PPTX
master seminar digital applications in india
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Complications of Minimal Access Surgery at WLH
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Lesson notes of climatology university.
PPTX
Cell Types and Its function , kingdom of life
O5-L3 Freight Transport Ops (International) V1.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Module 4: Burden of Disease Tutorial Slides S2 2025
2.FourierTransform-ShortQuestionswithAnswers.pdf
O7-L3 Supply Chain Operations - ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Classroom Observation Tools for Teachers
Cell Structure & Organelles in detailed.
master seminar digital applications in india
Final Presentation General Medicine 03-08-2024.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
102 student loan defaulters named and shamed – Is someone you know on the list?
Anesthesia in Laparoscopic Surgery in India
Complications of Minimal Access Surgery at WLH
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPH.pptx obstetrics and gynecology in nursing
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Lesson notes of climatology university.
Cell Types and Its function , kingdom of life

Query optimization

  • 1. Prof. Neeraj Bhargava Pooja Dixit Department of Computer Science School of Engineering & System Science MDS, University Ajmer, Rajasthan, India 1
  • 2.  Query optimization is a function of many relational database management systems. The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans.  Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to the database server, and parsed by the parser, they are then passed to the query optimizer where optimization occurs.  Query optimization is a combination of:- ◦ Query: A query is a request for information from a database. ◦ Query Plans: A query plan (or query execution plan) is an ordered set of steps used to access data in a SQL relational database management system. ◦ Query Optimization: A single query can be executed through different algorithms or re-written in different forms and structures. Hence, the question of query optimization comes into the picture – Which of these forms or pathways is the most optimal? The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans. 2
  • 3.  Importance: The goal of query optimization is to reduce the system resources required to fulfill a query, and ultimately provide the user with the correct result set faster. ◦ First, it provides the user with faster results, which makes the application seem faster to the user. ◦ Secondly, it allows the system to service more queries in the same amount of time, because each request takes less time than unoptimized queries. ◦ Thirdly, query optimization ultimately reduces the amount of wear on the hardware (e.g. disk drives), and allows the server to run more efficiently (e.g. lower power consumption, less memory usage).  There are broadly two ways a query can be optimized: ◦ Analyze and transform equivalent relational expressions: Try to minimize the tuple and column counts of the intermediate and final query processes (discussed here). ◦ Using different algorithms for each operation: These underlying algorithms determine how tuples are accessed from the data structures they are stored in, indexing, hashing, data retrieval and hence influence the number of disk and block accesses (discussed in query processing). 3
  • 4.  Query processing refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions that can be used at the physical level of the file system, a variety of query- optimizing transformations, and actual evaluation of queries.  Overview  The steps involved in processing a query appear in Figure. The basic steps are: ◦ Parsing and translation. ◦ Optimization. ◦ Evaluation. 4
  • 5.  Before query processing can begin, the system must translate the query into a usable form. A language such as SQL is suitable for human use, but is ill suited to be the system’s internal representation of a query. A more useful internal representation is one based on the extended relational algebra.  Given a query, there are generally a variety of methods for computing the answer. For example, we have seen that, in SQL, a query could be expressed in several different ways. Each SQL query can itself be translated into a relationalalgebra expression in one of several ways. Furthermore, the relational-algebra representation of a query specifies only partially how to evaluate a query; there are usually several ways to evaluate relational-algebra expressions. As an select salary from instructor where salary < 75000;  This query can be translated into either of the following relational-algebra expressions: 5
  • 6.  Further, we can execute each relational-algebra operation by one of several different algorithms. For example, to implement the preceding selection, we can search every tuple in instructor to find tuples with salary less than 75000. If a B+-tree index is available on the attribute salary, we can use the index instead to locate the tuples. A Query-Evaluation Plan 6
  • 7.  A sequence of primitive operations that can be used to evaluate a query is a query- execution plan or query-evaluation plan.  The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.  The query optimizer uses these two techniques to determine which process or expression to consider for evaluating the query.  There are two methods of query optimization. 1. Cost based Optimization (Physical) 2. Heuristic Optimization (Logical) 7
  • 8.  Cost-Based Optimization also known as Cost-Based Query Optimization or CBO Optimizer) is an optimization technique in Spark SQL that uses table statistics to determine the most efficient query execution plan of a structured query (given the logical query plan).  Cost-based optimization is disabled by default. Spark SQL uses spark.sql.cbo.enabled configuration property to control whether the CBO should be enabled and used for query optimization or not.  Cost-Based Optimization uses logical optimization rules (e.g. CostBasedJoinReorder) to optimize the logical plan of a structured query based on statistics. 8
  • 9. Heuristic Based Optimization ◦ Heuristic based optimization uses rule-based optimization approaches for query optimization. These algorithms have polynomial time and space complexity, which is lower than the exponential complexity of exhaustive search-based algorithms. However, these algorithms do not necessarily produce the best query plan. ◦ Some of the common heuristic rules are −  Perform select and project operations before join operations. This is done by moving the select and project operations down the query tree. This reduces the number of tuples available for join.  Perform the most restrictive select/project operations at first before the other operations.  Avoid cross-product operation since they result in very large-sized intermediate tables. 9
  • 10.  External sorting is a technique in which the data is stored on the secondary memory, in which part by part data is loaded into the main memory and then sorting can be done over there. Then this sorted data will be stored in the intermediate files. Finally, these files will be merged to get a sorted data. Thus by using the external sorting technique, a huge amount of data can be sorted easily. In case of external sorting, all the data cannot be accommodated on the single memory, in this case, some amount of memory needs to be kept on a memory such as hard disk, compact disk and so on.  The requirement of external sorting is there, where the data we have to store in the main memory does not fit into it. Basically, it consists of two phases that are:  Sorting phase: This is a phase in which a large amount of data is sorted in an intermediate file.  Merge phase: In this phase, the sorted files are combined into a single larger file. 10
  • 11.  One of the best examples of external sorting is external merge sort.  External merge sort  The external merge sort is a technique in which the data is stored in intermediate files and then each intermediate files are sorted independently and then combined or merged to get a sorted data.  For example: Let us consider there are 10,000 records which have to be sorted. For this, we need to apply the external merge sort method. Suppose the main memory has a capacity to store 500 records in a block, with having each block size of 100 records. 11
  • 12.  In this example, we can see 5 blocks will be sorted in intermediate files. This process will be repeated 20 times to get all the records. Then by this, we start merging a pair of intermediate files in the main memory to get a sorted output.  Two-Way Merge Sort  Two-way merge sort is a technique which works in two stages which are as follows here: ◦ Stage 1: Firstly break the records into the blocks and then sort the individual record with the help of two input tapes. ◦ Stage 2: In this merge the sorted blocks and then create a single sorted file with the help of two output tapes.  By this, it can be said that two-way merge sort uses the two input tapes and two output tapes for sorting the data. 12
  • 13.  Algorithm for Two-Way Merge Sort:  Step 1) Divide the elements into the blocks of size M. Sort each block and then write on disk.  Step 2) Merge two runs ◦ Read first value on every two runs. ◦ Then compare it and sort it. ◦ Write the sorted record on the output tape.  Step 3) Repeat the step 2 and get longer and longer runs on alternates tapes. Finally, at last, we will get a single sorted list. 13
  • 14.  Analysis  This algorithm requires log(N/M) passes with initial run pass. Therefore, at each pass the N records are processed and at last we will get a time complexity as O(N log(N/M). 14