SlideShare a Scribd company logo
John Sospeter
Assistant Lecturer
Ruaha Catholic University
Department of Computer Science
Faculty of Information and Communication
Technology
RUCU
DATABASE SYSTEMS
[ RCS 321 ]
LECTURE _06
DATABASE QUERY PROCESSING &
OPTIMAZATION
QUERY
• We query data every day, from Google
searches to asking Siri for a funny
joke. Queries are simply questions against a set
of data.
• They can become very complex, involving
multiple tables and millions of records;
however, the basic concept is straightforward
and not very complex.
QUERY
• In database system a "query" refers to the
action of retrieving data from the database.
• Queries are one of the things that
make databases so powerful.
EXAMPLE OF QUERIES
SELECT * FROM TableName;
SELECT * FROM Albums;
SELECT * FROM TableName WHERE Condition;
SELECT * FROM Albums WHERE ArtistId = 1;
Query processing
• Query processing:
Is a set of activities involved in getting the
result of a query expressed in a high-level
language [usually SQL].
• With higher level database query languages such as
SQL and QUEL, a special component of the DBMS
called the Query Processor takes care of arranging the
underlying access routines to satisfy a given query.
• Due to query processor [using SQL] the queries
can be specified in terms of the required results
rather than in terms of how to achieve those
results.
• A query is processed in four general steps
Query processing
FOUR GENERAL STEPS
1. Scanning and Parsing.
2. Query Optimization or planning the execution
strategy.
3. Query Code Generator [interpreted or
compiled].
4. Execution in the runtime database processor
THE MAIN STEPS
LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx
SCANNING & PARSING
SCANNING & PARSING
• When a query is first submitted [via an
applications program], it must be scanned and
parsed to determine if the query consists of
appropriate syntax. Scanning is the process of
breaking the query text into tokens.
• The tokenized representation is more compact
and is suitable for processing by the parser. This
representation may be in a tree form.
SCANNING & PARSING
• The Parser checks the tokenized representation
for correct syntax.
• In this stage, checks are made to determine if
columns and tables identified in the query exist
in the database and if the query has been
formed correctly with the appropriate keywords
and structure.
• If the query passes the parsing checks, then it is
passed on to the Query Optimizer.
QUERY OPTIMIZATION
Query Optimization or Planning the Execution Strategy
• For any given query, there may be a number of
different ways to execute it.
• Each operation in the query [SELECT, JOIN,
etc.] can be implemented using one or more
different Access Routines.
Query Optimization or Planning the Execution Strategy
For example, an access routine that employs an
index to retrieve some rows would be more
efficient than an access routine that performs a
full table scan.
• The goal of the query optimizer is to find a
reasonably efficient strategy for executing the
query using the access routines.
Query Optimization or Planning the Execution Strategy
Optimization typically takes one of two forms:
1. Heuristic Optimization.
2. Cost Based Optimization.
Query Optimization or Planning the Execution
Strategy
• Heuristic Optimization: The query execution is
refined based on heuristic rules for reordering
the individual operations.
• Cost Based Optimization: The overall cost of
executing the query is systematically reduced
by estimating the costs of executing several
different execution plans.
Query Code Generator [interpreted or
compiled]
• Once the query optimizer has determined the
execution plan [the specific ordering of access
routines], the code generator writes out the
actual access routines to be executed.
• With an interactive session, the query code is
interpreted and passed directly to the runtime
database processor for execution.
Execution in the runtime database
processor
• At this point, the query has been scanned,
parsed, planned and [possibly] compiled.
• The runtime database processor then executes
the access routines against the database.
Execution in the runtime database
processor
• The results are returned to the application that
made the query in the first place.
• Any runtime errors are also returned.
COST OF QUERY
• The cost of processing of query is dominated by the
disk access, meaning memory access to the computer
disk on which information is stored.
• In short it refers to operation of reading or writing
stored information.
• It is difficult to include all the cost components in a
cost function. That is why most cost functions consider
only disk access cost as the reasonable measure of the
cost of a query-evaluation plan.
COST OF QUERY
• For a given query, there are several possible
strategies for processing especially when the
query is complex. The difference between a good
strategy and a bad one may be several orders of
magnitude.
• Therefore, it is worthwhile for the system to spend
some time on selecting good strategies for
processing query.
Here we give a brief introduction about the
difference phases in query processing process.
Difference phases in query processing process
• The functions of Query Parser is parsing and translating a
given high-level language query into its immediate form
such as relational algebra expressions.
• The parser needs to check the syntax of the query and to
also check for the semantic of the query [it means verifying
the relation names, the attribute names in the query are in
the database].
SCANNING & PARSING
• A parse-tree of the query is constructed and then
translated into relational algebra expression.
• A relational algebra expression of a query specifies
only partially how to evaluate a query because in
general there are several ways to evaluate a
relational algebra expression.
EXAMPLE_01
consider the SQL query.
 SELECT Salary FROM EMPLOYEE WHERE Salary >= 50000;
The possible relational algebra expressions for this query are
 Π Salary ( σ salary >= 50000 EMPLOYEE)
 σ salary >= 50000 (Π Salary EMPLOYEE)
• The query tree is a data structure that represents the
relational algebra expression in the query optimization
process.
– The leaf nodes in the query tree correspond to the input
relations of the query.
– The internal nodes represent the operators in the query.
• When executing the query, the system will execute an
internal node operation whenever its operands are
available and then the internal node is replaced by the
relation which is obtained from the preceding
execution
QUERY TREE
EXAMPLE_01
• Π Salary ( σ salary >= 50000 EMPLOYEE)
• σ salary >= 50000 (Π Salary EMPLOYEE)
QUERY TREE
EXAMPLE_02
ΠNAME, ADDRES(σDNAME=’HUMAN RESOURCE’(DEPARTMENT*EMPLOYEE)
QUERY TREE
EXAMPLE_02
Πcustomer.name(σcustomer.name=account.name
account.balance>2000(customerXaccount)
QUERY TREE
2: QUERY OPTIMIZATION
• Thus, in order to specify fully how to evaluate a query,
the system is responsible for constructing a query
execution plan which is made up of the relational
algebra expression and the detailed algorithms to
evaluate each operation in that expression.
• Moreover, the selected plan should minimize the cost
of query evaluation.
• The process of choosing a suitable query execution
plan is known as query optimization.
2: QUERY OPTIMIZATION
2: QUERY OPTIMIZATION
• This process is performed by Query Optimizer. One
aspect of optimization occurs at relational algebra
level.
• The system attempts to find an expression that is
equivalent to the given expression but that is more
efficient to execute.
• The other aspect involves the selection of a detailed
strategy for processing the query, this relates to
choosing the processing algorithm, choosing the
indices to use and so on.
QUERY OPTIMIZATION TECHNIQUES
1. Heuristic Query Optimisation.
[Heuristic rules for ordering operations in the query
execution plan.]
2. Estimating Cost Query Optimisation.
[Estimating the cost of different query execution
plans based on the systematic information.]
Most of the commercial DBMS query optimizers use
the combination of these two methods.
HEURISTIC QUERY OPTIMISATION
• Heuristic optimization applies rules to the initial
query expression and produces the heuristically
transformed query expressions. A heuristic is a
rule that works well in most cases but not
always guaranteed.
• Example: A rule for transforming relational-
algebra expression is perform selection
operations as early as possible.
HEURISTIC QUERY OPTIMISATION
• This rule is based on the intuitive idea that
selection is the operation that gives a subset of
the input relation such that applying selection
early might reduce the immediate result size.
• However, there are cases where performing
selection before join is not a good idea.
Transformation of Relational Expressions
• In this part we introduce how the heuristic rules
work.
• This involves transforming an initial expression
[tree] into an equivalent expression [tree] which is
more efficient to execute.
• Two relational algebra expressions are said to be
equivalent if the two expressions generate two
relations of the same set of attributes and contain
the same set of tuples although their attributes
may be ordered differently.
Equivalence Rules for transforming relational expressions
Rule Name In Relational Algebra
1 Commutatively of Join, Cartesian Product operations E1*E2≡E2*E1 E1 E2≡ E2 E1
2 Associatively of Join , Cartesian Product operations
Join operation is associative in the following manner. F1
involves attributes from only E1 and E2 and F2 involves
only attributes from E2 and E3
(E1*E2)*E3≡E1*(E2*E3)
(E1 E2) E3≡ E1 (E2 E3)
3 Cascade of Projection ΠX1(ΠX2…ΠXN(E)…)≡ΠXI(E)
4 Cascade of selection σ F1∩F2∩…FN(E) ≡
σ F1(σ F2(…σ FN(E) …))
5 Commutatively of Selection σ F1(σ F2(E))≡ σ F2(σ F1(E))
7 Selection with Cartesian Product and Join
If all the attributes in the selection condition F involve
only the attributes of one of the expression say E1, then
the selection and Join can be combined as follows
σ F (E1 E2) ≡
(σ F(E1)) E2
If the selection condition F = F1 AND F2 where F1
involves only attributes of expression E1 and F2 involves
only attribute of expression E2 then we have this
σ F1∩F2( E1 E2) ≡
(σ F1(E1)) (σ F2(E2))
If the selection condition F = F1 AND F2 where F1
involves only attributes of expression E1 and F2 involves
attributes from both E1 and E2 then we have.
The same rule apply if the Join operation replaced by a
Cartersian Product operation.
σ F1∩F2( E1 E2) ≡
σ F2((σ F1 (E1)) E2)
8 Commuting Projection with Join and
Cartesian Product
Let X, Y be the set of attributes of E1 and E2
respectively. If the join condition involves
only attributes in XY (union of two sets) then
we have this.
The same rule apply when replace the Join
by Cartersian Product
ΠXY(E1 E2)≡
ΠX(Eas1) ΠY(E2)
If the join condition involves additional
attributes say Z of E1 and W of E2 and Z,W
are not in XY then :
ΠXY(E1 E2)≡
ΠXY(ΠXZ(E1) ΠYW(E2))
9 Commuting Selection with set operations
The Selection commutes with all three set
operations (union, intersect, set difference)
σ F (E1∩E2) ≡
σ F (E1) ∩σ F(E2)
10 Commuting Projection with Intersection. The
same rule will apply if we replace Intersect
with Union but not Set Difference
ΠX (E1∩E2) ≡
ΠX (E1) ∩ΠX(E2)
11 Commutativity of set operations: The Union
and Intersection are commutative but Set
Difference is not.
E1∩E2 ≡ E2∩E1
12 Associativity of set operations: Union and
Intersection are associative but Set
Difference is not
(E1∩E2)∩E3 ≡ E1∩(E2∩E3)
Heuristic Algebraic Optimization algorithm
1. Break up any Selection operation with conjunctive conditions
into a cascade of Selection operations. This step is based on
equivalence rule number 4.
2. Move selection operations as far down the query tree as
possible. This step uses the commutatively and associatively of
selection as mentioned in equivalence rules number 5,6,7 and 9.
3. Rearrange the leaf nodes of the tree so that most restrictive
selections are done first. Most restrictive selection is the one
that produces the fewest number of tuples. In addition, make
sure that the ordering of leaf nodes does not cause the
Cartesian Product operation. This step relies on the rules of
associatively of binary operations such as rule 2 and 12
4. Combine a Cartesian Product with a subsequent
Selection operation into a Join operation if the
selection condition represents a join condition [rule 13].
5. Break down and move lists of projections down the
tree as far as possible. Creating new Projection
operations as needed [rules 3, 6, 8, 10]
6.Identify sub-trees that present groups of operations
that can be pipelined and executing them using
pipelining.
Heuristic Algebraic Optimization algorithm
CONSIDER THE GIVEN DATABASE
TABLES WITH RELATIONSHIP
EXAMPLE_03
SQL Command
SELECT PNUMBER, DNUM, LNAME
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE DNUM=DNUMBER and MGRSSN=SSN and
PLOCATION = 'Stafford';
In relational algebra this can be written as follows.
ΠPNUMBER,DNUM,LNAME(σPLOCATION=’Stanford’(σMGRSSN=SSN (
σDNUM=DNUMBER(E*(D*P)))))
These two query trees are possible from the
relational algebra expression
Of these two queries trees which is more efficient?
Note: two cross product operations. These require lots of space
and time to build. An overall rule for heuristic query optimization is
to perform as many select and project operations as possible
before doing any joins.
PERFORM A HEURISTIC QUERY OPTIMIZATION ON THE
QUERY TREE.
Using rule 4 we break the cascade selections to get this
equivalent query tree.
Using rule 8, we commute Selection with Cross product
to get this query tree.
Finally using rule 7 we combine Cross Product and
Selection to form Joins.
CONVERTING A QUERY TREE TO A QUERY
EVALUATION PLAN
• Query optimizers use the above equivalence rules to
generate a enumeration of logically equivalent
expressions to the given query expression.
• However, expression generating is just one part of the
optimization process.
• As mentioned earlier , the evaluation plan includes the
detail algorithm for each operation in the expression
and how the execution of the operations is
coordinated.
• Thus the evaluation of the expression is can be
costly in terms of both time and memory space.
• The obvious way to evaluate the expression is
simply to evaluate one operation at a time in an
appropriate order.
• The result of an individual evaluation will be stored
in a temporary relation, which must be written to
disk and might be used as the input for the next
evaluation.
COST-BASED QUERY OPTIMISATION
• The method of optimizing the query by
choosing a strategy those results in minimum
cost is called cost-based query optimization.
• The cost-based query optimization uses
formulae that estimate the costs for a number
of options and selects the one with lowest cost
and most efficient to execute.
COST-BASED QUERY OPTIMISATION
The cost estimation of a query evaluation plan is
calculated in terms of various resources that
include:
– Number of disk accesses
– Execution time taken by the CPU to execute a query
– Communication costs in distributed or parallel
database systems.
ESTIMATING COST QUERY OPTIMISATION
EXAMPLE: Cost Functions for SELECTION
Consider a selection operation on a relation whose tuples
are all stored in one file. The simplest algorithms to
implement selection are:
i. Linear search.
ii. Binary search.
LINEAR SEARCH
Linear search - scan all file blocks, all records in a
block are checked to see whether they satisfy the
search condition.
Cost for this method is…
C=br. (For a selection on a key attribute)
Half of the blocks are scanned on average
C = [br/2]
BINARY SEARCH
• Binary search - if the file is ordered on an attribute A
and selection condition is a equality comparison on A,
we can use binary search.
 The estimate number of blocks to be scanned is
C=[log2(br)]+[(SC(A,r)/fr]-1.
The first term is the cost to locate the first satisfied tuple
by a binary search.
The second term is the number of blocks containing
records that satisfy the select condition of which one has
already been retrieved that why we have the third term
EXAMPLE:LINEAR SEARCH
Now, consider a selection in EMPLOYEE file:
σDEPTID=1(DEPARTMENT).
The file EMPLOYEE has the following statistical information
 f = 20 [there are 20 tuples per block]
 V(DeptID, EMPLOYEE) = 10 [there are 10 different
departments]
 n = 1000 [ there are 1000 tuples in the file]
• Cost for doing linear search is b = 1000/20 = 50 block
accesses
EXAMPLE:BINARY SEARCH
Cost for doing binary search on ordering attribute
DEPTID
 Average number of records that satisfy the condition is
1000/10 = 100 records.
 Number of blocks containing these tuples is 100/20 = 5.
 A binary search for the first tuple would take largest
integer nearest to log250 = 6.
Thus the total cost is 6 + 5 – 1 = 10 block accesses
3: QUERY EXECUTION ENGINE
• Once the query plan is chosen, the Query Execution
Engine lastly takes the plan, executes that plan and
returns the answer of the query.
TUTORIAL PROBLEMS
1. Review your knowledge on Relational algebra, SQL
2. Discuss and identity the differences and similarities
between these:
SQL and Relational algebra queries
3.In your own language [including even Kiswahili] formally
describe ALL the steps shown above in the steps of
Query processing
4. In the above presentation cost of query has been on
execution time of query. Discuss the costs in terms of
memory and complexity
END OF CHAPTER SIX
NEXT ON CHAPTER SEVEN
DATABASE SECURITY
THANK YOU!

More Related Content

PPTX
Query processing and optimization (updated)
PPTX
Query processing
PPTX
Ch-2-Query-Process.pptx advanced database
PPTX
700442110-advanced database Ch-2-Query-Process.pptx
PPTX
Query Processing in Database mgmt system
PPTX
Concepts of Query Processing in ADBMS.pptx
PPTX
Lecture 5.pptx
PPTX
DB LECTURE 5 QUERY PROCESSING.pptx
Query processing and optimization (updated)
Query processing
Ch-2-Query-Process.pptx advanced database
700442110-advanced database Ch-2-Query-Process.pptx
Query Processing in Database mgmt system
Concepts of Query Processing in ADBMS.pptx
Lecture 5.pptx
DB LECTURE 5 QUERY PROCESSING.pptx

Similar to LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx (20)

PDF
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
PPTX
Query processing
PPTX
Query processing and optimization on dbms
PPT
ch02-240507064009-ac337bf1 .ppt
PPT
QPOfutyfurfugfuyttruft7rfu65rfuyt PPT - Copy.ppt
PPTX
Query processing and Optimization in Database
PPT
Query optimization and processing for advanced database systems
PPT
Oracle query optimizer
PPT
Overview of query evaluation
PPTX
Presentación Oracle Database Migración consideraciones 10g/11g/12c
PPTX
Mc seminar
PPTX
Query optimization
PPTX
Advanced Database System Chapter Two Query processing and Optimization.pptx
PDF
Implementation of query optimization for reducing run time
PDF
Measures of query cost
PPT
Chapter15
PPTX
Query optimization
PPT
Query Decomposition and data localization
PPTX
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
PPTX
Database Terminology and DBLC.pptx
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
Query processing
Query processing and optimization on dbms
ch02-240507064009-ac337bf1 .ppt
QPOfutyfurfugfuyttruft7rfu65rfuyt PPT - Copy.ppt
Query processing and Optimization in Database
Query optimization and processing for advanced database systems
Oracle query optimizer
Overview of query evaluation
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Mc seminar
Query optimization
Advanced Database System Chapter Two Query processing and Optimization.pptx
Implementation of query optimization for reducing run time
Measures of query cost
Chapter15
Query optimization
Query Decomposition and data localization
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Database Terminology and DBLC.pptx
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to machine learning and Linear Models
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Lecture1 pattern recognition............
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Mega Projects Data Mega Projects Data
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
ISS -ESG Data flows What is ESG and HowHow
Business Acumen Training GuidePresentation.pptx
Introduction to Knowledge Engineering Part 1
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
.pdf is not working space design for the following data for the following dat...
Introduction to machine learning and Linear Models
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Lecture1 pattern recognition............
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Reliability_Chapter_ presentation 1221.5784
Galatica Smart Energy Infrastructure Startup Pitch Deck
Mega Projects Data Mega Projects Data
Clinical guidelines as a resource for EBP(1).pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IBA_Chapter_11_Slides_Final_Accessible.pptx
Ad

LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx

  • 1. John Sospeter Assistant Lecturer Ruaha Catholic University Department of Computer Science Faculty of Information and Communication Technology RUCU DATABASE SYSTEMS [ RCS 321 ]
  • 2. LECTURE _06 DATABASE QUERY PROCESSING & OPTIMAZATION
  • 3. QUERY • We query data every day, from Google searches to asking Siri for a funny joke. Queries are simply questions against a set of data. • They can become very complex, involving multiple tables and millions of records; however, the basic concept is straightforward and not very complex.
  • 4. QUERY • In database system a "query" refers to the action of retrieving data from the database. • Queries are one of the things that make databases so powerful.
  • 5. EXAMPLE OF QUERIES SELECT * FROM TableName; SELECT * FROM Albums; SELECT * FROM TableName WHERE Condition; SELECT * FROM Albums WHERE ArtistId = 1;
  • 6. Query processing • Query processing: Is a set of activities involved in getting the result of a query expressed in a high-level language [usually SQL].
  • 7. • With higher level database query languages such as SQL and QUEL, a special component of the DBMS called the Query Processor takes care of arranging the underlying access routines to satisfy a given query. • Due to query processor [using SQL] the queries can be specified in terms of the required results rather than in terms of how to achieve those results. • A query is processed in four general steps Query processing
  • 8. FOUR GENERAL STEPS 1. Scanning and Parsing. 2. Query Optimization or planning the execution strategy. 3. Query Code Generator [interpreted or compiled]. 4. Execution in the runtime database processor
  • 12. SCANNING & PARSING • When a query is first submitted [via an applications program], it must be scanned and parsed to determine if the query consists of appropriate syntax. Scanning is the process of breaking the query text into tokens. • The tokenized representation is more compact and is suitable for processing by the parser. This representation may be in a tree form.
  • 13. SCANNING & PARSING • The Parser checks the tokenized representation for correct syntax. • In this stage, checks are made to determine if columns and tables identified in the query exist in the database and if the query has been formed correctly with the appropriate keywords and structure. • If the query passes the parsing checks, then it is passed on to the Query Optimizer.
  • 15. Query Optimization or Planning the Execution Strategy • For any given query, there may be a number of different ways to execute it. • Each operation in the query [SELECT, JOIN, etc.] can be implemented using one or more different Access Routines.
  • 16. Query Optimization or Planning the Execution Strategy For example, an access routine that employs an index to retrieve some rows would be more efficient than an access routine that performs a full table scan. • The goal of the query optimizer is to find a reasonably efficient strategy for executing the query using the access routines.
  • 17. Query Optimization or Planning the Execution Strategy Optimization typically takes one of two forms: 1. Heuristic Optimization. 2. Cost Based Optimization.
  • 18. Query Optimization or Planning the Execution Strategy • Heuristic Optimization: The query execution is refined based on heuristic rules for reordering the individual operations. • Cost Based Optimization: The overall cost of executing the query is systematically reduced by estimating the costs of executing several different execution plans.
  • 19. Query Code Generator [interpreted or compiled] • Once the query optimizer has determined the execution plan [the specific ordering of access routines], the code generator writes out the actual access routines to be executed. • With an interactive session, the query code is interpreted and passed directly to the runtime database processor for execution.
  • 20. Execution in the runtime database processor • At this point, the query has been scanned, parsed, planned and [possibly] compiled. • The runtime database processor then executes the access routines against the database.
  • 21. Execution in the runtime database processor • The results are returned to the application that made the query in the first place. • Any runtime errors are also returned.
  • 22. COST OF QUERY • The cost of processing of query is dominated by the disk access, meaning memory access to the computer disk on which information is stored. • In short it refers to operation of reading or writing stored information. • It is difficult to include all the cost components in a cost function. That is why most cost functions consider only disk access cost as the reasonable measure of the cost of a query-evaluation plan.
  • 23. COST OF QUERY • For a given query, there are several possible strategies for processing especially when the query is complex. The difference between a good strategy and a bad one may be several orders of magnitude. • Therefore, it is worthwhile for the system to spend some time on selecting good strategies for processing query. Here we give a brief introduction about the difference phases in query processing process.
  • 24. Difference phases in query processing process • The functions of Query Parser is parsing and translating a given high-level language query into its immediate form such as relational algebra expressions. • The parser needs to check the syntax of the query and to also check for the semantic of the query [it means verifying the relation names, the attribute names in the query are in the database]. SCANNING & PARSING
  • 25. • A parse-tree of the query is constructed and then translated into relational algebra expression. • A relational algebra expression of a query specifies only partially how to evaluate a query because in general there are several ways to evaluate a relational algebra expression.
  • 26. EXAMPLE_01 consider the SQL query.  SELECT Salary FROM EMPLOYEE WHERE Salary >= 50000; The possible relational algebra expressions for this query are  Π Salary ( σ salary >= 50000 EMPLOYEE)  σ salary >= 50000 (Π Salary EMPLOYEE)
  • 27. • The query tree is a data structure that represents the relational algebra expression in the query optimization process. – The leaf nodes in the query tree correspond to the input relations of the query. – The internal nodes represent the operators in the query. • When executing the query, the system will execute an internal node operation whenever its operands are available and then the internal node is replaced by the relation which is obtained from the preceding execution QUERY TREE
  • 28. EXAMPLE_01 • Π Salary ( σ salary >= 50000 EMPLOYEE) • σ salary >= 50000 (Π Salary EMPLOYEE) QUERY TREE
  • 31. 2: QUERY OPTIMIZATION • Thus, in order to specify fully how to evaluate a query, the system is responsible for constructing a query execution plan which is made up of the relational algebra expression and the detailed algorithms to evaluate each operation in that expression. • Moreover, the selected plan should minimize the cost of query evaluation. • The process of choosing a suitable query execution plan is known as query optimization.
  • 33. 2: QUERY OPTIMIZATION • This process is performed by Query Optimizer. One aspect of optimization occurs at relational algebra level. • The system attempts to find an expression that is equivalent to the given expression but that is more efficient to execute. • The other aspect involves the selection of a detailed strategy for processing the query, this relates to choosing the processing algorithm, choosing the indices to use and so on.
  • 34. QUERY OPTIMIZATION TECHNIQUES 1. Heuristic Query Optimisation. [Heuristic rules for ordering operations in the query execution plan.] 2. Estimating Cost Query Optimisation. [Estimating the cost of different query execution plans based on the systematic information.] Most of the commercial DBMS query optimizers use the combination of these two methods.
  • 35. HEURISTIC QUERY OPTIMISATION • Heuristic optimization applies rules to the initial query expression and produces the heuristically transformed query expressions. A heuristic is a rule that works well in most cases but not always guaranteed. • Example: A rule for transforming relational- algebra expression is perform selection operations as early as possible.
  • 36. HEURISTIC QUERY OPTIMISATION • This rule is based on the intuitive idea that selection is the operation that gives a subset of the input relation such that applying selection early might reduce the immediate result size. • However, there are cases where performing selection before join is not a good idea.
  • 37. Transformation of Relational Expressions • In this part we introduce how the heuristic rules work. • This involves transforming an initial expression [tree] into an equivalent expression [tree] which is more efficient to execute. • Two relational algebra expressions are said to be equivalent if the two expressions generate two relations of the same set of attributes and contain the same set of tuples although their attributes may be ordered differently.
  • 38. Equivalence Rules for transforming relational expressions Rule Name In Relational Algebra 1 Commutatively of Join, Cartesian Product operations E1*E2≡E2*E1 E1 E2≡ E2 E1 2 Associatively of Join , Cartesian Product operations Join operation is associative in the following manner. F1 involves attributes from only E1 and E2 and F2 involves only attributes from E2 and E3 (E1*E2)*E3≡E1*(E2*E3) (E1 E2) E3≡ E1 (E2 E3) 3 Cascade of Projection ΠX1(ΠX2…ΠXN(E)…)≡ΠXI(E) 4 Cascade of selection σ F1∩F2∩…FN(E) ≡ σ F1(σ F2(…σ FN(E) …)) 5 Commutatively of Selection σ F1(σ F2(E))≡ σ F2(σ F1(E)) 7 Selection with Cartesian Product and Join If all the attributes in the selection condition F involve only the attributes of one of the expression say E1, then the selection and Join can be combined as follows σ F (E1 E2) ≡ (σ F(E1)) E2 If the selection condition F = F1 AND F2 where F1 involves only attributes of expression E1 and F2 involves only attribute of expression E2 then we have this σ F1∩F2( E1 E2) ≡ (σ F1(E1)) (σ F2(E2)) If the selection condition F = F1 AND F2 where F1 involves only attributes of expression E1 and F2 involves attributes from both E1 and E2 then we have. The same rule apply if the Join operation replaced by a Cartersian Product operation. σ F1∩F2( E1 E2) ≡ σ F2((σ F1 (E1)) E2)
  • 39. 8 Commuting Projection with Join and Cartesian Product Let X, Y be the set of attributes of E1 and E2 respectively. If the join condition involves only attributes in XY (union of two sets) then we have this. The same rule apply when replace the Join by Cartersian Product ΠXY(E1 E2)≡ ΠX(Eas1) ΠY(E2) If the join condition involves additional attributes say Z of E1 and W of E2 and Z,W are not in XY then : ΠXY(E1 E2)≡ ΠXY(ΠXZ(E1) ΠYW(E2)) 9 Commuting Selection with set operations The Selection commutes with all three set operations (union, intersect, set difference) σ F (E1∩E2) ≡ σ F (E1) ∩σ F(E2) 10 Commuting Projection with Intersection. The same rule will apply if we replace Intersect with Union but not Set Difference ΠX (E1∩E2) ≡ ΠX (E1) ∩ΠX(E2) 11 Commutativity of set operations: The Union and Intersection are commutative but Set Difference is not. E1∩E2 ≡ E2∩E1 12 Associativity of set operations: Union and Intersection are associative but Set Difference is not (E1∩E2)∩E3 ≡ E1∩(E2∩E3)
  • 40. Heuristic Algebraic Optimization algorithm 1. Break up any Selection operation with conjunctive conditions into a cascade of Selection operations. This step is based on equivalence rule number 4. 2. Move selection operations as far down the query tree as possible. This step uses the commutatively and associatively of selection as mentioned in equivalence rules number 5,6,7 and 9. 3. Rearrange the leaf nodes of the tree so that most restrictive selections are done first. Most restrictive selection is the one that produces the fewest number of tuples. In addition, make sure that the ordering of leaf nodes does not cause the Cartesian Product operation. This step relies on the rules of associatively of binary operations such as rule 2 and 12
  • 41. 4. Combine a Cartesian Product with a subsequent Selection operation into a Join operation if the selection condition represents a join condition [rule 13]. 5. Break down and move lists of projections down the tree as far as possible. Creating new Projection operations as needed [rules 3, 6, 8, 10] 6.Identify sub-trees that present groups of operations that can be pipelined and executing them using pipelining. Heuristic Algebraic Optimization algorithm
  • 42. CONSIDER THE GIVEN DATABASE
  • 44. EXAMPLE_03 SQL Command SELECT PNUMBER, DNUM, LNAME FROM PROJECT, DEPARTMENT, EMPLOYEE WHERE DNUM=DNUMBER and MGRSSN=SSN and PLOCATION = 'Stafford'; In relational algebra this can be written as follows. ΠPNUMBER,DNUM,LNAME(σPLOCATION=’Stanford’(σMGRSSN=SSN ( σDNUM=DNUMBER(E*(D*P)))))
  • 45. These two query trees are possible from the relational algebra expression Of these two queries trees which is more efficient? Note: two cross product operations. These require lots of space and time to build. An overall rule for heuristic query optimization is to perform as many select and project operations as possible before doing any joins.
  • 46. PERFORM A HEURISTIC QUERY OPTIMIZATION ON THE QUERY TREE. Using rule 4 we break the cascade selections to get this equivalent query tree.
  • 47. Using rule 8, we commute Selection with Cross product to get this query tree.
  • 48. Finally using rule 7 we combine Cross Product and Selection to form Joins.
  • 49. CONVERTING A QUERY TREE TO A QUERY EVALUATION PLAN • Query optimizers use the above equivalence rules to generate a enumeration of logically equivalent expressions to the given query expression. • However, expression generating is just one part of the optimization process. • As mentioned earlier , the evaluation plan includes the detail algorithm for each operation in the expression and how the execution of the operations is coordinated.
  • 50. • Thus the evaluation of the expression is can be costly in terms of both time and memory space. • The obvious way to evaluate the expression is simply to evaluate one operation at a time in an appropriate order. • The result of an individual evaluation will be stored in a temporary relation, which must be written to disk and might be used as the input for the next evaluation.
  • 51. COST-BASED QUERY OPTIMISATION • The method of optimizing the query by choosing a strategy those results in minimum cost is called cost-based query optimization. • The cost-based query optimization uses formulae that estimate the costs for a number of options and selects the one with lowest cost and most efficient to execute.
  • 52. COST-BASED QUERY OPTIMISATION The cost estimation of a query evaluation plan is calculated in terms of various resources that include: – Number of disk accesses – Execution time taken by the CPU to execute a query – Communication costs in distributed or parallel database systems.
  • 53. ESTIMATING COST QUERY OPTIMISATION EXAMPLE: Cost Functions for SELECTION Consider a selection operation on a relation whose tuples are all stored in one file. The simplest algorithms to implement selection are: i. Linear search. ii. Binary search.
  • 54. LINEAR SEARCH Linear search - scan all file blocks, all records in a block are checked to see whether they satisfy the search condition. Cost for this method is… C=br. (For a selection on a key attribute) Half of the blocks are scanned on average C = [br/2]
  • 55. BINARY SEARCH • Binary search - if the file is ordered on an attribute A and selection condition is a equality comparison on A, we can use binary search.  The estimate number of blocks to be scanned is C=[log2(br)]+[(SC(A,r)/fr]-1. The first term is the cost to locate the first satisfied tuple by a binary search. The second term is the number of blocks containing records that satisfy the select condition of which one has already been retrieved that why we have the third term
  • 56. EXAMPLE:LINEAR SEARCH Now, consider a selection in EMPLOYEE file: σDEPTID=1(DEPARTMENT). The file EMPLOYEE has the following statistical information  f = 20 [there are 20 tuples per block]  V(DeptID, EMPLOYEE) = 10 [there are 10 different departments]  n = 1000 [ there are 1000 tuples in the file] • Cost for doing linear search is b = 1000/20 = 50 block accesses
  • 57. EXAMPLE:BINARY SEARCH Cost for doing binary search on ordering attribute DEPTID  Average number of records that satisfy the condition is 1000/10 = 100 records.  Number of blocks containing these tuples is 100/20 = 5.  A binary search for the first tuple would take largest integer nearest to log250 = 6. Thus the total cost is 6 + 5 – 1 = 10 block accesses
  • 58. 3: QUERY EXECUTION ENGINE • Once the query plan is chosen, the Query Execution Engine lastly takes the plan, executes that plan and returns the answer of the query.
  • 59. TUTORIAL PROBLEMS 1. Review your knowledge on Relational algebra, SQL 2. Discuss and identity the differences and similarities between these: SQL and Relational algebra queries 3.In your own language [including even Kiswahili] formally describe ALL the steps shown above in the steps of Query processing 4. In the above presentation cost of query has been on execution time of query. Discuss the costs in terms of memory and complexity
  • 61. NEXT ON CHAPTER SEVEN DATABASE SECURITY