LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx

John Sospeter
Assistant Lecturer
Ruaha Catholic University
Department of Computer Science
Faculty of Information and Communication
Technology
RUCU
DATABASE SYSTEMS
[ RCS 321 ]

LECTURE _06
DATABASE QUERY PROCESSING &
OPTIMAZATION

QUERY
• We query data every day, from Google
searches to asking Siri for a funny
joke. Queries are simply questions against a set
of data.
• They can become very complex, involving
multiple tables and millions of records;
however, the basic concept is straightforward
and not very complex.

QUERY
• In database system a "query" refers to the
action of retrieving data from the database.
• Queries are one of the things that
make databases so powerful.

EXAMPLE OF QUERIES
SELECT * FROM TableName;
SELECT * FROM Albums;
SELECT * FROM TableName WHERE Condition;
SELECT * FROM Albums WHERE ArtistId = 1;

Query processing
• Query processing:
Is a set of activities involved in getting the
result of a query expressed in a high-level
language [usually SQL].

• With higher level database query languages such as
SQL and QUEL, a special component of the DBMS
called the Query Processor takes care of arranging the
underlying access routines to satisfy a given query.
• Due to query processor [using SQL] the queries
can be specified in terms of the required results
rather than in terms of how to achieve those
results.
• A query is processed in four general steps
Query processing

FOUR GENERAL STEPS
1. Scanning and Parsing.
2. Query Optimization or planning the execution
strategy.
3. Query Code Generator [interpreted or
compiled].
4. Execution in the runtime database processor

LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx

SCANNING & PARSING
• When a query is first submitted [via an
applications program], it must be scanned and
parsed to determine if the query consists of
appropriate syntax. Scanning is the process of
breaking the query text into tokens.
• The tokenized representation is more compact
and is suitable for processing by the parser. This
representation may be in a tree form.

SCANNING & PARSING
• The Parser checks the tokenized representation
for correct syntax.
• In this stage, checks are made to determine if
columns and tables identified in the query exist
in the database and if the query has been
formed correctly with the appropriate keywords
and structure.
• If the query passes the parsing checks, then it is
passed on to the Query Optimizer.

Query Optimization or Planning the Execution Strategy
• For any given query, there may be a number of
different ways to execute it.
• Each operation in the query [SELECT, JOIN,
etc.] can be implemented using one or more
different Access Routines.

For example, an access routine that employs an
index to retrieve some rows would be more
efficient than an access routine that performs a
full table scan.
• The goal of the query optimizer is to find a
reasonably efficient strategy for executing the
query using the access routines.

Optimization typically takes one of two forms:
1. Heuristic Optimization.
2. Cost Based Optimization.

Query Optimization or Planning the Execution
Strategy
• Heuristic Optimization: The query execution is
refined based on heuristic rules for reordering
the individual operations.
• Cost Based Optimization: The overall cost of
executing the query is systematically reduced
by estimating the costs of executing several
different execution plans.

Query Code Generator [interpreted or
compiled]
• Once the query optimizer has determined the
execution plan [the specific ordering of access
routines], the code generator writes out the
actual access routines to be executed.
• With an interactive session, the query code is
interpreted and passed directly to the runtime
database processor for execution.

Execution in the runtime database
processor
• At this point, the query has been scanned,
parsed, planned and [possibly] compiled.
• The runtime database processor then executes
the access routines against the database.

Execution in the runtime database
processor
• The results are returned to the application that
made the query in the first place.
• Any runtime errors are also returned.

COST OF QUERY
• The cost of processing of query is dominated by the
disk access, meaning memory access to the computer
disk on which information is stored.
• In short it refers to operation of reading or writing
stored information.
• It is difficult to include all the cost components in a
cost function. That is why most cost functions consider
only disk access cost as the reasonable measure of the
cost of a query-evaluation plan.

COST OF QUERY
• For a given query, there are several possible
strategies for processing especially when the
query is complex. The difference between a good
strategy and a bad one may be several orders of
magnitude.
• Therefore, it is worthwhile for the system to spend
some time on selecting good strategies for
processing query.
Here we give a brief introduction about the
difference phases in query processing process.

Difference phases in query processing process
• The functions of Query Parser is parsing and translating a
given high-level language query into its immediate form
such as relational algebra expressions.
• The parser needs to check the syntax of the query and to
also check for the semantic of the query [it means verifying
the relation names, the attribute names in the query are in
the database].
SCANNING & PARSING

• A parse-tree of the query is constructed and then
translated into relational algebra expression.
• A relational algebra expression of a query specifies
only partially how to evaluate a query because in
general there are several ways to evaluate a
relational algebra expression.

EXAMPLE_01
consider the SQL query.
 SELECT Salary FROM EMPLOYEE WHERE Salary >= 50000;
The possible relational algebra expressions for this query are
 Π Salary ( σ salary >= 50000 EMPLOYEE)
 σ salary >= 50000 (Π Salary EMPLOYEE)

• The query tree is a data structure that represents the
relational algebra expression in the query optimization
process.
– The leaf nodes in the query tree correspond to the input
relations of the query.
– The internal nodes represent the operators in the query.
• When executing the query, the system will execute an
internal node operation whenever its operands are
available and then the internal node is replaced by the
relation which is obtained from the preceding
execution
QUERY TREE

EXAMPLE_01
• Π Salary ( σ salary >= 50000 EMPLOYEE)
• σ salary >= 50000 (Π Salary EMPLOYEE)
QUERY TREE

EXAMPLE_02
ΠNAME, ADDRES(σDNAME=’HUMAN RESOURCE’(DEPARTMENT*EMPLOYEE)
QUERY TREE

EXAMPLE_02
Πcustomer.name(σcustomer.name=account.name
account.balance>2000(customerXaccount)
QUERY TREE

2: QUERY OPTIMIZATION
• Thus, in order to specify fully how to evaluate a query,
the system is responsible for constructing a query
execution plan which is made up of the relational
algebra expression and the detailed algorithms to
evaluate each operation in that expression.
• Moreover, the selected plan should minimize the cost
of query evaluation.
• The process of choosing a suitable query execution
plan is known as query optimization.

2: QUERY OPTIMIZATION
• This process is performed by Query Optimizer. One
aspect of optimization occurs at relational algebra
level.
• The system attempts to find an expression that is
equivalent to the given expression but that is more
efficient to execute.
• The other aspect involves the selection of a detailed
strategy for processing the query, this relates to
choosing the processing algorithm, choosing the
indices to use and so on.

QUERY OPTIMIZATION TECHNIQUES
1. Heuristic Query Optimisation.
[Heuristic rules for ordering operations in the query
execution plan.]
2. Estimating Cost Query Optimisation.
[Estimating the cost of different query execution
plans based on the systematic information.]
Most of the commercial DBMS query optimizers use
the combination of these two methods.

HEURISTIC QUERY OPTIMISATION
• Heuristic optimization applies rules to the initial
query expression and produces the heuristically
transformed query expressions. A heuristic is a
rule that works well in most cases but not
always guaranteed.
• Example: A rule for transforming relational-
algebra expression is perform selection
operations as early as possible.

HEURISTIC QUERY OPTIMISATION
• This rule is based on the intuitive idea that
selection is the operation that gives a subset of
the input relation such that applying selection
early might reduce the immediate result size.
• However, there are cases where performing
selection before join is not a good idea.

Transformation of Relational Expressions
• In this part we introduce how the heuristic rules
work.
• This involves transforming an initial expression
[tree] into an equivalent expression [tree] which is
more efficient to execute.
• Two relational algebra expressions are said to be
equivalent if the two expressions generate two
relations of the same set of attributes and contain
the same set of tuples although their attributes
may be ordered differently.

Equivalence Rules for transforming relational expressions
Rule Name In Relational Algebra
1 Commutatively of Join, Cartesian Product operations E1*E2≡E2*E1 E1 E2≡ E2 E1
2 Associatively of Join , Cartesian Product operations
Join operation is associative in the following manner. F1
involves attributes from only E1 and E2 and F2 involves
only attributes from E2 and E3
(E1*E2)*E3≡E1*(E2*E3)
(E1 E2) E3≡ E1 (E2 E3)
3 Cascade of Projection ΠX1(ΠX2…ΠXN(E)…)≡ΠXI(E)
4 Cascade of selection σ F1∩F2∩…FN(E) ≡
σ F1(σ F2(…σ FN(E) …))
5 Commutatively of Selection σ F1(σ F2(E))≡ σ F2(σ F1(E))
7 Selection with Cartesian Product and Join
If all the attributes in the selection condition F involve
only the attributes of one of the expression say E1, then
the selection and Join can be combined as follows
σ F (E1 E2) ≡
(σ F(E1)) E2
If the selection condition F = F1 AND F2 where F1
involves only attributes of expression E1 and F2 involves
only attribute of expression E2 then we have this
σ F1∩F2( E1 E2) ≡
(σ F1(E1)) (σ F2(E2))
If the selection condition F = F1 AND F2 where F1
involves only attributes of expression E1 and F2 involves
attributes from both E1 and E2 then we have.
The same rule apply if the Join operation replaced by a
Cartersian Product operation.
σ F1∩F2( E1 E2) ≡
σ F2((σ F1 (E1)) E2)

8 Commuting Projection with Join and
Cartesian Product
Let X, Y be the set of attributes of E1 and E2
respectively. If the join condition involves
only attributes in XY (union of two sets) then
we have this.
The same rule apply when replace the Join
by Cartersian Product
ΠXY(E1 E2)≡
ΠX(Eas1) ΠY(E2)
If the join condition involves additional
attributes say Z of E1 and W of E2 and Z,W
are not in XY then :
ΠXY(E1 E2)≡
ΠXY(ΠXZ(E1) ΠYW(E2))
9 Commuting Selection with set operations
The Selection commutes with all three set
operations (union, intersect, set difference)
σ F (E1∩E2) ≡
σ F (E1) ∩σ F(E2)
10 Commuting Projection with Intersection. The
same rule will apply if we replace Intersect
with Union but not Set Difference
ΠX (E1∩E2) ≡
ΠX (E1) ∩ΠX(E2)
11 Commutativity of set operations: The Union
and Intersection are commutative but Set
Difference is not.
E1∩E2 ≡ E2∩E1
12 Associativity of set operations: Union and
Intersection are associative but Set
Difference is not
(E1∩E2)∩E3 ≡ E1∩(E2∩E3)

Heuristic Algebraic Optimization algorithm
1. Break up any Selection operation with conjunctive conditions
into a cascade of Selection operations. This step is based on
equivalence rule number 4.
2. Move selection operations as far down the query tree as
possible. This step uses the commutatively and associatively of
selection as mentioned in equivalence rules number 5,6,7 and 9.
3. Rearrange the leaf nodes of the tree so that most restrictive
selections are done first. Most restrictive selection is the one
that produces the fewest number of tuples. In addition, make
sure that the ordering of leaf nodes does not cause the
Cartesian Product operation. This step relies on the rules of
associatively of binary operations such as rule 2 and 12

4. Combine a Cartesian Product with a subsequent
Selection operation into a Join operation if the
selection condition represents a join condition [rule 13].
5. Break down and move lists of projections down the
tree as far as possible. Creating new Projection
operations as needed [rules 3, 6, 8, 10]
6.Identify sub-trees that present groups of operations
that can be pipelined and executing them using
pipelining.
Heuristic Algebraic Optimization algorithm

EXAMPLE_03
SQL Command
SELECT PNUMBER, DNUM, LNAME
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE DNUM=DNUMBER and MGRSSN=SSN and
PLOCATION = 'Stafford';
In relational algebra this can be written as follows.
ΠPNUMBER,DNUM,LNAME(σPLOCATION=’Stanford’(σMGRSSN=SSN (
σDNUM=DNUMBER(E*(D*P)))))

These two query trees are possible from the
relational algebra expression
Of these two queries trees which is more efficient?
Note: two cross product operations. These require lots of space
and time to build. An overall rule for heuristic query optimization is
to perform as many select and project operations as possible
before doing any joins.

PERFORM A HEURISTIC QUERY OPTIMIZATION ON THE
QUERY TREE.
Using rule 4 we break the cascade selections to get this
equivalent query tree.

Using rule 8, we commute Selection with Cross product
to get this query tree.

Finally using rule 7 we combine Cross Product and
Selection to form Joins.

CONVERTING A QUERY TREE TO A QUERY
EVALUATION PLAN
• Query optimizers use the above equivalence rules to
generate a enumeration of logically equivalent
expressions to the given query expression.
• However, expression generating is just one part of the
optimization process.
• As mentioned earlier , the evaluation plan includes the
detail algorithm for each operation in the expression
and how the execution of the operations is
coordinated.

• Thus the evaluation of the expression is can be
costly in terms of both time and memory space.
• The obvious way to evaluate the expression is
simply to evaluate one operation at a time in an
appropriate order.
• The result of an individual evaluation will be stored
in a temporary relation, which must be written to
disk and might be used as the input for the next
evaluation.

COST-BASED QUERY OPTIMISATION
• The method of optimizing the query by
choosing a strategy those results in minimum
cost is called cost-based query optimization.
• The cost-based query optimization uses
formulae that estimate the costs for a number
of options and selects the one with lowest cost
and most efficient to execute.

COST-BASED QUERY OPTIMISATION
The cost estimation of a query evaluation plan is
calculated in terms of various resources that
include:
– Number of disk accesses
– Execution time taken by the CPU to execute a query
– Communication costs in distributed or parallel
database systems.

ESTIMATING COST QUERY OPTIMISATION
EXAMPLE: Cost Functions for SELECTION
Consider a selection operation on a relation whose tuples
are all stored in one file. The simplest algorithms to
implement selection are:
i. Linear search.
ii. Binary search.

LINEAR SEARCH
Linear search - scan all file blocks, all records in a
block are checked to see whether they satisfy the
search condition.
Cost for this method is…
C=br. (For a selection on a key attribute)
Half of the blocks are scanned on average
C = [br/2]

BINARY SEARCH
• Binary search - if the file is ordered on an attribute A
and selection condition is a equality comparison on A,
we can use binary search.
 The estimate number of blocks to be scanned is
C=[log2(br)]+[(SC(A,r)/fr]-1.
The first term is the cost to locate the first satisfied tuple
by a binary search.
The second term is the number of blocks containing
records that satisfy the select condition of which one has
already been retrieved that why we have the third term

EXAMPLE:LINEAR SEARCH
Now, consider a selection in EMPLOYEE file:
σDEPTID=1(DEPARTMENT).
The file EMPLOYEE has the following statistical information
 f = 20 [there are 20 tuples per block]
 V(DeptID, EMPLOYEE) = 10 [there are 10 different
departments]
 n = 1000 [ there are 1000 tuples in the file]
• Cost for doing linear search is b = 1000/20 = 50 block
accesses

EXAMPLE:BINARY SEARCH
Cost for doing binary search on ordering attribute
DEPTID
 Average number of records that satisfy the condition is
1000/10 = 100 records.
 Number of blocks containing these tuples is 100/20 = 5.
 A binary search for the first tuple would take largest
integer nearest to log250 = 6.
Thus the total cost is 6 + 5 – 1 = 10 block accesses

3: QUERY EXECUTION ENGINE
• Once the query plan is chosen, the Query Execution
Engine lastly takes the plan, executes that plan and
returns the answer of the query.

TUTORIAL PROBLEMS
1. Review your knowledge on Relational algebra, SQL
2. Discuss and identity the differences and similarities
between these:
SQL and Relational algebra queries
3.In your own language [including even Kiswahili] formally
describe ALL the steps shown above in the steps of
Query processing
4. In the above presentation cost of query has been on
execution time of query. Discuss the costs in terms of
memory and complexity

NEXT ON CHAPTER SEVEN
DATABASE SECURITY

LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx

More Related Content

Similar to LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx (20)

Recently uploaded (20)

LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx