SlideShare a Scribd company logo
2
Most read
5
Most read
6
Most read
Query Processing Strategies in
Distributed Database
(Journal of Engineering, Computers & Applied Sciences (JEC&AS) ISSN No: 2319‐5606
Volume 2, No.7, July 2013)
Presented By:-
Shree Raj Khatiwada
Introduction
 Query: Instruction to DBMS to update or retrieve specific data to/from the
physically stored medium.
 Query Processor: There are three steps during the processing of that query:
1. Parsing and Translation: the human
readable form of the query is translated
into forms usable by the DBMS i.e.
Relational algebra expression, query
tree and query graph
SELECT Ename
FROM Employee
WHERE Salary > 5000;
This can be translated into following Relational Algebra
Expressions:
σ Salary > 5000 (π Ename (Employee))
OR
π Ename (σ Salary > 5000 (Employee)) Fig: Steps in Query Processing
Introduction contd…
2. Optimizing the Query:
 determines the efficient way to execute a query with different possible
query plans.
 Main aim is to minimize the cost function,
I/O Cost + CPU Cost + Communication Cost
 defines how an RDBMS can improve the performance of the query by re-
ordering the operations.
3. Evaluating the Query:
 The query-execution engine takes an (optimal) evaluation plan, executes
that plan, and returns the answers to the query.
Distributed Query Processing
 In a distributed database environment, data
is stored at different sites connected
through network.
 Distributed query processing contains four
stages, which are:
1. Query decomposition
 Calculus Query as an input
 Using global schema the calculus query is
decomposed to algebraic query
2. Data Localization
 Algebraic query as an input
 Uses fragment schema to generate localized
fragment query
 Fragment involvement is determined
Distributed Query Processing contd…
3. Global Optimization
 Fragment query is an input
 Uses fragment statistics to get optimized fragment query as an output.
 Finding best global schedule is done
4. Local Optimization
 Local schema is used to get optimized local query and then executed
 Output is returned to the site from where the query was generated.
Distributed Query Optimization
Distributed query optimization is defined as finding the efficient execution
strategy path in distributed network.
There are three components of distributed query optimization:
 Access Method: methods used to access data from distributed
environment
 Join Criteria: In distributed database join criteria is used to join the
different sites to get optimized result.
 Transmission Costs: Cost of transmitting the results from intermediate
steps needs to be considered .
There are many issues in distributed query optimization such as types of
optimizer, optimization granularity, network topologies and optimization
timing
Example
Site 1: COURSE, ENROLLMENT
Site 2: STUDENT
(Course: Physics and Student: Senior)
There are many ways to optimize this three-table join some of which are:
Option 1: Start with site 1, join C & E retrieving only physics course and move entire result set to
site 2 to be joined with S.
Option 2: Star with site 2, retrieve only senior student from S and move the entire result set to site
1 to be joined with C and E
Option 3: Move C & E to site 2 and proceed with the local 3-tables join
Option 4: Move S to site 1 and proceed with a local 3-tables join
Example Contd…
Which of these options will perform the best?
 The only correct answer is “It depends”.
The optimal choice will depends on:
 the size of the tables,
 the size of the result sets (the number of qualifying rows and their
length in bytes) and
 the efficiency of the network.
Optimal Distribution Strategies for
Simple QueriesA query optimization algorithm is an algorithm that derives a distribution
strategy for a given query.
Query optimization algorithms that derive optimal distribution strategies for a class of distributed
queries called simple queries.
There are various algorithms that are used for query optimization such as:
Algorithm PARALLEL:
 Algorithm PARALLEL was used to derive a minimal response time distribution strategy for any
given simple query.
 Algorithm PARALLEL searches for cost beneficial data transmissions by trying to join small
relations to large relations.
Algorithm SERIAL:
 Finds strategy with minimum total time
 consists of transmitting each relation, starting with Ri, to the next relation in a serial order.
 The strategy is represented by R1 -> R2 ->…….-> Rm->Rr, where Rr is the relation at the result
node.
Algorithm GENERAL:
• Algorithm GENERAL derives a query processing strategy for either
response time or total time minimization by using the procedures
RESPONSE, TOTAL, and COLLECTIVE.
• a relation can contain more than one joining attribute.
• It has three versions as :-
I. Response Time Version
II. Total Time Version
III. Handling Redundant Data Transmission
Conclusion
 Algorithm GENERAL to be an efficient algorithm of polynomial complexity that derives close to
optimal query processing strategies on distributed systems.
 Algorithm GENERAL is an extension of processing tactics found optimal for simple queries in
Algorithm PARALLEL and algorithm SERIAL
 There are two primary versions of Algorithm GENERAL
1. To minimize response time of a processing strategy, parallel data transmissions are emphasized
by the use of Algorithm PARALLEL and Procedure RESPONSE.
2. To minimize the total time of a processing strategy, serial time transmissions are emphasized by
the use of Algorithm SERIAL and Procedure TOTAL
Thank you.

More Related Content

PDF
operating system structure
PDF
Middleware and Middleware in distributed application
PPTX
Distributed concurrency control
PPT
Congetion Control.pptx
PDF
Ddb 1.6-design issues
PPT
distributed shared memory
PDF
management of distributed transactions
PPTX
Fault tolerance in distributed systems
operating system structure
Middleware and Middleware in distributed application
Distributed concurrency control
Congetion Control.pptx
Ddb 1.6-design issues
distributed shared memory
management of distributed transactions
Fault tolerance in distributed systems

What's hot (20)

PPTX
Specification-of-tokens
PPT
Distributed Transaction
PPTX
Component Based Software Engineering
PPTX
Matching techniques
PPTX
Virtualization in cloud computing
PPTX
DBMS - RAID
PPTX
Distributed Query Processing
PPT
Sliding window protocol
PPTX
Distributed Transactions(flat and nested) and Atomic Commit Protocols
PPT
Distributed Systems
PPTX
2 phase locking protocol DBMS
PPTX
Load Balancing in Parallel and Distributed Database
PPTX
Processor allocation in Distributed Systems
PPSX
Issues in Data Link Layer
PPTX
Link state routing protocol
PPTX
Decomposition using Functional Dependency
PPTX
Design Model & User Interface Design in Software Engineering
PPTX
file system in operating system
PPT
Error Detection And Correction
PPT
02 protocol architecture
Specification-of-tokens
Distributed Transaction
Component Based Software Engineering
Matching techniques
Virtualization in cloud computing
DBMS - RAID
Distributed Query Processing
Sliding window protocol
Distributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Systems
2 phase locking protocol DBMS
Load Balancing in Parallel and Distributed Database
Processor allocation in Distributed Systems
Issues in Data Link Layer
Link state routing protocol
Decomposition using Functional Dependency
Design Model & User Interface Design in Software Engineering
file system in operating system
Error Detection And Correction
02 protocol architecture
Ad

Similar to Query processing strategies in distributed database (20)

PDF
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
PPT
Query optimization and challenges in DDBMS with Review Algorithms.
PPTX
DB LECTURE 5 QUERY PROCESSING.pptx
PPTX
Distributed DBMS - Unit 6 - Query Processing
PDF
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
PDF
Query trees
PPTX
Query processing
PPTX
Query-porcessing-& Query optimization
PPTX
Concepts of Query Processing in ADBMS.pptx
PPTX
Query optimization
PPT
Distributed query processing for Advance database technology .ppt
PDF
CH5_Query Processing and Optimization.pdf
PDF
6-Query_Intro (5).pdf
PPTX
Database , 6 Query Introduction
PPTX
Optimizing distributed queries
PPTX
LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx
PPTX
Executing Joins Dynamically in DDBS Query Optimizer
PPTX
Query optimization
PPTX
Query processing and optimization (updated)
PPTX
Adbms 25 distributed database systems
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
Query optimization and challenges in DDBMS with Review Algorithms.
DB LECTURE 5 QUERY PROCESSING.pptx
Distributed DBMS - Unit 6 - Query Processing
Chapter 2.pdf WND FWKJFW KSD;KFLWHFB ASNK
Query trees
Query processing
Query-porcessing-& Query optimization
Concepts of Query Processing in ADBMS.pptx
Query optimization
Distributed query processing for Advance database technology .ppt
CH5_Query Processing and Optimization.pdf
6-Query_Intro (5).pdf
Database , 6 Query Introduction
Optimizing distributed queries
LECTURE_06_DATABASE PROCESSING & OPTIMAZATION.pptx
Executing Joins Dynamically in DDBS Query Optimizer
Query optimization
Query processing and optimization (updated)
Adbms 25 distributed database systems
Ad

More from ShreerajKhatiwada (7)

PDF
The Power of 5th Generation Networks: Uses, Future Trends
PDF
Green Computing Initiatives in Recent World
PDF
Computer Graphics and Animation in Detail
PDF
Artificial Intelligence General Overview
PPTX
Geographic data quality
PDF
Basic Computer Architeccture
PPTX
Cluster computing
The Power of 5th Generation Networks: Uses, Future Trends
Green Computing Initiatives in Recent World
Computer Graphics and Animation in Detail
Artificial Intelligence General Overview
Geographic data quality
Basic Computer Architeccture
Cluster computing

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25-Week II
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Network Security Unit 5.pdf for BCA BBA.
Assigned Numbers - 2025 - Bluetooth® Document
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
Digital-Transformation-Roadmap-for-Companies.pptx

Query processing strategies in distributed database

  • 1. Query Processing Strategies in Distributed Database (Journal of Engineering, Computers & Applied Sciences (JEC&AS) ISSN No: 2319‐5606 Volume 2, No.7, July 2013) Presented By:- Shree Raj Khatiwada
  • 2. Introduction  Query: Instruction to DBMS to update or retrieve specific data to/from the physically stored medium.  Query Processor: There are three steps during the processing of that query: 1. Parsing and Translation: the human readable form of the query is translated into forms usable by the DBMS i.e. Relational algebra expression, query tree and query graph SELECT Ename FROM Employee WHERE Salary > 5000; This can be translated into following Relational Algebra Expressions: σ Salary > 5000 (π Ename (Employee)) OR π Ename (σ Salary > 5000 (Employee)) Fig: Steps in Query Processing
  • 3. Introduction contd… 2. Optimizing the Query:  determines the efficient way to execute a query with different possible query plans.  Main aim is to minimize the cost function, I/O Cost + CPU Cost + Communication Cost  defines how an RDBMS can improve the performance of the query by re- ordering the operations. 3. Evaluating the Query:  The query-execution engine takes an (optimal) evaluation plan, executes that plan, and returns the answers to the query.
  • 4. Distributed Query Processing  In a distributed database environment, data is stored at different sites connected through network.  Distributed query processing contains four stages, which are: 1. Query decomposition  Calculus Query as an input  Using global schema the calculus query is decomposed to algebraic query 2. Data Localization  Algebraic query as an input  Uses fragment schema to generate localized fragment query  Fragment involvement is determined
  • 5. Distributed Query Processing contd… 3. Global Optimization  Fragment query is an input  Uses fragment statistics to get optimized fragment query as an output.  Finding best global schedule is done 4. Local Optimization  Local schema is used to get optimized local query and then executed  Output is returned to the site from where the query was generated.
  • 6. Distributed Query Optimization Distributed query optimization is defined as finding the efficient execution strategy path in distributed network. There are three components of distributed query optimization:  Access Method: methods used to access data from distributed environment  Join Criteria: In distributed database join criteria is used to join the different sites to get optimized result.  Transmission Costs: Cost of transmitting the results from intermediate steps needs to be considered . There are many issues in distributed query optimization such as types of optimizer, optimization granularity, network topologies and optimization timing
  • 7. Example Site 1: COURSE, ENROLLMENT Site 2: STUDENT (Course: Physics and Student: Senior) There are many ways to optimize this three-table join some of which are: Option 1: Start with site 1, join C & E retrieving only physics course and move entire result set to site 2 to be joined with S. Option 2: Star with site 2, retrieve only senior student from S and move the entire result set to site 1 to be joined with C and E Option 3: Move C & E to site 2 and proceed with the local 3-tables join Option 4: Move S to site 1 and proceed with a local 3-tables join
  • 8. Example Contd… Which of these options will perform the best?  The only correct answer is “It depends”. The optimal choice will depends on:  the size of the tables,  the size of the result sets (the number of qualifying rows and their length in bytes) and  the efficiency of the network.
  • 9. Optimal Distribution Strategies for Simple QueriesA query optimization algorithm is an algorithm that derives a distribution strategy for a given query. Query optimization algorithms that derive optimal distribution strategies for a class of distributed queries called simple queries. There are various algorithms that are used for query optimization such as: Algorithm PARALLEL:  Algorithm PARALLEL was used to derive a minimal response time distribution strategy for any given simple query.  Algorithm PARALLEL searches for cost beneficial data transmissions by trying to join small relations to large relations.
  • 10. Algorithm SERIAL:  Finds strategy with minimum total time  consists of transmitting each relation, starting with Ri, to the next relation in a serial order.  The strategy is represented by R1 -> R2 ->…….-> Rm->Rr, where Rr is the relation at the result node. Algorithm GENERAL: • Algorithm GENERAL derives a query processing strategy for either response time or total time minimization by using the procedures RESPONSE, TOTAL, and COLLECTIVE. • a relation can contain more than one joining attribute. • It has three versions as :- I. Response Time Version II. Total Time Version III. Handling Redundant Data Transmission
  • 11. Conclusion  Algorithm GENERAL to be an efficient algorithm of polynomial complexity that derives close to optimal query processing strategies on distributed systems.  Algorithm GENERAL is an extension of processing tactics found optimal for simple queries in Algorithm PARALLEL and algorithm SERIAL  There are two primary versions of Algorithm GENERAL 1. To minimize response time of a processing strategy, parallel data transmissions are emphasized by the use of Algorithm PARALLEL and Procedure RESPONSE. 2. To minimize the total time of a processing strategy, serial time transmissions are emphasized by the use of Algorithm SERIAL and Procedure TOTAL