SlideShare a Scribd company logo
Distributed Databases
Objectives
 key terms in the distributed database area
 Distributed vs. Decentralized Database
 Homogenous vs. Heterogeneous Decentralized Database
 Location transparency vs. local autonomy
 Asynchronous vs. Synchronous distributed databases
 Horizontal vs. Vertical partitioning
 Full refresh vs. differential refresh
 Push replication vs. Pull replication
 Local transaction vs. Global Transaction
Objectives
 Describe salient characteristics of distributed database
environments
 Explain advantages and risks of distributed databases
 Explain strategies and options for distributed database
design
 Discuss synchronous and asynchronous data replication
and partitioning
 Discuss optimized query processing in distributed
databases
Distributed vs. Decentralized Database
Both are stored on computers in multiple locations
 Distributed Database
 Geographical distribution of a SINGLE
database
 Decentralized Database
 A collection of independent databases on non-
networked computers
 Users at various sites cannot share data
Distributed Database
 Require multiple DBMS running at
remote sites
 There are different types of distributed
database environments
 The degree to which these DBMS cooperate
 Having a master site to coordinate requests
involving data from multiple sites
Reasons for Distributed Database
 Distribution and Autonomy of Business Units
 Departments/Facilities are geographically distributed
 Each has the authority to create and control own data
 Business mergers create this environment
 Data sharing
 Consolidate data across local databases on demand.
 Data communication costs and reliability
 Economical and reliable to locate data where needed.
 High cost for remote transactions / large data volumes
 Dependence on data communications can be risky
Reasons for Distributed Database
 Multiple application vendor environment
 Each unit may have different vendor applications
 A distributed DBMS can provide functionality that
cuts across separate applications
 Database recovery
 Replicating data on separate computers may ensure
that a damaged database can be quickly recovered
Homogeneous vs. Heterogeneous
Distributed Database
 Homogeneous Distributed Database -
 The same DBMS is used at each node
 Difficult for most organizations to force a
homogeneous environment
 Heterogeneous Distributed Database
 Potentially different DBMS are used at each
node
 Much more difficult to manage
Typical Homogeneous Environment
 Data distributed across all the nodes.
 Same DBMS at each node.
 A central DBMS coordinates database access
and update across the notes
 No exclusively local data
 All access is through one, global schema.
 The global schema is the union of all the local
schema.
Identical DBMSs
Figure 13-2 – Homogeneous Database
Everyone is a
GLOBAL user
Typical Heterogeneous Environment
 Data distributed across all the nodes.
 Different DBMSs may be used at each node.
 Local access is done using the local DBMS
and schema.
 Remote access is done using the global
schema.
Figure 13-3 –Typical Heterogeneous Environment
Non-identical DBMSs
Local user
accesses his
own data
Major Objectives of Distributed Database
Allow users to share data yet be able to operate
independently when network link fails.
 Location Transparency
 User does not have to know the location of the data
 Data requests automatically forwarded to appropriate
sites
 Local Autonomy
 Local site can operate with its database when network
connections fail
 Each site controls its own data, security, logging,
recovery
Trade-Offs in Distributed Database
When do you update data across the database?
 Synchronous Distributed Database
 All copies of the same data are always identical
 Updates apply immediately to all copies throughout network
 Good for data integrity
 High overhead  slow response times
 Asynchronous Distributed Database
 Some data inconsistency is tolerated
 Data update propagation is delayed
 Lower data integrity
 Less overhead  faster response time
Advantages of Distributed Database
1. Increased reliability and availability
 Even when a component fails the database may continue to
function albeit at a reduced level
2. Allow Local control over data.
 Local control promotes data integrity and administration
3. Modular growth
 Easy to add a connection to a new location
 Less chance of disrupting existing users during expansion
4. Lower communication costs.
5. Faster response for certain queries.
 Query local data
 Parallel queries
Disadvantages of Distributed Database
 Software cost and complexity.
 Processing overhead.
 Data integrity exposure.
 Slower response for certain queries.
 If data are not distributed properly, according to
their usage, or if queries are not formulated
correctly, queries can be extremely slow
Options for Distributing a Database
 Data Replication
 Horizontal Partitioning
 Vertical Partitioning
 Combinations of the above
Data Replication
 Advantages
 Reliability – if one node fails, you can find data at
another node
 Fast response at sites that have a full copy
 May avoid complicated distributed transaction
integrity routines (if replicated data is refreshed at
scheduled intervals.)
 De-couples nodes -transactions proceed even if
some nodes are down.
 Reduced network traffic at prime time, if updates
can be delayed to non-primetime hours
Data Replication
 Disadvantages -
 Storage requirements
 Complexity and cost of updating.
 Integrity exposure of getting incorrect data if
replicated data is not updated simultaneously.
Data Replication
 Best for non-volatile/static, non-collaborative
data
 Catalogs
 Telephone directories
 Train Schedules
 Not good for on-line applications
 Airline reservations
 ATM transactions
Types of Data Replication
 Push Replication
 Updating site sends changes to other sites
 Pull Replication
 Receiving sites control when update
messages will be processed
Types of Push Replication
 Snapshot Replication
 Changes periodically sent to master site
 Master collects updates in log
 Near Real-Time Replication
 Broadcast update orders without requiring
confirmation
 Update messages stored in message queue until
processed by receiving site
Issues in Data Replication Use
 Data timeliness – high tolerance for out-of-date
data may be required
 DBMS capabilities – if DBMS cannot support
multi-node queries, replication may be necessary
 Performance implications – refreshing may cause
performance problems for busy nodes
 Network heterogeneity – complicates replication
 Network communication capabilities – complete
refreshes place heavy demand on
telecommunications
Horizontal Partitioning
 Different rows of a table at different sites
 Advantages -
 Data stored close to where it is used  efficiency
 Local access optimization  better performance
 Only relevant data is available  security
 Unions across partitions  ease of query
 Disadvantages
 Accessing data across partitions  inconsistent
access speed
 If no data replication  backup vulnerability
Vertical Partitioning
 Different columns of a table at different sites
 Advantages and disadvantages are the same as
for horizontal partitioning except that
combining data across partitions is more
difficult because it requires joins (instead of
unions)
Factors in Choice of Distributed Strategy
No approach to data distribution is ALWAYS best
 Choice depends on
 Funding, autonomy, security.
 Site data referencing patterns.
 Growth and expansion needs.
 Technological capabilities.
 Costs of managing complex technologies.
 Need for reliable service.
Distributed DBMS
 Distributed database requires distributed DBMS
 Functions of a distributed DBMS:
 Locate data with a distributed data dictionary
 Determine location from which to retrieve data and process
query components
 DBMS translation between nodes with different local DBMSs
(handle heterogeneous DBMS translation using middleware)
 Data consistency (via multiphase commit protocols)
 Global primary key control
 Scalability
 Security, concurrency, query optimization, failure recovery
Distributed DBMS Data Reference
 Local Transaction - references local data.
 Global Transaction - references non-local data.
Distributed DBMS Architecture
Distributed DBMS Transparency Objectives
 Location Transparency
 User/application does not need to know where data resides
 Replication Transparency
 User/application does not need to know about duplication
 Failure Transparency
 Either all of the actions of a transaction are committed or else
none of them is committed.
 If a transaction fails at one site it don’t commit at other sites
 A system should detect a failure (broken communication link,
erroneous data, disk head crash), reconfigure the system and recover
 Each site has a transaction manager
 Logs transactions and before and after images
 Requires special commit protocol
Failure Transparency Two-Phase Commit
 Commit Protocol: Ensures that a global
transaction is either successfully completed at
each site or else aborted.
 Two-Phase Commit
 Prepare Phase: Check if operation ok at all
participating sites
 Commit Phase: Only if all participating sites
agree, do you issue the commite
Distributed DBMS Transparency Objectives
 Concurrency Transparency
 Allow multiple users to run transactions
concurrently, with each transaction appears as if
it is the only activity in the system
 Timestamping
 Ensure that even if two events occur simultaneously
at different sites, each will have a unique timestamp.
 Alternative to locks in distributed databases
Distributed DBMS Vendors
 Oracle
 Microsoft
 Informix
 Sybase
 IBM
 Computer Associates
 Ingress
 Others……

More Related Content

PPT
Distributed D B
PPTX
Distributed database
PPTX
Distributed database
PPTX
Distributed database
PPTX
DDBMS.pptx
PPT
The Database Environment Chapter 13
PPTX
Distributed database detailed version by jh
Distributed D B
Distributed database
Distributed database
Distributed database
DDBMS.pptx
The Database Environment Chapter 13
Distributed database detailed version by jh

Similar to distributed database management system.ppt (20)

PPTX
lecture-13.pptx
PPTX
Distributed Database system in Computer Science.pptx
PPTX
ADBMS Presentation1weerfbhytffdghg1.pptx
PPTX
DDBS PPT (1).pptx
PPTX
Lec 8 (distributed database)
PPTX
Distributed database management system
PPTX
DBMS - Distributed Databases
PPTX
Distributed dbms (ddbms)
PDF
Distributed databases
PPTX
ditributed databases
PPTX
nnnn.pptx
PPTX
DBMS.pptx
PPTX
RDBMS UNIT 5 DDBMS- Distributed Database Management Systems.pptx
PPTX
Homogeneous ddbms
PPTX
Santosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptx
PPT
Chapter-6 Distribute Database system (3).ppt
PPTX
Introduction to distributed database
PPT
1 ddbms jan 2011_u
PDF
ADBMS Presentation_new.pdtttttttttttttttttttttttttttttt
PPTX
ADBMS Presentation_new1234567878765.pptx
lecture-13.pptx
Distributed Database system in Computer Science.pptx
ADBMS Presentation1weerfbhytffdghg1.pptx
DDBS PPT (1).pptx
Lec 8 (distributed database)
Distributed database management system
DBMS - Distributed Databases
Distributed dbms (ddbms)
Distributed databases
ditributed databases
nnnn.pptx
DBMS.pptx
RDBMS UNIT 5 DDBMS- Distributed Database Management Systems.pptx
Homogeneous ddbms
Santosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptx
Chapter-6 Distribute Database system (3).ppt
Introduction to distributed database
1 ddbms jan 2011_u
ADBMS Presentation_new.pdtttttttttttttttttttttttttttttt
ADBMS Presentation_new1234567878765.pptx
Ad

More from ssuserf170c4 (12)

PPTX
1607090348- lec-10-dbms-vs-file-pro.pptx
PPTX
DATA MANIPULATION LANGUAGE (DML) OF BASE TABLES AND VIEWS.pptx
PPTX
Introduction to Structured Query Language (SQL).pptx
PPTX
STUDENTS DATABASE PRACTICAL MANUAL.pptx
PPTX
Different types of Machine learning careers.pptx
PPTX
Why is the machine learning important.pptx
PPTX
Representing Binary Logistic Regression Model.pptx
PPTX
Introduction to Logical Regression.pptx
PDF
TENTATIVE DATE SHEET FOR DECEMBER 2024 _16.07.2024_ IGNOU.pdf
PPTX
MODEL VIEW ARCHITECTUREtraffic EXAMPLE.pptx
PPT
week1 lecture2 on DataBaseManagementSystem.ppt
PPT
Inheritance and its necessity in java.ppt
1607090348- lec-10-dbms-vs-file-pro.pptx
DATA MANIPULATION LANGUAGE (DML) OF BASE TABLES AND VIEWS.pptx
Introduction to Structured Query Language (SQL).pptx
STUDENTS DATABASE PRACTICAL MANUAL.pptx
Different types of Machine learning careers.pptx
Why is the machine learning important.pptx
Representing Binary Logistic Regression Model.pptx
Introduction to Logical Regression.pptx
TENTATIVE DATE SHEET FOR DECEMBER 2024 _16.07.2024_ IGNOU.pdf
MODEL VIEW ARCHITECTUREtraffic EXAMPLE.pptx
week1 lecture2 on DataBaseManagementSystem.ppt
Inheritance and its necessity in java.ppt
Ad

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PPTX
master seminar digital applications in india
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Institutional Correction lecture only . . .
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
RMMM.pdf make it easy to upload and study
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Classroom Observation Tools for Teachers
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Types and Its function , kingdom of life
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
GDM (1) (1).pptx small presentation for students
master seminar digital applications in india
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
01-Introduction-to-Information-Management.pdf
VCE English Exam - Section C Student Revision Booklet
Microbial disease of the cardiovascular and lymphatic systems
O5-L3 Freight Transport Ops (International) V1.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Institutional Correction lecture only . . .
Microbial diseases, their pathogenesis and prophylaxis
Module 4: Burden of Disease Tutorial Slides S2 2025
RMMM.pdf make it easy to upload and study
O7-L3 Supply Chain Operations - ICLT Program
Classroom Observation Tools for Teachers
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Final Presentation General Medicine 03-08-2024.pptx
Cell Types and Its function , kingdom of life
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE

distributed database management system.ppt

  • 2. Objectives  key terms in the distributed database area  Distributed vs. Decentralized Database  Homogenous vs. Heterogeneous Decentralized Database  Location transparency vs. local autonomy  Asynchronous vs. Synchronous distributed databases  Horizontal vs. Vertical partitioning  Full refresh vs. differential refresh  Push replication vs. Pull replication  Local transaction vs. Global Transaction
  • 3. Objectives  Describe salient characteristics of distributed database environments  Explain advantages and risks of distributed databases  Explain strategies and options for distributed database design  Discuss synchronous and asynchronous data replication and partitioning  Discuss optimized query processing in distributed databases
  • 4. Distributed vs. Decentralized Database Both are stored on computers in multiple locations  Distributed Database  Geographical distribution of a SINGLE database  Decentralized Database  A collection of independent databases on non- networked computers  Users at various sites cannot share data
  • 5. Distributed Database  Require multiple DBMS running at remote sites  There are different types of distributed database environments  The degree to which these DBMS cooperate  Having a master site to coordinate requests involving data from multiple sites
  • 6. Reasons for Distributed Database  Distribution and Autonomy of Business Units  Departments/Facilities are geographically distributed  Each has the authority to create and control own data  Business mergers create this environment  Data sharing  Consolidate data across local databases on demand.  Data communication costs and reliability  Economical and reliable to locate data where needed.  High cost for remote transactions / large data volumes  Dependence on data communications can be risky
  • 7. Reasons for Distributed Database  Multiple application vendor environment  Each unit may have different vendor applications  A distributed DBMS can provide functionality that cuts across separate applications  Database recovery  Replicating data on separate computers may ensure that a damaged database can be quickly recovered
  • 8. Homogeneous vs. Heterogeneous Distributed Database  Homogeneous Distributed Database -  The same DBMS is used at each node  Difficult for most organizations to force a homogeneous environment  Heterogeneous Distributed Database  Potentially different DBMS are used at each node  Much more difficult to manage
  • 9. Typical Homogeneous Environment  Data distributed across all the nodes.  Same DBMS at each node.  A central DBMS coordinates database access and update across the notes  No exclusively local data  All access is through one, global schema.  The global schema is the union of all the local schema.
  • 10. Identical DBMSs Figure 13-2 – Homogeneous Database Everyone is a GLOBAL user
  • 11. Typical Heterogeneous Environment  Data distributed across all the nodes.  Different DBMSs may be used at each node.  Local access is done using the local DBMS and schema.  Remote access is done using the global schema.
  • 12. Figure 13-3 –Typical Heterogeneous Environment Non-identical DBMSs Local user accesses his own data
  • 13. Major Objectives of Distributed Database Allow users to share data yet be able to operate independently when network link fails.  Location Transparency  User does not have to know the location of the data  Data requests automatically forwarded to appropriate sites  Local Autonomy  Local site can operate with its database when network connections fail  Each site controls its own data, security, logging, recovery
  • 14. Trade-Offs in Distributed Database When do you update data across the database?  Synchronous Distributed Database  All copies of the same data are always identical  Updates apply immediately to all copies throughout network  Good for data integrity  High overhead  slow response times  Asynchronous Distributed Database  Some data inconsistency is tolerated  Data update propagation is delayed  Lower data integrity  Less overhead  faster response time
  • 15. Advantages of Distributed Database 1. Increased reliability and availability  Even when a component fails the database may continue to function albeit at a reduced level 2. Allow Local control over data.  Local control promotes data integrity and administration 3. Modular growth  Easy to add a connection to a new location  Less chance of disrupting existing users during expansion 4. Lower communication costs. 5. Faster response for certain queries.  Query local data  Parallel queries
  • 16. Disadvantages of Distributed Database  Software cost and complexity.  Processing overhead.  Data integrity exposure.  Slower response for certain queries.  If data are not distributed properly, according to their usage, or if queries are not formulated correctly, queries can be extremely slow
  • 17. Options for Distributing a Database  Data Replication  Horizontal Partitioning  Vertical Partitioning  Combinations of the above
  • 18. Data Replication  Advantages  Reliability – if one node fails, you can find data at another node  Fast response at sites that have a full copy  May avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals.)  De-couples nodes -transactions proceed even if some nodes are down.  Reduced network traffic at prime time, if updates can be delayed to non-primetime hours
  • 19. Data Replication  Disadvantages -  Storage requirements  Complexity and cost of updating.  Integrity exposure of getting incorrect data if replicated data is not updated simultaneously.
  • 20. Data Replication  Best for non-volatile/static, non-collaborative data  Catalogs  Telephone directories  Train Schedules  Not good for on-line applications  Airline reservations  ATM transactions
  • 21. Types of Data Replication  Push Replication  Updating site sends changes to other sites  Pull Replication  Receiving sites control when update messages will be processed
  • 22. Types of Push Replication  Snapshot Replication  Changes periodically sent to master site  Master collects updates in log  Near Real-Time Replication  Broadcast update orders without requiring confirmation  Update messages stored in message queue until processed by receiving site
  • 23. Issues in Data Replication Use  Data timeliness – high tolerance for out-of-date data may be required  DBMS capabilities – if DBMS cannot support multi-node queries, replication may be necessary  Performance implications – refreshing may cause performance problems for busy nodes  Network heterogeneity – complicates replication  Network communication capabilities – complete refreshes place heavy demand on telecommunications
  • 24. Horizontal Partitioning  Different rows of a table at different sites  Advantages -  Data stored close to where it is used  efficiency  Local access optimization  better performance  Only relevant data is available  security  Unions across partitions  ease of query  Disadvantages  Accessing data across partitions  inconsistent access speed  If no data replication  backup vulnerability
  • 25. Vertical Partitioning  Different columns of a table at different sites  Advantages and disadvantages are the same as for horizontal partitioning except that combining data across partitions is more difficult because it requires joins (instead of unions)
  • 26. Factors in Choice of Distributed Strategy No approach to data distribution is ALWAYS best  Choice depends on  Funding, autonomy, security.  Site data referencing patterns.  Growth and expansion needs.  Technological capabilities.  Costs of managing complex technologies.  Need for reliable service.
  • 27. Distributed DBMS  Distributed database requires distributed DBMS  Functions of a distributed DBMS:  Locate data with a distributed data dictionary  Determine location from which to retrieve data and process query components  DBMS translation between nodes with different local DBMSs (handle heterogeneous DBMS translation using middleware)  Data consistency (via multiphase commit protocols)  Global primary key control  Scalability  Security, concurrency, query optimization, failure recovery
  • 28. Distributed DBMS Data Reference  Local Transaction - references local data.  Global Transaction - references non-local data.
  • 30. Distributed DBMS Transparency Objectives  Location Transparency  User/application does not need to know where data resides  Replication Transparency  User/application does not need to know about duplication  Failure Transparency  Either all of the actions of a transaction are committed or else none of them is committed.  If a transaction fails at one site it don’t commit at other sites  A system should detect a failure (broken communication link, erroneous data, disk head crash), reconfigure the system and recover  Each site has a transaction manager  Logs transactions and before and after images  Requires special commit protocol
  • 31. Failure Transparency Two-Phase Commit  Commit Protocol: Ensures that a global transaction is either successfully completed at each site or else aborted.  Two-Phase Commit  Prepare Phase: Check if operation ok at all participating sites  Commit Phase: Only if all participating sites agree, do you issue the commite
  • 32. Distributed DBMS Transparency Objectives  Concurrency Transparency  Allow multiple users to run transactions concurrently, with each transaction appears as if it is the only activity in the system  Timestamping  Ensure that even if two events occur simultaneously at different sites, each will have a unique timestamp.  Alternative to locks in distributed databases
  • 33. Distributed DBMS Vendors  Oracle  Microsoft  Informix  Sybase  IBM  Computer Associates  Ingress  Others……