SlideShare a Scribd company logo
UNIT-1
Distributed Data Processing
• Distributed data processing allows multiple computers to be used
anywhere.
• Distributed data processing allows multiple computers to be working
among multiple geographically separate sites where local computers
handle local processing needs.
• Distributed processing is a database's logical processing is shared
among two or more physically independent sites that are connected
through a network.
Advanced Topics on Database - Unit-1 AU17
• It dramatically reduced workstation costs and improved user
interfaces and desktop power. It increases ability to share data
across multiple servers.
• We define a distributed system as one in which hardware or software
components located at networked computers communicate and
coordinate their actions only by-passing messages.
Advantages :
1. Availability
2. Resource Sharing
3. Incremental Growth
4. Increased User Involvement and Control
5. End-user Productivity
6. Distance and location independence
7. Privacy and security
Disadvantages :
1. More difficulty test and failure diagnosis
2. More components and dependence on communication means more points of failure
3. Incompatibility of components
4. Incompatibility of data
5. More complex management and control
6. Difficulty in control of corporate information resources
7. Suboptimal procurement
8. Duplication of effort
Distributed System Architecture
• Database System Architecture
• Database architecture focuses on the design, development, implementation and
maintenance of computer programs that store and organize information for businesses,
agencies and institutions.
• Centralized Architecture
• The centralized database system consists of a single processor together with its associated
data storage devices and other peripherals.
• Data can be accessed from the multiple sites with the use of a computer network while
the database is maintained at the central site.
• General-purpose computer system: One to a few CPUs and a number of device controllers
that are connected through a common bus that provides access to shared memory.
• Single-user system (e.g., personal computer or workstation) : Desk-top unit, single user,
usually has only one CPU and one or two hard disks; the OS may support only one user.
• Multi-user system: More disks, more memory, multiple CPUs and a multi-user OS. Serve a
large number of users who are connected to the system via terminals. Often called server
systems.
Advantages
1. The data integrity is maximized as the whole database is stored
at a single physical location. This means that it is easier to
coordinate the data and it is as accurate and consistent as
possible.
2. The data redundancy is minimal in the centralized database.
Disadvantages:
1. If server fails, whole system fails.
2. It takes more time to search and access
3. Bottlenecks can appear when the traffic spikes.
• Client-Server Architecture
• The functionality is divided into two classes: Server and Client.
• A client is defined as a requester of services and a server is defined as the provider
of services. Client/server computing is a very efficient and safe means of sharing
database information in a multi-user environment
• This provides two-level architecture which makes it easier to manage the
complexity of modern DBMSs and the complexity of distribution.
• Client: User machine that provides user interface capabilities and local processing.
• Server: System containing both hardware and software. It provides services to the
client machines such as file access, printing, archiving, or database access.
• Server functions: Mainly data management, including query processing,
optimization, transaction management, etc.
• Client functions: Might also include some data management functions not just
user interface.
Advanced Topics on Database - Unit-1 AU17
• Client-server architecture may be either two-tier or three-tier.
• In a two-tier architecture, the server performs database functions
and the clients perform the user interface functions. Either the
server or the clients may perform business functions.
• In a three-tier architecture, the clients perform the user interface
functions, a database server performs the database functions, and
separate computers, called application servers, perform the
business functions and act as interface between clients and
database server.
• There is various classification of client server system.
1. Multiple client - single server 2. Multiple client - multiple server
• Multiple clients - single server
Multiple Clients- Multiple Server
Problems with this type of architecture is as follows :
1. Single point of failure i.e. server failure
2. Bottleneck at server
3. Difficult to scale a database
2. Multiple Client - Multiple Server :
• Two alternative management strategies are possible: Either each client
manages its own connection to the appropriate server or each client
knows of only its” home server” which then communicates with other
servers as required
• It consists of clients running client software, a set of servers which provide
all database functionalities and a reliable communication infrastructure.
• When a server receives a, query that requires access to data at other
servers, it generates appropriate sub-query to be executed by other server
and put the results together to compute answer to the original query
Advanced Topics on Database - Unit-1 AU17
Advantages of client-server architectures
1. Horizontal and vertical scaling of resources
2. Better price/performance on client machines
3. Ability to use familiar tools on client machines
4. Client access to remote data
Disadvantages of client-server architectures
1. Maintenance cost is more.
2. It suffers from security problem as number of user and
processing sites increases.
3. Complexity increases
Server System Architecture
• Server systems can be broadly categorized into two kinds:
1. Transaction servers which are widely used in relational
database systems.
2. Data servers, used in object-oriented database systems.
Transaction server
A transaction server is a specialized type of server that manages
the operations of software based transactions or transaction
processing. It manages application and database transactions on a
network or Internet, within a distributed computing environment.
• Transaction server allows you to break down transactions into
components that perform discrete functions.
• A component notifies transaction server whether it has completed
successfully or not.
• If all the components in a transaction complete successfully, the
transaction is committed. If not, the transaction is rolled back.
• Transaction server is also called query server systems or SQL server
systems.
• Requests specified in SQL, and communicated to the server through a
Remote Procedure Call (RPC) mechanism.
• Open Database Connectivity (ODBC) is an application program
interface standard from Microsoft for connecting to a server, sending
SQL requests, and receiving results.
• Data server:
Data server is used in LANs, where there is a very high-speed connection
between the clients and the server.
Issues are as follows
• Page-shipping versus Item-shipping
• Locking
• Data caching
• Lock caching
Parallel System
• A system is said to be a parallel system in which multiple processor
have direct access to shared memory which forms a common
address space.
• Two main performance measures are:
a. Throughput: The number of tasks that can be completed in a
given time interval.
b. Response time: The amount of time it takes to complete a
single task from the time it is submitted.
• Speed-up and Scale-up:
Speedup: A fixed-sized problem executing on a small system is
given to a system which is N -times larger.
Scaleup: Increase the size of both the problem and the system.
The N -times larger system used to perform N -times larger job.
• Factors Limiting Speedup and Scaleup
1. Startup costs: Cost of starting up multiple processes may
dominate computation time, if the degree of parallelism is
high.
2. Interference: Processes accessing shared resources (e.g., system
bus, disks, or locks) compete with each other, thus
spending time waiting on other processes, rather
than performing useful work.
3. Skew: Increasing the degree of parallelism increases the variance
in service times of parallelly executing tasks. Overall
execution time determined by slowest of
parallelly executing tasks.
Distributed System
• A distributed system is one in which components located at
networked computers communicate and co-ordinate their actions
only by-passing messages.
• Usually, distributed systems are asynchronous, i.e., they do not
use a common clock and do not impose any bounds on relative
processor speeds or message transfer times.
Differentiate between local and global transactions are as follows
a) A local transaction accesses data in the single site at which the
transaction was initiated.
b) A global transaction either accesses data in a site different from the
one at which the transaction was initiated or accesses data in
several different sites.
Trade-offs in distributed systems :
a. Sharing Data : Users at one site able to access the data residing at
some other sites.
b. Autonomy: Each site is able to retain a degree of control over data
stored locally.
c. Higher System Availability through Redundancy : Data can be
replicated at remote sites, and system can function even if
a site fails.
Parallel Database
• Parallel database systems consist of multiple processors and multiple disks
connected by a fast interconnection network.
• Parallel database improves the performance of processing of data using
multiple resources simultaneously. Multiple resources like multiple CPU,
Disks can be used simultaneously.
Goals of Parallel Databases
• Improve performance: The performance of the system can be
improved by connecting multiple CPU and disks in parallel.
• Improve availability of data: Data can be copied to multiple locations
to improve the availability of data.
• Improve reliability: Reliability of system is improved with
completeness, accuracy and availability of data.
• Provide distributed access of data: Companies having many branches
in multiple cities can access data with the help of parallel database
system
• Advantages of Parallel Databases:
a) Performance improvement
b) High availability : Same data is stored at multiple locations
c) Increases reliability : Even if data site fails execution can continue as
other copy of data are available.
I/O Parallelism
• I/O parallelism refers to reducing the time required to retrieve
relations from disk by partitioning the relations on multiple disks.
• The most common form of data partitioning in a parallel database
environment is horizontal partitioning.
• In horizontal partitioning, so that each tuple residthe tuples of a
relation are divided among many disks,es on one disk. Several
partitioning strategies have been proposed.
• Three basic data-partitioning strategies are
• Round Robin,
• Hash Partitioning
• Range Partitioning.
Partitioning techniques are as follows :
• Round-robin: In this method, scans the relation in any order and sends the ith
tuple to disk number Di mod n. It ensures an even distribution of tuples
across disks and each disk has approximately the same number of tuples as
the others.
• Hash partitioning: It choose one or more attributes as the partitioning
attributes. A hash function is chosen whose range is |0, 1,..., n — 1|. Each
tuple of the original relation is hashed on the partitioning attributes. If the
hash function returns i, then the tuple is placed on disk Di . Hash partitioning
is also useful for sequential scans of the entire relation.
• Range partitioning: Choose an attribute as the partitioning attribute. A
partitioning vector [v 0 , v 1-,-- , v n-2 ] is chosen. Let v be the partitioning
attribute value of a tuple. Tuples such that v i<+vi+1 go to disk I + 1. Tuples
with v < v 0 go to disk 0 and tuples with v >vn-2 go to disk n — 1.
Handling of Skew
• Data skew primarily refers to a non uniform distribution in a dataset.
• The direct impact of data skew on parallel execution of complex
database queries is a poor load balancing leading to high response
time.
Skew are of two types : Attribute-value skew and Partition skew.
1. Attribute-value skew : Some values appear in the partitioning
attributes of many tuples. All the tuples with the same value for
the partitioning attribute end up in the same partition, resulting
in skew.
2. Partition skew: refers to the fact that there may be load
imbalance in the partitioning, even when there is no attribute
skew.
Interquery Parallelism
• In interquery parallelism, different queries or transaction execute in
parallel with one another.
• The response times of individual transactions are not faster than they
would be if the transactions were run in isolation.
• Thus, the primary use of interquery parallelism is to scale up a
transaction processing system to support a more significant number
of transactions per second.
Intraquery Parallelism
• Intraquery parallelism defines the execution of a single query in
parallel on multiple processors and disks. Using intraquery
parallelism is essential for speeding up long-running queries.
Two complementary forms of intraquery parallelism:
1. Intraoperation parallelism : Parallelize the execution of each
individual operation in the query.
2. Interoperation parallelism : Execute the different operations in a
query expression in parallel.
Distributed Database Concepts
• A logically interrelated collection of shared data physically distributed over a
computer network.
DDBMS components must include the following components
1. Computer workstations
2. Network hardware and software
3. Communications media
4. Transaction Processor: Software component found in each computer that
receives and processes the application's requests data
5. Data processor or data manager: Software component residing on each
computer that stores and retrieves data located at the site. It may be a
centralized DBMS
Data is stored using following methods.
1. Replication
2. Fragmentation
3. Hybrid
4. Allocation
• Allocation of fragments is depends on the how the database will be
used. First thing is to design database schema and then design
application program.
• Information required from application is as follows :
1. Transaction execution frequency
2. Site in which transaction is executed
3. Condition for transaction performed
Advanced Topics on Database - Unit-1 AU17
The distributed database systems contain local and global schema for all sites. Data
storage is responsibility of data sites.
Data is stored using following methods.
1. Replication
2. Fragmentation
3. Hybrid
4. Allocation
Allocation of fragments is depends on the how the database will be used.
Information required from application is as follows :
1. Transaction execution frequency
2. Site in which transaction is executed
3. Condition for transaction performed
• There are five big reasons for using a distributed database system:
1. Many organizations are distributed in nature.
2. Multiple databases can be accessed transparently.
3. Database can be expanded incrementally - As needs arise, additional
computers can be connected to the distributed database system.
4. Reliability and availability are increased - Distributed database can
replicate data among several sites. So even if one site fails,
redundancy in data will lead to increased availability and reliability
of the data as a whole.
5. Performance will increase - Query processing can be performed at
multiple sites and as such distributed database systems can mimic
parallel database systems in a high-performance network.
Homogenous and Heterogeneous Databases.
a)Homogeneous DDBMS
b) All sites use same DBMS product
c) It is much easier to design and manage
d) The approach provides incremental growth and allows increased
performance.
e) Homogeneous systems are much easier to design and manage
f) Homogeneous DBs can communicate directly with each other
Heterogeneous DDBMS
a) Sites may run different DBMS products, with possibly different
underlying data models.
b) This occurs when sites have implemented their own databases first,
and integration is considered later.
c) Translations are required to allow for different hardware and/or
different DBMS products
d) Typical solution is to use gateways
e) Heterogeneous DBs communicate through gateway interfaces
f) No DDBMS currently provides full support for heterogeneous or fully
heterogeneous DDBMSs
Distributed Data Storage
A Relation (r) is stored in the database. Two methods are used
for storing relations on the distributed database.
a) Replication: The system maintains several identical replicas of the
relation and stores each replica at a different site. The alternative to
replication is to store only one copy of relation r.
b) Fragmentation: The system partitions the relation into several
fragments, and stores each fragment at a different site.
Data Replication
• If relation r is replicated, a copy of relation r is stored in two or more
sites.
• Using data replication, each logical data item of a database has
several physical copies, each of them located on a different machine,
also referred to as site or node.
• In full replication the entire database is replicated and in partial
replication some selected part is replicated to some of the sites.
Advanced Topics on Database - Unit-1 AU17
Data Fragmentation
• Fragmentation is a design technique to divide a single relation or class
of a database into two or more partitions such that the
combination of the partitions provides the original database without
any loss of information.
The main reasons of fragmentation of the relations are to
1. Increase locality of reference of the queries submitted to database,
2. Improve reliability and availability of data and performance of the
system,
3. Balance storage capacities and minimize communication costs
among sites.
Horizontal Fragmentation
• The horizontal fragmentation of a relation R is the subdivision of its
tuples into subsets called fragments
Each fragment, Ti of table T contains a subset of the rows. Each tuple of
T is assigned to one or more fragments. Horizontal fragmentation is
lossless. It is defined as selection operation.
There are two versions of horizontal partitioning:
a. Primary horizontal fragmentation of a relation is achieved through
the use of predicates defined on that relation which restricts the
tuples of the relation.
b. Derived horizontal fragmentation is realized by using predicates that
are defined on other relations.
Vertical Fragmentation
• Some of the columns of a relation are projected into a base relation at
one of the sites, and other columns are projected into a base relation at
another site.
Two types of heuristics for vertical fragmentation exist :
1. Grouping : assign each attribute to one fragment, and at each
step, join some of the fragments until some criteria is satisfied. It
uses bottom-up approach.
2. Splitting : starts with a relation and decides on beneficial
partitioning based on the access behavior of applications to the
attributes. It uses top-down approach.
Distributed Transparency
• Distributed Transparency Distribution transparency, which allows a
distributed database to be treated as a single logical database.
Following are different levels of distribution transparency are
recognized:
1. Fragmentation transparency
2. Location transparency
3. Replication transparency
• Fragmentation transparency is the highest level of transparency. The
end user or programmer does not need to know that a database is
partitioned.
Location Transparency
Location transparency exists when the end user or programmer must
specify the database fragment name but does not need to specify
where those fragments are located.
Replication Transparency
• There might be more than one copy of a table stored in the system
should be hidden from the user. This provides for replication
transparency, which enables the user to query any table as if there
were only one copy of it.
Distributed Transactions
• A distributed transaction is composed of several sub-transactions,
each running on a different site. Each database manager can decide to
abort.
• When a transaction is submitted the transaction manager at that site
breaks it up into one or more sub transactions that execute at different
sites, submits them to the transaction manager at those sites, and
coordinates their activity.
Why are distributed transactions hard?
1. Atomic : Different parts of a transaction may be at different sites. How
do we ensure all or none committed?
2. Consistent : Failure may affect only part of transaction
3. Isolated : Commitment must occur "simultaneously" at all sites
4. Durable : Not much different when other problems solved. It also makes
"delayed commit" difficult.
For each such transaction, the coordinator is responsible for :
1. Starting the execution of the transaction.
2. Breaking the transaction into a number of sub-transactions and
distributing these sub-transactions to the appropriate sites for
execution.
3. Coordinating the termination of the transaction, which may result in
the transaction being committed at all sites or aborted at all
sites.
System Failure Modes
The basic failure types are:
1. Failure of a site
2. Loss of messages
3. Failure of a communication link
4. Network partition
Scenario:
• Blue (1) sends to Blue (2) “lets attack tomorrow at dawn” later,
• Blue (2) sends confirmation to Blue (1) “splendid idea, see you at dawn” but,
• Blue (1) realizes that Blue (2) does not know if the message arrived,
• So, Blue (1) sends to Blue (2) “message arrived, battle set” then, Blue (2) realizes that Blue(1) does
not know if the message arrived etc.
• The two blue armies can never be sure because of the unreliable communication.
• No certain agreement can be reached using this method.
• Transaction : Sequence of actions treated as an atomic action to preserve consistency (e.g. access to a database).
• Commit a transaction: Unconditional guarantee that the transaction will complete successfully (even in the
presence of failures).
• Abort a transaction : Unconditional guarantee to back out of a transaction, i e., that all the effects of the
transaction have been removed. Events that may cause aborting a transaction are deadlocks, timeouts,
protection violation etc.
• Mechanisms that facilitate backing out of an aborting transaction are Write-ahead-log protocol and
shadow pages.
• Commit protocol ensure that all the sites either commit or abort transaction unanimously, even in the
presence of multiple and repetitive failures.
Two-Phase Commit Protocol
• The two-phase commit protocol is a distributed algorithm which lets all sites in a
distributed system agrees to commit a transaction. The protocol results in either all
nodes committing the transaction or aborting, even in the case of site failures and
message losses.
The two phases of the algorithm are broken into:
1. The COMMIT-REQUEST phase, where the COORDINATOR
attempts to prepare all the COHORTS, and
2. The COMMIT phase, where the COORDINATOR completes the
transactions at all COHORTS.
• Basic algorithm
• During phase 1, initially the coordinator sends a query to commit message to all
cohorts. Then it waits for all cohorts to report back with the agreement message.
• The cohorts, if the transaction was successful, write an entry to the undo
log and an entry to the redo log.
• Then the cohorts reply with an agree message, or an abort if the
transaction failed at a cohort node.
• During phase 2, if the coordinator receives an agree message from all cohorts, then it
writes a commit record into its log and sends a commit message to all the cohorts.
• If all agreement messages do not come back the coordinator sends an abort message.
• Next the coordinator waits for the acknowledgement from the cohorts.
• When acks are received from all cohorts the coordinator writes a complete record to
its log.
• Note that, the coordinator will wait forever for all the acknowledgements to come
back.
• If the cohort receives a commit message, it releases all the locks and resources held
during the transaction and send an acknowledgement to the coordinator.
• If the message is abort, then the cohort undoes the transaction with the undo log and
releases the resources and locks held during the transaction. Then it sends an
acknowledgement.
•At the COORDINATOR:
Three-Phase Commits Protocol
• Three-Phase Commits (3PC) protocol or Non-blocking for site failures, except in event of
failure of all sites.
• Communication failures can result in different sites reaching different decisions,
thereby violating atomicity of global transactions.
• Introduces third phase, called pre-commit, between voting and global decision.
• On receiving all votes from participants, coordinator sends global pre-commit message.
Participant who receives global pre-commit, knows all other participants have voted
commit and that, in time, participant itself will definitely commit
sent
Abort
received
Prepare
sent sent
Basic 3PC protocol :
Phase 1 :
• The coordinator sends VOTE_REQ to all participants.
• When a participant receives VOTE_REQ, it responds with YES or NO, depending on
its vote. If a participant votes NO, it decides abort and stops.
Phase 2 :
• The coordinator collects all votes. If any vote was NO, then the coordinator decides abort,
sends ABORT to all participants that voted YES, and stops. Otherwise, the coordinator
sends PRE_COMMIT messages to all participants.
• A participant that votes YES waits for a PRE_COMMIT or ABORT message from the
coordinator. If it receives a PRE_COMMIT, then it responds with an ACK message.
Phase 3 :
• The coordinator collects the ACKs. When they have all been received, it decides commit,
sends COMMITs to all participants, and stops.
• A participant waits for a COMMIT from the coordinator. When it receives that message, it
decides commit and stops.
Concurrency Control and its Problem
In distributed database systems, database is typically used by
many users. These systems usually allow multiple transactions to run
concurrently i.e. at the same time.
Concurrency control is the activity of coordinating concurrent
accesses to a database in a multiuser database management system
(DBMS).
Main Objectives of Distributed Concurrency Control
1. It must be recovery from site and communication failures.
2. It must support parallel execution of transaction.
3. Storage mechanism and computational method should be modest to
minimize overhead.
4. Communication delay is less.
5. Few constraints on structure of atomic actions of transactions.
Main Objectives of Distributed Concurrency Control
1. It must be recovery from site and communication failures.
2. It must support parallel execution of transaction.
3. Storage mechanism and computational method should be modest to
minimize overhead.
4. Communication delay is less.
5. Few constraints on structure of atomic actions of transactions.
Concurrency Control Anomalies
• Co-ordination of simultaneous transaction execution in a multiprocessing
database system
• Lack of Concurrency Control can create data integrity and consistency
problems
1. Lost updates
2. Uncommitted data
3. Inconsistent retrievals
• Lost Update Problems (W - W Conflict)
• The problem occurs when two different database transactions perform the
read/write operations on the same database items in an interleaved
manner (i.e., concurrent execution) that makes the values of the items
incorrect hence making the database inconsistent
• Consider the below diagram where two transactions TX and TY, are
performed on the same account A where the balance of account A
is $300
• At time t1, transaction TX reads the value of account A, i.e., $300 (only
read).
• At time t2, transaction TX deducts $50 from account A that becomes $250
(only deducted and not updated/write).
• Alternately, at time t3, transaction TY reads the value of account A that
will be $300 only because TX didn't update the value yet.
• At time t4, transaction TY adds $100 to account A that becomes $400
(only added but not updated/write).
• At time t6, transaction TX writes the value of account A that will be
updated as $250 only, as TY didn't update the value yet.
• Similarly, at time t7, transaction TY writes the values of account A, so it
will write as done at time t4 that will be $400. It means the value written
by TX is lost, i.e., $250 is lost.
• Hence data becomes incorrect, and database sets to inconsistent.
Unrepeatable Read Problem (W-R
Conflict)
• Its also known as Inconsistent Retrievals Problem that occurs when in
a transaction, two different values are read for the same database
item.
• For example:
Consider two transactions, TX and TY, performing the read/write
operations on account A, having an available balance = $300. The
diagram is shown below:
Advanced Topics on Database - Unit-1 AU17
• At time t1, transaction TX reads the value from account A, i.e., $300.
• At time t2, transaction TY reads the value from account A, i.e., $300.
• At time t3, transaction TY updates the value of account A by adding
$100 to the available balance, and then it becomes $400.
• At time t4, transaction TY writes the updated value, i.e., $400.
• After that, at time t5, transaction TX reads the available value of
account A, and that will be read as $400.
• It means that within the same transaction TX, it reads two different
values of account A, i.e., $ 300 initially, and after updation made by
transaction TY, it reads $400. It is an unrepeatable read and is
therefore known as the Unrepeatable read problem.

More Related Content

PPTX
Database System Architectures
PPTX
Chapter 20
PPTX
Module 3 - DBMS System Architecture Principles
PPTX
Sayed database system_architecture
PPTX
Sayed database system_architecture
PPTX
Light sayed database_system_architecture
PPTX
Light sayed database_system_architecture
Database System Architectures
Chapter 20
Module 3 - DBMS System Architecture Principles
Sayed database system_architecture
Sayed database system_architecture
Light sayed database_system_architecture
Light sayed database_system_architecture

Similar to Advanced Topics on Database - Unit-1 AU17 (20)

PDF
Lecture Notes Unit3 chapter20 - Database System Architectures
PPTX
Adbms 26 architectures for a distributed system
PDF
Database system architecture
PPT
VNSISPL_DBMS_Concepts_ch20
PPT
Client Server Architecture1
PPT
Lecture 5 Database management system.ppt
PPTX
Design Issues of Distributed System (1).pptx
PPTX
Disadvantages Distributed System.pptx
PPTX
Database architecture
PPTX
Distributed database
PPTX
PPT
Elements of Systems Design.ppt
PPTX
unit 4-1.pptx
DOCX
Client server computing_keypoint_and_questions
DOCX
Distributed computing
PPTX
Distributed Data Base.pptx
PPTX
Client server architecture
PPT
Lecture Notes Unit3 chapter20 - Database System Architectures
Adbms 26 architectures for a distributed system
Database system architecture
VNSISPL_DBMS_Concepts_ch20
Client Server Architecture1
Lecture 5 Database management system.ppt
Design Issues of Distributed System (1).pptx
Disadvantages Distributed System.pptx
Database architecture
Distributed database
Elements of Systems Design.ppt
unit 4-1.pptx
Client server computing_keypoint_and_questions
Distributed computing
Distributed Data Base.pptx
Client server architecture
Ad

More from LOGANATHANK24 (6)

PPTX
CS3591- Computer Networks Unit-1 Introduction and Application Layer
PPTX
CS3591- Computer Networks Unit-2 Transport layer
PPTX
Advanced Topics on Database - Unit-4 AU17
PPTX
Advanced Topics on Database - Unit-2 AU17
PPTX
Advanced Topics on Database - Unit-3 AU17
PPSX
CS3391 - Object Oriented Programming AU.Reg-2021
CS3591- Computer Networks Unit-1 Introduction and Application Layer
CS3591- Computer Networks Unit-2 Transport layer
Advanced Topics on Database - Unit-4 AU17
Advanced Topics on Database - Unit-2 AU17
Advanced Topics on Database - Unit-3 AU17
CS3391 - Object Oriented Programming AU.Reg-2021
Ad

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
master seminar digital applications in india
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Pre independence Education in Inndia.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Microbial disease of the cardiovascular and lymphatic systems
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
master seminar digital applications in india
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Pharma ospi slides which help in ospi learning
PPH.pptx obstetrics and gynecology in nursing
Renaissance Architecture: A Journey from Faith to Humanism
Pre independence Education in Inndia.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
01-Introduction-to-Information-Management.pdf
Anesthesia in Laparoscopic Surgery in India
O7-L3 Supply Chain Operations - ICLT Program
GDM (1) (1).pptx small presentation for students
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

Advanced Topics on Database - Unit-1 AU17

  • 2. Distributed Data Processing • Distributed data processing allows multiple computers to be used anywhere. • Distributed data processing allows multiple computers to be working among multiple geographically separate sites where local computers handle local processing needs. • Distributed processing is a database's logical processing is shared among two or more physically independent sites that are connected through a network.
  • 4. • It dramatically reduced workstation costs and improved user interfaces and desktop power. It increases ability to share data across multiple servers. • We define a distributed system as one in which hardware or software components located at networked computers communicate and coordinate their actions only by-passing messages.
  • 5. Advantages : 1. Availability 2. Resource Sharing 3. Incremental Growth 4. Increased User Involvement and Control 5. End-user Productivity 6. Distance and location independence 7. Privacy and security Disadvantages : 1. More difficulty test and failure diagnosis 2. More components and dependence on communication means more points of failure 3. Incompatibility of components 4. Incompatibility of data 5. More complex management and control 6. Difficulty in control of corporate information resources 7. Suboptimal procurement 8. Duplication of effort
  • 6. Distributed System Architecture • Database System Architecture • Database architecture focuses on the design, development, implementation and maintenance of computer programs that store and organize information for businesses, agencies and institutions. • Centralized Architecture • The centralized database system consists of a single processor together with its associated data storage devices and other peripherals. • Data can be accessed from the multiple sites with the use of a computer network while the database is maintained at the central site. • General-purpose computer system: One to a few CPUs and a number of device controllers that are connected through a common bus that provides access to shared memory. • Single-user system (e.g., personal computer or workstation) : Desk-top unit, single user, usually has only one CPU and one or two hard disks; the OS may support only one user. • Multi-user system: More disks, more memory, multiple CPUs and a multi-user OS. Serve a large number of users who are connected to the system via terminals. Often called server systems.
  • 7. Advantages 1. The data integrity is maximized as the whole database is stored at a single physical location. This means that it is easier to coordinate the data and it is as accurate and consistent as possible. 2. The data redundancy is minimal in the centralized database. Disadvantages: 1. If server fails, whole system fails. 2. It takes more time to search and access 3. Bottlenecks can appear when the traffic spikes.
  • 8. • Client-Server Architecture • The functionality is divided into two classes: Server and Client. • A client is defined as a requester of services and a server is defined as the provider of services. Client/server computing is a very efficient and safe means of sharing database information in a multi-user environment • This provides two-level architecture which makes it easier to manage the complexity of modern DBMSs and the complexity of distribution. • Client: User machine that provides user interface capabilities and local processing. • Server: System containing both hardware and software. It provides services to the client machines such as file access, printing, archiving, or database access. • Server functions: Mainly data management, including query processing, optimization, transaction management, etc. • Client functions: Might also include some data management functions not just user interface.
  • 10. • Client-server architecture may be either two-tier or three-tier. • In a two-tier architecture, the server performs database functions and the clients perform the user interface functions. Either the server or the clients may perform business functions. • In a three-tier architecture, the clients perform the user interface functions, a database server performs the database functions, and separate computers, called application servers, perform the business functions and act as interface between clients and database server. • There is various classification of client server system. 1. Multiple client - single server 2. Multiple client - multiple server
  • 11. • Multiple clients - single server
  • 13. Problems with this type of architecture is as follows : 1. Single point of failure i.e. server failure 2. Bottleneck at server 3. Difficult to scale a database
  • 14. 2. Multiple Client - Multiple Server : • Two alternative management strategies are possible: Either each client manages its own connection to the appropriate server or each client knows of only its” home server” which then communicates with other servers as required • It consists of clients running client software, a set of servers which provide all database functionalities and a reliable communication infrastructure. • When a server receives a, query that requires access to data at other servers, it generates appropriate sub-query to be executed by other server and put the results together to compute answer to the original query
  • 16. Advantages of client-server architectures 1. Horizontal and vertical scaling of resources 2. Better price/performance on client machines 3. Ability to use familiar tools on client machines 4. Client access to remote data Disadvantages of client-server architectures 1. Maintenance cost is more. 2. It suffers from security problem as number of user and processing sites increases. 3. Complexity increases
  • 17. Server System Architecture • Server systems can be broadly categorized into two kinds: 1. Transaction servers which are widely used in relational database systems. 2. Data servers, used in object-oriented database systems. Transaction server A transaction server is a specialized type of server that manages the operations of software based transactions or transaction processing. It manages application and database transactions on a network or Internet, within a distributed computing environment.
  • 18. • Transaction server allows you to break down transactions into components that perform discrete functions. • A component notifies transaction server whether it has completed successfully or not. • If all the components in a transaction complete successfully, the transaction is committed. If not, the transaction is rolled back. • Transaction server is also called query server systems or SQL server systems. • Requests specified in SQL, and communicated to the server through a Remote Procedure Call (RPC) mechanism. • Open Database Connectivity (ODBC) is an application program interface standard from Microsoft for connecting to a server, sending SQL requests, and receiving results.
  • 19. • Data server: Data server is used in LANs, where there is a very high-speed connection between the clients and the server. Issues are as follows • Page-shipping versus Item-shipping • Locking • Data caching • Lock caching
  • 20. Parallel System • A system is said to be a parallel system in which multiple processor have direct access to shared memory which forms a common address space. • Two main performance measures are: a. Throughput: The number of tasks that can be completed in a given time interval. b. Response time: The amount of time it takes to complete a single task from the time it is submitted.
  • 21. • Speed-up and Scale-up: Speedup: A fixed-sized problem executing on a small system is given to a system which is N -times larger. Scaleup: Increase the size of both the problem and the system. The N -times larger system used to perform N -times larger job.
  • 22. • Factors Limiting Speedup and Scaleup 1. Startup costs: Cost of starting up multiple processes may dominate computation time, if the degree of parallelism is high. 2. Interference: Processes accessing shared resources (e.g., system bus, disks, or locks) compete with each other, thus spending time waiting on other processes, rather than performing useful work. 3. Skew: Increasing the degree of parallelism increases the variance in service times of parallelly executing tasks. Overall execution time determined by slowest of parallelly executing tasks.
  • 23. Distributed System • A distributed system is one in which components located at networked computers communicate and co-ordinate their actions only by-passing messages. • Usually, distributed systems are asynchronous, i.e., they do not use a common clock and do not impose any bounds on relative processor speeds or message transfer times. Differentiate between local and global transactions are as follows a) A local transaction accesses data in the single site at which the transaction was initiated. b) A global transaction either accesses data in a site different from the one at which the transaction was initiated or accesses data in several different sites.
  • 24. Trade-offs in distributed systems : a. Sharing Data : Users at one site able to access the data residing at some other sites. b. Autonomy: Each site is able to retain a degree of control over data stored locally. c. Higher System Availability through Redundancy : Data can be replicated at remote sites, and system can function even if a site fails.
  • 25. Parallel Database • Parallel database systems consist of multiple processors and multiple disks connected by a fast interconnection network. • Parallel database improves the performance of processing of data using multiple resources simultaneously. Multiple resources like multiple CPU, Disks can be used simultaneously.
  • 26. Goals of Parallel Databases • Improve performance: The performance of the system can be improved by connecting multiple CPU and disks in parallel. • Improve availability of data: Data can be copied to multiple locations to improve the availability of data. • Improve reliability: Reliability of system is improved with completeness, accuracy and availability of data. • Provide distributed access of data: Companies having many branches in multiple cities can access data with the help of parallel database system
  • 27. • Advantages of Parallel Databases: a) Performance improvement b) High availability : Same data is stored at multiple locations c) Increases reliability : Even if data site fails execution can continue as other copy of data are available.
  • 28. I/O Parallelism • I/O parallelism refers to reducing the time required to retrieve relations from disk by partitioning the relations on multiple disks. • The most common form of data partitioning in a parallel database environment is horizontal partitioning. • In horizontal partitioning, so that each tuple residthe tuples of a relation are divided among many disks,es on one disk. Several partitioning strategies have been proposed. • Three basic data-partitioning strategies are • Round Robin, • Hash Partitioning • Range Partitioning.
  • 29. Partitioning techniques are as follows : • Round-robin: In this method, scans the relation in any order and sends the ith tuple to disk number Di mod n. It ensures an even distribution of tuples across disks and each disk has approximately the same number of tuples as the others. • Hash partitioning: It choose one or more attributes as the partitioning attributes. A hash function is chosen whose range is |0, 1,..., n — 1|. Each tuple of the original relation is hashed on the partitioning attributes. If the hash function returns i, then the tuple is placed on disk Di . Hash partitioning is also useful for sequential scans of the entire relation. • Range partitioning: Choose an attribute as the partitioning attribute. A partitioning vector [v 0 , v 1-,-- , v n-2 ] is chosen. Let v be the partitioning attribute value of a tuple. Tuples such that v i<+vi+1 go to disk I + 1. Tuples with v < v 0 go to disk 0 and tuples with v >vn-2 go to disk n — 1.
  • 30. Handling of Skew • Data skew primarily refers to a non uniform distribution in a dataset. • The direct impact of data skew on parallel execution of complex database queries is a poor load balancing leading to high response time. Skew are of two types : Attribute-value skew and Partition skew. 1. Attribute-value skew : Some values appear in the partitioning attributes of many tuples. All the tuples with the same value for the partitioning attribute end up in the same partition, resulting in skew. 2. Partition skew: refers to the fact that there may be load imbalance in the partitioning, even when there is no attribute skew.
  • 31. Interquery Parallelism • In interquery parallelism, different queries or transaction execute in parallel with one another. • The response times of individual transactions are not faster than they would be if the transactions were run in isolation. • Thus, the primary use of interquery parallelism is to scale up a transaction processing system to support a more significant number of transactions per second.
  • 32. Intraquery Parallelism • Intraquery parallelism defines the execution of a single query in parallel on multiple processors and disks. Using intraquery parallelism is essential for speeding up long-running queries. Two complementary forms of intraquery parallelism: 1. Intraoperation parallelism : Parallelize the execution of each individual operation in the query. 2. Interoperation parallelism : Execute the different operations in a query expression in parallel.
  • 33. Distributed Database Concepts • A logically interrelated collection of shared data physically distributed over a computer network. DDBMS components must include the following components 1. Computer workstations 2. Network hardware and software 3. Communications media 4. Transaction Processor: Software component found in each computer that receives and processes the application's requests data 5. Data processor or data manager: Software component residing on each computer that stores and retrieves data located at the site. It may be a centralized DBMS
  • 34. Data is stored using following methods. 1. Replication 2. Fragmentation 3. Hybrid 4. Allocation • Allocation of fragments is depends on the how the database will be used. First thing is to design database schema and then design application program. • Information required from application is as follows : 1. Transaction execution frequency 2. Site in which transaction is executed 3. Condition for transaction performed
  • 36. The distributed database systems contain local and global schema for all sites. Data storage is responsibility of data sites. Data is stored using following methods. 1. Replication 2. Fragmentation 3. Hybrid 4. Allocation Allocation of fragments is depends on the how the database will be used.
  • 37. Information required from application is as follows : 1. Transaction execution frequency 2. Site in which transaction is executed 3. Condition for transaction performed
  • 38. • There are five big reasons for using a distributed database system: 1. Many organizations are distributed in nature. 2. Multiple databases can be accessed transparently. 3. Database can be expanded incrementally - As needs arise, additional computers can be connected to the distributed database system. 4. Reliability and availability are increased - Distributed database can replicate data among several sites. So even if one site fails, redundancy in data will lead to increased availability and reliability of the data as a whole. 5. Performance will increase - Query processing can be performed at multiple sites and as such distributed database systems can mimic parallel database systems in a high-performance network.
  • 39. Homogenous and Heterogeneous Databases. a)Homogeneous DDBMS b) All sites use same DBMS product c) It is much easier to design and manage d) The approach provides incremental growth and allows increased performance. e) Homogeneous systems are much easier to design and manage f) Homogeneous DBs can communicate directly with each other
  • 40. Heterogeneous DDBMS a) Sites may run different DBMS products, with possibly different underlying data models. b) This occurs when sites have implemented their own databases first, and integration is considered later. c) Translations are required to allow for different hardware and/or different DBMS products d) Typical solution is to use gateways e) Heterogeneous DBs communicate through gateway interfaces f) No DDBMS currently provides full support for heterogeneous or fully heterogeneous DDBMSs
  • 41. Distributed Data Storage A Relation (r) is stored in the database. Two methods are used for storing relations on the distributed database. a) Replication: The system maintains several identical replicas of the relation and stores each replica at a different site. The alternative to replication is to store only one copy of relation r. b) Fragmentation: The system partitions the relation into several fragments, and stores each fragment at a different site.
  • 42. Data Replication • If relation r is replicated, a copy of relation r is stored in two or more sites. • Using data replication, each logical data item of a database has several physical copies, each of them located on a different machine, also referred to as site or node. • In full replication the entire database is replicated and in partial replication some selected part is replicated to some of the sites.
  • 44. Data Fragmentation • Fragmentation is a design technique to divide a single relation or class of a database into two or more partitions such that the combination of the partitions provides the original database without any loss of information. The main reasons of fragmentation of the relations are to 1. Increase locality of reference of the queries submitted to database, 2. Improve reliability and availability of data and performance of the system, 3. Balance storage capacities and minimize communication costs among sites.
  • 45. Horizontal Fragmentation • The horizontal fragmentation of a relation R is the subdivision of its tuples into subsets called fragments Each fragment, Ti of table T contains a subset of the rows. Each tuple of T is assigned to one or more fragments. Horizontal fragmentation is lossless. It is defined as selection operation.
  • 46. There are two versions of horizontal partitioning: a. Primary horizontal fragmentation of a relation is achieved through the use of predicates defined on that relation which restricts the tuples of the relation. b. Derived horizontal fragmentation is realized by using predicates that are defined on other relations.
  • 47. Vertical Fragmentation • Some of the columns of a relation are projected into a base relation at one of the sites, and other columns are projected into a base relation at another site.
  • 48. Two types of heuristics for vertical fragmentation exist : 1. Grouping : assign each attribute to one fragment, and at each step, join some of the fragments until some criteria is satisfied. It uses bottom-up approach. 2. Splitting : starts with a relation and decides on beneficial partitioning based on the access behavior of applications to the attributes. It uses top-down approach.
  • 49. Distributed Transparency • Distributed Transparency Distribution transparency, which allows a distributed database to be treated as a single logical database. Following are different levels of distribution transparency are recognized: 1. Fragmentation transparency 2. Location transparency 3. Replication transparency
  • 50. • Fragmentation transparency is the highest level of transparency. The end user or programmer does not need to know that a database is partitioned. Location Transparency Location transparency exists when the end user or programmer must specify the database fragment name but does not need to specify where those fragments are located.
  • 51. Replication Transparency • There might be more than one copy of a table stored in the system should be hidden from the user. This provides for replication transparency, which enables the user to query any table as if there were only one copy of it.
  • 52. Distributed Transactions • A distributed transaction is composed of several sub-transactions, each running on a different site. Each database manager can decide to abort.
  • 53. • When a transaction is submitted the transaction manager at that site breaks it up into one or more sub transactions that execute at different sites, submits them to the transaction manager at those sites, and coordinates their activity. Why are distributed transactions hard? 1. Atomic : Different parts of a transaction may be at different sites. How do we ensure all or none committed? 2. Consistent : Failure may affect only part of transaction 3. Isolated : Commitment must occur "simultaneously" at all sites 4. Durable : Not much different when other problems solved. It also makes "delayed commit" difficult.
  • 54. For each such transaction, the coordinator is responsible for : 1. Starting the execution of the transaction. 2. Breaking the transaction into a number of sub-transactions and distributing these sub-transactions to the appropriate sites for execution. 3. Coordinating the termination of the transaction, which may result in the transaction being committed at all sites or aborted at all sites.
  • 55. System Failure Modes The basic failure types are: 1. Failure of a site 2. Loss of messages 3. Failure of a communication link 4. Network partition
  • 56. Scenario: • Blue (1) sends to Blue (2) “lets attack tomorrow at dawn” later, • Blue (2) sends confirmation to Blue (1) “splendid idea, see you at dawn” but, • Blue (1) realizes that Blue (2) does not know if the message arrived, • So, Blue (1) sends to Blue (2) “message arrived, battle set” then, Blue (2) realizes that Blue(1) does not know if the message arrived etc. • The two blue armies can never be sure because of the unreliable communication. • No certain agreement can be reached using this method. • Transaction : Sequence of actions treated as an atomic action to preserve consistency (e.g. access to a database). • Commit a transaction: Unconditional guarantee that the transaction will complete successfully (even in the presence of failures). • Abort a transaction : Unconditional guarantee to back out of a transaction, i e., that all the effects of the transaction have been removed. Events that may cause aborting a transaction are deadlocks, timeouts, protection violation etc. • Mechanisms that facilitate backing out of an aborting transaction are Write-ahead-log protocol and shadow pages. • Commit protocol ensure that all the sites either commit or abort transaction unanimously, even in the presence of multiple and repetitive failures.
  • 57. Two-Phase Commit Protocol • The two-phase commit protocol is a distributed algorithm which lets all sites in a distributed system agrees to commit a transaction. The protocol results in either all nodes committing the transaction or aborting, even in the case of site failures and message losses.
  • 58. The two phases of the algorithm are broken into: 1. The COMMIT-REQUEST phase, where the COORDINATOR attempts to prepare all the COHORTS, and 2. The COMMIT phase, where the COORDINATOR completes the transactions at all COHORTS.
  • 59. • Basic algorithm • During phase 1, initially the coordinator sends a query to commit message to all cohorts. Then it waits for all cohorts to report back with the agreement message. • The cohorts, if the transaction was successful, write an entry to the undo log and an entry to the redo log. • Then the cohorts reply with an agree message, or an abort if the transaction failed at a cohort node. • During phase 2, if the coordinator receives an agree message from all cohorts, then it writes a commit record into its log and sends a commit message to all the cohorts. • If all agreement messages do not come back the coordinator sends an abort message. • Next the coordinator waits for the acknowledgement from the cohorts. • When acks are received from all cohorts the coordinator writes a complete record to its log. • Note that, the coordinator will wait forever for all the acknowledgements to come back. • If the cohort receives a commit message, it releases all the locks and resources held during the transaction and send an acknowledgement to the coordinator. • If the message is abort, then the cohort undoes the transaction with the undo log and releases the resources and locks held during the transaction. Then it sends an acknowledgement. •At the COORDINATOR:
  • 60. Three-Phase Commits Protocol • Three-Phase Commits (3PC) protocol or Non-blocking for site failures, except in event of failure of all sites. • Communication failures can result in different sites reaching different decisions, thereby violating atomicity of global transactions. • Introduces third phase, called pre-commit, between voting and global decision. • On receiving all votes from participants, coordinator sends global pre-commit message. Participant who receives global pre-commit, knows all other participants have voted commit and that, in time, participant itself will definitely commit sent Abort received Prepare sent sent
  • 61. Basic 3PC protocol : Phase 1 : • The coordinator sends VOTE_REQ to all participants. • When a participant receives VOTE_REQ, it responds with YES or NO, depending on its vote. If a participant votes NO, it decides abort and stops. Phase 2 : • The coordinator collects all votes. If any vote was NO, then the coordinator decides abort, sends ABORT to all participants that voted YES, and stops. Otherwise, the coordinator sends PRE_COMMIT messages to all participants. • A participant that votes YES waits for a PRE_COMMIT or ABORT message from the coordinator. If it receives a PRE_COMMIT, then it responds with an ACK message. Phase 3 : • The coordinator collects the ACKs. When they have all been received, it decides commit, sends COMMITs to all participants, and stops. • A participant waits for a COMMIT from the coordinator. When it receives that message, it decides commit and stops.
  • 62. Concurrency Control and its Problem In distributed database systems, database is typically used by many users. These systems usually allow multiple transactions to run concurrently i.e. at the same time. Concurrency control is the activity of coordinating concurrent accesses to a database in a multiuser database management system (DBMS).
  • 63. Main Objectives of Distributed Concurrency Control 1. It must be recovery from site and communication failures. 2. It must support parallel execution of transaction. 3. Storage mechanism and computational method should be modest to minimize overhead. 4. Communication delay is less. 5. Few constraints on structure of atomic actions of transactions.
  • 64. Main Objectives of Distributed Concurrency Control 1. It must be recovery from site and communication failures. 2. It must support parallel execution of transaction. 3. Storage mechanism and computational method should be modest to minimize overhead. 4. Communication delay is less. 5. Few constraints on structure of atomic actions of transactions. Concurrency Control Anomalies • Co-ordination of simultaneous transaction execution in a multiprocessing database system • Lack of Concurrency Control can create data integrity and consistency problems
  • 65. 1. Lost updates 2. Uncommitted data 3. Inconsistent retrievals
  • 66. • Lost Update Problems (W - W Conflict) • The problem occurs when two different database transactions perform the read/write operations on the same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the items incorrect hence making the database inconsistent • Consider the below diagram where two transactions TX and TY, are performed on the same account A where the balance of account A is $300
  • 67. • At time t1, transaction TX reads the value of account A, i.e., $300 (only read). • At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and not updated/write). • Alternately, at time t3, transaction TY reads the value of account A that will be $300 only because TX didn't update the value yet. • At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not updated/write). • At time t6, transaction TX writes the value of account A that will be updated as $250 only, as TY didn't update the value yet. • Similarly, at time t7, transaction TY writes the values of account A, so it will write as done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is lost. • Hence data becomes incorrect, and database sets to inconsistent.
  • 68. Unrepeatable Read Problem (W-R Conflict) • Its also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values are read for the same database item. • For example: Consider two transactions, TX and TY, performing the read/write operations on account A, having an available balance = $300. The diagram is shown below:
  • 70. • At time t1, transaction TX reads the value from account A, i.e., $300. • At time t2, transaction TY reads the value from account A, i.e., $300. • At time t3, transaction TY updates the value of account A by adding $100 to the available balance, and then it becomes $400. • At time t4, transaction TY writes the updated value, i.e., $400. • After that, at time t5, transaction TX reads the available value of account A, and that will be read as $400. • It means that within the same transaction TX, it reads two different values of account A, i.e., $ 300 initially, and after updation made by transaction TY, it reads $400. It is an unrepeatable read and is therefore known as the Unrepeatable read problem.