SlideShare a Scribd company logo
Storing the real-
world DATA
In the world of relational, document and graph databases..
Whoami
Athira Mukundan
Solution Consultant @Sahaj
athira_tech@yahoo.in
Why distribute at all?
● Scalability
○ shared-memory architecture (vertical scaling or scaling up)
○ shared-disk architecture
○ shared-nothing architectures (horizontal scaling or scaling out)
● Fault tolerance/high availability
● Latency
What is distribution all about?
● Replication
○ Keeping a copy of the same data on several different nodes, potentially in different locations.
○ Provides redundancy: if some nodes are unavailable, the data can still be served from the
remaining nodes
○ Help improve performance by scaling out machines that can serve read request
○ To keep data geographically close to your users
● Partitioning :
○ Splitting a big database into smaller subsets called partitions so that different partitions can be
assigned to different nodes (also known as sharding)
● Transactions
Replication: Leaders and Followers
Replication: Leaders and Followers … cont.
Leader (also known as master or primary) :
● All writes compulsorily go to this node.
Followers (read replicas, slaves, secondaries, or hot standbys):
● Whenever the leader writes new data to its local storage, it also sends the data change to all of its
followers as part of a replication log or change stream.
● Each follower takes the log from the leader and updates its local copy of the database accordingly,
by applying all writes in the same order as they were processed on the leader.
● Reads can happen from here
Implemented in:
● PostgreSQL (since version 9.0), MySQL, Oracle Data Guard, and SQL Server’s AlwaysOn
Availability Groups].
● nonrelational databases, including MongoDB, RethinkDB, and Espresso
Synchronous Versus Asynchronous
Replication
Replication: Handling Node Outages
Follower failure: Catch-up recovery
● On its local disk, each follower keeps a log of the data changes it has received from the
leader
● the follower can connect to the leader and request all the data changes that occurred
during the time when the follower was disconnected.
Leader failure: Failover
● Determining that the leader has failed
● Choosing a new leader (replica with the most up-to-date data changes)
● Reconfiguring the system to use the new leader. (write requests to the new leader)
Problems with Replication Lag
● Reading Your Own Writes
○ A user makes a write, followed by a read from a stale replica. To prevent this anomaly, we need
read-after-write consistency
● Monotonic Reads
○ A user first reads from a fresh replica, then from a stale replica. Time appears to go backward. To
prevent this anomaly, we need monotonic reads
○ One way of achieving monotonic reads is to make sure that each user always makes their reads
from the same replica (different users can read from different replicas).
Other Replication topologies
Multi-Leader Replication Topologies
Leaderless Replication
Leaderless Replication
Partitioning
● To spread the data and query load evenly across multiple machines, avoiding hot spots
(nodes with disproportionately high load).
● Combining replication and partitioning: each node acts as leader for some partitions and
follower for other partitions.
Partitioning of Key-Value Data
Partitioning by Key Range
● assign a continuous range of keys (from some mini‐ mum to some maximum) to each
partition
Partitioning by Hash of Key
● Once you have a suitable hash function for keys, you can assign each partition a range of
hashes (rather than a range of keys), and every key whose hash falls within a partition’s
range will be stored in that partition.
Partitioning of Relational Data
Request Routing
On a high level, there are a few different approaches to this problem
● Allow clients to contact any node (e.g., via a round-robin load balancer).
○ If that node coincidentally owns the partition to which the request applies, it can handle the
request directly; otherwise, it forwards the request to the appropriate node, receives the reply,
and passes the reply along to the client.
● Send all requests from clients to a routing tier first,
○ This determines the node that should handle each request and forwards it accordingly.
○ This routing tier does not itself handle any requests; it only acts as a partition-aware load
balancer.
● Require that clients be aware of the partitioning and the assignment of partitions to
nodes.
○ In this case, a client can connect directly to the appropriate node, without any intermediary.
Transactions
● A transaction is a way for an application to group several reads and writes together into a
logical unit.
● all the reads and writes are executed as one operation: either the entire transaction
succeeds (commit) or it fails (abort, rollback).
● If it fails, the application can safely retry
● No partial failures
● If a transaction involves activities at different sites, we call the activity at a given site a
subtransaction
Transactions: Lock Management
● Centralized:
○ A single site is in charge of handling lock and unlock requests for all objects.
● Primary copy:
● One copy of each object is designated as the primary copy.
● All requests to lock or unlock a copy of this object are handled by the lock manager at the site
where the primary copy is stored, regardless of where the copy itself is stored.
● Fully distributed
● Requests to lock or unlock a copy of an object stored at a site are handled by the lock manager at
the site where the copy is stored.
Transactions: Distributed Recovery
A commit protocol is followed to ensure that all subtransactions of a given transaction either
commit or abort uniformly.
Two of the well-known protocols
1. Two-Phase Commit (2PC)
2. Three-Phase Commit (3PC)
Two-Phase Commit (2PC)
The transaction manager at the site where the transaction originated is called the coordinator
for the transaction; transaction managers at sites where its subtransactions execute are called
subordinates
1. coordinator sends a prepare message to each subordinate
2. When a subordinate receives a prepare message, it decides whether to abort or commit
its subtransaction. It replies with a yes/no
3. If the coordinator receives yes messages from all subordinates, then sends a commit
message to all subordinates. Else sends an abort message to all subordinates
4. The subordinate responds appropriately to the received message and acknowledges the
coordinator
5. After the coordinator has received ack messages from all subordinates, it writes an end
log record for the transaction
Three-Phase Commit (3PC)
● when the coordinator sends out prepare messages and receives yes votes from all
subordinates, it sends all sites a precommit message, rather than a commit message.
● In 3PC the coordinator effectively postpones the decision to commit until it is sure that
enough sites know about the decision to commit;
● if the coordinator subsequently fails, these sites can communicate with each other and
detect that the transaction must be committed conversely, aborted, if none of them has
received a precommit message without waiting for the coordinator to recover.
Troubles with Distributed Systems
1. Whenever you try to send a packet over the network, it may be lost or arbitrarily delayed.
Likewise, the reply may be lost or delayed, so if you don’t get a reply, you have no idea
whether the message got through.
2. A node’s clock may be significantly out of sync with other nodes (despite your best efforts
to set up NTP), it may suddenly jump forward or back in time, and relying on it is
dangerous because you most likely don’t have a good measure of your clock’s error
interval.
3. A process may pause for a substantial amount of time at any point in its execu‐ tion
(perhaps due to a stop-the-world garbage collector), be declared dead by other nodes,
and then come back to life again without realizing that it was paused.
Storing the real world data

More Related Content

PPT
Sequential consistency model
PDF
Operating system Memory management
PPTX
Buffer management
PPTX
Distributed shared memory ch 5
PPTX
Distributed shred memory architecture
PPTX
Transection Process System.
PPTX
Distributed Shared Memory Systems
Sequential consistency model
Operating system Memory management
Buffer management
Distributed shared memory ch 5
Distributed shred memory architecture
Transection Process System.
Distributed Shared Memory Systems

What's hot (20)

PPTX
Memory management
PPTX
Distributed Shared Memory
ODP
Distributed shared memory shyam soni
PPTX
Structure of shared memory space
PPT
Ios103 ios102 iv-operating-system-memory-management_wk4
PPT
Shared memory
PPTX
Memory Management
PDF
CS9222 ADVANCED OPERATING SYSTEMS
PDF
Centralized shared memory architectures
PDF
CS6401 OPERATING SYSTEMS Unit 3
PPTX
6.distributed shared memory
PPT
Computer organization memory hierarchy
PDF
Memory Management
DOCX
Opetating System Memory management
PPT
PPT
Memory management
PDF
Differentiable Neural Computer
PDF
8 memory management strategies
PDF
Process management
Memory management
Distributed Shared Memory
Distributed shared memory shyam soni
Structure of shared memory space
Ios103 ios102 iv-operating-system-memory-management_wk4
Shared memory
Memory Management
CS9222 ADVANCED OPERATING SYSTEMS
Centralized shared memory architectures
CS6401 OPERATING SYSTEMS Unit 3
6.distributed shared memory
Computer organization memory hierarchy
Memory Management
Opetating System Memory management
Memory management
Differentiable Neural Computer
8 memory management strategies
Process management
Ad

Similar to Storing the real world data (20)

PPTX
UNIT IV DIS.pptx
PPT
19. Distributed Databases in DBMS
PPTX
Replication in Distributed Systems
PDF
Distributed Algorithms
ODP
Everything you always wanted to know about Distributed databases, at devoxx l...
PDF
Highly available distributed databases, how they work, javier ramirez at teowaki
PDF
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
PDF
Everything you always wanted to know about highly available distributed datab...
PPT
18 philbe replication stanford99
PDF
Design Patterns For Distributed NO-reational databases
PDF
10 replication
PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PPTX
UNIT II (1).pptx
PPT
Dynamo.ppt
 
PPT
Dynamo.ppt
 
PPT
Replication.ppt
PDF
module-4.1-Class notes_R and DD_basket-IV -.pdf
PPTX
PPT
Big Data & NoSQL - EFS'11 (Pavlo Baron)
PDF
Thoughts on Transaction and Consistency Models
UNIT IV DIS.pptx
19. Distributed Databases in DBMS
Replication in Distributed Systems
Distributed Algorithms
Everything you always wanted to know about Distributed databases, at devoxx l...
Highly available distributed databases, how they work, javier ramirez at teowaki
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Everything you always wanted to know about highly available distributed datab...
18 philbe replication stanford99
Design Patterns For Distributed NO-reational databases
10 replication
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
UNIT II (1).pptx
Dynamo.ppt
 
Dynamo.ppt
 
Replication.ppt
module-4.1-Class notes_R and DD_basket-IV -.pdf
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Thoughts on Transaction and Consistency Models
Ad

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Introduction to Business Data Analytics.
PDF
Clinical guidelines as a resource for EBP(1).pdf
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Mega Projects Data Mega Projects Data
Database Infoormation System (DBIS).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
Moving the Public Sector (Government) to a Digital Adoption
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Business Data Analytics.
Clinical guidelines as a resource for EBP(1).pdf

Storing the real world data

  • 1. Storing the real- world DATA In the world of relational, document and graph databases..
  • 2. Whoami Athira Mukundan Solution Consultant @Sahaj athira_tech@yahoo.in
  • 3. Why distribute at all? ● Scalability ○ shared-memory architecture (vertical scaling or scaling up) ○ shared-disk architecture ○ shared-nothing architectures (horizontal scaling or scaling out) ● Fault tolerance/high availability ● Latency
  • 4. What is distribution all about? ● Replication ○ Keeping a copy of the same data on several different nodes, potentially in different locations. ○ Provides redundancy: if some nodes are unavailable, the data can still be served from the remaining nodes ○ Help improve performance by scaling out machines that can serve read request ○ To keep data geographically close to your users ● Partitioning : ○ Splitting a big database into smaller subsets called partitions so that different partitions can be assigned to different nodes (also known as sharding) ● Transactions
  • 6. Replication: Leaders and Followers … cont. Leader (also known as master or primary) : ● All writes compulsorily go to this node. Followers (read replicas, slaves, secondaries, or hot standbys): ● Whenever the leader writes new data to its local storage, it also sends the data change to all of its followers as part of a replication log or change stream. ● Each follower takes the log from the leader and updates its local copy of the database accordingly, by applying all writes in the same order as they were processed on the leader. ● Reads can happen from here Implemented in: ● PostgreSQL (since version 9.0), MySQL, Oracle Data Guard, and SQL Server’s AlwaysOn Availability Groups]. ● nonrelational databases, including MongoDB, RethinkDB, and Espresso
  • 8. Replication: Handling Node Outages Follower failure: Catch-up recovery ● On its local disk, each follower keeps a log of the data changes it has received from the leader ● the follower can connect to the leader and request all the data changes that occurred during the time when the follower was disconnected. Leader failure: Failover ● Determining that the leader has failed ● Choosing a new leader (replica with the most up-to-date data changes) ● Reconfiguring the system to use the new leader. (write requests to the new leader)
  • 9. Problems with Replication Lag ● Reading Your Own Writes ○ A user makes a write, followed by a read from a stale replica. To prevent this anomaly, we need read-after-write consistency ● Monotonic Reads ○ A user first reads from a fresh replica, then from a stale replica. Time appears to go backward. To prevent this anomaly, we need monotonic reads ○ One way of achieving monotonic reads is to make sure that each user always makes their reads from the same replica (different users can read from different replicas).
  • 10. Other Replication topologies Multi-Leader Replication Topologies Leaderless Replication Leaderless Replication
  • 11. Partitioning ● To spread the data and query load evenly across multiple machines, avoiding hot spots (nodes with disproportionately high load). ● Combining replication and partitioning: each node acts as leader for some partitions and follower for other partitions.
  • 12. Partitioning of Key-Value Data Partitioning by Key Range ● assign a continuous range of keys (from some mini‐ mum to some maximum) to each partition Partitioning by Hash of Key ● Once you have a suitable hash function for keys, you can assign each partition a range of hashes (rather than a range of keys), and every key whose hash falls within a partition’s range will be stored in that partition.
  • 14. Request Routing On a high level, there are a few different approaches to this problem ● Allow clients to contact any node (e.g., via a round-robin load balancer). ○ If that node coincidentally owns the partition to which the request applies, it can handle the request directly; otherwise, it forwards the request to the appropriate node, receives the reply, and passes the reply along to the client. ● Send all requests from clients to a routing tier first, ○ This determines the node that should handle each request and forwards it accordingly. ○ This routing tier does not itself handle any requests; it only acts as a partition-aware load balancer. ● Require that clients be aware of the partitioning and the assignment of partitions to nodes. ○ In this case, a client can connect directly to the appropriate node, without any intermediary.
  • 15. Transactions ● A transaction is a way for an application to group several reads and writes together into a logical unit. ● all the reads and writes are executed as one operation: either the entire transaction succeeds (commit) or it fails (abort, rollback). ● If it fails, the application can safely retry ● No partial failures ● If a transaction involves activities at different sites, we call the activity at a given site a subtransaction
  • 16. Transactions: Lock Management ● Centralized: ○ A single site is in charge of handling lock and unlock requests for all objects. ● Primary copy: ● One copy of each object is designated as the primary copy. ● All requests to lock or unlock a copy of this object are handled by the lock manager at the site where the primary copy is stored, regardless of where the copy itself is stored. ● Fully distributed ● Requests to lock or unlock a copy of an object stored at a site are handled by the lock manager at the site where the copy is stored.
  • 17. Transactions: Distributed Recovery A commit protocol is followed to ensure that all subtransactions of a given transaction either commit or abort uniformly. Two of the well-known protocols 1. Two-Phase Commit (2PC) 2. Three-Phase Commit (3PC)
  • 18. Two-Phase Commit (2PC) The transaction manager at the site where the transaction originated is called the coordinator for the transaction; transaction managers at sites where its subtransactions execute are called subordinates 1. coordinator sends a prepare message to each subordinate 2. When a subordinate receives a prepare message, it decides whether to abort or commit its subtransaction. It replies with a yes/no 3. If the coordinator receives yes messages from all subordinates, then sends a commit message to all subordinates. Else sends an abort message to all subordinates 4. The subordinate responds appropriately to the received message and acknowledges the coordinator 5. After the coordinator has received ack messages from all subordinates, it writes an end log record for the transaction
  • 19. Three-Phase Commit (3PC) ● when the coordinator sends out prepare messages and receives yes votes from all subordinates, it sends all sites a precommit message, rather than a commit message. ● In 3PC the coordinator effectively postpones the decision to commit until it is sure that enough sites know about the decision to commit; ● if the coordinator subsequently fails, these sites can communicate with each other and detect that the transaction must be committed conversely, aborted, if none of them has received a precommit message without waiting for the coordinator to recover.
  • 20. Troubles with Distributed Systems 1. Whenever you try to send a packet over the network, it may be lost or arbitrarily delayed. Likewise, the reply may be lost or delayed, so if you don’t get a reply, you have no idea whether the message got through. 2. A node’s clock may be significantly out of sync with other nodes (despite your best efforts to set up NTP), it may suddenly jump forward or back in time, and relying on it is dangerous because you most likely don’t have a good measure of your clock’s error interval. 3. A process may pause for a substantial amount of time at any point in its execu‐ tion (perhaps due to a stop-the-world garbage collector), be declared dead by other nodes, and then come back to life again without realizing that it was paused.