Storing the real world data

Storing the real-
world DATA
In the world of relational, document and graph databases..

Whoami
Athira Mukundan
Solution Consultant @Sahaj
athira_tech@yahoo.in

Why distribute at all?
● Scalability
○ shared-memory architecture (vertical scaling or scaling up)
○ shared-disk architecture
○ shared-nothing architectures (horizontal scaling or scaling out)
● Fault tolerance/high availability
● Latency

What is distribution all about?
● Replication
○ Keeping a copy of the same data on several different nodes, potentially in different locations.
○ Provides redundancy: if some nodes are unavailable, the data can still be served from the
remaining nodes
○ Help improve performance by scaling out machines that can serve read request
○ To keep data geographically close to your users
● Partitioning :
○ Splitting a big database into smaller subsets called partitions so that different partitions can be
assigned to different nodes (also known as sharding)
● Transactions

Replication: Leaders and Followers

Replication: Leaders and Followers … cont.
Leader (also known as master or primary) :
● All writes compulsorily go to this node.
Followers (read replicas, slaves, secondaries, or hot standbys):
● Whenever the leader writes new data to its local storage, it also sends the data change to all of its
followers as part of a replication log or change stream.
● Each follower takes the log from the leader and updates its local copy of the database accordingly,
by applying all writes in the same order as they were processed on the leader.
● Reads can happen from here
Implemented in:
● PostgreSQL (since version 9.0), MySQL, Oracle Data Guard, and SQL Server’s AlwaysOn
Availability Groups].
● nonrelational databases, including MongoDB, RethinkDB, and Espresso

Synchronous Versus Asynchronous
Replication

Replication: Handling Node Outages
Follower failure: Catch-up recovery
● On its local disk, each follower keeps a log of the data changes it has received from the
leader
● the follower can connect to the leader and request all the data changes that occurred
during the time when the follower was disconnected.
Leader failure: Failover
● Determining that the leader has failed
● Choosing a new leader (replica with the most up-to-date data changes)
● Reconfiguring the system to use the new leader. (write requests to the new leader)

Problems with Replication Lag
● Reading Your Own Writes
○ A user makes a write, followed by a read from a stale replica. To prevent this anomaly, we need
read-after-write consistency
● Monotonic Reads
○ A user first reads from a fresh replica, then from a stale replica. Time appears to go backward. To
prevent this anomaly, we need monotonic reads
○ One way of achieving monotonic reads is to make sure that each user always makes their reads
from the same replica (different users can read from different replicas).

Other Replication topologies
Multi-Leader Replication Topologies
Leaderless Replication
Leaderless Replication

Partitioning
● To spread the data and query load evenly across multiple machines, avoiding hot spots
(nodes with disproportionately high load).
● Combining replication and partitioning: each node acts as leader for some partitions and
follower for other partitions.

Partitioning of Key-Value Data
Partitioning by Key Range
● assign a continuous range of keys (from some mini‐ mum to some maximum) to each
partition
Partitioning by Hash of Key
● Once you have a suitable hash function for keys, you can assign each partition a range of
hashes (rather than a range of keys), and every key whose hash falls within a partition’s
range will be stored in that partition.

Partitioning of Relational Data

Request Routing
On a high level, there are a few different approaches to this problem
● Allow clients to contact any node (e.g., via a round-robin load balancer).
○ If that node coincidentally owns the partition to which the request applies, it can handle the
request directly; otherwise, it forwards the request to the appropriate node, receives the reply,
and passes the reply along to the client.
● Send all requests from clients to a routing tier first,
○ This determines the node that should handle each request and forwards it accordingly.
○ This routing tier does not itself handle any requests; it only acts as a partition-aware load
balancer.
● Require that clients be aware of the partitioning and the assignment of partitions to
nodes.
○ In this case, a client can connect directly to the appropriate node, without any intermediary.

Transactions
● A transaction is a way for an application to group several reads and writes together into a
logical unit.
● all the reads and writes are executed as one operation: either the entire transaction
succeeds (commit) or it fails (abort, rollback).
● If it fails, the application can safely retry
● No partial failures
● If a transaction involves activities at different sites, we call the activity at a given site a
subtransaction

Transactions: Lock Management
● Centralized:
○ A single site is in charge of handling lock and unlock requests for all objects.
● Primary copy:
● One copy of each object is designated as the primary copy.
● All requests to lock or unlock a copy of this object are handled by the lock manager at the site
where the primary copy is stored, regardless of where the copy itself is stored.
● Fully distributed
● Requests to lock or unlock a copy of an object stored at a site are handled by the lock manager at
the site where the copy is stored.

Transactions: Distributed Recovery
A commit protocol is followed to ensure that all subtransactions of a given transaction either
commit or abort uniformly.
Two of the well-known protocols
1. Two-Phase Commit (2PC)
2. Three-Phase Commit (3PC)

Two-Phase Commit (2PC)
The transaction manager at the site where the transaction originated is called the coordinator
for the transaction; transaction managers at sites where its subtransactions execute are called
subordinates
1. coordinator sends a prepare message to each subordinate
2. When a subordinate receives a prepare message, it decides whether to abort or commit
its subtransaction. It replies with a yes/no
3. If the coordinator receives yes messages from all subordinates, then sends a commit
message to all subordinates. Else sends an abort message to all subordinates
4. The subordinate responds appropriately to the received message and acknowledges the
coordinator
5. After the coordinator has received ack messages from all subordinates, it writes an end
log record for the transaction

Three-Phase Commit (3PC)
● when the coordinator sends out prepare messages and receives yes votes from all
subordinates, it sends all sites a precommit message, rather than a commit message.
● In 3PC the coordinator effectively postpones the decision to commit until it is sure that
enough sites know about the decision to commit;
● if the coordinator subsequently fails, these sites can communicate with each other and
detect that the transaction must be committed conversely, aborted, if none of them has
received a precommit message without waiting for the coordinator to recover.

Troubles with Distributed Systems
1. Whenever you try to send a packet over the network, it may be lost or arbitrarily delayed.
Likewise, the reply may be lost or delayed, so if you don’t get a reply, you have no idea
whether the message got through.
2. A node’s clock may be significantly out of sync with other nodes (despite your best efforts
to set up NTP), it may suddenly jump forward or back in time, and relying on it is
dangerous because you most likely don’t have a good measure of your clock’s error
interval.
3. A process may pause for a substantial amount of time at any point in its execu‐ tion
(perhaps due to a stop-the-world garbage collector), be declared dead by other nodes,
and then come back to life again without realizing that it was paused.

Storing the real world data

More Related Content

What's hot (20)

Similar to Storing the real world data (20)

Recently uploaded (20)

Storing the real world data