The CAP Theorem

The CAP Theorem : Brewer’s Conjecture and
the Feasibility of Consistent, Available,
Partition-Tolerant Web Services

Aleksandar Bradic, Vast.com

nosqlsummer | Belgrade
28 August 2010

CAP Theorem

Conjecture by Eric Brewer at PODC 2000 :
It is impossible for a web service to provide following three
guarantees :
Consistency
Availability
Partition-tolerance

CAP Theorem

Consistency - all nodes should see the same data at the same
time
Availability - node failures do not prevent survivors from
continuing to operate
Partition-tolerance - the system continues to operate despite
arbitrary message loss

CAP Theorem

CAP Theorem :
It is impossible for a web service to provide following three
guarantees :
Consistency
Availability
Partition-tolerance
A distributed system can satisfy any two of these
guarantees at the same time but not all three

CAP Theorem

CAP Theorem
Conjecture since 2000
Established as theorem in 2002: Lynch, Nancy, and Seth
Gilbert. Brewer’s conjecture and the feasibility of consistent,
available, partition-tolerant web services. ACM SIGACT News,
v. 33 issue 2, 2002, p. 51-59.

CAP Theorem

A distributed system can satisfy any two of CAP guarantees at the
same time but not all three:
Consistency + Availability
Consistency + Partition Tolerance
Availability + Partition Tolerance

Consistency + Availability

Examples:
Single-site databases
Cluster databases
LDAP
xFS ﬁle system
Traits:
2-phase commit
cache validation protocols

Consistency + Partition Tolerance

Examples:
Distributed databases
Distributed Locking
Majority protocols
Traits:
Pessimistic locking
Make minority partitions unavailable

Availability + Partition Tolerance

Examples:
Coda
Web caching
DNS
Traits:
expiration/leases
conﬂict resolution
optimistic

Enterprise System CAP Classiﬁcation

RDBMS : CA (Master/Slave replication, Sharding)
Amazon Dynamo : AP (Read-repair, application hooks)
Terracota : CA (Quorum vote, majority partition survival)
Apache Cassandra : AP (Partitioning, Read-repair)
Apache Zookeeper: AP (Consensus protocol)
Google BigTable : CA
Apache CouchDB : AP

Enterprise System CAP Classiﬁcation

Some Techniques:

Consistent Hashing
Vector Clocks
Sloppy Quorum
Merkle trees
Gossip-based protocols
...

Proof of the CAP Theorem

Lynch, Nancy, and Seth Gilbert. Brewer’s conjecture and the
feasibility of consistent, available, partition-tolerant web services.
ACM SIGACT News, v. 33 issue 2, 2002, p. 51-59.

Formal Model

Formalization of the notion of Consistency, Availability and
Partition Tolerance :
Atomic Data Object
Available Data Object
Partition Tolerance

Atomic Data Objects

Atomic/Linearizable Consistency:
There must exist a total order on all operation such that each
operation looks as if it were completed at a single instant
This is equivalent to requiring requests on the distributed
shared memory to act as if they are executing on single node,
responding to operations one at the time

Available Data Objects

For a distributed system to be continuously available, every
request received by a non-failing node in the system must
result in a response
That is, any algorithm used by service must eventually
terminate
(In some ways, this is weak definition of availability : it puts
no bounds on how long the algorithm may run before
terminating, and therefore allows unbounded computation)
(On the other hand, when qualified by the need for partition
tolerance, this can be seen as a strong definition of availability
: even when severe network failures occur, every request must
terminate)

Partition Tolerance

In order to model partition tolerance, the network is allowed to
lose arbitrary many messages sent from one node to another
When a network is partitioned, all messages sent from nodes
in one component of the partition to another component are
lost.
The atomicity requirement implies that every response will be
atomic, even though arbitrary messages sent as part of the
algorithm might not be delivered
The availability requirement therefore implies that every node
receiving request from a client must respond, even through
arbitrary messages that are sent may be lost
Partition Tolerance : No set of failures less than total network
failure is allowed to cause the system to respond incorrectly

Formal Framework

Asynchronous Network Model
Partially Synchronous Network Model

Asynchronous Networks

There is no clock
Nodes must make decisions based only on messages received
and local computation

Asynchronous Networks : Impossibility Result

Theorem 1 : It is impossible in the asynchronous network model to
implement a read/write data object that guarantees the following
properties:
Availability
Atomic consistency
in all fair executions (including those in which messages are lost)


Proof (by contradiction) :
Assume an algorithm A exists that meets the three criteria :
atomicity, availability and partition tolerance
We construct an execution of A in which there exists a
request that returns and inconsistent response
Assume that the network consists of at least two nodes. Thus
it can be divided into two disjoint, non-empty sets G1 , G2
Assume all messages between G1 and G2 are lost.
If a write occurs in G 1 and read oddurs in G2 , then the read
operation cannot return the results of earlier write operation.


Formal proof:
Let v0 be the initial value of the atomic object
Let α1 be the preﬁx of an execution of A in which a single
write of a value not equal to v0 occurs in G1 , ending with the
termination of the write operation.
assume that no other client requests occur in either G1 or G2 .
assume that no messages from G1 are received in G2 and no
messages from G2 are received in G1
we know that write operation will complete (by the
availability requirement)


Let α2 be the preﬁx of an execution in which a single read
occurs in G2 and no other client requests occur, ending with
the termination of the read operation
During α2 no messages from G2 are received in G1 and no
messages from G1 are received in G 2
We know that the read must return a value (by the
availability requirement)
The value returned by this execution must be v0 as no write
operation has occurred in α2


Let α be an execution beginning with α1 and continuing with
α2 . To the nodes in G2 , α is indistinguishable from α2 , as all
the messages from G1 to G2 are lost (in both α1 and α2 that
together make up α), and α1 does not include any client
requests to nodes in G2 .
Therefore, in the α execution - the read request (from α2 )
must still return v0 .
However, the read request does not begin until after the write
request (from α1 ) has completed
This therefore contradicts the atomicity property,
proving that no such algorithm exists


Corollary 1.1:
It is impossible in the asynchronous network model to implement a
read/write data object that guarantees the following properties:
Availability - in all fair executions
Atomic consistency - in fair executions in which no messages
are lost


Proof:
The main idea is that in the asynchronous model, and
algorithm has no way of determining whether a message has
been lost, or has been arbitrary delayed in the transmission
channel
Therefore if there existed an an algorithm that guaranteed
atomic consistency in executions in which no messages were
lost, there would exist an algorithm that guaranteed atomic
consistency in all executions.
This would violate Theorem 1


Assume that there exists an algorithm A that always
terminates, and guarantees atomic consistency in fair
executions in which all messages are delivered
Theorem 1 implies that A does not guarantee atomic
consistency in al fair execution, so there exist some fair
execution of α in which some response is not atomic


At some ﬁnite point in execution α1 , the algorithm A returns
a response that is not atomic.
Let α be the preﬁx of α ending with the invalid response.
Next, extend α to to a fair execution α in which all
messages are delivered
The execution α is now a fair execution in which all
messages are delivered
However, this execution is not atomic.
Therefore no such algorithm A exists

Solutions in the Asynchronous Model

While it is impossible to provide all three properties : atomicity,
availability and partition tolerance, any two of these properties can
be achieved:
Atomic, Partition Tolerant
Atomic, Available

If availability is not required , it is easy to achieve atomic data
and partition tolerance
The trivial system that ignores all requests meets these
requirements
Stronger liveness criterion : if all the messages in an execution
are delivered, system is available and all operations terminate
A simple centralized algorithm meets these requirements : a
single designated node maintains the value of an object
A node receiving request forwards the request to designated
node which sends a response. When acknowledgement is
received, the node sends a response to the client
Many distributed databases provide this guarantee, especially
algorithms based on distributed locking or quorums : if certain
failure patterns occur, liveness condition is weakened and the
service no longer returns response. If there are no failures,
then liveness is guaranteed

Atomic, Available

If there are no partitions - it is possible to provide atomic,
available data
Centralized algorithm with single designated node for
maintaining value of an object meets these requirements

Available, Partition Tolerant

It is possible to provide high availability and partition
tolerance if atomic consistency is not required
If there are no consistency requirements, the service can
trivially return v0 , the initial value in response to every request
It is possible to provide weakened consistency in an available,
partition-tolerant setting
Web caches are one example of weakly consistent network

Partially Synchronous Model
In the real world, most networks are not purely asynchronous
If we allow each node in the network to have a clock, it is
possible to build a more powerful service
In partially synchronous model - every node has a clock and
all clocks increase at the same rate
However, the clocks themselves are not synchronized, in that
they might display diﬀerent variables at the same real time
In eﬀect : the clocks act as timers : local state variables that
the process can observe to measure how much time has passed
A local timer can be used to schedule an action to occur a
certain interval of time after some other event
Furthermore, assume that every message is either delivered
within a given, known time tmax or it is lost
Also, every node processes a received message within a given,
known time tlocal and local processing time

Partially Synchronous Networks : Impossibility Result

Theorem 2: It is impossible in the partially synchronous network
model to implement a read/write data object that guarantees the
following properties:
Availability
Atomic consistency
in all executions (even those in which messages are lost)


Proof:
Same methodology as in case of Theorem 1 is used
we divide the network into two components {G1 , G2 } and
construct an admissible execution in which a write happens in
one component, followed by a read operation in the other
component.
This read operation can be shown to return inconsistent data

We construct execution α1 : a single write request and
acknowledgement occurs in G1 , and all messages between the
two components {G1 , G2 } are lost

Let α2 be an execution that begins with a long interval of
time during which no client requests occur
This interval must be at least long as the entire duration of α1

Then append to α2 the events of α2 in following manner : a
single read request and response in G2 assuming all messages
between the two components are lost
Finally - we construct α by superimposing two execution α1
and α2

The long interval of time in α2 ensures that the write request
competes before the read request begins
However, the read request returns the initial value, rather
than the new value written by the write request, violating
atomic consistency

Solutions in the Partially Synchronous Model
Corollary 1.1(Asynchronous network model):
It is impossible in the asynchronous network model to implement a
read/write data object that guarantees the following properties:
Availability - in all fair executions
Atomic consistency - in fair executions in which no messages
are lost
In partially synchronous model - the analogue of Corollary 1.1 does
not hold
The proof of this corollary depends on nodes being unaware of
when a message is lost
There are partially synchronous algorithms that will return
atomic data when all messages in an execution are delivered
(i.e. there are no partitions) - and will only return inconsistent
data when messages are lost


An example of such an algorithm is the centralized protocol
with single object state store modiﬁed to time-out lost
messages :
On a read (or write) request, a message is sent to the central
node
If a response from central node is received, then the node
delivers the requested data (or an acknowledgement)
If no response is received within 2 ∗ tmsg + tlocal , then the
node concludes that the message was lost
The client is then sent a response : either the best known
value of the local node (for a read operation) or an
acknowledgement (for a write) operation. In this case, atomic
consistency may be violated.

Weaker Consistency Conditions
While it is useful to guarantee that atomic data will be
returned in executions in which all messages are delivered, it is
equally important to specify what happens in executions in
which some of the messages are lost
We discuss possible weaker consistency condition that allows
stale data to be returned when there are partitions, yet place
formal requirements on the quality of stale data returned
This consistency guarantee will require availability and atomic
consistency in executions in which no messages are lost and is
therefore impossible to guarantee in the asynchronous model
as a result of corollary
In the partially synchronous model it often makes sense to
base guarantees on how long an algorithm has had to rectify a
situation
This consistency model ensures that if messages are delivered,
then eventually some notion of atomicity is restored


In a atomic execution, we define a partial order of the read
and write operations and then require that if one operation
begins after another one ends, the former does not precede
the latter in the partial order.
We define a weaker guarantee, t-Connected Consistency,
which defines a partial order in similar manner, but only
requires that one operation not precede another if there is an
interval between the operations in which all messages are
delivered

A timed execution, α of a read-write object is t-Connected
Consistent if two criteria hold. First in executions in which no
messages are lost, the execution is atomic. Second, in executions
in which messages are lost, there exists a partial order P on the
operations in α such that :
1. P orders all write operations, and orders all read operations
with respect to the write operations
2. The value returned by every read operation is exactly the
one written by the previous write operation in P or the initial
value if there is no such previous write in P
3. The order in P is consistent with the order of read and
write requests submitted at each node
4. Assume that there exists an interval of time longer than t
in which no messages are lost. Further, assume an operation,
θ completes before the interval begins, and another operation
φ, begins after the interval ends. Then φ does not precede θ
in the partial order P


t-Connected Consistency
This guarantee allows for some stale data when messages are
lost, but provides a time limit on how long it takes for
consistency to return, once the partition heals
This deﬁnition can be generalized to provide consistency
guarantees when only some of the nodes are connected and
when connections are available only some of the time


A variant of ”centralized algorithm” is t-Connected Consistent.
Assume node C is the centralized node. The algorithm behaves as
follows:
read at node A : A sends a request to C from the most
recent value. If A receives a response from C within time
2 ∗ tmsg + tlocal , it saves the value and returns it to the client.
Otherwise, A concludes that a message was lost and it returns
the value with the highest sequence number that has ever
been received from C , or the initial value if no value has yet
been received from C . (When a client read request occurs at
C it acts like any other node, sending messages to itself)


write at A : A sends a message to C with the new value. A
waits 2 ∗ tmsg + tlocal , or until it receives an acknowledgement
from C and then sends an acknowledgement to the client. At
this point, either C has learned of the new value, or a
message was lost, or both eventsoccured. If A concludes that
a message was lost, it periodically retransmits the value to C
(along with all values lost during earlier write operations) until
it receives an acknowledgement from C . (As in the case of
read operations, when a client write request occurs at C it
acts like any other node, sending messages to itself)
New value is received at C : C serializes the write requests
that it hears about by assigning them consecutive integer
tags. Periodically C broadcasts the latest value and sequence
number to all other nodes.

Theorem 4 The modified centralized algorithm is t-Connected
consistent
Proof: In executions in which no messages are lost, the
operations are atomic. An execution is atomic if every
operation acts as if it is executed at a single instant. In this
case - that single instant occurs when C processes the
operation. C serializes the operation, ensuring atomic
consistency in executions in which all messages are delivered.
We then examine executions in which messages are lost. The
partial order P is constructed as follows. Write operations are
ordered by the sequence number assigned by the central
node.Each read operation is sequenced after the write
operation whose value it returns. It is clear by construction
that the partial order P satisfies criteria 1 and 2 of the
definition of t-Connected consistency. As the algorithm
handles requests in order received, criterion 3 is also clearly
true


In showing that the partial order respects criterion 4, there are
four cases : write followed by read, write followed by write,
read followed by read and read followed by write. Let time t
be long enough for a write operation to complete (and for C
to assign a sequence number to the new value), and from one
of the periodic broadcasts from C to occur.


1. write followed by read :
Assume a write occurs at Aw , after which an interval of time
longer than t passes in which all messages are delivered. After
this, a read is requested at some node. By the end of the
interval, two things have happened. First, Aw has notiﬁed the
central node of the new value, and the write operation has
been assigned a sequence number. Second, the central node
has rebroadcast that value (or later value in the partial order)
to all other nodes during one of the periodic broadcasts. As a
result , the read operation does not return an earlier value,
and therefore it must come after the write in the partial order
P


2. write followed by write :
Assume a write occurs at Aw after which an interval of time
longer than t passes in which all messages are delivered. After
this a write is requested at some node. As in the previous
case, by the end of the interval in which messages are
delivered , the central node has assigned a sequence number
to the write operation at Aw . As a result, the later write
operation is sequenced by the central node after the ﬁrst write
operation. Therefore the second write comes after the ﬁrst
write in the partial order P.


3. read followed by read :
Assume a read operation occurs at Br , after which an interval
of time longer than t passes in which all messages are
delivered. After this a read is requested at some node. Let φ
be the write operation whose value value the ﬁrst read
operation at Br returns. By the end of the interval in which
messages are delivered, the central node has assigned a
sequence number to φ and has broadcast the value of φ (or a
later value in the partial order) to all other nodes. As a result,
the second read operation does not return a value earlier in
the partial order than φ. Therefore the second read operation
does not precede the ﬁrst in the partial order P.


4. read followed by write :
Assume a read operation occurs at Br , after which an interval
of time longer than t passes in which all messages are
delivered. After this a write is requested at some node. Let φ
be the write operation whose value the ﬁrst read operation at
Br returns. By the end of the interval in which messages are
delivered, the central node has assigned a sequence number to
φ and as a result all write operations beginning after the
interval are serialized after φ. Therefore the write operation
does not precede the read operation in the partial order P.


Therefore P satisﬁes criterion 4. of the deﬁnition and this
algorithm is t-Connected Consistent.

Conclusion

We have shown that it impossible to reliably provide atomic,
consistent data when there are partitions in the network
It is feasible, however, to achieve any two of the three
properties : consistency, availability and partition tolerance
In an asynchronous model, when no clocks are available, the
impossibility result is fairly strong : it is impossible to provide
consistent data, even allowing stale data to be returned when
messages are lost
However, in partially synchronous models it is possible to
achieve a practical compromise between consistency and
availability

The CAP Theorem

More Related Content

What's hot (20)

Similar to The CAP Theorem (20)

More from Aleksandar Bradic (11)

Recently uploaded (20)

The CAP Theorem