SlideShare a Scribd company logo
The CAP Theorem : Brewer’s Conjecture and
   the Feasibility of Consistent, Available,
      Partition-Tolerant Web Services

           Aleksandar Bradic, Vast.com


             nosqlsummer | Belgrade
                 28 August 2010
CAP Theorem




  Conjecture by Eric Brewer at PODC 2000 :
     It is impossible for a web service to provide following three
      guarantees :
            Consistency
            Availability
            Partition-tolerance
CAP Theorem




      Consistency - all nodes should see the same data at the same
       time
      Availability - node failures do not prevent survivors from
       continuing to operate
      Partition-tolerance - the system continues to operate despite
       arbitrary message loss
CAP Theorem



  CAP Theorem :
    It is impossible for a web service to provide following three
     guarantees :
             Consistency
             Availability
             Partition-tolerance
       A distributed system can satisfy any two of these
        guarantees at the same time but not all three
CAP Theorem




  CAP Theorem
      Conjecture since 2000
      Established as theorem in 2002: Lynch, Nancy, and Seth
       Gilbert. Brewer’s conjecture and the feasibility of consistent,
       available, partition-tolerant web services. ACM SIGACT News,
       v. 33 issue 2, 2002, p. 51-59.
CAP Theorem




  A distributed system can satisfy any two of CAP guarantees at the
  same time but not all three:
       Consistency + Availability
       Consistency + Partition Tolerance
       Availability + Partition Tolerance
Consistency + Availability



   Examples:
        Single-site databases
        Cluster databases
        LDAP
        xFS file system
   Traits:
        2-phase commit
        cache validation protocols
Consistency + Partition Tolerance



   Examples:
        Distributed databases
        Distributed Locking
        Majority protocols
   Traits:
        Pessimistic locking
        Make minority partitions unavailable
Availability + Partition Tolerance



   Examples:
        Coda
        Web caching
        DNS
   Traits:
        expiration/leases
        conflict resolution
        optimistic
Enterprise System CAP Classification



       RDBMS : CA (Master/Slave replication, Sharding)
       Amazon Dynamo : AP (Read-repair, application hooks)
       Terracota : CA (Quorum vote, majority partition survival)
       Apache Cassandra : AP (Partitioning, Read-repair)
       Apache Zookeeper: AP (Consensus protocol)
       Google BigTable : CA
       Apache CouchDB : AP
Enterprise System CAP Classification
Some Techniques:




       Consistent Hashing
       Vector Clocks
       Sloppy Quorum
       Merkle trees
       Gossip-based protocols
       ...
Proof of the CAP Theorem




  Lynch, Nancy, and Seth Gilbert. Brewer’s conjecture and the
  feasibility of consistent, available, partition-tolerant web services.
  ACM SIGACT News, v. 33 issue 2, 2002, p. 51-59.
Formal Model




  Formalization of the notion of Consistency, Availability and
  Partition Tolerance :
       Atomic Data Object
       Available Data Object
       Partition Tolerance
Atomic Data Objects




   Atomic/Linearizable Consistency:
        There must exist a total order on all operation such that each
         operation looks as if it were completed at a single instant
        This is equivalent to requiring requests on the distributed
         shared memory to act as if they are executing on single node,
         responding to operations one at the time
Available Data Objects


       For a distributed system to be continuously available, every
        request received by a non-failing node in the system must
        result in a response
       That is, any algorithm used by service must eventually
        terminate
       (In some ways, this is weak definition of availability : it puts
        no bounds on how long the algorithm may run before
        terminating, and therefore allows unbounded computation)
       (On the other hand, when qualified by the need for partition
        tolerance, this can be seen as a strong definition of availability
        : even when severe network failures occur, every request must
        terminate)
Partition Tolerance

        In order to model partition tolerance, the network is allowed to
         lose arbitrary many messages sent from one node to another
        When a network is partitioned, all messages sent from nodes
         in one component of the partition to another component are
         lost.
        The atomicity requirement implies that every response will be
         atomic, even though arbitrary messages sent as part of the
         algorithm might not be delivered
        The availability requirement therefore implies that every node
         receiving request from a client must respond, even through
         arbitrary messages that are sent may be lost
        Partition Tolerance : No set of failures less than total network
         failure is allowed to cause the system to respond incorrectly
Formal Framework




       Asynchronous Network Model
       Partially Synchronous Network Model
Asynchronous Networks




       There is no clock
       Nodes must make decisions based only on messages received
        and local computation
Asynchronous Networks : Impossibility Result




   Theorem 1 : It is impossible in the asynchronous network model to
   implement a read/write data object that guarantees the following
   properties:
        Availability
        Atomic consistency
   in all fair executions (including those in which messages are lost)
Asynchronous Networks : Impossibility Result


   Proof (by contradiction) :
        Assume an algorithm A exists that meets the three criteria :
         atomicity, availability and partition tolerance
        We construct an execution of A in which there exists a
         request that returns and inconsistent response
        Assume that the network consists of at least two nodes. Thus
         it can be divided into two disjoint, non-empty sets G1 , G2
        Assume all messages between G1 and G2 are lost.
        If a write occurs in G 1 and read oddurs in G2 , then the read
         operation cannot return the results of earlier write operation.
Asynchronous Networks : Impossibility Result


   Formal proof:
        Let v0 be the initial value of the atomic object
        Let α1 be the prefix of an execution of A in which a single
         write of a value not equal to v0 occurs in G1 , ending with the
         termination of the write operation.
        assume that no other client requests occur in either G1 or G2 .
        assume that no messages from G1 are received in G2 and no
         messages from G2 are received in G1
        we know that write operation will complete (by the
         availability requirement)
Asynchronous Networks : Impossibility Result



       Let α2 be the prefix of an execution in which a single read
        occurs in G2 and no other client requests occur, ending with
        the termination of the read operation
       During α2 no messages from G2 are received in G1 and no
        messages from G1 are received in G 2
       We know that the read must return a value (by the
        availability requirement)
       The value returned by this execution must be v0 as no write
        operation has occurred in α2
Asynchronous Networks : Impossibility Result


       Let α be an execution beginning with α1 and continuing with
        α2 . To the nodes in G2 , α is indistinguishable from α2 , as all
        the messages from G1 to G2 are lost (in both α1 and α2 that
        together make up α), and α1 does not include any client
        requests to nodes in G2 .
       Therefore, in the α execution - the read request (from α2 )
        must still return v0 .
       However, the read request does not begin until after the write
        request (from α1 ) has completed
       This therefore contradicts the atomicity property,
        proving that no such algorithm exists
Asynchronous Networks : Impossibility Result
Asynchronous Networks : Impossibility Result




   Corollary 1.1:
   It is impossible in the asynchronous network model to implement a
   read/write data object that guarantees the following properties:
        Availability - in all fair executions
        Atomic consistency - in fair executions in which no messages
         are lost
Asynchronous Networks : Impossibility Result


   Proof:
        The main idea is that in the asynchronous model, and
         algorithm has no way of determining whether a message has
         been lost, or has been arbitrary delayed in the transmission
         channel
        Therefore if there existed an an algorithm that guaranteed
         atomic consistency in executions in which no messages were
         lost, there would exist an algorithm that guaranteed atomic
         consistency in all executions.
        This would violate Theorem 1
Asynchronous Networks : Impossibility Result




       Assume that there exists an algorithm A that always
        terminates, and guarantees atomic consistency in fair
        executions in which all messages are delivered
       Theorem 1 implies that A does not guarantee atomic
        consistency in al fair execution, so there exist some fair
        execution of α in which some response is not atomic
Asynchronous Networks : Impossibility Result



       At some finite point in execution α1 , the algorithm A returns
        a response that is not atomic.
       Let α be the prefix of α ending with the invalid response.
       Next, extend α to to a fair execution α in which all
        messages are delivered
       The execution α is now a fair execution in which all
        messages are delivered
       However, this execution is not atomic.
       Therefore no such algorithm A exists
Solutions in the Asynchronous Model




   While it is impossible to provide all three properties : atomicity,
   availability and partition tolerance, any two of these properties can
   be achieved:
        Atomic, Partition Tolerant
        Atomic, Available
        Atomic, Partition Tolerant
Atomic, Partition Tolerant
        If availability is not required , it is easy to achieve atomic data
         and partition tolerance
        The trivial system that ignores all requests meets these
         requirements
        Stronger liveness criterion : if all the messages in an execution
         are delivered, system is available and all operations terminate
        A simple centralized algorithm meets these requirements : a
         single designated node maintains the value of an object
        A node receiving request forwards the request to designated
         node which sends a response. When acknowledgement is
         received, the node sends a response to the client
        Many distributed databases provide this guarantee, especially
         algorithms based on distributed locking or quorums : if certain
         failure patterns occur, liveness condition is weakened and the
         service no longer returns response. If there are no failures,
         then liveness is guaranteed
Atomic, Available




       If there are no partitions - it is possible to provide atomic,
        available data
       Centralized algorithm with single designated node for
        maintaining value of an object meets these requirements
Available, Partition Tolerant



        It is possible to provide high availability and partition
         tolerance if atomic consistency is not required
        If there are no consistency requirements, the service can
         trivially return v0 , the initial value in response to every request
        It is possible to provide weakened consistency in an available,
         partition-tolerant setting
        Web caches are one example of weakly consistent network
Partially Synchronous Model
       In the real world, most networks are not purely asynchronous
       If we allow each node in the network to have a clock, it is
        possible to build a more powerful service
       In partially synchronous model - every node has a clock and
        all clocks increase at the same rate
       However, the clocks themselves are not synchronized, in that
        they might display different variables at the same real time
       In effect : the clocks act as timers : local state variables that
        the process can observe to measure how much time has passed
       A local timer can be used to schedule an action to occur a
        certain interval of time after some other event
       Furthermore, assume that every message is either delivered
        within a given, known time tmax or it is lost
       Also, every node processes a received message within a given,
        known time tlocal and local processing time
Partially Synchronous Networks : Impossibility Result




   Theorem 2: It is impossible in the partially synchronous network
   model to implement a read/write data object that guarantees the
   following properties:
        Availability
        Atomic consistency
   in all executions (even those in which messages are lost)
Partially Synchronous Networks : Impossibility Result



   Proof:
        Same methodology as in case of Theorem 1 is used
        we divide the network into two components {G1 , G2 } and
         construct an admissible execution in which a write happens in
         one component, followed by a read operation in the other
         component.
        This read operation can be shown to return inconsistent data
Partially Synchronous Networks : Impossibility Result
        We construct execution α1 : a single write request and
         acknowledgement occurs in G1 , and all messages between the
         two components {G1 , G2 } are lost
              
         Let α2 be an execution that begins with a long interval of
         time during which no client requests occur
        This interval must be at least long as the entire duration of α1
                           
         Then append to α2 the events of α2 in following manner : a
         single read request and response in G2 assuming all messages
         between the two components are lost
        Finally - we construct α by superimposing two execution α1
         and α2 

        The long interval of time in α2 ensures that the write request
         competes before the read request begins
        However, the read request returns the initial value, rather
         than the new value written by the write request, violating
         atomic consistency
Asynchronous Networks : Impossibility Result
Solutions in the Partially Synchronous Model
   Corollary 1.1(Asynchronous network model):
   It is impossible in the asynchronous network model to implement a
   read/write data object that guarantees the following properties:
        Availability - in all fair executions
        Atomic consistency - in fair executions in which no messages
         are lost
   In partially synchronous model - the analogue of Corollary 1.1 does
   not hold
        The proof of this corollary depends on nodes being unaware of
         when a message is lost
        There are partially synchronous algorithms that will return
         atomic data when all messages in an execution are delivered
         (i.e. there are no partitions) - and will only return inconsistent
         data when messages are lost
Solutions in the Partially Synchronous Model

        An example of such an algorithm is the centralized protocol
         with single object state store modified to time-out lost
         messages :
        On a read (or write) request, a message is sent to the central
         node
        If a response from central node is received, then the node
         delivers the requested data (or an acknowledgement)
        If no response is received within 2 ∗ tmsg + tlocal , then the
         node concludes that the message was lost
        The client is then sent a response : either the best known
         value of the local node (for a read operation) or an
         acknowledgement (for a write) operation. In this case, atomic
         consistency may be violated.
Solutions in the Partially Synchronous Model
Weaker Consistency Conditions
       While it is useful to guarantee that atomic data will be
        returned in executions in which all messages are delivered, it is
        equally important to specify what happens in executions in
        which some of the messages are lost
       We discuss possible weaker consistency condition that allows
        stale data to be returned when there are partitions, yet place
        formal requirements on the quality of stale data returned
       This consistency guarantee will require availability and atomic
        consistency in executions in which no messages are lost and is
        therefore impossible to guarantee in the asynchronous model
        as a result of corollary
       In the partially synchronous model it often makes sense to
        base guarantees on how long an algorithm has had to rectify a
        situation
       This consistency model ensures that if messages are delivered,
        then eventually some notion of atomicity is restored
Weaker Consistency Conditions



       In a atomic execution, we define a partial order of the read
        and write operations and then require that if one operation
        begins after another one ends, the former does not precede
        the latter in the partial order.
       We define a weaker guarantee, t-Connected Consistency,
        which defines a partial order in similar manner, but only
        requires that one operation not precede another if there is an
        interval between the operations in which all messages are
        delivered
Weaker Consistency Conditions
    A timed execution, α of a read-write object is t-Connected
   Consistent if two criteria hold. First in executions in which no
   messages are lost, the execution is atomic. Second, in executions
   in which messages are lost, there exists a partial order P on the
   operations in α such that :
       1. P orders all write operations, and orders all read operations
        with respect to the write operations
       2. The value returned by every read operation is exactly the
        one written by the previous write operation in P or the initial
        value if there is no such previous write in P
       3. The order in P is consistent with the order of read and
        write requests submitted at each node
       4. Assume that there exists an interval of time longer than t
        in which no messages are lost. Further, assume an operation,
        θ completes before the interval begins, and another operation
        φ, begins after the interval ends. Then φ does not precede θ
        in the partial order P
Weaker Consistency Conditions



   t-Connected Consistency
        This guarantee allows for some stale data when messages are
         lost, but provides a time limit on how long it takes for
         consistency to return, once the partition heals
        This definition can be generalized to provide consistency
         guarantees when only some of the nodes are connected and
         when connections are available only some of the time
Weaker Consistency Conditions


   A variant of ”centralized algorithm” is t-Connected Consistent.
   Assume node C is the centralized node. The algorithm behaves as
   follows:
        read at node A : A sends a request to C from the most
         recent value. If A receives a response from C within time
         2 ∗ tmsg + tlocal , it saves the value and returns it to the client.
         Otherwise, A concludes that a message was lost and it returns
         the value with the highest sequence number that has ever
         been received from C , or the initial value if no value has yet
         been received from C . (When a client read request occurs at
         C it acts like any other node, sending messages to itself)
Weaker Consistency Conditions

       write at A : A sends a message to C with the new value. A
        waits 2 ∗ tmsg + tlocal , or until it receives an acknowledgement
        from C and then sends an acknowledgement to the client. At
        this point, either C has learned of the new value, or a
        message was lost, or both eventsoccured. If A concludes that
        a message was lost, it periodically retransmits the value to C
        (along with all values lost during earlier write operations) until
        it receives an acknowledgement from C . (As in the case of
        read operations, when a client write request occurs at C it
        acts like any other node, sending messages to itself)
       New value is received at C : C serializes the write requests
        that it hears about by assigning them consecutive integer
        tags. Periodically C broadcasts the latest value and sequence
        number to all other nodes.
Weaker Consistency Conditions
Weaker Consistency Conditions
   Theorem 4 The modified centralized algorithm is t-Connected
   consistent
        Proof: In executions in which no messages are lost, the
         operations are atomic. An execution is atomic if every
         operation acts as if it is executed at a single instant. In this
         case - that single instant occurs when C processes the
         operation. C serializes the operation, ensuring atomic
         consistency in executions in which all messages are delivered.
        We then examine executions in which messages are lost. The
         partial order P is constructed as follows. Write operations are
         ordered by the sequence number assigned by the central
         node.Each read operation is sequenced after the write
         operation whose value it returns. It is clear by construction
         that the partial order P satisfies criteria 1 and 2 of the
         definition of t-Connected consistency. As the algorithm
         handles requests in order received, criterion 3 is also clearly
         true
Weaker Consistency Conditions




       In showing that the partial order respects criterion 4, there are
        four cases : write followed by read, write followed by write,
        read followed by read and read followed by write. Let time t
        be long enough for a write operation to complete (and for C
        to assign a sequence number to the new value), and from one
        of the periodic broadcasts from C to occur.
Weaker Consistency Conditions


   1. write followed by read :
        Assume a write occurs at Aw , after which an interval of time
         longer than t passes in which all messages are delivered. After
         this, a read is requested at some node. By the end of the
         interval, two things have happened. First, Aw has notified the
         central node of the new value, and the write operation has
         been assigned a sequence number. Second, the central node
         has rebroadcast that value (or later value in the partial order)
         to all other nodes during one of the periodic broadcasts. As a
         result , the read operation does not return an earlier value,
         and therefore it must come after the write in the partial order
         P
Weaker Consistency Conditions
Weaker Consistency Conditions


   2. write followed by write :
        Assume a write occurs at Aw after which an interval of time
         longer than t passes in which all messages are delivered. After
         this a write is requested at some node. As in the previous
         case, by the end of the interval in which messages are
         delivered , the central node has assigned a sequence number
         to the write operation at Aw . As a result, the later write
         operation is sequenced by the central node after the first write
         operation. Therefore the second write comes after the first
         write in the partial order P.
Weaker Consistency Conditions
Weaker Consistency Conditions


   3. read followed by read :
        Assume a read operation occurs at Br , after which an interval
         of time longer than t passes in which all messages are
         delivered. After this a read is requested at some node. Let φ
         be the write operation whose value value the first read
         operation at Br returns. By the end of the interval in which
         messages are delivered, the central node has assigned a
         sequence number to φ and has broadcast the value of φ (or a
         later value in the partial order) to all other nodes. As a result,
         the second read operation does not return a value earlier in
         the partial order than φ. Therefore the second read operation
         does not precede the first in the partial order P.
Weaker Consistency Conditions
Weaker Consistency Conditions


   4. read followed by write :
        Assume a read operation occurs at Br , after which an interval
         of time longer than t passes in which all messages are
         delivered. After this a write is requested at some node. Let φ
         be the write operation whose value the first read operation at
         Br returns. By the end of the interval in which messages are
         delivered, the central node has assigned a sequence number to
         φ and as a result all write operations beginning after the
         interval are serialized after φ. Therefore the write operation
         does not precede the read operation in the partial order P.
Weaker Consistency Conditions
Weaker Consistency Conditions




       Therefore P satisfies criterion 4. of the definition and this
        algorithm is t-Connected Consistent.
Conclusion


       We have shown that it impossible to reliably provide atomic,
        consistent data when there are partitions in the network
       It is feasible, however, to achieve any two of the three
        properties : consistency, availability and partition tolerance
       In an asynchronous model, when no clocks are available, the
        impossibility result is fairly strong : it is impossible to provide
        consistent data, even allowing stale data to be returned when
        messages are lost
       However, in partially synchronous models it is possible to
        achieve a practical compromise between consistency and
        availability

More Related Content

PPTX
NOSQL vs SQL
PDF
SQOOP PPT
PDF
Deep Learning - The Past, Present and Future of Artificial Intelligence
PPTX
S'entraider pour mieux réussir.
PPTX
Relational databases vs Non-relational databases
PPTX
Unit 3 - Harmony in the family and society
PPT
Curentul electric
PDF
Nature-Based Solutions & Re-Naturing Cities
NOSQL vs SQL
SQOOP PPT
Deep Learning - The Past, Present and Future of Artificial Intelligence
S'entraider pour mieux réussir.
Relational databases vs Non-relational databases
Unit 3 - Harmony in the family and society
Curentul electric
Nature-Based Solutions & Re-Naturing Cities

What's hot (20)

PPTX
introduction to NOSQL Database
PPTX
Apache HBase™
PPTX
Map Reduce
PPTX
Distributed file system
PPT
7. Key-Value Databases: In Depth
PPTX
Introduction to NoSQL
PPTX
Hadoop File system (HDFS)
PPTX
What is NoSQL and CAP Theorem
PPTX
HADOOP TECHNOLOGY ppt
PPTX
Introduction to Hadoop and Hadoop component
PDF
PDF
Intro to HBase
ZIP
NoSQL databases
PDF
Data Streaming For Big Data
PDF
Big Data Architecture
PPTX
Introduction to Map Reduce
PPTX
Hadoop And Their Ecosystem ppt
PDF
Dynamo and BigTable - Review and Comparison
PPTX
Data models in NoSQL
PPTX
Introduction to Hadoop
introduction to NOSQL Database
Apache HBase™
Map Reduce
Distributed file system
7. Key-Value Databases: In Depth
Introduction to NoSQL
Hadoop File system (HDFS)
What is NoSQL and CAP Theorem
HADOOP TECHNOLOGY ppt
Introduction to Hadoop and Hadoop component
Intro to HBase
NoSQL databases
Data Streaming For Big Data
Big Data Architecture
Introduction to Map Reduce
Hadoop And Their Ecosystem ppt
Dynamo and BigTable - Review and Comparison
Data models in NoSQL
Introduction to Hadoop
Ad

Similar to The CAP Theorem (20)

PPT
Network Layer
PPT
CN_unit2.ppt Data Link Layer characteristics, categories
PDF
Ch05_OSI_Reference_Model - Compressed.pdf
PPTX
Impromptu ideas in respect of v2 v and other
PDF
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
PDF
Opnet lab 3 solutions
PDF
Ballpark Figure Algorithms for Data Broadcast in Wireless Networks
PPTX
UNIT 5 TRANSPORT LAYER AND APPLICATION LAYER.pptx
PPTX
Computer network
PDF
Mit6 02 f12_chap18
PDF
Transport laye
PPTX
Replication in Distributed Systems
PPTX
OSI reference model
PPT
Network Layer
PPT
Network Layer
PPTX
Computer networks module 2 data link layer
PPTX
Unit 2
DOCX
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
PPTX
Chapter V-Connecting LANs, Backbone Networks, and Virtual LANs.pptx
PDF
Chapter 2 LAN redundancy
Network Layer
CN_unit2.ppt Data Link Layer characteristics, categories
Ch05_OSI_Reference_Model - Compressed.pdf
Impromptu ideas in respect of v2 v and other
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Opnet lab 3 solutions
Ballpark Figure Algorithms for Data Broadcast in Wireless Networks
UNIT 5 TRANSPORT LAYER AND APPLICATION LAYER.pptx
Computer network
Mit6 02 f12_chap18
Transport laye
Replication in Distributed Systems
OSI reference model
Network Layer
Network Layer
Computer networks module 2 data link layer
Unit 2
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
Chapter V-Connecting LANs, Backbone Networks, and Virtual LANs.pptx
Chapter 2 LAN redundancy
Ad

More from Aleksandar Bradic (11)

PDF
Creativity, Language, Computation - Sketching in Hardware 2022
PDF
Technology and Power: Agency, Discourse and Community Formation
PDF
Theses on AI User Experience Design - Sketching in Hardware 2020
PDF
Go-to-Market & Scaling: An Engineering Perspective
PDF
Human-Data Interfaces: A case for hardware-driven innovation
PDF
Computational Complexity for Poets
PDF
Kolmogorov Complexity, Art, and all that
PDF
De Bruijn Sequences for Fun and Profit
PDF
Can Data Help Us Build Better Hardware Products ?
PDF
Supplyframe Engineering
PDF
S4: Distributed Stream Computing Platform
Creativity, Language, Computation - Sketching in Hardware 2022
Technology and Power: Agency, Discourse and Community Formation
Theses on AI User Experience Design - Sketching in Hardware 2020
Go-to-Market & Scaling: An Engineering Perspective
Human-Data Interfaces: A case for hardware-driven innovation
Computational Complexity for Poets
Kolmogorov Complexity, Art, and all that
De Bruijn Sequences for Fun and Profit
Can Data Help Us Build Better Hardware Products ?
Supplyframe Engineering
S4: Distributed Stream Computing Platform

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Advanced Soft Computing BINUS July 2025.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Empathic Computing: Creating Shared Understanding
CIFDAQ's Market Insight: SEC Turns Pro Crypto
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced methodologies resolving dimensionality complications for autism neur...
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx
GamePlan Trading System Review: Professional Trader's Honest Take
Advanced Soft Computing BINUS July 2025.pdf
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Monthly Chronicles - July 2025
Spectral efficient network and resource selection model in 5G networks

The CAP Theorem

  • 1. The CAP Theorem : Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Aleksandar Bradic, Vast.com nosqlsummer | Belgrade 28 August 2010
  • 2. CAP Theorem Conjecture by Eric Brewer at PODC 2000 : It is impossible for a web service to provide following three guarantees : Consistency Availability Partition-tolerance
  • 3. CAP Theorem Consistency - all nodes should see the same data at the same time Availability - node failures do not prevent survivors from continuing to operate Partition-tolerance - the system continues to operate despite arbitrary message loss
  • 4. CAP Theorem CAP Theorem : It is impossible for a web service to provide following three guarantees : Consistency Availability Partition-tolerance A distributed system can satisfy any two of these guarantees at the same time but not all three
  • 5. CAP Theorem CAP Theorem Conjecture since 2000 Established as theorem in 2002: Lynch, Nancy, and Seth Gilbert. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, v. 33 issue 2, 2002, p. 51-59.
  • 6. CAP Theorem A distributed system can satisfy any two of CAP guarantees at the same time but not all three: Consistency + Availability Consistency + Partition Tolerance Availability + Partition Tolerance
  • 7. Consistency + Availability Examples: Single-site databases Cluster databases LDAP xFS file system Traits: 2-phase commit cache validation protocols
  • 8. Consistency + Partition Tolerance Examples: Distributed databases Distributed Locking Majority protocols Traits: Pessimistic locking Make minority partitions unavailable
  • 9. Availability + Partition Tolerance Examples: Coda Web caching DNS Traits: expiration/leases conflict resolution optimistic
  • 10. Enterprise System CAP Classification RDBMS : CA (Master/Slave replication, Sharding) Amazon Dynamo : AP (Read-repair, application hooks) Terracota : CA (Quorum vote, majority partition survival) Apache Cassandra : AP (Partitioning, Read-repair) Apache Zookeeper: AP (Consensus protocol) Google BigTable : CA Apache CouchDB : AP
  • 11. Enterprise System CAP Classification
  • 12. Some Techniques: Consistent Hashing Vector Clocks Sloppy Quorum Merkle trees Gossip-based protocols ...
  • 13. Proof of the CAP Theorem Lynch, Nancy, and Seth Gilbert. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, v. 33 issue 2, 2002, p. 51-59.
  • 14. Formal Model Formalization of the notion of Consistency, Availability and Partition Tolerance : Atomic Data Object Available Data Object Partition Tolerance
  • 15. Atomic Data Objects Atomic/Linearizable Consistency: There must exist a total order on all operation such that each operation looks as if it were completed at a single instant This is equivalent to requiring requests on the distributed shared memory to act as if they are executing on single node, responding to operations one at the time
  • 16. Available Data Objects For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response That is, any algorithm used by service must eventually terminate (In some ways, this is weak definition of availability : it puts no bounds on how long the algorithm may run before terminating, and therefore allows unbounded computation) (On the other hand, when qualified by the need for partition tolerance, this can be seen as a strong definition of availability : even when severe network failures occur, every request must terminate)
  • 17. Partition Tolerance In order to model partition tolerance, the network is allowed to lose arbitrary many messages sent from one node to another When a network is partitioned, all messages sent from nodes in one component of the partition to another component are lost. The atomicity requirement implies that every response will be atomic, even though arbitrary messages sent as part of the algorithm might not be delivered The availability requirement therefore implies that every node receiving request from a client must respond, even through arbitrary messages that are sent may be lost Partition Tolerance : No set of failures less than total network failure is allowed to cause the system to respond incorrectly
  • 18. Formal Framework Asynchronous Network Model Partially Synchronous Network Model
  • 19. Asynchronous Networks There is no clock Nodes must make decisions based only on messages received and local computation
  • 20. Asynchronous Networks : Impossibility Result Theorem 1 : It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties: Availability Atomic consistency in all fair executions (including those in which messages are lost)
  • 21. Asynchronous Networks : Impossibility Result Proof (by contradiction) : Assume an algorithm A exists that meets the three criteria : atomicity, availability and partition tolerance We construct an execution of A in which there exists a request that returns and inconsistent response Assume that the network consists of at least two nodes. Thus it can be divided into two disjoint, non-empty sets G1 , G2 Assume all messages between G1 and G2 are lost. If a write occurs in G 1 and read oddurs in G2 , then the read operation cannot return the results of earlier write operation.
  • 22. Asynchronous Networks : Impossibility Result Formal proof: Let v0 be the initial value of the atomic object Let α1 be the prefix of an execution of A in which a single write of a value not equal to v0 occurs in G1 , ending with the termination of the write operation. assume that no other client requests occur in either G1 or G2 . assume that no messages from G1 are received in G2 and no messages from G2 are received in G1 we know that write operation will complete (by the availability requirement)
  • 23. Asynchronous Networks : Impossibility Result Let α2 be the prefix of an execution in which a single read occurs in G2 and no other client requests occur, ending with the termination of the read operation During α2 no messages from G2 are received in G1 and no messages from G1 are received in G 2 We know that the read must return a value (by the availability requirement) The value returned by this execution must be v0 as no write operation has occurred in α2
  • 24. Asynchronous Networks : Impossibility Result Let α be an execution beginning with α1 and continuing with α2 . To the nodes in G2 , α is indistinguishable from α2 , as all the messages from G1 to G2 are lost (in both α1 and α2 that together make up α), and α1 does not include any client requests to nodes in G2 . Therefore, in the α execution - the read request (from α2 ) must still return v0 . However, the read request does not begin until after the write request (from α1 ) has completed This therefore contradicts the atomicity property, proving that no such algorithm exists
  • 25. Asynchronous Networks : Impossibility Result
  • 26. Asynchronous Networks : Impossibility Result Corollary 1.1: It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties: Availability - in all fair executions Atomic consistency - in fair executions in which no messages are lost
  • 27. Asynchronous Networks : Impossibility Result Proof: The main idea is that in the asynchronous model, and algorithm has no way of determining whether a message has been lost, or has been arbitrary delayed in the transmission channel Therefore if there existed an an algorithm that guaranteed atomic consistency in executions in which no messages were lost, there would exist an algorithm that guaranteed atomic consistency in all executions. This would violate Theorem 1
  • 28. Asynchronous Networks : Impossibility Result Assume that there exists an algorithm A that always terminates, and guarantees atomic consistency in fair executions in which all messages are delivered Theorem 1 implies that A does not guarantee atomic consistency in al fair execution, so there exist some fair execution of α in which some response is not atomic
  • 29. Asynchronous Networks : Impossibility Result At some finite point in execution α1 , the algorithm A returns a response that is not atomic. Let α be the prefix of α ending with the invalid response. Next, extend α to to a fair execution α in which all messages are delivered The execution α is now a fair execution in which all messages are delivered However, this execution is not atomic. Therefore no such algorithm A exists
  • 30. Solutions in the Asynchronous Model While it is impossible to provide all three properties : atomicity, availability and partition tolerance, any two of these properties can be achieved: Atomic, Partition Tolerant Atomic, Available Atomic, Partition Tolerant
  • 31. Atomic, Partition Tolerant If availability is not required , it is easy to achieve atomic data and partition tolerance The trivial system that ignores all requests meets these requirements Stronger liveness criterion : if all the messages in an execution are delivered, system is available and all operations terminate A simple centralized algorithm meets these requirements : a single designated node maintains the value of an object A node receiving request forwards the request to designated node which sends a response. When acknowledgement is received, the node sends a response to the client Many distributed databases provide this guarantee, especially algorithms based on distributed locking or quorums : if certain failure patterns occur, liveness condition is weakened and the service no longer returns response. If there are no failures, then liveness is guaranteed
  • 32. Atomic, Available If there are no partitions - it is possible to provide atomic, available data Centralized algorithm with single designated node for maintaining value of an object meets these requirements
  • 33. Available, Partition Tolerant It is possible to provide high availability and partition tolerance if atomic consistency is not required If there are no consistency requirements, the service can trivially return v0 , the initial value in response to every request It is possible to provide weakened consistency in an available, partition-tolerant setting Web caches are one example of weakly consistent network
  • 34. Partially Synchronous Model In the real world, most networks are not purely asynchronous If we allow each node in the network to have a clock, it is possible to build a more powerful service In partially synchronous model - every node has a clock and all clocks increase at the same rate However, the clocks themselves are not synchronized, in that they might display different variables at the same real time In effect : the clocks act as timers : local state variables that the process can observe to measure how much time has passed A local timer can be used to schedule an action to occur a certain interval of time after some other event Furthermore, assume that every message is either delivered within a given, known time tmax or it is lost Also, every node processes a received message within a given, known time tlocal and local processing time
  • 35. Partially Synchronous Networks : Impossibility Result Theorem 2: It is impossible in the partially synchronous network model to implement a read/write data object that guarantees the following properties: Availability Atomic consistency in all executions (even those in which messages are lost)
  • 36. Partially Synchronous Networks : Impossibility Result Proof: Same methodology as in case of Theorem 1 is used we divide the network into two components {G1 , G2 } and construct an admissible execution in which a write happens in one component, followed by a read operation in the other component. This read operation can be shown to return inconsistent data
  • 37. Partially Synchronous Networks : Impossibility Result We construct execution α1 : a single write request and acknowledgement occurs in G1 , and all messages between the two components {G1 , G2 } are lost Let α2 be an execution that begins with a long interval of time during which no client requests occur This interval must be at least long as the entire duration of α1 Then append to α2 the events of α2 in following manner : a single read request and response in G2 assuming all messages between the two components are lost Finally - we construct α by superimposing two execution α1 and α2 The long interval of time in α2 ensures that the write request competes before the read request begins However, the read request returns the initial value, rather than the new value written by the write request, violating atomic consistency
  • 38. Asynchronous Networks : Impossibility Result
  • 39. Solutions in the Partially Synchronous Model Corollary 1.1(Asynchronous network model): It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties: Availability - in all fair executions Atomic consistency - in fair executions in which no messages are lost In partially synchronous model - the analogue of Corollary 1.1 does not hold The proof of this corollary depends on nodes being unaware of when a message is lost There are partially synchronous algorithms that will return atomic data when all messages in an execution are delivered (i.e. there are no partitions) - and will only return inconsistent data when messages are lost
  • 40. Solutions in the Partially Synchronous Model An example of such an algorithm is the centralized protocol with single object state store modified to time-out lost messages : On a read (or write) request, a message is sent to the central node If a response from central node is received, then the node delivers the requested data (or an acknowledgement) If no response is received within 2 ∗ tmsg + tlocal , then the node concludes that the message was lost The client is then sent a response : either the best known value of the local node (for a read operation) or an acknowledgement (for a write) operation. In this case, atomic consistency may be violated.
  • 41. Solutions in the Partially Synchronous Model
  • 42. Weaker Consistency Conditions While it is useful to guarantee that atomic data will be returned in executions in which all messages are delivered, it is equally important to specify what happens in executions in which some of the messages are lost We discuss possible weaker consistency condition that allows stale data to be returned when there are partitions, yet place formal requirements on the quality of stale data returned This consistency guarantee will require availability and atomic consistency in executions in which no messages are lost and is therefore impossible to guarantee in the asynchronous model as a result of corollary In the partially synchronous model it often makes sense to base guarantees on how long an algorithm has had to rectify a situation This consistency model ensures that if messages are delivered, then eventually some notion of atomicity is restored
  • 43. Weaker Consistency Conditions In a atomic execution, we define a partial order of the read and write operations and then require that if one operation begins after another one ends, the former does not precede the latter in the partial order. We define a weaker guarantee, t-Connected Consistency, which defines a partial order in similar manner, but only requires that one operation not precede another if there is an interval between the operations in which all messages are delivered
  • 44. Weaker Consistency Conditions A timed execution, α of a read-write object is t-Connected Consistent if two criteria hold. First in executions in which no messages are lost, the execution is atomic. Second, in executions in which messages are lost, there exists a partial order P on the operations in α such that : 1. P orders all write operations, and orders all read operations with respect to the write operations 2. The value returned by every read operation is exactly the one written by the previous write operation in P or the initial value if there is no such previous write in P 3. The order in P is consistent with the order of read and write requests submitted at each node 4. Assume that there exists an interval of time longer than t in which no messages are lost. Further, assume an operation, θ completes before the interval begins, and another operation φ, begins after the interval ends. Then φ does not precede θ in the partial order P
  • 45. Weaker Consistency Conditions t-Connected Consistency This guarantee allows for some stale data when messages are lost, but provides a time limit on how long it takes for consistency to return, once the partition heals This definition can be generalized to provide consistency guarantees when only some of the nodes are connected and when connections are available only some of the time
  • 46. Weaker Consistency Conditions A variant of ”centralized algorithm” is t-Connected Consistent. Assume node C is the centralized node. The algorithm behaves as follows: read at node A : A sends a request to C from the most recent value. If A receives a response from C within time 2 ∗ tmsg + tlocal , it saves the value and returns it to the client. Otherwise, A concludes that a message was lost and it returns the value with the highest sequence number that has ever been received from C , or the initial value if no value has yet been received from C . (When a client read request occurs at C it acts like any other node, sending messages to itself)
  • 47. Weaker Consistency Conditions write at A : A sends a message to C with the new value. A waits 2 ∗ tmsg + tlocal , or until it receives an acknowledgement from C and then sends an acknowledgement to the client. At this point, either C has learned of the new value, or a message was lost, or both eventsoccured. If A concludes that a message was lost, it periodically retransmits the value to C (along with all values lost during earlier write operations) until it receives an acknowledgement from C . (As in the case of read operations, when a client write request occurs at C it acts like any other node, sending messages to itself) New value is received at C : C serializes the write requests that it hears about by assigning them consecutive integer tags. Periodically C broadcasts the latest value and sequence number to all other nodes.
  • 49. Weaker Consistency Conditions Theorem 4 The modified centralized algorithm is t-Connected consistent Proof: In executions in which no messages are lost, the operations are atomic. An execution is atomic if every operation acts as if it is executed at a single instant. In this case - that single instant occurs when C processes the operation. C serializes the operation, ensuring atomic consistency in executions in which all messages are delivered. We then examine executions in which messages are lost. The partial order P is constructed as follows. Write operations are ordered by the sequence number assigned by the central node.Each read operation is sequenced after the write operation whose value it returns. It is clear by construction that the partial order P satisfies criteria 1 and 2 of the definition of t-Connected consistency. As the algorithm handles requests in order received, criterion 3 is also clearly true
  • 50. Weaker Consistency Conditions In showing that the partial order respects criterion 4, there are four cases : write followed by read, write followed by write, read followed by read and read followed by write. Let time t be long enough for a write operation to complete (and for C to assign a sequence number to the new value), and from one of the periodic broadcasts from C to occur.
  • 51. Weaker Consistency Conditions 1. write followed by read : Assume a write occurs at Aw , after which an interval of time longer than t passes in which all messages are delivered. After this, a read is requested at some node. By the end of the interval, two things have happened. First, Aw has notified the central node of the new value, and the write operation has been assigned a sequence number. Second, the central node has rebroadcast that value (or later value in the partial order) to all other nodes during one of the periodic broadcasts. As a result , the read operation does not return an earlier value, and therefore it must come after the write in the partial order P
  • 53. Weaker Consistency Conditions 2. write followed by write : Assume a write occurs at Aw after which an interval of time longer than t passes in which all messages are delivered. After this a write is requested at some node. As in the previous case, by the end of the interval in which messages are delivered , the central node has assigned a sequence number to the write operation at Aw . As a result, the later write operation is sequenced by the central node after the first write operation. Therefore the second write comes after the first write in the partial order P.
  • 55. Weaker Consistency Conditions 3. read followed by read : Assume a read operation occurs at Br , after which an interval of time longer than t passes in which all messages are delivered. After this a read is requested at some node. Let φ be the write operation whose value value the first read operation at Br returns. By the end of the interval in which messages are delivered, the central node has assigned a sequence number to φ and has broadcast the value of φ (or a later value in the partial order) to all other nodes. As a result, the second read operation does not return a value earlier in the partial order than φ. Therefore the second read operation does not precede the first in the partial order P.
  • 57. Weaker Consistency Conditions 4. read followed by write : Assume a read operation occurs at Br , after which an interval of time longer than t passes in which all messages are delivered. After this a write is requested at some node. Let φ be the write operation whose value the first read operation at Br returns. By the end of the interval in which messages are delivered, the central node has assigned a sequence number to φ and as a result all write operations beginning after the interval are serialized after φ. Therefore the write operation does not precede the read operation in the partial order P.
  • 59. Weaker Consistency Conditions Therefore P satisfies criterion 4. of the definition and this algorithm is t-Connected Consistent.
  • 60. Conclusion We have shown that it impossible to reliably provide atomic, consistent data when there are partitions in the network It is feasible, however, to achieve any two of the three properties : consistency, availability and partition tolerance In an asynchronous model, when no clocks are available, the impossibility result is fairly strong : it is impossible to provide consistent data, even allowing stale data to be returned when messages are lost However, in partially synchronous models it is possible to achieve a practical compromise between consistency and availability