SlideShare a Scribd company logo
Cassandra
Replication & Consistency

  Benjamin Black, b@b3k.us
        2010-04-28
Dynamo                         BigTable
     Cluster                         Sparse,
 management,                     columnar data
replication, fault               model, storage
   tolerance                      architecture
                     Cassandra
Dynamo-like
 Features
Symmetric, P2P architecture
 No special nodes/SPOFs
Gossip-based cluster management
Distributed hash table for data
placement
 Pluggable partitioning
 Pluggable topology discovery
 Pluggable placement strategies
Tunable, eventual consistency
BigTable-like
  Features
Sparse, “columnar” data model
 Optional, 2-level maps called
 Super Column Families
SSTable disk storage
 Append-only commit log
 Memtable (buffer and sort)
 Immutable SSTable files
Hadoop integration
Topic(s) for Today

    Replication
         &
    Consistency
[1]
Replication
How many copies of each piece
  of data do we want in the
           system?

            N=3
Consistency
     Level
  How many replicas must
respond to declare success?
W=2                 R=2


       ?
CL.Options
WRITE                                       READ
 Level     Description       Level     Description

 ZERO     Cross fingers

 ANY
                 WEAK
          1st Response
         (including HH)
 ONE      1st Response       ONE      1st Response



              STRONG
QUORUM   N/2 + 1 replicas   QUORUM   N/2 + 1 replicas

 ALL       All replicas      ALL       All replicas
A Side Note on
      CL
        Consistency
        Level is based
        on Replication
        Factor (N), not
        on the number
        of nodes in the
        system.
A Question of
       Time
       row



             column    column      column      column      column

             value      value       value       value       value

        timestamp     timestamp   timestamp   timestamp   timestamp




All columns have a value and a timestamp
Timestamps provided by clients
   usec resolution by convention
Latest timestamp wins
Vector clocks may be introduced in 0.7
Read Repair
      ?




Query all replicas on every read
  Data from one replica
  Checksum/timestamps from all
  others
If there is a mismatch:
  Pull all data and merge
  Write back to out of sync replicas
Weak vs. Strong
Weak Consistency
(reads)Perform repair after
returning results

      Strong Consistency (reads)
    Perform repair before returning
                             results
R+W>N

  Please imagine this inequality has huge fangs, dripping with the
blood of innocent, enterprise developers so you can best appreciate
                        the terror it inspires.
Our Guarantee
R+W>N guarantees overlap of
  read and write quorums


 W=2                 R=2

           N=3
A Matter of
Perspective
       View
    consistency



                Replica
              consistency
[2]
The Ring
           0
  range
                  113

375               125


 312
           250
Tokens
A TOKEN is a
partitioner-dependent
element on the ring
                  Each NODE has a
                  single, unique TOKEN

   Each NODE claims a RANGE of
   the ring from its TOKEN to the
   token of the previous node on
   the ring
Partitioning
    Map from Key Space to Token

RandomPartitioner
  Tokens are integers in the range 0-2127
  MD5(Key) -> Token
  Good: Even key distribution, Bad:
  Inefficient range queries
OrderPreservingPartitioner
  Tokens are UTF8 strings in the range ‘’-∞
  Key -> Token
  Good: Efficient range queries, Bad:
  Uneven key distribution
Snitching
     Map from Nodes to Physical
             Location
EndpointSnitch
  Guess at rack and datacenter based on IP address octets.


DatacenterEndpointSnitch
  Specify IP subnets for racks, grouped per datacenter.


PropertySnitch
  Specify arbitrary mappings from individual IP addresses to
  racks and datacenters.


            Or write your own!
Placement
  Map from Token Space to Nodes


The first replica is always placed
on the node that claims the
range in which the token falls.

Strategies determine where the
rest of the replicas are placed.
RackUnaware
    Place replicas on the N-1
subsequent nodes around the ring,
       ignoring topology.

datacenter A            datacenter B

     rack 1    rack 2        rack 1    rack 2
RackAware
Place the second replica in another
datacenter, and the remaining N-2
replicas on nodes in other racks in
       the same datacenter.
datacenter A             datacenter B

     rack 1     rack 2        rack 1    rack 2
DatacenterShard
Place M of the N replicas in another
 datacenter, and the remaining N -
 (M + 1) replicas on nodes in other
   racks in the same datacenter.
datacenter A            datacenter B

     rack 1    rack 2        rack 1    rack 2
Or write your own!
[fin]
Cassandra
http://cassandra.apache.org
Amazon Dynamo
   http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html




       Google BigTable
                http://labs.google.com/papers/bigtable.html




Facebook Cassandra
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
Thank you!
 Questions?

More Related Content

PDF
Introduction to Cassandra
PDF
Cassandra Database
PDF
Cassandra overview
PPTX
An Overview of Apache Cassandra
PDF
Understanding Data Consistency in Apache Cassandra
PDF
Understanding Data Partitioning and Replication in Apache Cassandra
PPTX
Apache Flink and what it is used for
PDF
Machine learning and big data @ uber a tale of two systems
Introduction to Cassandra
Cassandra Database
Cassandra overview
An Overview of Apache Cassandra
Understanding Data Consistency in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
Apache Flink and what it is used for
Machine learning and big data @ uber a tale of two systems

What's hot (20)

PPTX
Presentation of Apache Cassandra
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
Cassandra by example - the path of read and write requests
PPTX
Cassandra
PPTX
Data Federation with Apache Spark
PPTX
Appache Cassandra
PPTX
Ozone: An Object Store in HDFS
PPTX
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
PPTX
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
PDF
Design Patterns for Distributed Non-Relational Databases
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
PPTX
Introduction to Apache ZooKeeper
PPTX
Introduction to NoSQL Databases
ODP
Introduction to Apache Cassandra
PDF
Introduction to Redis
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
PPTX
PDF
Apache Kafka Architecture & Fundamentals Explained
Presentation of Apache Cassandra
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Apache Iceberg - A Table Format for Hige Analytic Datasets
Cassandra by example - the path of read and write requests
Cassandra
Data Federation with Apache Spark
Appache Cassandra
Ozone: An Object Store in HDFS
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Design Patterns for Distributed Non-Relational Databases
From cache to in-memory data grid. Introduction to Hazelcast.
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Introduction to Apache ZooKeeper
Introduction to NoSQL Databases
Introduction to Apache Cassandra
Introduction to Redis
File Format Benchmark - Avro, JSON, ORC and Parquet
Apache Kafka Architecture & Fundamentals Explained
Ad

Viewers also liked (20)

KEY
Replication, Durability, and Disaster Recovery
PDF
Indexing in Cassandra
PPTX
How to size up an Apache Cassandra cluster (Training)
PDF
Cassandra NoSQL Tutorial
PDF
Cassandra at NoSql Matters 2012
PPTX
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
PPTX
User Inspired Management of Scientific Jobs in Grids and Clouds
PDF
Cassandra By Example: Data Modelling with CQL3
PPTX
Lect 07 data replication
PDF
Cassandra: Two data centers and great performance
PDF
IBM InfoSphere Data Replication for Big Data
PPTX
Large partition in Cassandra
PPT
Cassandra Data Model
PDF
Introduction to Cassandra Basics
PPTX
Learning Cassandra
PDF
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
PDF
Introduction to Apache Cassandra
PPTX
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
PDF
Cassandra Explained
PPTX
Cassandra under the hood
Replication, Durability, and Disaster Recovery
Indexing in Cassandra
How to size up an Apache Cassandra cluster (Training)
Cassandra NoSQL Tutorial
Cassandra at NoSql Matters 2012
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
User Inspired Management of Scientific Jobs in Grids and Clouds
Cassandra By Example: Data Modelling with CQL3
Lect 07 data replication
Cassandra: Two data centers and great performance
IBM InfoSphere Data Replication for Big Data
Large partition in Cassandra
Cassandra Data Model
Introduction to Cassandra Basics
Learning Cassandra
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Introduction to Apache Cassandra
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Cassandra Explained
Cassandra under the hood
Ad

Similar to Introduction to Cassandra: Replication and Consistency (20)

PDF
Dynamo: Not Just For Datastores
PDF
Design Patterns For Distributed NO-reational databases
PDF
Cassandra for Sysadmins
PPT
Handling Data in Mega Scale Web Systems
ODP
Distributed Coordination
PPTX
Dynamo cassandra
PDF
Renegotiating the boundary between database latency and consistency
PPTX
NoSql Database
PPTX
Cassandra & Python - Springfield MO User Group
PPT
NOSQL Database: Apache Cassandra
PDF
Distributed Database Consistency: Architectural Considerations and Tradeoffs
PPTX
Talk about apache cassandra, TWJUG 2011
PPTX
Talk About Apache Cassandra
PPTX
Basics of Distributed Systems - Distributed Storage
PDF
Distribute Key Value Store
PDF
Distribute key value_store
PPTX
Compilers Are Databases
PPTX
Apache Cassandra, part 1 – principles, data model
PPTX
Cassandra Architecture
PDF
Self healing data
Dynamo: Not Just For Datastores
Design Patterns For Distributed NO-reational databases
Cassandra for Sysadmins
Handling Data in Mega Scale Web Systems
Distributed Coordination
Dynamo cassandra
Renegotiating the boundary between database latency and consistency
NoSql Database
Cassandra & Python - Springfield MO User Group
NOSQL Database: Apache Cassandra
Distributed Database Consistency: Architectural Considerations and Tradeoffs
Talk about apache cassandra, TWJUG 2011
Talk About Apache Cassandra
Basics of Distributed Systems - Distributed Storage
Distribute Key Value Store
Distribute key value_store
Compilers Are Databases
Apache Cassandra, part 1 – principles, data model
Cassandra Architecture
Self healing data

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Modernizing your data center with Dell and AMD
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Modernizing your data center with Dell and AMD
Building Integrated photovoltaic BIPV_UPV.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Network Security Unit 5.pdf for BCA BBA.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Unlocking AI with Model Context Protocol (MCP)

Introduction to Cassandra: Replication and Consistency

  • 1. Cassandra Replication & Consistency Benjamin Black, b@b3k.us 2010-04-28
  • 2. Dynamo BigTable Cluster Sparse, management, columnar data replication, fault model, storage tolerance architecture Cassandra
  • 3. Dynamo-like Features Symmetric, P2P architecture No special nodes/SPOFs Gossip-based cluster management Distributed hash table for data placement Pluggable partitioning Pluggable topology discovery Pluggable placement strategies Tunable, eventual consistency
  • 4. BigTable-like Features Sparse, “columnar” data model Optional, 2-level maps called Super Column Families SSTable disk storage Append-only commit log Memtable (buffer and sort) Immutable SSTable files Hadoop integration
  • 5. Topic(s) for Today Replication & Consistency
  • 6. [1]
  • 7. Replication How many copies of each piece of data do we want in the system? N=3
  • 8. Consistency Level How many replicas must respond to declare success? W=2 R=2 ?
  • 9. CL.Options WRITE READ Level Description Level Description ZERO Cross fingers ANY WEAK 1st Response (including HH) ONE 1st Response ONE 1st Response STRONG QUORUM N/2 + 1 replicas QUORUM N/2 + 1 replicas ALL All replicas ALL All replicas
  • 10. A Side Note on CL Consistency Level is based on Replication Factor (N), not on the number of nodes in the system.
  • 11. A Question of Time row column column column column column value value value value value timestamp timestamp timestamp timestamp timestamp All columns have a value and a timestamp Timestamps provided by clients usec resolution by convention Latest timestamp wins Vector clocks may be introduced in 0.7
  • 12. Read Repair ? Query all replicas on every read Data from one replica Checksum/timestamps from all others If there is a mismatch: Pull all data and merge Write back to out of sync replicas
  • 13. Weak vs. Strong Weak Consistency (reads)Perform repair after returning results Strong Consistency (reads) Perform repair before returning results
  • 14. R+W>N Please imagine this inequality has huge fangs, dripping with the blood of innocent, enterprise developers so you can best appreciate the terror it inspires.
  • 15. Our Guarantee R+W>N guarantees overlap of read and write quorums W=2 R=2 N=3
  • 16. A Matter of Perspective View consistency Replica consistency
  • 17. [2]
  • 18. The Ring 0 range 113 375 125 312 250
  • 19. Tokens A TOKEN is a partitioner-dependent element on the ring Each NODE has a single, unique TOKEN Each NODE claims a RANGE of the ring from its TOKEN to the token of the previous node on the ring
  • 20. Partitioning Map from Key Space to Token RandomPartitioner Tokens are integers in the range 0-2127 MD5(Key) -> Token Good: Even key distribution, Bad: Inefficient range queries OrderPreservingPartitioner Tokens are UTF8 strings in the range ‘’-∞ Key -> Token Good: Efficient range queries, Bad: Uneven key distribution
  • 21. Snitching Map from Nodes to Physical Location EndpointSnitch Guess at rack and datacenter based on IP address octets. DatacenterEndpointSnitch Specify IP subnets for racks, grouped per datacenter. PropertySnitch Specify arbitrary mappings from individual IP addresses to racks and datacenters. Or write your own!
  • 22. Placement Map from Token Space to Nodes The first replica is always placed on the node that claims the range in which the token falls. Strategies determine where the rest of the replicas are placed.
  • 23. RackUnaware Place replicas on the N-1 subsequent nodes around the ring, ignoring topology. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2
  • 24. RackAware Place the second replica in another datacenter, and the remaining N-2 replicas on nodes in other racks in the same datacenter. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2
  • 25. DatacenterShard Place M of the N replicas in another datacenter, and the remaining N - (M + 1) replicas on nodes in other racks in the same datacenter. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2
  • 27. [fin]
  • 29. Amazon Dynamo http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html Google BigTable http://labs.google.com/papers/bigtable.html Facebook Cassandra http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf