SlideShare a Scribd company logo
Sergey Titov
Software Architect
@sergtitov
AgendaAGENDA
• Cassandra Architecture
• CAP theorem and Consistency
• Scalability
• Astyanax client
• Data Modeling
• Queries
• DataStax OpsCenter
• Resources
Cassandra architectureARCHITECTURE
• Ring
• P2P
• Gossip
• Key hash-based sharding
CAP TheoremCAPTHEOREM
Consistency in CassandraCONSISTENCY
• ACID - Atomicity Consistency Isolation Durability
• BASE - Basically Available Soft-state Eventual consistency
• Isolation on the row level
• Atomic batches starting Cassandra 1.2
• Consistency level for READs and WRITEs set for every request
• Tunable consistency
• Log: CL_WRITE = ANY or ONE
• Strong: CL_READ + CL_WRITE > REPLICATION_FACTOR
• Recommended default: LOCAL_QUORUM
Consistency in Cassandra - continuedCONSISTENCY
Level Description
ANY
A write must be written to at least one node. If all replica nodes
for the given row key are down, the write can still succeed once
a hinted handoff has been written. Note that if all replica nodes
are down at write time, an ANY write will not be readable until
the replica nodes for that row key have recovered.
ONE
A write must be written to the commit log and memory table of
at least one replica node.
QUORUM
A write must be written to the commit log and memory table on
a quorum of replica nodes.
LOCAL_QUORUM
A write must be written to the commit log and memory table on
a quorum of replica nodes in the same data center as the
coordinator node. Avoids latency of inter-data center
communication.
EACH_QUORUM
A write must be written to the commit log and memory table on
a quorum of replica nodes in all data centers.
ALL
A write must be written to the commit log and memory table on
all replica nodes in the cluster for that row key.
Write Data FlowARCHITECTURE
Multiple Data CentersARCHITECTURE
ScalabilitySCALABILITY
Astyanax clientASTYANAX
• Based on Hector
• High level, simple object oriented interface to Cassandra.
• Fail-over behavior on the client side.
• Connection pool abstraction (round robin connection pool)
• Monitoring to get event notification from the connection pool.
• Complete encapsulation of the underlying Thrift API.
• Automatic retry of downed hosts.
• Automatic discovery of additional hosts in the cluster.
• Suspension of hosts for a short period of time after timeouts.
Astyanax – token aware clientASTYANAX
Data Modeling in CassandraDATAMODELING
• Column Families are NOT tables!
• Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
• Values could be and often are stored in column names
• Number of columns could be different for different rows
• There could be 2 billions columns in one row!
• Use UUIDs
• Separate read-heavy from write-heavy data
Data Modeling in Cassandra - continuedDATAMODELING
• Client joins
• Denormalize data
• Wide rows
• Materialized views
• Model around queries
• Row key is “shard” key
Modeling nested entities and documentsDATAMODELING
Motivation
• Parent-child decomposition lacks performance in Cassandra.
• No JOIN operator in CQL!
• The only solution is to store tree-like structure with nested “children”
• Cassandra doesn’t have built-in support for a document object
Solution
• Column Families are NOT tables
• Domain object fields are traversed along with the nested entities
• Collection and Map fields (of any level of deepness) are unwrapped
into plain key-value pairs (mapped to Cassandra column name – value)
Modeling nested entities and documents. ExampleDATAMODELING
class Parent {
@Id
private UUID id;
@Column
private String stringField1;
@NestedCollection
private Map<String, byte[]> imageMap;
@NestedCollection
private List<Child> children;
}
class Child {
@Column
private Integer kidsNumber;
}
Modeling nested entities and documents. ExampleDATAMODELING
Let’s use JSON notation:
If Parent is
{
“id” : “edc39a6c-355f-4ad0-a4de-
b2103dbd610d”,
“stringField1” : “value1”,
“imageMap”: [
“name1” : “SW1hZ2VEYXRhMQ==“,
“name2” : “SW1hZ2VEYXRhMg==“
],
“children” : [
{
“kidsNumber” : 1
},
{
“kidsNumber” : 2
}
]
}
the corresponding Cassandra columns will be:
• “id” -> “edc39a6c-355f-4ad0-a4de-
b2103dbd610d”
• “stringField1” -> “value1”
• “imageMap:name1” -> “SW1…MQ==“
• “imageMap:name2” -> “SW1…MQ==“
• “children:0:kidsNumber” -> 1
• “children:1:kidsNumber” -> 2
Range queries in CassandraQUERIES
Motivation
• No CQL equivalent for SQL clause:
WHERE “field_name” >= value1 and “field_name” <= value2
• For indexed fields the only possible query is
WHERE “field_name” [<,>,<=,>=,=] “value” but “field_name” can be
specified in a cql query only once
Solution
• Any name of Cassandra column is a byte buffer ~ byte [] columnName
• Column names (in comparison with the values)
may be filtered by the specified range,
i.e. if two border values
• byte [] lowMargin,
• byte [] highMargin
are defined it is possible to select columns with columName
WHERE columnName >= lowMargin AND columnName <= highMargin
• As there are ~ 2 bln columns can be persisted for the same key
it is possible to search quickly among lists of size < 2 * 10^9
Composite Column FamiliesQUERIES
Motivation
• Raw untyped column names are not convenient in processing.
• If there are 2 or more components of a column name serialized
to a same byte buffer it is hard to build quick search on a single part.
For instance, let’s introduce column name consisting of two components:
• person_name: String
• time_stamp: Date
How to build a column range returning all the previously persisted
combinations of person_name = “Tom” and time_stamp >= “1999-01-01” and
time_stamp <= “2012-01-01”?
Solution
Cassandra has built-in CompositeType comparator which can be defined for
number of components and sorts columns first by component number 0, 1, …
Composite Column Families - mappingQUERIES
public class ReferenceCategoryValue {
@Id
private String category; //maps to row key
@Component(ordinal = 0) //the following three fields are serialized
private UUID id; //into a column name
@Component(ordinal = 1)
private String description;
@Component(ordinal = 2)
private String code;
@Value
private String value // the value which is saved for the column
}
DataStax OpsCenterOPSCENTER
ResourcesRESOURCES
• DataStax Documentation
• Free Cassandra Academy
• Tutorials
• Apache Cassandra Home Page
• Cassandra Summit Presentations
• 2014 Summit Videos
• Netflix blog
• Astyanax
• Ebay Cassandra Data Modeling best practices part 1 and part 2

More Related Content

PDF
Cassandra Explained
KEY
Scaling Twitter with Cassandra
PPT
The No SQL Principles and Basic Application Of Casandra Model
PPT
Apache cassandra
PDF
Indexing in Cassandra
PDF
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
PDF
Introduction to Cassandra
PDF
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Explained
Scaling Twitter with Cassandra
The No SQL Principles and Basic Application Of Casandra Model
Apache cassandra
Indexing in Cassandra
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Introduction to Cassandra
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example

What's hot (20)

PDF
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
PDF
Deep Dive into Cassandra
PPTX
NoSql Database
PPT
Elk presentation1#3
PPTX
Introduction to NoSQL & Apache Cassandra
PPTX
Cassandra Summit 2015: Intro to DSE Search
PDF
Introduction to data modeling with apache cassandra
PDF
Cassandra Basics, Counters and Time Series Modeling
PDF
Apache Spark and DataStax Enablement
PDF
Helsinki Cassandra Meetup #2: From Postgres to Cassandra
PPT
Introduction to cassandra
PDF
DTCC '14 Spark Runtime Internals
PDF
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
PPTX
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
PDF
Time series with Apache Cassandra - Long version
PDF
Mysqlconf2013 mariadb-cassandra-interoperability
PDF
Spark Streaming with Cassandra
PPTX
Spark Introduction
PPTX
Apache Cassandra
PPTX
Tuning and Debugging in Apache Spark
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Deep Dive into Cassandra
NoSql Database
Elk presentation1#3
Introduction to NoSQL & Apache Cassandra
Cassandra Summit 2015: Intro to DSE Search
Introduction to data modeling with apache cassandra
Cassandra Basics, Counters and Time Series Modeling
Apache Spark and DataStax Enablement
Helsinki Cassandra Meetup #2: From Postgres to Cassandra
Introduction to cassandra
DTCC '14 Spark Runtime Internals
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Time series with Apache Cassandra - Long version
Mysqlconf2013 mariadb-cassandra-interoperability
Spark Streaming with Cassandra
Spark Introduction
Apache Cassandra
Tuning and Debugging in Apache Spark
Ad

Similar to Cassandra Overview (20)

PPTX
NoSQL - Cassandra & MongoDB.pptx
PPT
Scaling web applications with cassandra presentation
PPTX
Cassandra Learning
PPTX
Apache Cassandra at the Geek2Geek Berlin
PDF
On Rails with Apache Cassandra
PPTX
Apache Cassandra, part 1 – principles, data model
PPTX
Cassandra Java APIs Old and New – A Comparison
PPTX
Appache Cassandra
PDF
Cassandra
ODP
Intro to cassandra
PPTX
Cassandra implementation for collecting data and presenting data
PPTX
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
PPTX
Cassandra training
PPTX
cassandra_presentation_final
PPT
Storage cassandra
PPTX
Cassandra - A decentralized storage system
PPT
Cassandra - A Distributed Database System
PDF
Cassandra Talk: Austin JUG
PDF
Chicago Kafka Meetup
PPTX
N07_RoundII_20220405.pptx
NoSQL - Cassandra & MongoDB.pptx
Scaling web applications with cassandra presentation
Cassandra Learning
Apache Cassandra at the Geek2Geek Berlin
On Rails with Apache Cassandra
Apache Cassandra, part 1 – principles, data model
Cassandra Java APIs Old and New – A Comparison
Appache Cassandra
Cassandra
Intro to cassandra
Cassandra implementation for collecting data and presenting data
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Cassandra training
cassandra_presentation_final
Storage cassandra
Cassandra - A decentralized storage system
Cassandra - A Distributed Database System
Cassandra Talk: Austin JUG
Chicago Kafka Meetup
N07_RoundII_20220405.pptx
Ad

Recently uploaded (20)

PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
AI in Product Development-omnex systems
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
ai tools demonstartion for schools and inter college
PPTX
history of c programming in notes for students .pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Transform Your Business with a Software ERP System
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
System and Network Administraation Chapter 3
PPTX
Essential Infomation Tech presentation.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
AI in Product Development-omnex systems
CHAPTER 2 - PM Management and IT Context
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Navsoft: AI-Powered Business Solutions & Custom Software Development
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
Design an Analysis of Algorithms II-SECS-1021-03
Which alternative to Crystal Reports is best for small or large businesses.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
ai tools demonstartion for schools and inter college
history of c programming in notes for students .pptx
PTS Company Brochure 2025 (1).pdf.......
L1 - Introduction to python Backend.pptx
Transform Your Business with a Software ERP System
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Upgrade and Innovation Strategies for SAP ERP Customers
System and Network Administraation Chapter 3
Essential Infomation Tech presentation.pptx

Cassandra Overview

  • 2. AgendaAGENDA • Cassandra Architecture • CAP theorem and Consistency • Scalability • Astyanax client • Data Modeling • Queries • DataStax OpsCenter • Resources
  • 3. Cassandra architectureARCHITECTURE • Ring • P2P • Gossip • Key hash-based sharding
  • 5. Consistency in CassandraCONSISTENCY • ACID - Atomicity Consistency Isolation Durability • BASE - Basically Available Soft-state Eventual consistency • Isolation on the row level • Atomic batches starting Cassandra 1.2 • Consistency level for READs and WRITEs set for every request • Tunable consistency • Log: CL_WRITE = ANY or ONE • Strong: CL_READ + CL_WRITE > REPLICATION_FACTOR • Recommended default: LOCAL_QUORUM
  • 6. Consistency in Cassandra - continuedCONSISTENCY Level Description ANY A write must be written to at least one node. If all replica nodes for the given row key are down, the write can still succeed once a hinted handoff has been written. Note that if all replica nodes are down at write time, an ANY write will not be readable until the replica nodes for that row key have recovered. ONE A write must be written to the commit log and memory table of at least one replica node. QUORUM A write must be written to the commit log and memory table on a quorum of replica nodes. LOCAL_QUORUM A write must be written to the commit log and memory table on a quorum of replica nodes in the same data center as the coordinator node. Avoids latency of inter-data center communication. EACH_QUORUM A write must be written to the commit log and memory table on a quorum of replica nodes in all data centers. ALL A write must be written to the commit log and memory table on all replica nodes in the cluster for that row key.
  • 10. Astyanax clientASTYANAX • Based on Hector • High level, simple object oriented interface to Cassandra. • Fail-over behavior on the client side. • Connection pool abstraction (round robin connection pool) • Monitoring to get event notification from the connection pool. • Complete encapsulation of the underlying Thrift API. • Automatic retry of downed hosts. • Automatic discovery of additional hosts in the cluster. • Suspension of hosts for a short period of time after timeouts.
  • 11. Astyanax – token aware clientASTYANAX
  • 12. Data Modeling in CassandraDATAMODELING • Column Families are NOT tables! • Map<RowKey, SortedMap<ColumnKey, ColumnValue>> • Values could be and often are stored in column names • Number of columns could be different for different rows • There could be 2 billions columns in one row! • Use UUIDs • Separate read-heavy from write-heavy data
  • 13. Data Modeling in Cassandra - continuedDATAMODELING • Client joins • Denormalize data • Wide rows • Materialized views • Model around queries • Row key is “shard” key
  • 14. Modeling nested entities and documentsDATAMODELING Motivation • Parent-child decomposition lacks performance in Cassandra. • No JOIN operator in CQL! • The only solution is to store tree-like structure with nested “children” • Cassandra doesn’t have built-in support for a document object Solution • Column Families are NOT tables • Domain object fields are traversed along with the nested entities • Collection and Map fields (of any level of deepness) are unwrapped into plain key-value pairs (mapped to Cassandra column name – value)
  • 15. Modeling nested entities and documents. ExampleDATAMODELING class Parent { @Id private UUID id; @Column private String stringField1; @NestedCollection private Map<String, byte[]> imageMap; @NestedCollection private List<Child> children; } class Child { @Column private Integer kidsNumber; }
  • 16. Modeling nested entities and documents. ExampleDATAMODELING Let’s use JSON notation: If Parent is { “id” : “edc39a6c-355f-4ad0-a4de- b2103dbd610d”, “stringField1” : “value1”, “imageMap”: [ “name1” : “SW1hZ2VEYXRhMQ==“, “name2” : “SW1hZ2VEYXRhMg==“ ], “children” : [ { “kidsNumber” : 1 }, { “kidsNumber” : 2 } ] } the corresponding Cassandra columns will be: • “id” -> “edc39a6c-355f-4ad0-a4de- b2103dbd610d” • “stringField1” -> “value1” • “imageMap:name1” -> “SW1…MQ==“ • “imageMap:name2” -> “SW1…MQ==“ • “children:0:kidsNumber” -> 1 • “children:1:kidsNumber” -> 2
  • 17. Range queries in CassandraQUERIES Motivation • No CQL equivalent for SQL clause: WHERE “field_name” >= value1 and “field_name” <= value2 • For indexed fields the only possible query is WHERE “field_name” [<,>,<=,>=,=] “value” but “field_name” can be specified in a cql query only once Solution • Any name of Cassandra column is a byte buffer ~ byte [] columnName • Column names (in comparison with the values) may be filtered by the specified range, i.e. if two border values • byte [] lowMargin, • byte [] highMargin are defined it is possible to select columns with columName WHERE columnName >= lowMargin AND columnName <= highMargin • As there are ~ 2 bln columns can be persisted for the same key it is possible to search quickly among lists of size < 2 * 10^9
  • 18. Composite Column FamiliesQUERIES Motivation • Raw untyped column names are not convenient in processing. • If there are 2 or more components of a column name serialized to a same byte buffer it is hard to build quick search on a single part. For instance, let’s introduce column name consisting of two components: • person_name: String • time_stamp: Date How to build a column range returning all the previously persisted combinations of person_name = “Tom” and time_stamp >= “1999-01-01” and time_stamp <= “2012-01-01”? Solution Cassandra has built-in CompositeType comparator which can be defined for number of components and sorts columns first by component number 0, 1, …
  • 19. Composite Column Families - mappingQUERIES public class ReferenceCategoryValue { @Id private String category; //maps to row key @Component(ordinal = 0) //the following three fields are serialized private UUID id; //into a column name @Component(ordinal = 1) private String description; @Component(ordinal = 2) private String code; @Value private String value // the value which is saved for the column }
  • 21. ResourcesRESOURCES • DataStax Documentation • Free Cassandra Academy • Tutorials • Apache Cassandra Home Page • Cassandra Summit Presentations • 2014 Summit Videos • Netflix blog • Astyanax • Ebay Cassandra Data Modeling best practices part 1 and part 2