SlideShare a Scribd company logo
Introduction to Cassandra
Asis Mohanty
Source: DataStax, Cassandra
01ARCHITECTURE
Cassandra Architecture
02 C* CLUSTER
Cassandra Cluster
03CQL COLUMN
FAMILY
CQL Column Family
WRITE PATH04
Cassandra Write Path
READ PATH05
Cassandra Read Path
DATA
MODEL
06
C* Data Model
Whats is Cassandra?
• Open-source
database
management system
(DBMS)
• Several key features
of Cassandra
differentiate it from
other similar
systems
Whats is Cassandra?
Cassandra basics 2.0
Architected Scale in Mind
What is C* Cluster?
Peer To Peer
Data Centers
Tunable Consistency
Continuous Availability
Consistent Hashing
Consistent hashing allows distributing data across a cluster which minimizes reorganization
when nodes are added or removed. Consistent hashing partitions data based on the partition
key
name age car gender
jim 36 camaro M
carol 37 bmw F
johnny 12 M
suzy 10 F
For Example
Partition key Murmur3 hash value
jim -2245462676723223822
carol 7723358927203680754
johnny -6723372854036780875
suzy 1168604627387940318
Cassandra assigns a hash value to each partition key
What is a CQL table and how is it related to a column family?
Row is the smallest unit that stores related data in Cassandra
• Rows – individual rows constitute a column family
• Row key – uniquely identifies a row in a column family
• Row – stores pairs of column keys and column values
• Column key – uniquely identifies a column value in a row
• Column value – stores one value or a collection of values
What are row, row key, column key and column value?
What are partition, partition key, row, column and cell?
Cassandra basics 2.0
How does Cassandra writes so fast?
Cassandra is a log-structured storage engine
• Data is sequentially appended, not placed in pre-set locations
What are the key components of the write path?
Each node implements four key components to handle its writes
 Memtables – in-memory tables corresponding to CQL tables, with
indexes
 CommitLog – append-only log, replayed to restore downed node's
Memtables
 SSTables – Memtable snapshots periodically flushed to disk, clearing
heap
 Compaction – periodic process to merge and streamline SSTables
When any node receive any write request
 The record appends to the CommitLog, and
 The record appends to the Memtable for this record's target CQL table
 Periodically, Memtables flush to SSTables, clearing JVM heap and
CommitLog
 Periodically, Compaction runs to merge and streamline SSTables
How does the write path flow on a node?
What are Memtables and how are they flushed to disk?
What is a SSTable and what are its characteristics?
What is a SSTable and what are its characteristics?
What is compaction?
Cassandra basics 2.0
How does the read path flow on each node?
How does the read path flow on each node?
How does the read path flow on each node?
How does the read path flow on each node?
Cassandra basics 2.0
What is a data modeling framework?
ASample Data Model
What is a conceptual data model?
Partitioning
• Nodes are logically structured in Ring Topology.
• Hashed value of key associated with data partition is used to assign it
to a node in the ring.
• Hashing rounds off after certain value to support ring structure.
• Lightly loaded nodes moves position to alleviate highly loaded
nodes.
33
Appendix
Consistency – All the
servers in the system will
have the same data so
anyone using the system will
get the same copy
regardless of which server
answers their request.
Availability – The system
will always respond to a
request (even if it's not the
latest data or consistent
across the system or just a
message saying the system
isn't working)
Partition Tolerance – The
system continues to operate
as a whole even if individual
servers fail or can't be
reached..
CAP Theorem
CassandraArchitecture Overview
○ Cassandra was designed with the understanding that system/ hardware failures
can and do occur
○ Peer-to-peer, distributed system
○ All nodes are the same
○ Data partitioned among all nodes in the cluster
○ Custom data replication to ensure fault tolerance
○ Read/Write-anywhere design
○ Google BigTable - data model
○ Column Families
○ Memtables
○ SSTables
○ Amazon Dynamo - distributed systems technologies
○ Consistent hashing
○ Partitioning
○ Replication
○ One-hop routing
Transparent Elasticity
Nodes can be added and removed from Cassandra online, with no
downtime being experienced.
1
2
3
4
5
6
1
7
10
4
2
3
5
6
8
9
11
12
Transparent Scalability
Addition of Cassandra nodes increases performance linearly and
ability to manage TB’s-PB’s of data.
1
2
3
4
5
6
1
7
10
4
2
3
5
6
8
9
11
12
Performance
throughput = N
Performance
throughput = N x 2
HighAvailability
Cassandra, with its peer-to-peer architecture has no single point of
failure.
Multi-Geography/ZoneAware
Cassandra allows a single logical database to span 1-N datacenters
that are geographically dispersed. Also supports a hybrid on-
premise/Cloud implementation.
Data Redundancy
Cassandra allows for customizable data redundancy so that data is
completely protected. Also supports rack awareness (data can be
replicated between different racks to guard against machine/rack
failures).
uses ‘Zookeeper’ to
choose a leader
which tells nodes
the range they are
replicas for
Security in Cassandra
• Internal Authentication
 Manages login IDs and passwords inside the database.
• Object Permission Management
 Controls who has access to what and who can do what in the
database
 Uses familiar GRANT/REVOKE from relational systems.
• Client to Node Encryption
 Protects data in flight to and from a database

More Related Content

PPTX
Evaluating Apache Cassandra as a Cloud Database
PPTX
Cassandra Architecture FTW
PPTX
Cassandra an overview
PPT
Cassandra architecture
PDF
Intro to Cassandra
PDF
Cassandra overview
PPTX
Cassandra
PDF
Cassandra 101
Evaluating Apache Cassandra as a Cloud Database
Cassandra Architecture FTW
Cassandra an overview
Cassandra architecture
Intro to Cassandra
Cassandra overview
Cassandra
Cassandra 101

What's hot (20)

PPT
NOSQL Database: Apache Cassandra
PPTX
An Overview of Apache Cassandra
PDF
Apache Cassandra overview
PPT
Apache Cassandra training. Overview and Basics
PPTX
Apache Cassandra at the Geek2Geek Berlin
PDF
Apache Cassandra @Geneva JUG 2013.02.26
PDF
The Cassandra Distributed Database
PDF
Cassandra Database
PDF
Introduction to Cassandra
PPTX
Cassandra training
PDF
Cassandra background-and-architecture
PPTX
Presentation of Apache Cassandra
PPTX
Cassandra ppt 2
PPTX
Cassandra internals
PPTX
Cassandra tutorial
ODP
Intro to cassandra
PDF
Cassandra
PPTX
Introduction to NoSQL & Apache Cassandra
PDF
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
PPTX
Apache Cassandra
NOSQL Database: Apache Cassandra
An Overview of Apache Cassandra
Apache Cassandra overview
Apache Cassandra training. Overview and Basics
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra @Geneva JUG 2013.02.26
The Cassandra Distributed Database
Cassandra Database
Introduction to Cassandra
Cassandra training
Cassandra background-and-architecture
Presentation of Apache Cassandra
Cassandra ppt 2
Cassandra internals
Cassandra tutorial
Intro to cassandra
Cassandra
Introduction to NoSQL & Apache Cassandra
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra
Ad

Viewers also liked (6)

PDF
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
PPTX
Cassandra ppt 1
PDF
Introduction to Cassandra Basics
PPTX
Getting started with Cassandra 2.1
PPT
Management on Cloud 2011
PDF
Cassandra Explained
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra ppt 1
Introduction to Cassandra Basics
Getting started with Cassandra 2.1
Management on Cloud 2011
Cassandra Explained
Ad

Similar to Cassandra basics 2.0 (20)

PPTX
Appache Cassandra
PPT
Cassandra - A Distributed Database System
PPT
Storage cassandra
PDF
Cassandra
PPTX
Cassandra presentation
PDF
cassandra
PPT
Cassandra advanced part-ll
PPTX
Cassndra (4).pptx
PPTX
Cassandra implementation for collecting data and presenting data
PDF
Migrating Oracle database to Cassandra
PPTX
NoSql Database
PDF
Cassandra for Sysadmins
PPTX
Cassandra Learning
PDF
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
DOCX
Cassandra architecture
PPTX
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
PPTX
Cassandra advanced-I
PPTX
Apache Cassandra, part 1 – principles, data model
PPTX
Why Cassandra?
Appache Cassandra
Cassandra - A Distributed Database System
Storage cassandra
Cassandra
Cassandra presentation
cassandra
Cassandra advanced part-ll
Cassndra (4).pptx
Cassandra implementation for collecting data and presenting data
Migrating Oracle database to Cassandra
NoSql Database
Cassandra for Sysadmins
Cassandra Learning
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
Cassandra architecture
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Cassandra advanced-I
Apache Cassandra, part 1 – principles, data model
Why Cassandra?

More from Asis Mohanty (14)

PDF
Cloud Data Warehouses
PDF
Cloud Lambda Architecture Patterns
PDF
Apache TAJO
PDF
What is hadoop
PDF
Hadoop Architecture Options for Existing Enterprise DataWarehouse
PDF
Netezza vs Teradata vs Exadata
PDF
ETL tool evaluation criteria
PDF
COGNOS Vs OBIEE
PDF
Cognos vs Hyperion vs SSAS Comparison
PPTX
Reporting/Dashboard Evaluations
PDF
Oracle to Netezza Migration Casestudy
PDF
BI Error Processing Framework
PDF
Netezza vs teradata
PDF
Change data capture the journey to real time bi
Cloud Data Warehouses
Cloud Lambda Architecture Patterns
Apache TAJO
What is hadoop
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Netezza vs Teradata vs Exadata
ETL tool evaluation criteria
COGNOS Vs OBIEE
Cognos vs Hyperion vs SSAS Comparison
Reporting/Dashboard Evaluations
Oracle to Netezza Migration Casestudy
BI Error Processing Framework
Netezza vs teradata
Change data capture the journey to real time bi

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation theory and applications.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation theory and applications.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
sap open course for s4hana steps from ECC to s4
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Per capita expenditure prediction using model stacking based on satellite ima...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MIND Revenue Release Quarter 2 2025 Press Release
Network Security Unit 5.pdf for BCA BBA.
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectroscopy.pptx food analysis technology

Cassandra basics 2.0

  • 1. Introduction to Cassandra Asis Mohanty Source: DataStax, Cassandra
  • 2. 01ARCHITECTURE Cassandra Architecture 02 C* CLUSTER Cassandra Cluster 03CQL COLUMN FAMILY CQL Column Family WRITE PATH04 Cassandra Write Path READ PATH05 Cassandra Read Path DATA MODEL 06 C* Data Model Whats is Cassandra?
  • 3. • Open-source database management system (DBMS) • Several key features of Cassandra differentiate it from other similar systems Whats is Cassandra?
  • 6. What is C* Cluster?
  • 11. Consistent Hashing Consistent hashing allows distributing data across a cluster which minimizes reorganization when nodes are added or removed. Consistent hashing partitions data based on the partition key name age car gender jim 36 camaro M carol 37 bmw F johnny 12 M suzy 10 F For Example Partition key Murmur3 hash value jim -2245462676723223822 carol 7723358927203680754 johnny -6723372854036780875 suzy 1168604627387940318 Cassandra assigns a hash value to each partition key
  • 12. What is a CQL table and how is it related to a column family?
  • 13. Row is the smallest unit that stores related data in Cassandra • Rows – individual rows constitute a column family • Row key – uniquely identifies a row in a column family • Row – stores pairs of column keys and column values • Column key – uniquely identifies a column value in a row • Column value – stores one value or a collection of values What are row, row key, column key and column value?
  • 14. What are partition, partition key, row, column and cell?
  • 16. How does Cassandra writes so fast? Cassandra is a log-structured storage engine • Data is sequentially appended, not placed in pre-set locations
  • 17. What are the key components of the write path? Each node implements four key components to handle its writes  Memtables – in-memory tables corresponding to CQL tables, with indexes  CommitLog – append-only log, replayed to restore downed node's Memtables  SSTables – Memtable snapshots periodically flushed to disk, clearing heap  Compaction – periodic process to merge and streamline SSTables When any node receive any write request  The record appends to the CommitLog, and  The record appends to the Memtable for this record's target CQL table  Periodically, Memtables flush to SSTables, clearing JVM heap and CommitLog  Periodically, Compaction runs to merge and streamline SSTables
  • 18. How does the write path flow on a node?
  • 19. What are Memtables and how are they flushed to disk?
  • 20. What is a SSTable and what are its characteristics?
  • 21. What is a SSTable and what are its characteristics?
  • 24. How does the read path flow on each node?
  • 25. How does the read path flow on each node?
  • 26. How does the read path flow on each node?
  • 27. How does the read path flow on each node?
  • 29. What is a data modeling framework?
  • 31. What is a conceptual data model?
  • 32. Partitioning • Nodes are logically structured in Ring Topology. • Hashed value of key associated with data partition is used to assign it to a node in the ring. • Hashing rounds off after certain value to support ring structure. • Lightly loaded nodes moves position to alleviate highly loaded nodes.
  • 34. Consistency – All the servers in the system will have the same data so anyone using the system will get the same copy regardless of which server answers their request. Availability – The system will always respond to a request (even if it's not the latest data or consistent across the system or just a message saying the system isn't working) Partition Tolerance – The system continues to operate as a whole even if individual servers fail or can't be reached.. CAP Theorem
  • 35. CassandraArchitecture Overview ○ Cassandra was designed with the understanding that system/ hardware failures can and do occur ○ Peer-to-peer, distributed system ○ All nodes are the same ○ Data partitioned among all nodes in the cluster ○ Custom data replication to ensure fault tolerance ○ Read/Write-anywhere design ○ Google BigTable - data model ○ Column Families ○ Memtables ○ SSTables ○ Amazon Dynamo - distributed systems technologies ○ Consistent hashing ○ Partitioning ○ Replication ○ One-hop routing
  • 36. Transparent Elasticity Nodes can be added and removed from Cassandra online, with no downtime being experienced. 1 2 3 4 5 6 1 7 10 4 2 3 5 6 8 9 11 12
  • 37. Transparent Scalability Addition of Cassandra nodes increases performance linearly and ability to manage TB’s-PB’s of data. 1 2 3 4 5 6 1 7 10 4 2 3 5 6 8 9 11 12 Performance throughput = N Performance throughput = N x 2
  • 38. HighAvailability Cassandra, with its peer-to-peer architecture has no single point of failure.
  • 39. Multi-Geography/ZoneAware Cassandra allows a single logical database to span 1-N datacenters that are geographically dispersed. Also supports a hybrid on- premise/Cloud implementation.
  • 40. Data Redundancy Cassandra allows for customizable data redundancy so that data is completely protected. Also supports rack awareness (data can be replicated between different racks to guard against machine/rack failures). uses ‘Zookeeper’ to choose a leader which tells nodes the range they are replicas for
  • 41. Security in Cassandra • Internal Authentication  Manages login IDs and passwords inside the database. • Object Permission Management  Controls who has access to what and who can do what in the database  Uses familiar GRANT/REVOKE from relational systems. • Client to Node Encryption  Protects data in flight to and from a database