SlideShare a Scribd company logo
Apache Cassandra
What is Apache Cassandra?
Apache Cassandra is an open source non relational distributed
database that manages large amounts of data across commodity
servers.
It is column oriented database.
It was initially released in July 2008.
It comes under Availability and Partition Tolerance.
Why Apache Cassandra was implemented?
Avinash Lakshman and Prashant Malik initially
developed Apache Cassandra at Facebook to power the
Facebook inbox search feature.
Components of Apache Cassandra
• Node: A Cassandra node is a place where data is stored.
• Data center: Data center is a collection of related nodes.
• Cluster: A cluster is a component which contains one or more data centers.
• Commit log: In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is
written to the commit log.
• Memtable: A memtable is a memory-resident data structure. After commit log, the data will be written
to the mem-table. Sometimes, for a single-column family, there will be multiple memtables.
• SSTable: It is a disk file to which the data is flushed from the memtable when its contents reach a
threshold value.
• Bloom filter: These are nothing but quick, nondeterministic, algorithms for testing whether an element
is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
Apache Cassandra Architecture:
Write Operations:
i. Cassandra stores the data in memory structure in memtable(RAM)
when the initial write request is generated from the client.
Concurrently the writes are written on Commit log(disk)as well
which are permanent even if the light goes off for the node.
ii. The data from the memtable(RAM) is flushed to the SSTables(Disk)
and the partition index is also created that points to the location of
data in the disk. The flushing of data from memtable(RAM) to
SSTables(Disk) is done using the configurable threshold or when the
commit log threshold commitlog_total_space_in_mb is exceeded.
iii. The Data is written on the SSTables tables which are immutable
which means when the memtable is flushed the data is not
overwritten in SSTables despite a new file being created. The
partitions are stored on multiple SSTables so that they can be easily
searched.
Apache Cassandra.pptx
Read Operations:
i. The Read request will be made from the client.
ii. The request data will be checked in the memtable(RAM). If the
requested data is present then data will be read from memtable(RAM)
and merged with SSTables(DISK) files to send final data to the client.
iii. If the row cache is enabled then it will be checked to find the data.
iv. Bloom Filters are loaded in the Heap memory that will be checked to
find out the SSTables file that can store the requested partition data.
Since Bloom Filters works on probabilistic function and can return false
positives. In some cases Bloom Filters does not return the SSTable file
then Cassandra further checks in the partition key cache.
v. Partition Key Cache is used to store the partition index in heap memory
and the partition index of data will be searched in that. If the Partition
Key is present in the Partition Key Cache then Cassandra will go to
compression offset to find the Disk that has the data. If the Partition Key
is not present in the Partition Key Cache then the partition summary is
searched to find user-requested data.
vi. Partition Index is used to store the Partition key of the data that will
be used in the Compression offset map to find out the exact location
of the Disk which has stored the data.
vii. Compression offset map is used to hold the exact location of data. It
uses the Partition key to locate that. Once the Compression offset
map indicates the location where data is stored the further process is
to fetch the data and share it with the user.
Features of Apache Cassandra:
Distributive
Scalability
Fault Tolerance
Query Language
Virtual Nodes:
A virtual node is the data storage layer within a server. There are
256 virtual nodes per server by default. Each node has a range of
tokens assigned. Every virtual node uses a sub-range of tokens from
the node they belong to. These virtual nodes provide greater
flexibility in the system. Consequently, It is easier for Cassandra to
add new nodes to the cluster when we need them. When our data
has unequally distributed tokens between nodes, we can easily
extend the storage capacity by extending virtual nodes to the more
loaded node.
Apache Cassandra.pptx
Advantages of Apache Cassandra:
Open source
Peer to Peer Architecture
Scalable
High Efficiency
Consistency adjustable
Schema Less
Easy to Learn and Use
Distributed and Decentralized
Ability to Analyse
Disadvantages of Apache Cassandra:
It does not support ACID and relational data properties.
Because it handles large amounts of data and many requests,
transactions slow down, meaning you get latency issues.
Data is modelled around queries and not structure, resulting in the
same information stored multiple times.
Since Cassandra stores vast amounts of data, users may experience
JVM memory management issues.
It offers no join or subquery support.
Cassandra does not support aggregates
Cassandra was optimized from the start for fast writes, reading got
the short end of the stick, so it tends to be slower.
Finally, it was lacks official documentation from Apache, so you need
to look for it among third party companies.

More Related Content

PDF
cassandra
PPTX
Cassandra tutorial
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
PDF
An Introduction to Apache Cassandra
PDF
Cassandra 101
PPTX
cassandra.pptx
PPT
5266732.ppt
cassandra
Cassandra tutorial
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
An Introduction to Apache Cassandra
Cassandra 101
cassandra.pptx
5266732.ppt

Similar to Apache Cassandra.pptx (20)

PPTX
Cassandra an overview
PDF
Using cassandra as a distributed logging to store pb data
PDF
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
PPTX
Cassandra for mission critical data
ODP
Intro to cassandra
PPTX
Cassandra Learning
PPTX
Cassandra & Python - Springfield MO User Group
PPTX
Learn Cassandra at edureka!
PPTX
Cassandra Tutorial
PPTX
Why Cassandra?
PPTX
Getting started with Cassandra 2.1
PPT
NOSQL Database: Apache Cassandra
PPTX
Cassandra's Sweet Spot - an introduction to Apache Cassandra
PDF
Deep Dive into Cassandra
PPT
Cassandra advanced part-ll
PDF
Introduction to Cassandra Concepts and its usage
PPTX
Introduction to Apache Cassandra and support within WSO2 Platform
PPTX
Cassandra - A decentralized storage system
PDF
04-Introduction-to-CassandraDB-.pdf
PPTX
Presentation of Apache Cassandra
Cassandra an overview
Using cassandra as a distributed logging to store pb data
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
Cassandra for mission critical data
Intro to cassandra
Cassandra Learning
Cassandra & Python - Springfield MO User Group
Learn Cassandra at edureka!
Cassandra Tutorial
Why Cassandra?
Getting started with Cassandra 2.1
NOSQL Database: Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Deep Dive into Cassandra
Cassandra advanced part-ll
Introduction to Cassandra Concepts and its usage
Introduction to Apache Cassandra and support within WSO2 Platform
Cassandra - A decentralized storage system
04-Introduction-to-CassandraDB-.pdf
Presentation of Apache Cassandra
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
KodekX | Application Modernization Development
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
Teaching material agriculture food technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Mobile App Security Testing_ A Comprehensive Guide.pdf
KodekX | Application Modernization Development
Network Security Unit 5.pdf for BCA BBA.
Teaching material agriculture food technology
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
Ad

Apache Cassandra.pptx

  • 1. Apache Cassandra What is Apache Cassandra? Apache Cassandra is an open source non relational distributed database that manages large amounts of data across commodity servers. It is column oriented database. It was initially released in July 2008. It comes under Availability and Partition Tolerance.
  • 2. Why Apache Cassandra was implemented? Avinash Lakshman and Prashant Malik initially developed Apache Cassandra at Facebook to power the Facebook inbox search feature.
  • 3. Components of Apache Cassandra • Node: A Cassandra node is a place where data is stored. • Data center: Data center is a collection of related nodes. • Cluster: A cluster is a component which contains one or more data centers. • Commit log: In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is written to the commit log. • Memtable: A memtable is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple memtables. • SSTable: It is a disk file to which the data is flushed from the memtable when its contents reach a threshold value. • Bloom filter: These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
  • 5. Write Operations: i. Cassandra stores the data in memory structure in memtable(RAM) when the initial write request is generated from the client. Concurrently the writes are written on Commit log(disk)as well which are permanent even if the light goes off for the node. ii. The data from the memtable(RAM) is flushed to the SSTables(Disk) and the partition index is also created that points to the location of data in the disk. The flushing of data from memtable(RAM) to SSTables(Disk) is done using the configurable threshold or when the commit log threshold commitlog_total_space_in_mb is exceeded. iii. The Data is written on the SSTables tables which are immutable which means when the memtable is flushed the data is not overwritten in SSTables despite a new file being created. The partitions are stored on multiple SSTables so that they can be easily searched.
  • 7. Read Operations: i. The Read request will be made from the client. ii. The request data will be checked in the memtable(RAM). If the requested data is present then data will be read from memtable(RAM) and merged with SSTables(DISK) files to send final data to the client. iii. If the row cache is enabled then it will be checked to find the data. iv. Bloom Filters are loaded in the Heap memory that will be checked to find out the SSTables file that can store the requested partition data. Since Bloom Filters works on probabilistic function and can return false positives. In some cases Bloom Filters does not return the SSTable file then Cassandra further checks in the partition key cache. v. Partition Key Cache is used to store the partition index in heap memory and the partition index of data will be searched in that. If the Partition Key is present in the Partition Key Cache then Cassandra will go to compression offset to find the Disk that has the data. If the Partition Key is not present in the Partition Key Cache then the partition summary is searched to find user-requested data.
  • 8. vi. Partition Index is used to store the Partition key of the data that will be used in the Compression offset map to find out the exact location of the Disk which has stored the data. vii. Compression offset map is used to hold the exact location of data. It uses the Partition key to locate that. Once the Compression offset map indicates the location where data is stored the further process is to fetch the data and share it with the user.
  • 9. Features of Apache Cassandra: Distributive Scalability Fault Tolerance Query Language
  • 10. Virtual Nodes: A virtual node is the data storage layer within a server. There are 256 virtual nodes per server by default. Each node has a range of tokens assigned. Every virtual node uses a sub-range of tokens from the node they belong to. These virtual nodes provide greater flexibility in the system. Consequently, It is easier for Cassandra to add new nodes to the cluster when we need them. When our data has unequally distributed tokens between nodes, we can easily extend the storage capacity by extending virtual nodes to the more loaded node.
  • 12. Advantages of Apache Cassandra: Open source Peer to Peer Architecture Scalable High Efficiency Consistency adjustable Schema Less Easy to Learn and Use Distributed and Decentralized Ability to Analyse
  • 13. Disadvantages of Apache Cassandra: It does not support ACID and relational data properties. Because it handles large amounts of data and many requests, transactions slow down, meaning you get latency issues. Data is modelled around queries and not structure, resulting in the same information stored multiple times. Since Cassandra stores vast amounts of data, users may experience JVM memory management issues. It offers no join or subquery support. Cassandra does not support aggregates Cassandra was optimized from the start for fast writes, reading got the short end of the stick, so it tends to be slower.
  • 14. Finally, it was lacks official documentation from Apache, so you need to look for it among third party companies.