SlideShare a Scribd company logo
Getting started with Apache
Cassandra and Python
By
Adnan Siddiqi
(http://adnansiddiqi.me)
What is Apache Cassandra?
• According to Wikipedia:
Apache Cassandra is a free and open-source,
distributed, wide column store, NoSQL database
management system designed to handle large
amounts of data across many commodity servers,
providing high availability with no single point of
failure. Cassandra offers robust support for
clusters spanning multiple datacenters,[1] with
asynchronous masterless replication allowing low
latency operations for all clients.
History
• Developed by two Facebook engineers to deal
with search mechanism of Inbox.
• Released as an open-source project after few
years.
• Handed over to Apache Foundation.
Companies using Cassandra
• Apple
• Netflix
• eBay
• Weather Channel
Architecture
Architecture(Contd…)
• Node:- The basic component of the data, a
machine where the data is stored.
• Datacenter:- A collection of related nodes. It
can be a physical datacenter or virtual.
• Cluster:- A cluster contains one or more
datacenters, it could span across locations.
• Commit Log:- Every write operation is first
stored in the commit log. It is used for crash
recovery.
Architecture(Contd…)
• Mem-Table:- After data is written to the
commit log it then is stored in Mem-
Table(Memory Table) which remains there till
it reaches to the threshold.
• SSTable:- Sorted-String Table or SSTable is a
disk file which stores data from MemTable
once it reaches to the threshold. SSTables are
stored on disk sequentially and maintained for
each database table.
Write Operations
Write Operations(Contd…)
• Write request is stored in both CommitLog to
make sure that data is saved.
• Data is written in Memtable which holds data
till it reaches to threshold.
• Data is flused to SSTable once Memtable
reaches to its threshold.
• The node that accepts requests called
Coordinator.
Read Operations
• Direct Request:- The coordinator node sends
the read request to one of the replicas.
• Digest:- The coordinator contacts the replicas
specified by the consistency level. The
contacted nodes respond with a digest
request of the required data. Comparison
takes place to make sure that the update data
is sent back.
Replication Strategies
• Simple Strategy
• Network Topology
Simple Strategy
• It is used when you have only one data center.
It places the first replica on the node selected
by the partitioner. A partitioner determines
how data is distributed across the nodes in the
cluster (including replicas). After that,
remaining replicas are placed in a clockwise
direction in the Node ring.
Simple Strategy(Contd…)
Network Topology Strategy
• Deployments across multiple Datacenters.
• This strategy places replicas in the same
datacenter by traversing the ring clockwise
until reaching the first node in another rack.
• This strategy is highly recommended for
scalability purpose and future expansion.
Network Topology Strategy(Contd…)
Installation and Setup
• Dockerized Version.
• docker pull cassandra
• Make sure to set the Docker memory to 4GB
atleast to avoid 137 exit error code.
Installation and Setup(Contd…)
• data docker exec -it cas1
nodetool status
CQL Shell
GUI Client
Cassandra Data Modeling
Cassandra Data Modeling
Cassandra Data Modeling
• Keyspace:- It is the container collection of
column families. You can think of it as a
Database in the RDBMS world.
• Column Family:- A column family is a
container for an ordered collection of rows.
Each row, in turn, is an ordered collection of
columns. Think of it as a Table in the RDBMS
world.
Cassandra Data Modeling(Contd…)
Cassandra Data Modeling(Contd…)
Creating KeySpace
• Creating Keyspace with name CityInfo.
• create keyspace CityInfo with
replication = {'class' :
'SimpleStrategy',
'replication_factor':2}
Designing Modeling Goals
• Evenly spread of data in a cluster.
• Minimize the number of Reads.
Demo
Data Clustering
Cassandra and Python
• pip install cassandra-driver
Reading Data
from cassandra.cluster import Cluster
if __name__ == "__main__":
cluster = Cluster(['0.0.0.0'],port=9042)
session =
cluster.connect('cityinfo',wait_for_all_pools=T
rue)
session.execute('USE cityinfo')
rows = session.execute('SELECT * FROM
users')
for row in rows:
print(row.age,row.name,row.username)
The End

More Related Content

PDF
Cassandra background-and-architecture
PDF
Introducing gluster filesystem by aditya
PPTX
Apache Cassandra Lunch #70: Basics of Apache Cassandra
PPTX
Cassandra presentation
PDF
What’s new in Alluxio 2: from seamless operations to structured data management
PDF
Using Ceph for Large Hadron Collider Data
PDF
Openstack For Beginners
PPTX
Cassandra - A Basic Introduction Guide
Cassandra background-and-architecture
Introducing gluster filesystem by aditya
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Cassandra presentation
What’s new in Alluxio 2: from seamless operations to structured data management
Using Ceph for Large Hadron Collider Data
Openstack For Beginners
Cassandra - A Basic Introduction Guide

What's hot (20)

PPTX
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
PDF
What we Learned About Application Resiliency When the Data Center Burned Down
PPTX
Cassandra database design best practises
PPTX
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
PPT
Cassandra architecture
DOCX
Cassandra data modelling best practices
PPTX
New Ceph capabilities and Reference Architectures
PPTX
Cassandra
PDF
Cassandra basics 2.0
PDF
Distributed storage system
PDF
Glusterfs and openstack
PPTX
Need for Time series Database
PDF
Alluxio Data Orchestration Platform for the Cloud
PDF
Apache Cassandra in the Real World
PPTX
Cassandra vs Databases
PPTX
Migrating from a Relational Database to Cassandra: Why, Where, When and How
PPTX
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
PPTX
Survey of distributed storage system
PPTX
Hedvig & ClusterHQ - Persistent, portable storage for Docker
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
What we Learned About Application Resiliency When the Data Center Burned Down
Cassandra database design best practises
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Cassandra architecture
Cassandra data modelling best practices
New Ceph capabilities and Reference Architectures
Cassandra
Cassandra basics 2.0
Distributed storage system
Glusterfs and openstack
Need for Time series Database
Alluxio Data Orchestration Platform for the Cloud
Apache Cassandra in the Real World
Cassandra vs Databases
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Survey of distributed storage system
Hedvig & ClusterHQ - Persistent, portable storage for Docker
Ad

Similar to Apache cassandra (20)

PPTX
cybersecurity notes for mca students for learning
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
PPTX
Big data and hadoop
PPTX
Apache Spark
PDF
Pythian: My First 100 days with a Cassandra Cluster
PPTX
Cassandra
PDF
04-Introduction-to-CassandraDB-.pdf
PPTX
BigData Developers MeetUp
PPTX
Multivariate algorithms in distributed data processing computing.pptx
PPTX
Multivariate algorithms in distributed data processing computing.pptx
PPTX
Cassandra training
PPTX
Apache Cassandra introduction
PPTX
CASSANDRA apache cassandra apacheee.pptx
PPTX
Cassandra Tutorial
PPTX
Cassandra implementation for collecting data and presenting data
PPTX
L6.sp17.pptx
PPTX
Cassandra tutorial
PPTX
Cassandra - A decentralized storage system
PDF
cassandra
cybersecurity notes for mca students for learning
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Big data and hadoop
Apache Spark
Pythian: My First 100 days with a Cassandra Cluster
Cassandra
04-Introduction-to-CassandraDB-.pdf
BigData Developers MeetUp
Multivariate algorithms in distributed data processing computing.pptx
Multivariate algorithms in distributed data processing computing.pptx
Cassandra training
Apache Cassandra introduction
CASSANDRA apache cassandra apacheee.pptx
Cassandra Tutorial
Cassandra implementation for collecting data and presenting data
L6.sp17.pptx
Cassandra tutorial
Cassandra - A decentralized storage system
cassandra
Ad

More from Adnan Siddiqi (6)

PPTX
Map filter reduce in Python
PPTX
Python Decorators
PPTX
Python Advance Tutorial - Advance Functions
PPTX
Exception handling in Python
PPTX
Tips every developer should know to improve site performance
PPTX
Learning Dockers - Step by Step
Map filter reduce in Python
Python Decorators
Python Advance Tutorial - Advance Functions
Exception handling in Python
Tips every developer should know to improve site performance
Learning Dockers - Step by Step

Recently uploaded (20)

PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
history of c programming in notes for students .pptx
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Cost to Outsource Software Development in 2025
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Nekopoi APK 2025 free lastest update
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
assetexplorer- product-overview - presentation
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PDF
CCleaner Pro 6.38.11537 Crack Final Latest Version 2025
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
CHAPTER 2 - PM Management and IT Context
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
Weekly report ppt - harsh dattuprasad patel.pptx
history of c programming in notes for students .pptx
Patient Appointment Booking in Odoo with online payment
Cost to Outsource Software Development in 2025
Design an Analysis of Algorithms I-SECS-1021-03
Nekopoi APK 2025 free lastest update
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Digital Systems & Binary Numbers (comprehensive )
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Autodesk AutoCAD Crack Free Download 2025
assetexplorer- product-overview - presentation
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
iTop VPN Free 5.6.0.5262 Crack latest version 2025
CCleaner Pro 6.38.11537 Crack Final Latest Version 2025
Monitoring Stack: Grafana, Loki & Promtail
iTop VPN Crack Latest Version Full Key 2025
Reimagine Home Health with the Power of Agentic AI​
CHAPTER 2 - PM Management and IT Context

Apache cassandra

  • 1. Getting started with Apache Cassandra and Python By Adnan Siddiqi (http://adnansiddiqi.me)
  • 2. What is Apache Cassandra? • According to Wikipedia: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,[1] with asynchronous masterless replication allowing low latency operations for all clients.
  • 3. History • Developed by two Facebook engineers to deal with search mechanism of Inbox. • Released as an open-source project after few years. • Handed over to Apache Foundation.
  • 4. Companies using Cassandra • Apple • Netflix • eBay • Weather Channel
  • 6. Architecture(Contd…) • Node:- The basic component of the data, a machine where the data is stored. • Datacenter:- A collection of related nodes. It can be a physical datacenter or virtual. • Cluster:- A cluster contains one or more datacenters, it could span across locations. • Commit Log:- Every write operation is first stored in the commit log. It is used for crash recovery.
  • 7. Architecture(Contd…) • Mem-Table:- After data is written to the commit log it then is stored in Mem- Table(Memory Table) which remains there till it reaches to the threshold. • SSTable:- Sorted-String Table or SSTable is a disk file which stores data from MemTable once it reaches to the threshold. SSTables are stored on disk sequentially and maintained for each database table.
  • 9. Write Operations(Contd…) • Write request is stored in both CommitLog to make sure that data is saved. • Data is written in Memtable which holds data till it reaches to threshold. • Data is flused to SSTable once Memtable reaches to its threshold. • The node that accepts requests called Coordinator.
  • 10. Read Operations • Direct Request:- The coordinator node sends the read request to one of the replicas. • Digest:- The coordinator contacts the replicas specified by the consistency level. The contacted nodes respond with a digest request of the required data. Comparison takes place to make sure that the update data is sent back.
  • 11. Replication Strategies • Simple Strategy • Network Topology
  • 12. Simple Strategy • It is used when you have only one data center. It places the first replica on the node selected by the partitioner. A partitioner determines how data is distributed across the nodes in the cluster (including replicas). After that, remaining replicas are placed in a clockwise direction in the Node ring.
  • 14. Network Topology Strategy • Deployments across multiple Datacenters. • This strategy places replicas in the same datacenter by traversing the ring clockwise until reaching the first node in another rack. • This strategy is highly recommended for scalability purpose and future expansion.
  • 16. Installation and Setup • Dockerized Version. • docker pull cassandra • Make sure to set the Docker memory to 4GB atleast to avoid 137 exit error code.
  • 17. Installation and Setup(Contd…) • data docker exec -it cas1 nodetool status
  • 22. Cassandra Data Modeling • Keyspace:- It is the container collection of column families. You can think of it as a Database in the RDBMS world. • Column Family:- A column family is a container for an ordered collection of rows. Each row, in turn, is an ordered collection of columns. Think of it as a Table in the RDBMS world.
  • 25. Creating KeySpace • Creating Keyspace with name CityInfo. • create keyspace CityInfo with replication = {'class' : 'SimpleStrategy', 'replication_factor':2}
  • 26. Designing Modeling Goals • Evenly spread of data in a cluster. • Minimize the number of Reads.
  • 27. Demo
  • 29. Cassandra and Python • pip install cassandra-driver
  • 30. Reading Data from cassandra.cluster import Cluster if __name__ == "__main__": cluster = Cluster(['0.0.0.0'],port=9042) session = cluster.connect('cityinfo',wait_for_all_pools=T rue) session.execute('USE cityinfo') rows = session.execute('SELECT * FROM users') for row in rows: print(row.age,row.name,row.username)