SlideShare a Scribd company logo
Cassandra
Data Modeling
Presented By: Charmy Garg
Software Consultant
Knoldus Inc.
01 Keys in Cassandra
02 Basic Goals
03 Model your own Queries
04 Applying Rules: Examples
05 Glance at Use cases
Our Agenda
What is Apache Cassandra?
Cassandra vs Relational
Cassandra Data Model Relational Data Model
Keyspace Database
Column family Table
Partition key Primary Key
Column Name/Key Column Name
Column value Column value
Equivalent to the Partition
Key in a single-field-key
table (i.e. Simple).
Just any multiple-column
key.
Responsible for data
distribution across your
nodes.
Responsible for data
sorting within the
partition.
1
2
3
4
Primary Key
Composite Key
Partition Key
Clustering Key
“Keys to Recall for
Cassandra Data
Modeling”
Primary Key
Composite Key
Clustering Key &
Partition Key
How Cassandra organizes data
Partitioning and Hashing
Non-Goals
Minimize Data
Duplication
Minimize the
Number of
Writes
As Cassandra is a
distributed database,
so data duplication
provides instant data
availability and no
single point of failure.
Cassandra is
optimized for high
write throughput,
and almost all writes
are equally efficient.
2
1
4
1Spread data evenly
around the cluster
Rows are spread around the cluster
based on a hash of the partition key,
which is the first element of the PRIMARY
KEY. So, the key to spreading data evenly
is this: pick a good primary key.
Minimize the number of
partitions read
Partitions are groups of rows
that share the same partition
key. When you issue a read
query, you want to read rows
from as few partitions as
possible.
Basic Goals
Model Your Data
The way to minimize partition reads is to model your data to fit your queries. Don't model around
relations. Don't model around objects. Model around your queries. Here's how you do that:
Determine what queries
you want to support
Create table according to
your queries
Step 2Step 1
www.website.com
Try to determine exactly what queries you need to support. This can
include a lot of considerations that you may not think of at first. For
example, you may need to think about:
● Grouping by an attribute
● Ordering by an attribute
● Filtering based on some set of conditions
● Enforcing uniqueness in the result set
Changes to just one of these query requirements will frequently warrant a data model change for maximum
efficiency.
Step 1:
Determine What Queries to Support
www.website.com
Use one table per query pattern. If you need to support multiple query
patterns, you usually need more than one table.
If you need different types of answers, you usually need different tables. This is how you optimize for reads.
Remember, in Cassandra data duplication is okay. Many of your tables may repeat the same data.
Step 2:
Create table for Queries
Applying the Rules: Examples
c
Example 1:
Table Music
Playlist
In the example, table Music Playlist,
● SongId is the partition key, and
● SongName is the clustering
column
c
Example 1:
Table Music
Playlist
In the example, table Music Playlist,
● SongId and Year are the
partition key, and
● SongName is the clustering
column.
Glance at Use Cases
Use Case 1
Suppose that we are storing Facebook posts of different users in
Cassandra.
Query: Fetch the top ‘N‘ posts made by a given user.
We require user_id, post_id and content as fields. The Cassandra
table schema for this use case would look like:
Stores all data for a particular user on a single
partition as per the above guidelines.
Using the post timestamp as the clustering key will
be helpful for retrieving the top ‘N‘ posts more
efficiently.
Use Case 2
Suppose that we are storing the details of different partner gyms
across the different cities and states of many countries.
Query: Fetch the sorted gyms for a given city.
We require country_code, state, city, gym_name and opening_date
as fields.
The Cassandra table schema for this use case would look like:
Also, let’s say we need to return the results having
gyms sorted by their opening date.
Store the gyms located in a given city of a specific
state and country on a single partition and use the
opening date and gym name as a clustering key.
References
Baeldung - Cassandra Data Modeling
Guru99 - Data Modeling rules in Cassandra
Simple Learn - Cassandra Data Modeling
Datastax - Cassandra Data Modeling rules
Q&A
Please email your queries at
charmy.garg@knoldus.in
Thank You!
@charmygarg
@charmygarg
/facebook.com/charmiigarg

More Related Content

PPTX
How rss works
PPTX
Database management system of facebook
PPTX
data science chapter-4,5,6
PPTX
MS Sql Server: Introduction To Database Concepts
PPT
Slick Data Sharding: Slides from DrupalCon London
PPTX
Database optimization
PDF
White paper on cassandra
PPT
Storage cassandra
How rss works
Database management system of facebook
data science chapter-4,5,6
MS Sql Server: Introduction To Database Concepts
Slick Data Sharding: Slides from DrupalCon London
Database optimization
White paper on cassandra
Storage cassandra

Similar to Cassandra Data Modelling (20)

DOCX
Cassandra data modelling best practices
PDF
Mongodb in-anger-boston-rb-2011
PPTX
Learning Cassandra NoSQL
PPTX
Chapter 5 design of keyvalue databses from nosql for mere mortals
PPTX
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
PPTX
Presentation
PPTX
http://www.hfadeel.com/Blog/?p=151
PDF
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
PDF
DOCX
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
PPTX
Document databases
PPTX
Choosing your NoSQL storage
PPTX
Using Cassandra with your Web Application
PPTX
NOSQL and MongoDB Database
PPTX
Mongo db
PPTX
NoSQL - A Closer Look to Couchbase
PPTX
What Your Database Query is Really Doing
PPTX
Introduction to cassandra
PPSX
Annotating search results from web databases-IEEE Transaction Paper 2013
PDF
MongoDB performance
Cassandra data modelling best practices
Mongodb in-anger-boston-rb-2011
Learning Cassandra NoSQL
Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Presentation
http://www.hfadeel.com/Blog/?p=151
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Document databases
Choosing your NoSQL storage
Using Cassandra with your Web Application
NOSQL and MongoDB Database
Mongo db
NoSQL - A Closer Look to Couchbase
What Your Database Query is Really Doing
Introduction to cassandra
Annotating search results from web databases-IEEE Transaction Paper 2013
MongoDB performance
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
PPTX
Self-Healing Test Automation Framework - Healenium
PPTX
Kanban Metrics Presentation (Project Management)
PPTX
Java 17 features and implementation.pptx
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
PPTX
GraalVM - A Step Ahead of JVM Presentation
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
DAPR - Distributed Application Runtime Presentation
PPTX
Introduction to Azure Virtual WAN Presentation
PPTX
Introduction to Argo Rollouts Presentation
PPTX
Intro to Azure Container App Presentation
PPTX
Insights Unveiled Test Reporting and Observability Excellence
PPTX
Introduction to Splunk Presentation (DevOps)
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
PPTX
AWS: Messaging Services in AWS Presentation
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
PPTX
Managing State & HTTP Requests In Ionic.
Angular Hydration Presentation (FrontEnd)
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Self-Healing Test Automation Framework - Healenium
Kanban Metrics Presentation (Project Management)
Java 17 features and implementation.pptx
Chaos Mesh Introducing Chaos in Kubernetes
GraalVM - A Step Ahead of JVM Presentation
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
DAPR - Distributed Application Runtime Presentation
Introduction to Azure Virtual WAN Presentation
Introduction to Argo Rollouts Presentation
Intro to Azure Container App Presentation
Insights Unveiled Test Reporting and Observability Excellence
Introduction to Splunk Presentation (DevOps)
Code Camp - Data Profiling and Quality Analysis Framework
AWS: Messaging Services in AWS Presentation
Amazon Cognito: A Primer on Authentication and Authorization
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Managing State & HTTP Requests In Ionic.
Ad

Recently uploaded (20)

PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
project resource management chapter-09.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
STKI Israel Market Study 2025 version august
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Getting Started with Data Integration: FME Form 101
PDF
August Patch Tuesday
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Hybrid model detection and classification of lung cancer
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
WOOl fibre morphology and structure.pdf for textiles
NewMind AI Weekly Chronicles – August ’25 Week III
project resource management chapter-09.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Programs and apps: productivity, graphics, security and other tools
STKI Israel Market Study 2025 version august
cloud_computing_Infrastucture_as_cloud_p
Getting Started with Data Integration: FME Form 101
August Patch Tuesday
Chapter 5: Probability Theory and Statistics
Hybrid model detection and classification of lung cancer
Getting started with AI Agents and Multi-Agent Systems
Zenith AI: Advanced Artificial Intelligence
NewMind AI Weekly Chronicles - August'25-Week II
1 - Historical Antecedents, Social Consideration.pdf
A comparative study of natural language inference in Swahili using monolingua...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Enhancing emotion recognition model for a student engagement use case through...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf

Cassandra Data Modelling

  • 1. Cassandra Data Modeling Presented By: Charmy Garg Software Consultant Knoldus Inc.
  • 2. 01 Keys in Cassandra 02 Basic Goals 03 Model your own Queries 04 Applying Rules: Examples 05 Glance at Use cases Our Agenda
  • 3. What is Apache Cassandra?
  • 4. Cassandra vs Relational Cassandra Data Model Relational Data Model Keyspace Database Column family Table Partition key Primary Key Column Name/Key Column Name Column value Column value
  • 5. Equivalent to the Partition Key in a single-field-key table (i.e. Simple). Just any multiple-column key. Responsible for data distribution across your nodes. Responsible for data sorting within the partition. 1 2 3 4 Primary Key Composite Key Partition Key Clustering Key “Keys to Recall for Cassandra Data Modeling”
  • 11. Non-Goals Minimize Data Duplication Minimize the Number of Writes As Cassandra is a distributed database, so data duplication provides instant data availability and no single point of failure. Cassandra is optimized for high write throughput, and almost all writes are equally efficient.
  • 12. 2 1 4 1Spread data evenly around the cluster Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key. Minimize the number of partitions read Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible. Basic Goals
  • 13. Model Your Data The way to minimize partition reads is to model your data to fit your queries. Don't model around relations. Don't model around objects. Model around your queries. Here's how you do that: Determine what queries you want to support Create table according to your queries Step 2Step 1
  • 14. www.website.com Try to determine exactly what queries you need to support. This can include a lot of considerations that you may not think of at first. For example, you may need to think about: ● Grouping by an attribute ● Ordering by an attribute ● Filtering based on some set of conditions ● Enforcing uniqueness in the result set Changes to just one of these query requirements will frequently warrant a data model change for maximum efficiency. Step 1: Determine What Queries to Support
  • 15. www.website.com Use one table per query pattern. If you need to support multiple query patterns, you usually need more than one table. If you need different types of answers, you usually need different tables. This is how you optimize for reads. Remember, in Cassandra data duplication is okay. Many of your tables may repeat the same data. Step 2: Create table for Queries
  • 17. c Example 1: Table Music Playlist In the example, table Music Playlist, ● SongId is the partition key, and ● SongName is the clustering column
  • 18. c Example 1: Table Music Playlist In the example, table Music Playlist, ● SongId and Year are the partition key, and ● SongName is the clustering column.
  • 19. Glance at Use Cases
  • 20. Use Case 1 Suppose that we are storing Facebook posts of different users in Cassandra. Query: Fetch the top ‘N‘ posts made by a given user. We require user_id, post_id and content as fields. The Cassandra table schema for this use case would look like: Stores all data for a particular user on a single partition as per the above guidelines. Using the post timestamp as the clustering key will be helpful for retrieving the top ‘N‘ posts more efficiently.
  • 21. Use Case 2 Suppose that we are storing the details of different partner gyms across the different cities and states of many countries. Query: Fetch the sorted gyms for a given city. We require country_code, state, city, gym_name and opening_date as fields. The Cassandra table schema for this use case would look like: Also, let’s say we need to return the results having gyms sorted by their opening date. Store the gyms located in a given city of a specific state and country on a single partition and use the opening date and gym name as a clustering key.
  • 22. References Baeldung - Cassandra Data Modeling Guru99 - Data Modeling rules in Cassandra Simple Learn - Cassandra Data Modeling Datastax - Cassandra Data Modeling rules
  • 23. Q&A Please email your queries at charmy.garg@knoldus.in