Cassandra Data Modelling

Cassandra
Data Modeling
Presented By: Charmy Garg
Software Consultant
Knoldus Inc.

01 Keys in Cassandra
02 Basic Goals
03 Model your own Queries
04 Applying Rules: Examples
05 Glance at Use cases
Our Agenda

Cassandra vs Relational
Cassandra Data Model Relational Data Model
Keyspace Database
Column family Table
Partition key Primary Key
Column Name/Key Column Name
Column value Column value

Equivalent to the Partition
Key in a single-field-key
table (i.e. Simple).
Just any multiple-column
key.
Responsible for data
distribution across your
nodes.
Responsible for data
sorting within the
partition.
1
2
3
4
Primary Key
Composite Key
Partition Key
Clustering Key
“Keys to Recall for
Cassandra Data
Modeling”

Clustering Key &
Partition Key

Non-Goals
Minimize Data
Duplication
Minimize the
Number of
Writes
As Cassandra is a
distributed database,
so data duplication
provides instant data
availability and no
single point of failure.
Cassandra is
optimized for high
write throughput,
and almost all writes
are equally efficient.

2
1
4
1Spread data evenly
around the cluster
Rows are spread around the cluster
based on a hash of the partition key,
which is the first element of the PRIMARY
KEY. So, the key to spreading data evenly
is this: pick a good primary key.
Minimize the number of
partitions read
Partitions are groups of rows
that share the same partition
key. When you issue a read
query, you want to read rows
from as few partitions as
possible.
Basic Goals

Model Your Data
The way to minimize partition reads is to model your data to fit your queries. Don't model around
relations. Don't model around objects. Model around your queries. Here's how you do that:
Determine what queries
you want to support
Create table according to
your queries
Step 2Step 1

www.website.com
Try to determine exactly what queries you need to support. This can
include a lot of considerations that you may not think of at first. For
example, you may need to think about:
● Grouping by an attribute
● Ordering by an attribute
● Filtering based on some set of conditions
● Enforcing uniqueness in the result set
Changes to just one of these query requirements will frequently warrant a data model change for maximum
efficiency.
Step 1:
Determine What Queries to Support

www.website.com
Use one table per query pattern. If you need to support multiple query
patterns, you usually need more than one table.
If you need different types of answers, you usually need different tables. This is how you optimize for reads.
Remember, in Cassandra data duplication is okay. Many of your tables may repeat the same data.
Step 2:
Create table for Queries

c
Example 1:
Table Music
Playlist
In the example, table Music Playlist,
● SongId is the partition key, and
● SongName is the clustering
column

c
Example 1:
Table Music
Playlist
In the example, table Music Playlist,
● SongId and Year are the
partition key, and
● SongName is the clustering
column.

Use Case 1
Suppose that we are storing Facebook posts of different users in
Cassandra.
Query: Fetch the top ‘N‘ posts made by a given user.
We require user_id, post_id and content as fields. The Cassandra
table schema for this use case would look like:
Stores all data for a particular user on a single
partition as per the above guidelines.
Using the post timestamp as the clustering key will
be helpful for retrieving the top ‘N‘ posts more
efficiently.

Use Case 2
Suppose that we are storing the details of different partner gyms
across the different cities and states of many countries.
Query: Fetch the sorted gyms for a given city.
We require country_code, state, city, gym_name and opening_date
as fields.
The Cassandra table schema for this use case would look like:
Also, let’s say we need to return the results having
gyms sorted by their opening date.
Store the gyms located in a given city of a specific
state and country on a single partition and use the
opening date and gym name as a clustering key.

References
Baeldung - Cassandra Data Modeling
Guru99 - Data Modeling rules in Cassandra
Simple Learn - Cassandra Data Modeling
Datastax - Cassandra Data Modeling rules

Q&A
Please email your queries at
charmy.garg@knoldus.in

Thank You!
@charmygarg
@charmygarg
/facebook.com/charmiigarg

Cassandra Data Modelling

More Related Content

Similar to Cassandra Data Modelling (20)

More from Knoldus Inc. (20)

Recently uploaded (20)

Cassandra Data Modelling