SlideShare a Scribd company logo
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadin

Chief Evangelist for Apache Cassandra
Introduction to Data Modeling
with Apache Cassandra
1
My Background
…ran into this problem
Gave it my best shot
shard 1 shard 2 shard 3 shard 4
router
client
Patrick,
All your wildest
dreams will come
true.
Just add complexity!
A new plan
ACID vs CAP
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Cassandra let’s you tune this
Relational Data Models
• 5 normal forms
• Foreign Keys
• Joins
deptId First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
Relational Modeling
Data
Models
Application
Cassandra Modeling
Data
Models
Application
CQL vs SQL
• No joins
• No aggregations
deptId First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
SELECT e.First, e.Last, d.Dept
FROM Department d, Employees e
WHERE ‘Codd’ = e.Last
AND e.deptId = d.id
Denormalization
• Combine table columns into a single view
• No joins
SELECT First, Last, Dept
FROM employees
WHERE id = ‘1’
id First Last Dept
1 Edgar Codd Engineering
2 Raymond Boyce Math
Employees
No more sequences
• Great for auto-creation of Ids
• Guaranteed unique
• Needs ACID to work. (Sorry. No sharding)
INSERT INTO user (id, firstName, LastName)
VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
No sequences???
• Almost impossible in a distributed system
• Couple of great choices
• Natural Key - Unique values like email
• Surrogate Key - UUID
• Universal Unique ID
• 128 bit number represented in character form
• Easily generated on the client
• Same as GUID for the MS folks
99051fe9-6a9c-46c2-b949-38ef78858dd0
KillrVideo.com
• Hosted on Azure
• Code on GitHub
• Also on your USB
• Data Model for examples
Entity Table
• Simple view of a single
user
• UUID used for ID
• Simple primary key // Users keyed by id
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
SELECT firstname, lastname
FROM user
WHERE userId = 99051fe9-6a9c-46c2-b949-38ef78858dd0
CQL Collections
CQL Collections
• Meant to be dynamic part of table
• Update syntax is very different from insert
• Reads require all of collection to be read
CQL Set
• Set is sorted by CQL type comparator
INSERT INTO collections_example (id, set_example)
VALUES(1, {'1-one', '2-two'});
set_example set<text>
Collection name Collection type CQLType
CQL Set Operations
• Adding an element to the set
• After adding this element, it will sort to the beginning.
• Removing an element from the set
UPDATE collections_example
SET set_example = set_example + {'3-three'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example + {'0-zero'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example - {'3-three'} WHERE id = 1;
CQL List
• Ordered by insertion
• Use with caution
list_example list<text>
Collection name Collection type
INSERT INTO collections_example (id, list_example)
VALUES(1, ['1-one', '2-two']);
CQLType
CQL List Operations
• Adding an element to the end of a list
• Adding an element to the beginning of a list
• Deleting an element from a list
UPDATE collections_example
SET list_example = list_example + ['3-three']
WHERE id = 1;
UPDATE collections_example
SET list_example = ['0-zero'] + list_example
WHERE id = 1;
UPDATE collections_example
SET list_example = list_example - ['3-three'] WHERE id = 1;
CQL Map
• Key and value
• Key is sorted by CQL type comparator
INSERT INTO collections_example (id, map_example)
VALUES(1, { 1 : 'one', 2 : 'two' });
map_example map<int,text>
Collection name Collection type Value CQLTypeKey CQLType
CQL Map Operations
• Add an element to the map
• Update an existing element in the map
• Delete an element in the map
UPDATE collections_example
SET map_example[3] = 'three'
WHERE id = 1;
UPDATE collections_example
SET map_example[3] = 'tres'
WHERE id = 1;
DELETE map_example[3]
FROM collections_example
WHERE id = 1;
Entity with collections
• Same type of entity
• SET type for dynamic data
• tags for each video
// Videos by id
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Index (or lookup) tables
• Table arranged to find data
• Denormalized for speed
• Find videos for a user
// One-to-many from user point of view (lookup table)
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid, added_date, videoid)
) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
Primary Key
• First column name is the Partition Key
• Subsequent are the Clustering Columns
• Videos will be ordered by added_date and
videoId per user
// One-to-many from user point of view (lookup table)
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid, added_date, videoid)
) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
Primary key relationship
PRIMARY KEY (userId,added_date,videoId)
Primary key relationship
Partition Key
PRIMARY KEY (userId,added_date,videoId)
Primary key relationship
Partition Key Clustering Columns
PRIMARY KEY (userId,added_date,videoId)
Primary key relationship
Partition Key Clustering Columns
A12378E55F5A32
PRIMARY KEY (userId,added_date,videoId)
2005:12:1:102005:12:1:92005:12:1:82005:12:1:7
5F22A0BC
Primary key relationship
Partition Key Clustering Columns
F2B3652CFFB3652D7AB3652C
PRIMARY KEY (userId,added_date,videoId)
A12378E55F5A32
SELECT videoId FROM user_videos
WHERE userId = A12378E55F5A32
AND added_date = ‘2005-12-1’
AND videoId = 5F22A0BC
Clustering Order
• Clustering Columns have default order
• Use to specify order
• Bonus: Sorts on disk for speed
// One-to-many from user point of view (lookup table)
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid, added_date, videoid)
) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
Multiple Lookups
• Same data
• Different lookup pattern // Index for tag keywords
CREATE TABLE videos_by_tag (
tag text,
videoid uuid,
added_date timestamp,
name text,
preview_image_location text,
tagged_date timestamp,
PRIMARY KEY (tag, videoid)
);
// Index for tags by first letter in the tag
CREATE TABLE tags_by_letter (
first_letter text,
tag text,
PRIMARY KEY (first_letter, tag)
);
Many to Many Relationships
• Two views
• Different directions
• Insert data in a batch
// Comments for a given video
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
// Comments for a given user
CREATE TABLE comments_by_user (
userid uuid,
commentid timeuuid,
videoid uuid,
comment text,
PRIMARY KEY (userid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Use Case Example
Example 1: Weather Station
• Weather station collects data
• Cassandra stores in sequence
• Application reads in sequence
Use case
• Store data per weather station
• Store time series in order: first to last
• Get all data for one weather station
• Get data for a single date and time
• Get data for a range of dates and times
Needed Queries
Data Model to support queries
Data Model
• Weather Station Id and Time
are unique
• Store as many as needed
CREATE TABLE temperature (
weather_station text,
year int,
month int,
day int,
hour int,
temperature double,
PRIMARY KEY (weather_station,year,month,day,hour)
);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.6);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,8,-5.1);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,9,-4.9);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,10,-5.3);
Storage Model - Logical View
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
SELECT weather_station,hour,temperature
FROM temperature
WHERE weatherstation_id='10010:99999';
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:10
-5.3
10010:99999
2005:12:1:12
-5.4
2005:12:1:11
-4.9-5.3-4.9-5.1
2005:12:1:7
-5.6
Storage Model - Disk Layout
2005:12:1:8 2005:12:1:9
10010:99999
2005:12:1:10
Merged, Sorted and Stored Sequentially
SELECT weather_station,hour,temperature
FROM temperature
WHERE weatherstation_id='10010:99999';
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation,hour,temperature
FROM temperature
WHERE weatherstation=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
Single seek on disk
2005:12:1:12
-5.4
2005:12:1:11
-4.9-5.3-4.9-5.1
2005:12:1:7
-5.6
2005:12:1:8 2005:12:1:9
10010:99999
2005:12:1:10
Partition key for locality
Query patterns
• Range queries
• “Slice” operation on disk
Programmers like this
Sorted by event_time
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:10
-5.3
10010:99999
SELECT weatherstation,hour,temperature
FROM temperature
WHERE weatherstation=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
Thank you!
Bring the questions
Follow me on twitter
@PatrickMcFadin

More Related Content

PDF
Cassandra Basics, Counters and Time Series Modeling
PDF
Time series with Apache Cassandra - Long version
PDF
Storing time series data with Apache Cassandra
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
PDF
Cassandra 2.0 and timeseries
PDF
Time series with apache cassandra strata
PDF
Apache cassandra and spark. you got the the lighter, let's start the fire
PDF
Cassandra Materialized Views
Cassandra Basics, Counters and Time Series Modeling
Time series with Apache Cassandra - Long version
Storing time series data with Apache Cassandra
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Cassandra 2.0 and timeseries
Time series with apache cassandra strata
Apache cassandra and spark. you got the the lighter, let's start the fire
Cassandra Materialized Views

What's hot (20)

PDF
Owning time series with team apache Strata San Jose 2015
PDF
Advanced data modeling with apache cassandra
PDF
Cassandra EU - Data model on fire
PDF
Introduction to cassandra 2014
PDF
Real data models of silicon valley
PPTX
Using Spark to Load Oracle Data into Cassandra
PDF
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
PDF
Laying down the smack on your data pipelines
PPTX
Spark Cassandra Connector: Past, Present and Furure
PDF
Cassandra 2.0 better, faster, stronger
PDF
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
PDF
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
PPTX
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
PDF
An Introduction to time series with Team Apache
PDF
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
PPTX
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
PPTX
Spark + Cassandra = Real Time Analytics on Operational Data
PDF
Cassandra and Spark
PDF
Introduction to Apache Cassandra
PDF
Cassandra Fundamentals - C* 2.0
Owning time series with team apache Strata San Jose 2015
Advanced data modeling with apache cassandra
Cassandra EU - Data model on fire
Introduction to cassandra 2014
Real data models of silicon valley
Using Spark to Load Oracle Data into Cassandra
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
Laying down the smack on your data pipelines
Spark Cassandra Connector: Past, Present and Furure
Cassandra 2.0 better, faster, stronger
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
An Introduction to time series with Team Apache
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Spark + Cassandra = Real Time Analytics on Operational Data
Cassandra and Spark
Introduction to Apache Cassandra
Cassandra Fundamentals - C* 2.0
Ad

Similar to Introduction to data modeling with apache cassandra (20)

PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Cassandra Summit 2014: Real Data Models of Silicon Valley
PDF
Oracle to Cassandra Core Concepts Guide Pt. 2
PDF
Datastax day 2016 : Cassandra data modeling basics
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
PPTX
Sql killedserver
PPTX
My SQL Skills Killed the Server
PPTX
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
PDF
Apache con 2020 use cases and optimizations of iotdb
PDF
What's New in Apache Hive
PDF
1 Dundee - Cassandra 101
PDF
Ivan Pashko - Simplifying test automation with design patterns
PDF
Apache Zookeeper
PPTX
Apache zookeeper seminar_trinh_viet_dung_03_2016
PDF
2 designing tables
PDF
Deep Dive into Cassandra
PDF
Re-Engineering PostgreSQL as a Time-Series Database
PDF
Apache Cassandra & Data Modeling
PDF
The Data Distribution Service Tutorial
Introduction to Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
Cassandra Summit 2014: Real Data Models of Silicon Valley
Oracle to Cassandra Core Concepts Guide Pt. 2
Datastax day 2016 : Cassandra data modeling basics
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
Sql killedserver
My SQL Skills Killed the Server
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Apache con 2020 use cases and optimizations of iotdb
What's New in Apache Hive
1 Dundee - Cassandra 101
Ivan Pashko - Simplifying test automation with design patterns
Apache Zookeeper
Apache zookeeper seminar_trinh_viet_dung_03_2016
2 designing tables
Deep Dive into Cassandra
Re-Engineering PostgreSQL as a Time-Series Database
Apache Cassandra & Data Modeling
The Data Distribution Service Tutorial
Ad

More from Patrick McFadin (15)

PDF
Successful Architectures for Fast Data
PDF
Open source or proprietary, choose wisely!
PDF
Help! I want to contribute to an Open Source project but my boss says no.
PDF
Analyzing Time Series Data with Apache Spark and Cassandra
PDF
Cassandra 3.0 advanced preview
PDF
Apache cassandra & apache spark for time series data
PDF
Making money with open source and not losing your soul: A practical guide
PDF
Building Antifragile Applications with Apache Cassandra
PDF
Cassandra at scale
PDF
The world's next top data model
PDF
Become a super modeler
PDF
The data model is dead, long live the data model
PDF
Cassandra Virtual Node talk
PPT
Toronto jaspersoft meetup
PDF
Cassandra data modeling talk
Successful Architectures for Fast Data
Open source or proprietary, choose wisely!
Help! I want to contribute to an Open Source project but my boss says no.
Analyzing Time Series Data with Apache Spark and Cassandra
Cassandra 3.0 advanced preview
Apache cassandra & apache spark for time series data
Making money with open source and not losing your soul: A practical guide
Building Antifragile Applications with Apache Cassandra
Cassandra at scale
The world's next top data model
Become a super modeler
The data model is dead, long live the data model
Cassandra Virtual Node talk
Toronto jaspersoft meetup
Cassandra data modeling talk

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Modernizing your data center with Dell and AMD
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
KodekX | Application Modernization Development
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Teaching material agriculture food technology
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Modernizing your data center with Dell and AMD
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KodekX | Application Modernization Development
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectral efficient network and resource selection model in 5G networks
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx

Introduction to data modeling with apache cassandra

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin
 Chief Evangelist for Apache Cassandra Introduction to Data Modeling with Apache Cassandra 1
  • 3. Gave it my best shot shard 1 shard 2 shard 3 shard 4 router client Patrick, All your wildest dreams will come true.
  • 6. ACID vs CAP ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Cassandra let’s you tune this
  • 7. Relational Data Models • 5 normal forms • Foreign Keys • Joins deptId First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department
  • 10. CQL vs SQL • No joins • No aggregations deptId First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department SELECT e.First, e.Last, d.Dept FROM Department d, Employees e WHERE ‘Codd’ = e.Last AND e.deptId = d.id
  • 11. Denormalization • Combine table columns into a single view • No joins SELECT First, Last, Dept FROM employees WHERE id = ‘1’ id First Last Dept 1 Edgar Codd Engineering 2 Raymond Boyce Math Employees
  • 12. No more sequences • Great for auto-creation of Ids • Guaranteed unique • Needs ACID to work. (Sorry. No sharding) INSERT INTO user (id, firstName, LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
  • 13. No sequences??? • Almost impossible in a distributed system • Couple of great choices • Natural Key - Unique values like email • Surrogate Key - UUID • Universal Unique ID • 128 bit number represented in character form • Easily generated on the client • Same as GUID for the MS folks 99051fe9-6a9c-46c2-b949-38ef78858dd0
  • 14. KillrVideo.com • Hosted on Azure • Code on GitHub • Also on your USB • Data Model for examples
  • 15. Entity Table • Simple view of a single user • UUID used for ID • Simple primary key // Users keyed by id CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) ); SELECT firstname, lastname FROM user WHERE userId = 99051fe9-6a9c-46c2-b949-38ef78858dd0
  • 17. CQL Collections • Meant to be dynamic part of table • Update syntax is very different from insert • Reads require all of collection to be read
  • 18. CQL Set • Set is sorted by CQL type comparator INSERT INTO collections_example (id, set_example) VALUES(1, {'1-one', '2-two'}); set_example set<text> Collection name Collection type CQLType
  • 19. CQL Set Operations • Adding an element to the set • After adding this element, it will sort to the beginning. • Removing an element from the set UPDATE collections_example SET set_example = set_example + {'3-three'} WHERE id = 1; UPDATE collections_example SET set_example = set_example + {'0-zero'} WHERE id = 1; UPDATE collections_example SET set_example = set_example - {'3-three'} WHERE id = 1;
  • 20. CQL List • Ordered by insertion • Use with caution list_example list<text> Collection name Collection type INSERT INTO collections_example (id, list_example) VALUES(1, ['1-one', '2-two']); CQLType
  • 21. CQL List Operations • Adding an element to the end of a list • Adding an element to the beginning of a list • Deleting an element from a list UPDATE collections_example SET list_example = list_example + ['3-three'] WHERE id = 1; UPDATE collections_example SET list_example = ['0-zero'] + list_example WHERE id = 1; UPDATE collections_example SET list_example = list_example - ['3-three'] WHERE id = 1;
  • 22. CQL Map • Key and value • Key is sorted by CQL type comparator INSERT INTO collections_example (id, map_example) VALUES(1, { 1 : 'one', 2 : 'two' }); map_example map<int,text> Collection name Collection type Value CQLTypeKey CQLType
  • 23. CQL Map Operations • Add an element to the map • Update an existing element in the map • Delete an element in the map UPDATE collections_example SET map_example[3] = 'three' WHERE id = 1; UPDATE collections_example SET map_example[3] = 'tres' WHERE id = 1; DELETE map_example[3] FROM collections_example WHERE id = 1;
  • 24. Entity with collections • Same type of entity • SET type for dynamic data • tags for each video // Videos by id CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 25. Index (or lookup) tables • Table arranged to find data • Denormalized for speed • Find videos for a user // One-to-many from user point of view (lookup table) CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
  • 26. Primary Key • First column name is the Partition Key • Subsequent are the Clustering Columns • Videos will be ordered by added_date and videoId per user // One-to-many from user point of view (lookup table) CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
  • 27. Primary key relationship PRIMARY KEY (userId,added_date,videoId)
  • 28. Primary key relationship Partition Key PRIMARY KEY (userId,added_date,videoId)
  • 29. Primary key relationship Partition Key Clustering Columns PRIMARY KEY (userId,added_date,videoId)
  • 30. Primary key relationship Partition Key Clustering Columns A12378E55F5A32 PRIMARY KEY (userId,added_date,videoId)
  • 31. 2005:12:1:102005:12:1:92005:12:1:82005:12:1:7 5F22A0BC Primary key relationship Partition Key Clustering Columns F2B3652CFFB3652D7AB3652C PRIMARY KEY (userId,added_date,videoId) A12378E55F5A32 SELECT videoId FROM user_videos WHERE userId = A12378E55F5A32 AND added_date = ‘2005-12-1’ AND videoId = 5F22A0BC
  • 32. Clustering Order • Clustering Columns have default order • Use to specify order • Bonus: Sorts on disk for speed // One-to-many from user point of view (lookup table) CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
  • 33. Multiple Lookups • Same data • Different lookup pattern // Index for tag keywords CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) ); // Index for tags by first letter in the tag CREATE TABLE tags_by_letter ( first_letter text, tag text, PRIMARY KEY (first_letter, tag) );
  • 34. Many to Many Relationships • Two views • Different directions • Insert data in a batch // Comments for a given video CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC); // Comments for a given user CREATE TABLE comments_by_user ( userid uuid, commentid timeuuid, videoid uuid, comment text, PRIMARY KEY (userid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 36. Example 1: Weather Station • Weather station collects data • Cassandra stores in sequence • Application reads in sequence
  • 37. Use case • Store data per weather station • Store time series in order: first to last • Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times Needed Queries Data Model to support queries
  • 38. Data Model • Weather Station Id and Time are unique • Store as many as needed CREATE TABLE temperature ( weather_station text, year int, month int, day int, hour int, temperature double, PRIMARY KEY (weather_station,year,month,day,hour) ); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-5.1); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-4.9); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.3);
  • 39. Storage Model - Logical View 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 SELECT weather_station,hour,temperature FROM temperature WHERE weatherstation_id='10010:99999'; 10010:99999 10010:99999 10010:99999 weather_station hour temperature 2005:12:1:10 -5.3 10010:99999
  • 40. 2005:12:1:12 -5.4 2005:12:1:11 -4.9-5.3-4.9-5.1 2005:12:1:7 -5.6 Storage Model - Disk Layout 2005:12:1:8 2005:12:1:9 10010:99999 2005:12:1:10 Merged, Sorted and Stored Sequentially SELECT weather_station,hour,temperature FROM temperature WHERE weatherstation_id='10010:99999';
  • 41. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10; Single seek on disk 2005:12:1:12 -5.4 2005:12:1:11 -4.9-5.3-4.9-5.1 2005:12:1:7 -5.6 2005:12:1:8 2005:12:1:9 10010:99999 2005:12:1:10 Partition key for locality
  • 42. Query patterns • Range queries • “Slice” operation on disk Programmers like this Sorted by event_time 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 10010:99999 10010:99999 10010:99999 weather_station hour temperature 2005:12:1:10 -5.3 10010:99999 SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
  • 43. Thank you! Bring the questions Follow me on twitter @PatrickMcFadin