SlideShare a Scribd company logo
CASSANDRA
DATA MODELING
INTRODUCTION
A little bit about me, a little bit more about AddThis
WHY CASSANDRA?
▸ Scalability
▸ Fault Tolerant
▸ Optimized for Big Data
▸ Queryable
QUERYABLE, NOW
WE'RE TALKIN!
STORING DATA IN CASSANDRA
▸ Before we can talk about building a house,
you have to know what kind of tools you have.
▸ Before you can cook a meal,
you need to know what ingredients you have.
▸ Before you can data model with Cassandra,
you need to know what types of data you can store.
BASIC TYPES
CQL Type | Constants | Description
-------- | --------- | -----------
ascii | strings | US-ASCII character string
bigint | integers | 64-bit signed long
blob | blob | Arbitrary bytes
boolean | booleans | true or false
counter | integers | Distributed counter value (64-bit long)
decimal | ints/floats| Variable-precision decimal
float | ints/floats| 32-bit floating point
inet | strings | IP address string
int | integers | 32-bit signed integer
text | strings | UTF-8 encoded string
timestamp| ints/strings | Date plus time
timeuuid | uuids | Type 1 UUID only
tuple | n/a | A group of 2-3 fields (Cassandra 2.1+)
uuid | uuids | A UUID in standard UUID format
varchar | strings | UTF-8 encoded string
varint | integers | Arbitrary-precision integer
HOW CASSANDRA STORES DATA
TABLE CONSTRUCT
CREATE TABLE users (
user_id uuid,
first_name text,
last_name text,
company text,
PRIMARY KEY (user_id)
);
COLLECTIONS
SETS
emails set<text>
--- insert set into table
INSERT INTO users (user_id, first_name, last_name, emails)
VALUES(uuid(), 'Ben', 'Knear', {'ben@addthis.com', 'ben.kn@gmail.com'});
--- add to set, even if set was never instantiated
--- note, it will re-sort the collection after adding
UPDATE users
SET emails = emails + {'ben.kn@yahoo.com'}
WHERE user_id = X;
--- remove from set
UPDATE users
SET emails = emails - {'ben.kn@yahoo.com'}
WHERE user_id = X;
LISTS
priority_emails list<text>
--- insert list into table
INSERT INTO users (user_id, first_name, last_name, priority_emails)
VALUES(uuid(), 'Ben', 'Knear', ['ben@addthis.com', 'ben.kn@gmail.com']);
--- add to list, even if list was never instantiated
UPDATE users
SET priority_emails = priority_emails + ['ben.kn@yahoo.com']
WHERE user_id = X;
--- remove from list
UPDATE users
SET priority_emails = priority_emails - ['ben.kn@yahoo.com']
WHERE user_id = X;
MAPS
contact_info map<text, text>
--- insert map into table
INSERT INTO users (user_id, first_name, last_name, contact_info)
VALUES(uuid(), 'Ben', 'Knear',
{ 'work_email' : 'ben@addthis.com',
'home_email' : 'ben.kn@gmail.com' });
--- delete from a map
DELETE contact_info['work_email']
FROM users WHERE user_id = X;
MAPS
contact_info map<text, text>
--- add to map, even if map was never instantiated
UPDATE users
SET contact_info['other_email'] = 'ben.kn@yahoo.com'
WHERE user_id = X;
--- remove from map
UPDATE users
SET contact_info = contact_info - ['ben.kn@yahoo.com']
WHERE user_id = X;
JSON OR MAP
▸ Remember all values in a Map must be the same type
▸ What will you do with the value?
▸ Remember: Values of items in collections are limited to 64K.
HOW CASSANDRA STORES DATA
Row Keys, Column Families
| row key | columns |
|---------|----------------------------------------|
| | "first_name" | "last_name" | "company" |
| UUID | 'Ben' | 'Knear' | 'AddThis' |
|---------|----------------------------------------|
CASSANDRA
LIMITS
▸ Maximum number of columns per row is 2 billion.
▸ Maximum size for the name of a column is 64 KB.
▸ Maximum size for a value in a column is 2 GB.
▸ Collection values may not be larger than 64 KB.
APPROACHING
DATA MODEL
You must ask:
▸ What do we want to store?
▸ What are the relationships within the data?
▸ How do we plan to access it?
In a relational model, you would focus most on the questions one and
two. But for Cassandra, you must focus most on the third.
You must ask:
▸ What do we want to store?
▸ What are the relationships within the data?
▸ How do we plan to access it?
In a relational model, you would focus most on the questions one and
two. But for Cassandra, you must focus most on the third.
Cassandra data modeling starts with how will you use the data.
Denormalization
Optimizing the read performance of a database by adding redundant
data or by grouping data. In Cassandra, this process is accomplished by
duplicating data in multiple tables, grouping data for queries.
So how will we get data from
Cassandra?
CASSANDRA
QUERYING
PRIMARY KEY
Defines a unique value to identify the row, and also drives partitioning.
When multiple columns are defined in the primary key, the first column
defines the partition key, the rest are clustered columns.
Partitioning is important because rows and columns are grouped on
nodes, and grouped data is read and written faster.
EXAMPLE
CREATE TABLE movies (
id uuid,
name text,
genre text,
cast set<uuid>,
company text,
PRIMARY KEY (id)
);
Good if I only know the ID when I retrieve
COMPOSITE PRIMARY KEY ALTERNATIVE
CREATE TABLE movies (
id uuid,
name text,
genre text,
cast set<uuid>,
company text,
PRIMARY KEY (genre, id)
);
Good if I know the genre and ID when I retrieve, or to get all of a genre
QUERYING EXAMPLES
CREATE TABLE user_addresses (
state text,
city text,
username text,
address text,
PRIMARY KEY (state, city, username)
);
-- insert a value
INSERT INTO user_addresses (state, city, username, address)
VALUES ('VA', 'Vienna', 'AddThis', 'Spring Hill Rd');
-- FAILURES
SELECT * FROM user_addresses WHERE city = 'Vienna';
SELECT * FROM user_addresses
WHERE city = 'Vienna' AND username = 'AddThis';
-- SUCCESSES
SELECT * FROM user_addresses WHERE state = 'VA';
SELECT * FROM user_addresses
WHERE state = 'VA' AND city = 'Vienna';
SELECT * FROM user_addresses
WHERE state = 'VA' AND city = 'Vienna'
AND username = 'AddThis';
ERROR
Filtering by just clustering columns will give you this response:
Bad Request: Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance. If you want to
execute this query despite the performance unpredictability, use ALLOW
FILTERING
WORK AROUND
Expensive query, so use LIMIT if you must:
SELECT * FROM user_addresses WHERE username = 'AddThis' LIMIT 10 ALLOW FILTERING;
At absolute worst (there are no addresses with username = 'AddThis'),
it will have to look through the entire table.
COMPOSITE PARTITION KEYS
Composite partition keys will group the first value on the same node,
though the second value may be on a different node.
CREATE TABLE cars (
lot_id int,
make text,
model text,
color text,
PRIMARY KEY ((lot_id, make), model)
);
INSERT INTO cars (lot_id, make, model, color) VALUES (1, 'Ford', 'Explorer', 'Black');
INSERT INTO cars (lot_id, make, model, color) VALUES (1, 'Cadillac', 'CT6', 'Black');
INSERT INTO cars (lot_id, make, model, color) VALUES (2, 'BMW', 'M8', 'Red');
COMPOSITE PARTITION KEYS
-- FAILS
SELECT * FROM cars WHERE lot_id = 1;
SELECT * FROM cars WHERE make = 'Ford';
-- SUCCESS
SELECT * FROM cars WHERE lot_id = 1 AND make = 'Ford';
Generally, Cassandra will store columns having the same lot id but a
different make on different nodes, and columns having the same lot id
and make on the same node.
INDEXES
You can also add indices at any time:
CREATE INDEX genre_idx ON movie (genre);
Or even on a collection field (Cassandra 2.1+)
CREATE INDEX cast_idx ON movie (cast);
But for maps, you can index either the keys or the values.
CREATE INDEX ON users (general);
CREATE INDEX ON users (KEYS(general));
RULES FOR INDEXES
Similar to relational databases, the more unique values in the index, the
larger it'll be, and the longer it'll take to read it.
RULES FOR INDEXES
▸ Do not index counter columns.
▸ Do not index high cardinality columns.
▸ Do not index on a frequently updated or deleted column (tombstone
issues)
▸ Do not index on a largely partitioned field (which requires
communicating with more servers to retrieve the information)
SIDEBAR ON TOMBSTONES
Tombstones are relics from deleted values, used in data replication.
▸ Grace period for garbage collection
▸ Avoid nulls
FILTERING WITH INDEX
Indexes on a basic data column filters the same as a primary key.
For collections you will use CONTAINS:
SELECT * FROM users WHERE email_addresses CONTAINS 'ben@addthis.com';
SELECT * FROM users WHERE user_attr_map CONTAINS 'Software Engineer';
SELECT * FROM users WHERE user_attr_map CONTAINS KEY 'Job Title';
RANGE QUERIES
Especially useful for timeseries tables, you can select a range of values
SELECT * FROM weather_reports WHERE report_time >= '2014-05-17 00:00:00-0000' AND report_time < '2014-05-18 00:00:00-0000';
DATA MODEL
WORKSHOP
REFERENCING Netflix
USER QUEUE
RELATIONAL MODEL
▸ User table with auto-inc ID
▸ Movie table with auto-inc ID
▸ User_Movie table with auto-inc ID, foreign keys user_id and
movie_id, plus a timestamp
RELATIONAL MODEL
▸ User table with auto-inc ID
▸ Movie table with auto-inc ID
▸ User_Movie table with auto-inc ID, foreign keys user_id and
movie_id, plus a timestamp
Completely inefficient for Cassandra
BAD PARTITIONING
CREATE TABLE user_queue (
id uuid,
user_id uuid,
video_id uuid,
added timestamp,
PRIMARY KEY (id)
);
CREATE INDEX user_id_idx ON user_queue (user_id);
Terrible.
BETTER PARTITIONING
Including the video info as a JSON blob
CREATE TABLE user_queue (
user_id uuid,
video_id uuid,
video_info text,
added timestamp,
PRIMARY KEY (user_id)
);
BETTER MODEL
Utilize the columns
CREATE TABLE user_queue (
user_id uuid,
queue map<text, text>,
PRIMARY KEY (user_id)
);
▸ queue map will contain 'movieId' -> json blob of movie
▸ Read will pull all movies for a user
ADDING TO QUEUE
UPDATE TABLE user
SET queue['newMovieId'] = 'json about movie'
WHERE user_id = X;
VIEW HISTORY
USER REVIEWS
NEW RELEASES
METRICS
THANK YOU

More Related Content

DOCX
database-querry-student-note
PDF
Practical Object Oriented Models In Sql
PDF
MySQL partitions tutorial
KEY
Fields in Core: How to create a custom field
PDF
Varraysandnestedtables
PDF
Extensible Data Modeling
PPSX
SenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin Xu
database-querry-student-note
Practical Object Oriented Models In Sql
MySQL partitions tutorial
Fields in Core: How to create a custom field
Varraysandnestedtables
Extensible Data Modeling
SenchaCon 2016: Developing COSMOS Using Sencha Ext JS 5 - Shenglin Xu

Similar to Cassandra Data Modeling (20)

PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Community Webinar | Become a Super Modeler
PDF
Introduction to data modeling with apache cassandra
PDF
Cassandra Day Atlanta 2015: Data Modeling 101
PDF
Cassandra Day London 2015: Data Modeling 101
PDF
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
PDF
Apache Cassandra & Data Modeling
PDF
Become a super modeler
PPTX
Apache Cassandra Developer Training Slide Deck
PPTX
Cassandra20141009
PDF
Introduction to Data Modeling with Apache Cassandra
ODP
Cassandra Data Modelling
PDF
Cassandra for impatients
PPTX
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
PDF
Cassandra Data Modelling with CQL (OSCON 2015)
PPTX
Cassandra20141113
PPTX
Apache Cassandra Data Modeling with Travis Price
PDF
Big Data Grows Up - A (re)introduction to Cassandra
PDF
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
PDF
Cassandra 3.0 advanced preview
Introduction to Data Modeling with Apache Cassandra
Cassandra Community Webinar | Become a Super Modeler
Introduction to data modeling with apache cassandra
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Apache Cassandra & Data Modeling
Become a super modeler
Apache Cassandra Developer Training Slide Deck
Cassandra20141009
Introduction to Data Modeling with Apache Cassandra
Cassandra Data Modelling
Cassandra for impatients
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra20141113
Apache Cassandra Data Modeling with Travis Price
Big Data Grows Up - A (re)introduction to Cassandra
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Cassandra 3.0 advanced preview
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
A Presentation on Artificial Intelligence
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
KodekX | Application Modernization Development
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
A Presentation on Artificial Intelligence
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Chapter 3 Spatial Domain Image Processing.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
KodekX | Application Modernization Development
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The Rise and Fall of 3GPP – Time for a Sabbatical?
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Approach and Philosophy of On baking technology
Ad

Cassandra Data Modeling

  • 2. INTRODUCTION A little bit about me, a little bit more about AddThis
  • 3. WHY CASSANDRA? ▸ Scalability ▸ Fault Tolerant ▸ Optimized for Big Data ▸ Queryable
  • 5. STORING DATA IN CASSANDRA ▸ Before we can talk about building a house, you have to know what kind of tools you have. ▸ Before you can cook a meal, you need to know what ingredients you have. ▸ Before you can data model with Cassandra, you need to know what types of data you can store.
  • 6. BASIC TYPES CQL Type | Constants | Description -------- | --------- | ----------- ascii | strings | US-ASCII character string bigint | integers | 64-bit signed long blob | blob | Arbitrary bytes boolean | booleans | true or false counter | integers | Distributed counter value (64-bit long) decimal | ints/floats| Variable-precision decimal float | ints/floats| 32-bit floating point inet | strings | IP address string int | integers | 32-bit signed integer text | strings | UTF-8 encoded string timestamp| ints/strings | Date plus time timeuuid | uuids | Type 1 UUID only tuple | n/a | A group of 2-3 fields (Cassandra 2.1+) uuid | uuids | A UUID in standard UUID format varchar | strings | UTF-8 encoded string varint | integers | Arbitrary-precision integer
  • 7. HOW CASSANDRA STORES DATA TABLE CONSTRUCT CREATE TABLE users ( user_id uuid, first_name text, last_name text, company text, PRIMARY KEY (user_id) );
  • 9. SETS emails set<text> --- insert set into table INSERT INTO users (user_id, first_name, last_name, emails) VALUES(uuid(), 'Ben', 'Knear', {'ben@addthis.com', 'ben.kn@gmail.com'}); --- add to set, even if set was never instantiated --- note, it will re-sort the collection after adding UPDATE users SET emails = emails + {'ben.kn@yahoo.com'} WHERE user_id = X; --- remove from set UPDATE users SET emails = emails - {'ben.kn@yahoo.com'} WHERE user_id = X;
  • 10. LISTS priority_emails list<text> --- insert list into table INSERT INTO users (user_id, first_name, last_name, priority_emails) VALUES(uuid(), 'Ben', 'Knear', ['ben@addthis.com', 'ben.kn@gmail.com']); --- add to list, even if list was never instantiated UPDATE users SET priority_emails = priority_emails + ['ben.kn@yahoo.com'] WHERE user_id = X; --- remove from list UPDATE users SET priority_emails = priority_emails - ['ben.kn@yahoo.com'] WHERE user_id = X;
  • 11. MAPS contact_info map<text, text> --- insert map into table INSERT INTO users (user_id, first_name, last_name, contact_info) VALUES(uuid(), 'Ben', 'Knear', { 'work_email' : 'ben@addthis.com', 'home_email' : 'ben.kn@gmail.com' }); --- delete from a map DELETE contact_info['work_email'] FROM users WHERE user_id = X;
  • 12. MAPS contact_info map<text, text> --- add to map, even if map was never instantiated UPDATE users SET contact_info['other_email'] = 'ben.kn@yahoo.com' WHERE user_id = X; --- remove from map UPDATE users SET contact_info = contact_info - ['ben.kn@yahoo.com'] WHERE user_id = X;
  • 13. JSON OR MAP ▸ Remember all values in a Map must be the same type ▸ What will you do with the value? ▸ Remember: Values of items in collections are limited to 64K.
  • 14. HOW CASSANDRA STORES DATA Row Keys, Column Families | row key | columns | |---------|----------------------------------------| | | "first_name" | "last_name" | "company" | | UUID | 'Ben' | 'Knear' | 'AddThis' | |---------|----------------------------------------|
  • 16. ▸ Maximum number of columns per row is 2 billion. ▸ Maximum size for the name of a column is 64 KB. ▸ Maximum size for a value in a column is 2 GB. ▸ Collection values may not be larger than 64 KB.
  • 18. You must ask: ▸ What do we want to store? ▸ What are the relationships within the data? ▸ How do we plan to access it? In a relational model, you would focus most on the questions one and two. But for Cassandra, you must focus most on the third.
  • 19. You must ask: ▸ What do we want to store? ▸ What are the relationships within the data? ▸ How do we plan to access it? In a relational model, you would focus most on the questions one and two. But for Cassandra, you must focus most on the third. Cassandra data modeling starts with how will you use the data.
  • 20. Denormalization Optimizing the read performance of a database by adding redundant data or by grouping data. In Cassandra, this process is accomplished by duplicating data in multiple tables, grouping data for queries.
  • 21. So how will we get data from Cassandra?
  • 23. PRIMARY KEY Defines a unique value to identify the row, and also drives partitioning. When multiple columns are defined in the primary key, the first column defines the partition key, the rest are clustered columns. Partitioning is important because rows and columns are grouped on nodes, and grouped data is read and written faster.
  • 24. EXAMPLE CREATE TABLE movies ( id uuid, name text, genre text, cast set<uuid>, company text, PRIMARY KEY (id) ); Good if I only know the ID when I retrieve
  • 25. COMPOSITE PRIMARY KEY ALTERNATIVE CREATE TABLE movies ( id uuid, name text, genre text, cast set<uuid>, company text, PRIMARY KEY (genre, id) ); Good if I know the genre and ID when I retrieve, or to get all of a genre
  • 26. QUERYING EXAMPLES CREATE TABLE user_addresses ( state text, city text, username text, address text, PRIMARY KEY (state, city, username) ); -- insert a value INSERT INTO user_addresses (state, city, username, address) VALUES ('VA', 'Vienna', 'AddThis', 'Spring Hill Rd');
  • 27. -- FAILURES SELECT * FROM user_addresses WHERE city = 'Vienna'; SELECT * FROM user_addresses WHERE city = 'Vienna' AND username = 'AddThis'; -- SUCCESSES SELECT * FROM user_addresses WHERE state = 'VA'; SELECT * FROM user_addresses WHERE state = 'VA' AND city = 'Vienna'; SELECT * FROM user_addresses WHERE state = 'VA' AND city = 'Vienna' AND username = 'AddThis';
  • 28. ERROR Filtering by just clustering columns will give you this response: Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
  • 29. WORK AROUND Expensive query, so use LIMIT if you must: SELECT * FROM user_addresses WHERE username = 'AddThis' LIMIT 10 ALLOW FILTERING; At absolute worst (there are no addresses with username = 'AddThis'), it will have to look through the entire table.
  • 30. COMPOSITE PARTITION KEYS Composite partition keys will group the first value on the same node, though the second value may be on a different node. CREATE TABLE cars ( lot_id int, make text, model text, color text, PRIMARY KEY ((lot_id, make), model) ); INSERT INTO cars (lot_id, make, model, color) VALUES (1, 'Ford', 'Explorer', 'Black'); INSERT INTO cars (lot_id, make, model, color) VALUES (1, 'Cadillac', 'CT6', 'Black'); INSERT INTO cars (lot_id, make, model, color) VALUES (2, 'BMW', 'M8', 'Red');
  • 31. COMPOSITE PARTITION KEYS -- FAILS SELECT * FROM cars WHERE lot_id = 1; SELECT * FROM cars WHERE make = 'Ford'; -- SUCCESS SELECT * FROM cars WHERE lot_id = 1 AND make = 'Ford'; Generally, Cassandra will store columns having the same lot id but a different make on different nodes, and columns having the same lot id and make on the same node.
  • 32. INDEXES You can also add indices at any time: CREATE INDEX genre_idx ON movie (genre); Or even on a collection field (Cassandra 2.1+) CREATE INDEX cast_idx ON movie (cast); But for maps, you can index either the keys or the values. CREATE INDEX ON users (general); CREATE INDEX ON users (KEYS(general));
  • 33. RULES FOR INDEXES Similar to relational databases, the more unique values in the index, the larger it'll be, and the longer it'll take to read it.
  • 34. RULES FOR INDEXES ▸ Do not index counter columns. ▸ Do not index high cardinality columns. ▸ Do not index on a frequently updated or deleted column (tombstone issues) ▸ Do not index on a largely partitioned field (which requires communicating with more servers to retrieve the information)
  • 35. SIDEBAR ON TOMBSTONES Tombstones are relics from deleted values, used in data replication. ▸ Grace period for garbage collection ▸ Avoid nulls
  • 36. FILTERING WITH INDEX Indexes on a basic data column filters the same as a primary key. For collections you will use CONTAINS: SELECT * FROM users WHERE email_addresses CONTAINS 'ben@addthis.com'; SELECT * FROM users WHERE user_attr_map CONTAINS 'Software Engineer'; SELECT * FROM users WHERE user_attr_map CONTAINS KEY 'Job Title';
  • 37. RANGE QUERIES Especially useful for timeseries tables, you can select a range of values SELECT * FROM weather_reports WHERE report_time >= '2014-05-17 00:00:00-0000' AND report_time < '2014-05-18 00:00:00-0000';
  • 40. RELATIONAL MODEL ▸ User table with auto-inc ID ▸ Movie table with auto-inc ID ▸ User_Movie table with auto-inc ID, foreign keys user_id and movie_id, plus a timestamp
  • 41. RELATIONAL MODEL ▸ User table with auto-inc ID ▸ Movie table with auto-inc ID ▸ User_Movie table with auto-inc ID, foreign keys user_id and movie_id, plus a timestamp Completely inefficient for Cassandra
  • 42. BAD PARTITIONING CREATE TABLE user_queue ( id uuid, user_id uuid, video_id uuid, added timestamp, PRIMARY KEY (id) ); CREATE INDEX user_id_idx ON user_queue (user_id); Terrible.
  • 43. BETTER PARTITIONING Including the video info as a JSON blob CREATE TABLE user_queue ( user_id uuid, video_id uuid, video_info text, added timestamp, PRIMARY KEY (user_id) );
  • 44. BETTER MODEL Utilize the columns CREATE TABLE user_queue ( user_id uuid, queue map<text, text>, PRIMARY KEY (user_id) ); ▸ queue map will contain 'movieId' -> json blob of movie ▸ Read will pull all movies for a user
  • 45. ADDING TO QUEUE UPDATE TABLE user SET queue['newMovieId'] = 'json about movie' WHERE user_id = X;