SlideShare a Scribd company logo
@chbatey
Christopher Batey

Technical Evangelist for Apache Cassandra
Cassandra deep dive
@chbatey
Who am I?
• Technical Evangelist for Apache Cassandra
•Founder of Stubbed Cassandra
•Help out Apache Cassandra users
• DataStax
•Builds enterprise ready version of Apache
Cassandra
• Previous: Cassandra backed apps at BSkyB
@chbatey
Overview
• Cassandra use cases
• Replication
• Fault tolerance
• Read and write path
• Data modelling
• Java Driver
@chbatey
Distributed databases
@chbatey
It is a big world
• Relational
- Oracle, PostgreSQL
• Graph databases
- Neo4J, InfoGrid, Titan
• Key value
- DynamoDB
• Document stores
- MongoDB, Couchbase
• Columnar aka wide row
- Cassandra, HBase
@chbatey
Building a web app
@chbatey
Running multiple copies of your app
@chbatey
Still in one DC?
@chbatey
Handling hardware failure
@chbatey
Handling hardware failure
@chbatey
Master/slave
•Master serves all writes
•Read from master and optionally slaves
@chbatey
Peer-to-Peer
• No master
• Read/write to any
• Consistency?
@chbatey
Decisions decisions… CAP theorem
Are these really that different??
Mongo, Redis, Couchbase
Highly Available Databases:
Voldermort, Cassandra
@chbatey
Cassandra use cases
@chbatey
Cassandra for Applications
APACHE
CASSANDRA
@chbatey
Common use cases
•Ordered data such as time series
-Event stores
-Financial transactions
-Sensor data e.g IoT
@chbatey
Common use cases
•Ordered data such as time series
-Event stores
-Financial transactions
-Sensor data e.g IoT
•Non functional requirements:
-Linear scalability
-High throughout durable writes
-Multi datacenter including active-active
-Analytics without ETL
@chbatey
Cassandra deep dive
@chbatey
Cassandra
Cassandra
• Distributed masterless
database (Dynamo)
• Column family data model
(Google BigTable)
@chbatey
Datacenter and rack aware
Europe
• Distributed master less
database (Dynamo)
• Column family data model
(Google BigTable)
• Multi data centre replication
built in from the start
USA
@chbatey
Cassandra
Online
• Distributed master less
database (Dynamo)
• Column family data model
(Google BigTable)
• Multi data centre replication
built in from the start
• Analytics with Apache SparkAnalytics
@chbatey
Dynamo 101
@chbatey
Dynamo 101
• The parts Cassandra took
- Consistent hashing
- Replication
- Gossip
- Hinted handoff
- Anti-entropy repair
• And the parts it left behind
- Key/Value
- Vector clocks
@chbatey
Picking the right nodes
• You don’t want a full table scan on a 1000 node cluster!
• Dynamo to the rescue: Consistent Hashing
• Then the replication strategy takes over:
- Network topology
- Simple
@chbatey
Murmer3 Example
• Data:
• Murmer3 Hash Values:
jim age: 36 car: ford gender: M
carol age: 37 car: bmw gender: F
johnny age: 12 gender: M
suzy: age: 10 gender: F
Primary Key Murmur3 hash value
jim 350
carol 998
johnny 50
suzy 600
Primary Key
Real hash range: -9223372036854775808 to 9223372036854775807
@chbatey
Murmer3 Example
Four node cluster:
Node Murmur3 start range Murmur3 end range
A 0 249
B 250 499
C 500 749
D 750 999
@chbatey
Pictures are better
A
B
C
D
999
249
499
750
749
0
250
500
B
CD
A
@chbatey
Murmer3 Example
Data is distributed as:
Node Start range End range Primary
key
Hash value
A 0 249 johnny 50
B 250 499 jim 350
C 500 749 suzy 600
D 750 999 carol 998
@chbatey
Replication
@chbatey
Replication strategy
• Simple
- Give it to the next node in the ring
- Don’t use this in production
• NetworkTopology
- Every Cassandra node knows its DC and Rack
- Replicas won’t be put on the same rack unless Replication Factor > # of racks
- Unfortunately Cassandra can’t create servers and racks on the fly to fix this :(
@chbatey
Replication
DC1 DC2
client
RF3 RF3
C
RC
WRITE
CL = 1 We have replication!
32
@chbatey
Tunable Consistency
•Data is replicated N times
•Every query that you execute you give a consistency
-ALL
-QUORUM
-LOCAL_QUORUM
-ONE
• Christos Kalantzis Eventual Consistency != Hopeful Consistency: http://
youtu.be/A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU
@chbatey
Load balancing
•Data centre aware policy
•Token aware policy
•Latency aware policy
•Whitelist policy
APP APP
Async
Replication
DC1 DC2
@chbatey
Scaling shouldn’t be hard
• Throw more nodes at a cluster
• Bootstrapping + joining the ring
• For large data sets this can take some time
@chbatey
Data modelling
@chbatey
You must denormalise
@chbatey
Cassandra can not join or aggregate
Client
Where do I go for the max?
@chbatey
CQL
•Cassandra Query Language
-SQL like query language
•Keyspace – analogous to a schema
- The keyspace determines the RF (replication factor)
•Table – looks like a SQL Table CREATE TABLE scores (
name text,
score int,
date timestamp,
PRIMARY KEY (name, score)
);
INSERT INTO scores (name, score, date)
VALUES ('bob', 42, '2012-06-24');
INSERT INTO scores (name, score, date)
VALUES ('bob', 47, '2012-06-25');
SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
@chbatey
Lots of types
@chbatey
UUID
• Universal Unique ID
- 128 bit number represented in character form e.g.
99051fe9-6a9c-46c2-b949-38ef78858dd0
• Easily generated on the client
- Version 1 has a timestamp component (TIMEUUID)
- Version 4 has no timestamp component
@chbatey
Company Confidential
TIMEUUID
TIMEUUID data type supports Version 1 UUIDs
Generated using time (60 bits), a clock sequence number (14 bits), and MAC address (48 bits)
– CQL function ‘now()’ generates a new TIMEUUID
Time can be extracted from TIMEUUID
– CQL function dateOf() extracts the timestamp as a date
TIMEUUID values in clustering columns or in column names are ordered based on time
– DESC order on TIMEUUID lists most recent data first
© 2014 DataStax, All Rights Reserved.
@chbatey
Collections
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
added_date timestamp,
PRIMARY KEY (videoid)
);
@chbatey
Data Model - User Defined Types
• Complex data in one place
• No multi-gets (multi-partitions)
• Nesting!
CREATE TYPE address (
street text,
city text,
zip_code int,
country text,
cross_streets set<text>
);
Data Model - Updated
• We can embed video_metadata in videos
CREATE TYPE video_metadata (
height int,
width int,
video_bit_rate set<text>,
encoding text
);
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Data Model - Storing JSON
{
"productId": 2,
"name": "Kitchen Table",
"price": 249.99,
"description" : "Rectangular table with oak finish",
"dimensions": {
"units": "inches",
"length": 50.0,
"width": 66.0,
"height": 32
},
"categories": {
{
"category" : "Home Furnishings" {
"catalogPage": 45,
"url": "/home/furnishings"
},
{
"category" : "Kitchen Furnishings" {
"catalogPage": 108,
"url": "/kitchen/furnishings"
}
}
}
CREATE TYPE dimensions (
units text,
length float,
width float,
height float
);
CREATE TYPE category (
catalogPage int,
url text
);
CREATE TABLE product (
productId int,
name text,
price float,
description text,
dimensions frozen <dimensions>,
categories map <text, frozen <category>>,
PRIMARY KEY (productId)
);
@chbatey
Tuple type
• Type to represent a
group
• Up to 256 different
elements
CREATE TABLE tuple_table (
id int PRIMARY KEY,
three_tuple frozen <tuple<int, text, float>>,
four_tuple frozen <tuple<int, text, float, inet>>,
five_tuple frozen <tuple<int, text, float, inet, ascii>>
);
@chbatey
Counters
• Old has been around since .8
• Commit log replay changes counters
• Repair can change a counter
@chbatey
Time-to-Live (TTL)
TTL a row:
INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘catherine’, ‘cachart’)
USING TTL 3600; // Expires data in one hour

TTL a column:
UPDATE users USING TTL 30 SET last = ‘miller’ WHERE id = ‘abc123’
– TTL in seconds
– Can also set default TTL at a table level
– Expired columns/values automatically deleted
– With no TTL specified, columns/values never expire
– TTL is useful for automatic deletion
– Re-inserting the same row before it expires will overwrite TTL
@chbatey
DevCenter
@chbatey
Example Time: Customer event store
@chbatey
An example: Customer event store
• Customer event
- customer_id e.g ChrisBatey
- event_type e.g login, logout, add_to_basket,
remove_from_basket, buy_item
• Staff
- name e.g Charlie
- favourite_colour e.g red
• Store
- name
- type e.g Website, PhoneApp, Phone, Retail
@chbatey
Requirements
• Get all events
• Get all events for a particular customer
• As above for a time slice
@chbatey
Modelling in a relational database
CREATE TABLE customer_events(
customer_id text,
staff_name text,
time timeuuid,
event_type text,
store_name text,
PRIMARY KEY (customer_id));
CREATE TABLE store(
name text,
location text,
store_type text,
PRIMARY KEY (store_name));
CREATE TABLE staff(
name text,
favourite_colour text,
job_title text,
PRIMARY KEY (name));
@chbatey
Your model should look like your queries
Modelling in Cassandra
CREATE TABLE customer_events(
customer_id text,
staff_id text,
time timeuuid,
store_type text,
event_type text,
tags map<text, text>,
PRIMARY KEY ((customer_id), time));
Partition Key
Clustering Column(s)
How it is stored on disk
customer
_id
time event_type store_type tags
charles 2014-11-18 16:52:04 basket_add online {'item': 'coffee'}
charles 2014-11-18 16:53:00 basket_add online {'item': ‘wine'}
charles 2014-11-18 16:53:09 logout online {}
chbatey 2014-11-18 16:52:21 login online {}
chbatey 2014-11-18 16:53:21 basket_add online {'item': 'coffee'}
chbatey 2014-11-18 16:54:00 basket_add online {'item': 'cheese'}
charles
event_type
basket_add
staff_id
n/a
store_type
online
tags:item
coffee
event_type
basket_add
staff_id
n/a
store_type
online
tags:item
wine
event_type
logout
staff_id
n/a
store_type
online
chbatey
event_type
login
staff_id
n/a
store_type
online
event_type
basket_add
staff_id
n/a
store_type
online
tags:item
coffee
event_type
basket_add
staff_id
n/a
store_type
online
tags:item
cheese
@chbatey
Spark Time

More Related Content

PDF
Scaling Deep Learning with MXNet
PPTX
PDF
Modern query optimisation features in MySQL 8.
PDF
Mysqlfunctions
PDF
Cassandra data structures and algorithms
PDF
Pitfalls of object_oriented_programming_gcap_09
PDF
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
PDF
What to name how to route
Scaling Deep Learning with MXNet
Modern query optimisation features in MySQL 8.
Mysqlfunctions
Cassandra data structures and algorithms
Pitfalls of object_oriented_programming_gcap_09
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
What to name how to route

Viewers also liked (20)

PDF
Social Computing Research with Apache Spark
PDF
Deep Dive into Cassandra
PDF
Cassandra Day London: Building Java Applications
PDF
3 Dundee-Spark Overview for C* developers
PDF
IoT London July 2015
PDF
1 Dundee - Cassandra 101
PDF
Dublin Meetup: Cassandra anti patterns
PDF
Think your software is fault-tolerant? Prove it!
PDF
Cassandra Day NYC - Cassandra anti patterns
PDF
Cassandra summit LWTs
PDF
Cassandra London - 2.2 and 3.0
PDF
Cassandra London - C* Spark Connector
PDF
NYC Cassandra Day - Java Intro
PDF
Manchester Hadoop Meetup: Cassandra Spark internals
PDF
LJC: Microservices in the real world
PDF
2 Dundee - Cassandra-3
PDF
Manchester Hadoop Meetup: Spark Cassandra Integration
PDF
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
PDF
Docker and jvm. A good idea?
PDF
Open source or proprietary, choose wisely!
Social Computing Research with Apache Spark
Deep Dive into Cassandra
Cassandra Day London: Building Java Applications
3 Dundee-Spark Overview for C* developers
IoT London July 2015
1 Dundee - Cassandra 101
Dublin Meetup: Cassandra anti patterns
Think your software is fault-tolerant? Prove it!
Cassandra Day NYC - Cassandra anti patterns
Cassandra summit LWTs
Cassandra London - 2.2 and 3.0
Cassandra London - C* Spark Connector
NYC Cassandra Day - Java Intro
Manchester Hadoop Meetup: Cassandra Spark internals
LJC: Microservices in the real world
2 Dundee - Cassandra-3
Manchester Hadoop Meetup: Spark Cassandra Integration
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
Docker and jvm. A good idea?
Open source or proprietary, choose wisely!
Ad

Similar to Manchester Hadoop User Group: Cassandra Intro (20)

PDF
Vienna Feb 2015: Cassandra: How it works and what it's good for!
PDF
Jan 2015 - Cassandra101 Manchester Meetup
PDF
Data Science Lab Meetup: Cassandra and Spark
PDF
LJC: Fault tolerance with Apache Cassandra
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Basics, Counters and Time Series Modeling
PDF
Paris Day Cassandra: Use case
PPTX
Apache Cassandra Developer Training Slide Deck
PDF
Advanced Data Modeling with Apache Cassandra
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
PDF
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
PDF
Advanced data modeling with apache cassandra
PDF
Cassandra and Spark
PPTX
Dan Hotka's Top 10 Oracle 12c New Features
PDF
Slide presentation pycassa_upload
PDF
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
PDF
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
PDF
Time series with Apache Cassandra - Long version
PPTX
Sql 2016 - What's New
Vienna Feb 2015: Cassandra: How it works and what it's good for!
Jan 2015 - Cassandra101 Manchester Meetup
Data Science Lab Meetup: Cassandra and Spark
LJC: Fault tolerance with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
Cassandra Basics, Counters and Time Series Modeling
Paris Day Cassandra: Use case
Apache Cassandra Developer Training Slide Deck
Advanced Data Modeling with Apache Cassandra
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
Introduction to Apache Cassandra™ + What’s New in 4.0
Advanced data modeling with apache cassandra
Cassandra and Spark
Dan Hotka's Top 10 Oracle 12c New Features
Slide presentation pycassa_upload
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
Time series with Apache Cassandra - Long version
Sql 2016 - What's New
Ad

More from Christopher Batey (8)

PDF
Webinar Cassandra Anti-Patterns
PDF
Munich March 2015 - Cassandra + Spark Overview
PDF
Reading Cassandra Meetup Feb 2015: Apache Spark
PDF
LA Cassandra Day 2015 - Testing Cassandra
PDF
LA Cassandra Day 2015 - Cassandra for developers
PDF
Voxxed Vienna 2015 Fault tolerant microservices
PDF
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
PDF
Cassandra Summit EU 2014 - Testing Cassandra Applications
Webinar Cassandra Anti-Patterns
Munich March 2015 - Cassandra + Spark Overview
Reading Cassandra Meetup Feb 2015: Apache Spark
LA Cassandra Day 2015 - Testing Cassandra
LA Cassandra Day 2015 - Cassandra for developers
Voxxed Vienna 2015 Fault tolerant microservices
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
Cassandra Summit EU 2014 - Testing Cassandra Applications

Recently uploaded (20)

PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
AI in Product Development-omnex systems
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
L1 - Introduction to python Backend.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
history of c programming in notes for students .pptx
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Essential Infomation Tech presentation.pptx
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
ai tools demonstartion for schools and inter college
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
AI in Product Development-omnex systems
Design an Analysis of Algorithms I-SECS-1021-03
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Nekopoi APK 2025 free lastest update
Navsoft: AI-Powered Business Solutions & Custom Software Development
L1 - Introduction to python Backend.pptx
Odoo Companies in India – Driving Business Transformation.pdf
Odoo POS Development Services by CandidRoot Solutions
history of c programming in notes for students .pptx
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
CHAPTER 2 - PM Management and IT Context
Essential Infomation Tech presentation.pptx
VVF-Customer-Presentation2025-Ver1.9.pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
ai tools demonstartion for schools and inter college
2025 Textile ERP Trends: SAP, Odoo & Oracle

Manchester Hadoop User Group: Cassandra Intro

  • 1. @chbatey Christopher Batey
 Technical Evangelist for Apache Cassandra Cassandra deep dive
  • 2. @chbatey Who am I? • Technical Evangelist for Apache Cassandra •Founder of Stubbed Cassandra •Help out Apache Cassandra users • DataStax •Builds enterprise ready version of Apache Cassandra • Previous: Cassandra backed apps at BSkyB
  • 3. @chbatey Overview • Cassandra use cases • Replication • Fault tolerance • Read and write path • Data modelling • Java Driver
  • 5. @chbatey It is a big world • Relational - Oracle, PostgreSQL • Graph databases - Neo4J, InfoGrid, Titan • Key value - DynamoDB • Document stores - MongoDB, Couchbase • Columnar aka wide row - Cassandra, HBase
  • 11. @chbatey Master/slave •Master serves all writes •Read from master and optionally slaves
  • 12. @chbatey Peer-to-Peer • No master • Read/write to any • Consistency?
  • 13. @chbatey Decisions decisions… CAP theorem Are these really that different?? Mongo, Redis, Couchbase Highly Available Databases: Voldermort, Cassandra
  • 16. @chbatey Common use cases •Ordered data such as time series -Event stores -Financial transactions -Sensor data e.g IoT
  • 17. @chbatey Common use cases •Ordered data such as time series -Event stores -Financial transactions -Sensor data e.g IoT •Non functional requirements: -Linear scalability -High throughout durable writes -Multi datacenter including active-active -Analytics without ETL
  • 19. @chbatey Cassandra Cassandra • Distributed masterless database (Dynamo) • Column family data model (Google BigTable)
  • 20. @chbatey Datacenter and rack aware Europe • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start USA
  • 21. @chbatey Cassandra Online • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start • Analytics with Apache SparkAnalytics
  • 23. @chbatey Dynamo 101 • The parts Cassandra took - Consistent hashing - Replication - Gossip - Hinted handoff - Anti-entropy repair • And the parts it left behind - Key/Value - Vector clocks
  • 24. @chbatey Picking the right nodes • You don’t want a full table scan on a 1000 node cluster! • Dynamo to the rescue: Consistent Hashing • Then the replication strategy takes over: - Network topology - Simple
  • 25. @chbatey Murmer3 Example • Data: • Murmer3 Hash Values: jim age: 36 car: ford gender: M carol age: 37 car: bmw gender: F johnny age: 12 gender: M suzy: age: 10 gender: F Primary Key Murmur3 hash value jim 350 carol 998 johnny 50 suzy 600 Primary Key Real hash range: -9223372036854775808 to 9223372036854775807
  • 26. @chbatey Murmer3 Example Four node cluster: Node Murmur3 start range Murmur3 end range A 0 249 B 250 499 C 500 749 D 750 999
  • 28. @chbatey Murmer3 Example Data is distributed as: Node Start range End range Primary key Hash value A 0 249 johnny 50 B 250 499 jim 350 C 500 749 suzy 600 D 750 999 carol 998
  • 30. @chbatey Replication strategy • Simple - Give it to the next node in the ring - Don’t use this in production • NetworkTopology - Every Cassandra node knows its DC and Rack - Replicas won’t be put on the same rack unless Replication Factor > # of racks - Unfortunately Cassandra can’t create servers and racks on the fly to fix this :(
  • 32. 32
  • 33. @chbatey Tunable Consistency •Data is replicated N times •Every query that you execute you give a consistency -ALL -QUORUM -LOCAL_QUORUM -ONE • Christos Kalantzis Eventual Consistency != Hopeful Consistency: http:// youtu.be/A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU
  • 34. @chbatey Load balancing •Data centre aware policy •Token aware policy •Latency aware policy •Whitelist policy APP APP Async Replication DC1 DC2
  • 35. @chbatey Scaling shouldn’t be hard • Throw more nodes at a cluster • Bootstrapping + joining the ring • For large data sets this can take some time
  • 38. @chbatey Cassandra can not join or aggregate Client Where do I go for the max?
  • 39. @chbatey CQL •Cassandra Query Language -SQL like query language •Keyspace – analogous to a schema - The keyspace determines the RF (replication factor) •Table – looks like a SQL Table CREATE TABLE scores ( name text, score int, date timestamp, PRIMARY KEY (name, score) ); INSERT INTO scores (name, score, date) VALUES ('bob', 42, '2012-06-24'); INSERT INTO scores (name, score, date) VALUES ('bob', 47, '2012-06-25'); SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
  • 41. @chbatey UUID • Universal Unique ID - 128 bit number represented in character form e.g. 99051fe9-6a9c-46c2-b949-38ef78858dd0 • Easily generated on the client - Version 1 has a timestamp component (TIMEUUID) - Version 4 has no timestamp component
  • 42. @chbatey Company Confidential TIMEUUID TIMEUUID data type supports Version 1 UUIDs Generated using time (60 bits), a clock sequence number (14 bits), and MAC address (48 bits) – CQL function ‘now()’ generates a new TIMEUUID Time can be extracted from TIMEUUID – CQL function dateOf() extracts the timestamp as a date TIMEUUID values in clustering columns or in column names are ordered based on time – DESC order on TIMEUUID lists most recent data first © 2014 DataStax, All Rights Reserved.
  • 43. @chbatey Collections CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
  • 44. @chbatey Data Model - User Defined Types • Complex data in one place • No multi-gets (multi-partitions) • Nesting! CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text> );
  • 45. Data Model - Updated • We can embed video_metadata in videos CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text ); CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );
  • 46. Data Model - Storing JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions ( units text, length float, width float, height float ); CREATE TYPE category ( catalogPage int, url text ); CREATE TABLE product ( productId int, name text, price float, description text, dimensions frozen <dimensions>, categories map <text, frozen <category>>, PRIMARY KEY (productId) );
  • 47. @chbatey Tuple type • Type to represent a group • Up to 256 different elements CREATE TABLE tuple_table ( id int PRIMARY KEY, three_tuple frozen <tuple<int, text, float>>, four_tuple frozen <tuple<int, text, float, inet>>, five_tuple frozen <tuple<int, text, float, inet, ascii>> );
  • 48. @chbatey Counters • Old has been around since .8 • Commit log replay changes counters • Repair can change a counter
  • 49. @chbatey Time-to-Live (TTL) TTL a row: INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘catherine’, ‘cachart’) USING TTL 3600; // Expires data in one hour
 TTL a column: UPDATE users USING TTL 30 SET last = ‘miller’ WHERE id = ‘abc123’ – TTL in seconds – Can also set default TTL at a table level – Expired columns/values automatically deleted – With no TTL specified, columns/values never expire – TTL is useful for automatic deletion – Re-inserting the same row before it expires will overwrite TTL
  • 52. @chbatey An example: Customer event store • Customer event - customer_id e.g ChrisBatey - event_type e.g login, logout, add_to_basket, remove_from_basket, buy_item • Staff - name e.g Charlie - favourite_colour e.g red • Store - name - type e.g Website, PhoneApp, Phone, Retail
  • 53. @chbatey Requirements • Get all events • Get all events for a particular customer • As above for a time slice
  • 54. @chbatey Modelling in a relational database CREATE TABLE customer_events( customer_id text, staff_name text, time timeuuid, event_type text, store_name text, PRIMARY KEY (customer_id)); CREATE TABLE store( name text, location text, store_type text, PRIMARY KEY (store_name)); CREATE TABLE staff( name text, favourite_colour text, job_title text, PRIMARY KEY (name));
  • 55. @chbatey Your model should look like your queries
  • 56. Modelling in Cassandra CREATE TABLE customer_events( customer_id text, staff_id text, time timeuuid, store_type text, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time)); Partition Key Clustering Column(s)
  • 57. How it is stored on disk customer _id time event_type store_type tags charles 2014-11-18 16:52:04 basket_add online {'item': 'coffee'} charles 2014-11-18 16:53:00 basket_add online {'item': ‘wine'} charles 2014-11-18 16:53:09 logout online {} chbatey 2014-11-18 16:52:21 login online {} chbatey 2014-11-18 16:53:21 basket_add online {'item': 'coffee'} chbatey 2014-11-18 16:54:00 basket_add online {'item': 'cheese'} charles event_type basket_add staff_id n/a store_type online tags:item coffee event_type basket_add staff_id n/a store_type online tags:item wine event_type logout staff_id n/a store_type online chbatey event_type login staff_id n/a store_type online event_type basket_add staff_id n/a store_type online tags:item coffee event_type basket_add staff_id n/a store_type online tags:item cheese