Introduction to Cassandra & Data model

Introduction to Cassandra
DuyHai DOAN, Technical Advocate
@doanduyhai

Shameless self-promotion!
@doanduyhai
2
Duy Hai DOAN
Cassandra technical advocate
• talks, meetups, confs
• open-source devs (Achilles, …)
• Europe technical point of contact
☞ duy_hai.doan@datastax.com
• production troubleshooting

Datastax!
@doanduyhai
3
• Founded in April 2010
• We drive Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 200+ employees
• Home to Cassandra chair & most committers (≈80%)
• Headquartered in San Francisco Bay area
• EU headquarters in London, offices in France and Germany

Agenda!
@doanduyhai
4
Architecture
• Cluster, Replication, Consistency
Data model
• Last Write Win (LWW), CQL basics, From SQL to CQL
Dev Center Demo
DSE overview
CQL In Depth (time permitted)

Cassandra history!
@doanduyhai
5
NoSQL database
• created at Facebook
• open-sourced since 2008
• current version = 2.1
• column-oriented ☞ distributed table

Cassandra 5 key facts!
@doanduyhai
6
Linear scalability
Small & « huge » scale
• 2 à1k+ nodes cluster
• 3Gb à Pb+

@doanduyhai
7
Continuous availability (≈100% up-time)
• resilient architecture (Dynamo)
• rolling upgrades
• data backward compatible n/n+1 versions

@doanduyhai
8
Multi-data centers
• out-of-the-box (config only)
• AWS conf for multi-region DCs
• GCE/CloudStack support
• resilience, work-load segregation
• virtual data-centers

@doanduyhai
9
Operational simplicity
• 1 node = 1 process + 1 config file
• deployment automation
• OpsCenter for monitoring

@doanduyhai
10
Analytics combo
• Cassandra + Spark = awesome !
• realtime streaming

Cassandra architecture!
Cluster
Replication
Consistency

@doanduyhai
12
Cluster layer
• Amazon DynamoDB paper
• masterless architecture
Data-store layer
• Google Big Table paper
• Columns/columns family

@doanduyhai
13
API (CQL & RPC)
CLUSTER (DYNAMO)
DATA STORE (BIG TABLES)
DISKS
Node1
Client request
API (CQL & RPC)
CLUSTER (DYNAMO)
DATA STORE (BIG TABLES)
DISKS
Node2

Data distribution!
@doanduyhai
14
Random: hash of #partition → token = hash(#p)
Hash: ]-X, X]
X = huge number (264/2)
n1
n2
n3
n4
n5
n6
n7
n8

Token Ranges!
@doanduyhai
15
A: ]0, X/8]
B: ] X/8, 2X/8]
C: ] 2X/8, 3X/8]
D: ] 3X/8, 4X/8]
E: ] 4X/8, 5X/8]
F: ] 5X/8, 6X/8]
G: ] 6X/8, 7X/8]
H: ] 7X/8, X]
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H

Linear scalability!
@doanduyhai
16
n1
n2
8 nodes 10 nodes
n3
n4
n5
n6
n7
n8
n1
n2
n3 n4
n5
n6
n7
n9 n8
n10

Failure tolerance!
@doanduyhai
17
Replication Factor (RF) = 3
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
{B, A, H}
{C, B, A}
{D, C, B}
A
B
C
D
E
F
G
H

Coordinator node!
Incoming requests (read/write)
Coordinator node handles the request
Every node can be coordinator àmasterless
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
request

Consistency!
@doanduyhai
19
Tunable at runtime
• ONE
• QUORUM (strict majority w.r.t. RF)
• ALL
Apply both to read & write

Write consistency!
Write ONE
• write request to all replicas in //
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Write consistency!
Write ONE
• wait for ONE ack before returning to
client
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs

Write consistency!
Write ONE
• wait for ONE ack before returning to
client
• other acks later, asynchronously
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
10 μs
120 μs

Write consistency!
Write QUORUM
• wait for QUORUM acks before
returning to client
• other acks later, asynchronously
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
10 μs
120 μs

Read consistency!
Read ONE
• read from one node among all replicas
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read ONE
• read from one node among all replicas
• contact the fastest node (stats)
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read QUORUM
• read from one fastest node
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read QUORUM
• AND request digest from other
replicas to reach QUORUM
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read QUORUM
• return most up-to-date data to client
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read QUORUM
• return most up-to-date data to client
• repair if digest mismatch n1
@doanduyhai
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Consistency trade-off!
@doanduyhai
30

Consistency in action!
@doanduyhai
31
RF = 3, Write ONE, Read ONE
B A A
B A A
Read ONE: A
data replication in progress …
Write ONE: B

@doanduyhai
32
RF = 3, Write ONE, Read QUORUM
B A A
Write ONE: B
Read QUORUM: A
B A A

@doanduyhai
33
RF = 3, Write ONE, Read ALL
B A A
Read ALL: B
B A A
Write ONE: B

@doanduyhai
34
RF = 3, Write QUORUM, Read ONE
B B A
Write QUORUM: B
Read ONE: A
B B A

@doanduyhai
35
RF = 3, Write QUORUM, Read QUORUM
B B A
Read QUORUM: B
B B A
Write QUORUM: B

Consistency level!
@doanduyhai
36
ONE
Fast, may not read latest written value

Consistency level!
@doanduyhai
37
QUORUM
Strict majority w.r.t. Replication Factor
Good balance

Consistency level!
@doanduyhai
38
ALL
Paranoid
Slow, no high availability

Consistency summary!
ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down
QUORUMRead + QUORUMWrite
☞ available for read/write even 1+ replica down
@doanduyhai 39

Data model!
Cassandra Write Path!
Last Write Win!
CQL basics!
From SQL to CQL!

@doanduyhai
42
Commit log1
. . .
1
Commit log2
Commit logn
Memory

@doanduyhai
43
Memory
MemTable
Table1
Commit log1
. . .
1
Commit log2
Commit logn
MemTable
Table2
MemTable
TableN
2
. . .

@doanduyhai
44
Commit log1
Commit log2
Commit logn
Table1
Table2 Table3
SStable2 SStable3 3
SStable1
Memory
. . .

@doanduyhai
45
MemTable . . . Memory
Table1
Commit log1
Commit log2
Commit logn
Table1
SStable1
Table2 Table3
SStable2 SStable3
MemTable
Table2
MemTable
TableN
. . .

@doanduyhai
46
Commit log1
Commit log2
SStable3 . . .
Commit logn
Table1
SStable1
Memory
Table2 Table3
SStable2 SStable3
SStable1
SStable2

Last Write Win (LWW)!
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
@doanduyhai
47
jdoe
age
name
33 John DOE
#partition

@doanduyhai
jdoe
age (t1) name (t1)
33 John DOE
48
auto-generated timestamp (μs)
.

@doanduyhai
49
UPDATE users SET age = 34 WHERE login = jdoe;
jdoe
SSTable1 SSTable2
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34

@doanduyhai
50
DELETE age FROM users WHERE login = jdoe;
tombstone
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34

@doanduyhai
51
SELECT age FROM users WHERE login = jdoe;
? ? ?
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34

@doanduyhai
52
✕ ✕ ✓
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34

Compaction!
@doanduyhai
53
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
New SSTable
jdoe
age (t3) name (t1)
ý John DOE

CRUD operations!
@doanduyhai
54
UPDATE users SET age = 34 WHERE login = jdoe;
DELETE age FROM users WHERE login = jdoe;

Simple Table!
@doanduyhai
55
CREATE TABLE users (
login text,
name text,
age int,
…
PRIMARY KEY(login));
partition key (#partition)

Clustered table!
@doanduyhai
56
CREATE TABLE mailbox (
login text,
message_id timeuuid,
interlocutor text,
message text,
PRIMARY KEY((login), message_id));
partition key clustering column unicity

Queries!
@doanduyhai
57
Get message by user and message_id (date)
SELECT * FROM mailbox WHERE login = jdoe
and message_id = ‘2014-09-25 16:00:00’;
Get message by user and date interval
SELECT * FROM mailbox WHERE login = jdoe
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;

Queries!
@doanduyhai
58
Get message by message_id only (#partition not provided)
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval (#partition not provided)
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;

Queries!
Get message by user range (range query on #partition)
Get message by user pattern (non exact match on #partition)
@doanduyhai
59
SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe;
SELECT * FROM mailbox WHERE login like ‘%doe%‘;

WHERE clause restrictions!
@doanduyhai
60
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
• ☞ full cluster scan
On clustering columns, only range queries (<, ≤, >, ≥) and exact match
WHERE clause only possible on columns defined in PRIMARY KEY

@doanduyhai
61
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields

@doanduyhai
62
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields
☞ Apache Solr (Lucene) integration (DSE)
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND sex:male’;
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;

Collections & maps!
@doanduyhai
63
login text,
name text,
age int,
friends set<text>,
hobbies list<text>,
languages map<int, text>,
…

User Defined Type (UDT)!
Instead of
@doanduyhai
64
login text,
…
street_number int,
street_name text,
postcode int,
country text,
…

User Defined Type (UDT)!
@doanduyhai
65
CREATE TYPE address (
street_number int,
street_name text,
postcode int,
country text);
login text,
…
location frozen <address>,
…

UDT insert!
@doanduyhai
66
INSERT INTO users(login,name, location) VALUES (
‘jdoe’,
’John DOE’,
{
‘street_number’: 124,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
});

UDT update!
@doanduyhai
67
UPDATE users set location =
{
‘street_number’: 125,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
}
WHERE login = jdoe;

From SQL to CQL!
@doanduyhai
68
Remember…

From SQL to CQL!
@doanduyhai
69
Remember…
CQL is not SQL

From SQL to CQL!
@doanduyhai
70
Remember…
there is no join
(do you want to scale ?)

From SQL to CQL!
@doanduyhai
71
Remember…
there is no integrity constraint
(do you want to read-before-write ?)

From SQL to CQL!
@doanduyhai
72
Paradigm change
• space is cheap (somehow …), latency is precious
• embrace immutability
• think query first
• denormalize !!!

From SQL to CQL!
@doanduyhai
73
Normalized
User
1
n
Comment
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author_id text, // typical join id
content text,
PRIMARY KEY((article_id), comment_id));

From SQL to CQL!
@doanduyhai
74
De-normalized
User
1
n
Comment
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author person, // person is UDT
content text,
PRIMARY KEY((article_id), comment_id));

Data modeling best practices!
@doanduyhai
75
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT

@doanduyhai
76
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT
Denormalize
• wisely, only duplicate necessary & immutable data
• functional/technical trade-off

@doanduyhai
77
Person UDT
- firstname/lastname
- date of birth
- gender
- mood
- location

@doanduyhai
78
John DOE, male
birthdate: 21/02/1981
subscribed since 03/06/2011
☉ San Mateo, CA
’’Impossible is not John DOE’’
Full detail read from
User table on click

DSE (Datastax Enterprise)!
@doanduyhai
81
Security
Analytics (Spark & Hadoop)
Search (Solr)

OpsCenter Enterprise!
@doanduyhai
82

Training Day | December 3rd
Beginner Track
• Introduction to Cassandra
• Introduction to Spark, Shark, Scala and
Cassandra
Advanced Track
• Data Modeling
• Performance Tuning
Conference Day | December 4th
Cassandra Summit Europe 2014 will be the single
largest gathering of Cassandra users in Europe.
Learn how the world's most successful companies are
transforming their businesses and growing faster than
ever using Apache Cassandra.
http://bit.ly/cassandrasummit2014
@doanduyhai Company Confidential 83

CQL In Depth!
Simple Table!
Clustered Table!
Bucketing!

Storage Engine!
@doanduyhai
85
#partition1
#col1 #col2 #col3 #col4
cell1 cell2 cell3 cell4
#partition2
#col1 #col2 #col3
cell1 cell2 cell3
#partition3
#col1 #col2
cell1 cell2
#partition4
#col1 #col2 #col3 #col4 …
cell1 cell2 cell3 cell4 …
Partition Key
Column Name
Cell

Data Model Abstraction!
@doanduyhai
86
Table ≈ Map<#p,SortedMap<#col,cell>>
SortedMap<token,…>

Data Model Abstraction!
@doanduyhai
87
SortedMap<#col,cell>>
!
!
SortedMap<token,…>
Unicity
Sort

Static Data Type!
Partition Key Type Column Name Type Cell Type
@doanduyhai
88
Native types: bigint, blob, counter, decimal, double, float, inet, int,
timestamp, timeuuid, uuid.

Simple Table Mapping!
@doanduyhai
89
login text,
name text,
age int,
…
Map<login,SortedMap<column_label,value>>!
text text blob

@doanduyhai
90
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’);
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’);
RowKey: jdoe
=> (name=, value=, timestamp=1412419763515000)
=> (name=age, value=00000021, timestamp=1412419763515000)
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000)
RowKey: hsue
=> (name=age, value=0000001c, timestamp=1412419776578000)
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)!

@doanduyhai
91
RowKey: jdoe
RowKey: hsue
Marker column

@doanduyhai
92
RowKey: jdoe
RowKey: hsue
Sorted
column_label

@doanduyhai
93
RowKey: jdoe
RowKey: hsue
Values
as bytes

Clustered Table Mapping!
@doanduyhai
94
CREATE TABLE daily_3g_quality_per_city (
operator text,
city text,
date int, //date as YYYYMMdd
latency_ms int,
power_watt double,
PRIMARY KEY((operator), city, date);
Map<operator,
SortedMap<city,
SortedMap<date,
SortedMap<column_label,value>>>>!

@doanduyhai
95
RowKey: verizon
=> (name=Austin:20140910:, value=, timestamp=…)
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…)
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…)
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…)
=> (name=New York:20140913:, value=, timestamp=1412422893832000)
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…)
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…)
=> (name=New York:20140917:, value=, timestamp=…)
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…)
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…)

@doanduyhai
96
RowKey: verizon
Sort first by
city

@doanduyhai
97
RowKey: verizon
… then by
date

@doanduyhai
98
RowKey: verizon
… then by
column_label

Query With Clustered Table!
Select by operator and city for all dates
Select by operator and city range for all dates
@doanduyhai
99
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’
WHERE operator = ‘verizon’ AND city >= ‘Austin’ AND city <= ‘New York’

Select by operator and city and date
Select by operator and city and range of date
@doanduyhai
100
WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date = 20140910
AND date >= 20140910 AND date <= 20140913

@doanduyhai
101
Select by operator and city and date tuples
AND date IN (20140910, 20140913)

@doanduyhai
102
Select by operator and date without city
WHERE operator = ‘verizon’ AND date = 20140910
Map<operator,
SortedMap<city,
SortedMap<date,
SortedMap<column_label,value>>>>!

Bucketing!
@doanduyhai
103
CREATE TABLE sensor_data (
sensor_id text,
date timestamp,
raw_data blob,
PRIMARY KEY(sensor_id, date));
sensor_id
date1 date2 date3 date4 …
blob1 blob2 blob3 blob4 …

Bucketing!
@doanduyhai
104
Problems:
• limit of 2.109 physical columns
• bad load balancing (1 user = 1 node)
• wide row spans over many files
sensor_id

Bucketing!
@doanduyhai
105
Idea:
• composite partition key: sensor_id:date_bucket
• tunable date granularity: per hour/per day/per month …
CREATE TABLE sensor_data (
sensor_id text,
date_bucket int, //format YYYYMMdd
date timestamp,
raw_data blob,
PRIMARY KEY((sensor_id, date_bucket), date));

Bucketing!
Idea:
• composite partition key: sensor_id:date_bucket
• tunable date granularity: per hour/per day/per month …
@doanduyhai
106
sensor_id:2014091014
Buckets

Bucketing!
@doanduyhai
107
Advantage:
• distribute load: 1 bucket = 1 node
• limit partition width (max x columns per bucket)
Buckets

Bucketing!
@doanduyhai
108
But how can I select raw data between 14:45 and 15:10 ?
14:45 à ?
15:00 à 15:10

Bucketing!
Solution
• use IN clause on partition key component
• with range condition on date column
☞ date column should be monotonic function (increasing/decreasing)
@doanduyhai
109
SELECT * FROM sensor_data WHERE sensor_id = xxx
AND date_bucket IN (2014091014 , 2014091015)
AND date >= ‘2014-09-10 14:45:00.000‘
AND date <= ‘2014-09-10 15:10:00.000‘

Bucketing Caveats!
@doanduyhai
110
IN clause for #partition is not silver bullet !
• use scarcely
• keep cardinality low (≤ 5)
n1
n2
n3
n4
n5
n6
n7
coordinator
n8

Bucketing Caveats!
@doanduyhai
111
IN clause for #partition is not silver bullet !
• use scarcely
• keep cardinality low (≤ 5)
• prefer // async queries
• ease of query vs perf
n1
n2
n3
n4
n5
n6
n7
n8
Async client

Thank You
@doanduyhai
duy_hai.doan@datastax.com
https://academy.datastax.com/

Introduction to Cassandra & Data model

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Introduction to Cassandra & Data model (20)

More from Duyhai Doan (20)

Recently uploaded (20)

Introduction to Cassandra & Data model