SlideShare a Scribd company logo
Introduction to Cassandra 
DuyHai DOAN, Technical Advocate 
@doanduyhai
Shameless self-promotion! 
@doanduyhai 
2 
Duy Hai DOAN 
Cassandra technical advocate 
• talks, meetups, confs 
• open-source devs (Achilles, …) 
• Europe technical point of contact 
☞ duy_hai.doan@datastax.com 
• production troubleshooting
Datastax! 
@doanduyhai 
3 
• Founded in April 2010 
• We drive Apache Cassandra™ 
• 400+ customers (25 of the Fortune 100), 200+ employees 
• Home to Cassandra chair & most committers (≈80%) 
• Headquartered in San Francisco Bay area 
• EU headquarters in London, offices in France and Germany
Agenda! 
@doanduyhai 
4 
Architecture 
• Cluster, Replication, Consistency 
Data model 
• Last Write Win (LWW), CQL basics, From SQL to CQL 
Dev Center Demo 
DSE overview 
CQL In Depth (time permitted)
Cassandra history! 
@doanduyhai 
5 
NoSQL database 
• created at Facebook 
• open-sourced since 2008 
• current version = 2.1 
• column-oriented ☞ distributed table
Cassandra 5 key facts! 
@doanduyhai 
6 
Linear scalability 
Small & « huge » scale 
• 2 à1k+ nodes cluster 
• 3Gb à Pb+
Cassandra 5 key facts! 
@doanduyhai 
7 
Continuous availability (≈100% up-time) 
• resilient architecture (Dynamo) 
• rolling upgrades 
• data backward compatible n/n+1 versions
Cassandra 5 key facts! 
@doanduyhai 
8 
Multi-data centers 
• out-of-the-box (config only) 
• AWS conf for multi-region DCs 
• GCE/CloudStack support 
• resilience, work-load segregation 
• virtual data-centers
Cassandra 5 key facts! 
@doanduyhai 
9 
Operational simplicity 
• 1 node = 1 process + 1 config file 
• deployment automation 
• OpsCenter for monitoring
Cassandra 5 key facts! 
@doanduyhai 
10 
Analytics combo 
• Cassandra + Spark = awesome ! 
• realtime streaming
Cassandra architecture! 
Cluster 
Replication 
Consistency
Cassandra architecture! 
@doanduyhai 
12 
Cluster layer 
• Amazon DynamoDB paper 
• masterless architecture 
Data-store layer 
• Google Big Table paper 
• Columns/columns family
Cassandra architecture! 
@doanduyhai 
13 
API (CQL & RPC) 
CLUSTER (DYNAMO) 
DATA STORE (BIG TABLES) 
DISKS 
Node1 
Client request 
API (CQL & RPC) 
CLUSTER (DYNAMO) 
DATA STORE (BIG TABLES) 
DISKS 
Node2
Data distribution! 
@doanduyhai 
14 
Random: hash of #partition → token = hash(#p) 
Hash: ]-X, X] 
X = huge number (264/2) 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8
Token Ranges! 
@doanduyhai 
15 
A: ]0, X/8] 
B: ] X/8, 2X/8] 
C: ] 2X/8, 3X/8] 
D: ] 3X/8, 4X/8] 
E: ] 4X/8, 5X/8] 
F: ] 5X/8, 6X/8] 
G: ] 6X/8, 7X/8] 
H: ] 7X/8, X] 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
A 
B 
C 
D 
E 
F 
G 
H
Linear scalability! 
@doanduyhai 
16 
n1 
n2 
8 nodes 10 nodes 
n3 
n4 
n5 
n6 
n7 
n8 
n1 
n2 
n3 n4 
n5 
n6 
n7 
n9 n8 
n10
Failure tolerance! 
@doanduyhai 
17 
Replication Factor (RF) = 3 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
{B, A, H} 
{C, B, A} 
{D, C, B} 
A 
B 
C 
D 
E 
F 
G 
H
Coordinator node! 
Incoming requests (read/write) 
Coordinator node handles the request 
Every node can be coordinator àmasterless 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
request
Consistency! 
@doanduyhai 
19 
Tunable at runtime 
• ONE 
• QUORUM (strict majority w.r.t. RF) 
• ALL 
Apply both to read & write
Write consistency! 
Write ONE 
• write request to all replicas in // 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Write consistency! 
Write ONE 
• write request to all replicas in // 
• wait for ONE ack before returning to 
client 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs
Write consistency! 
Write ONE 
• write request to all replicas in // 
• wait for ONE ack before returning to 
client 
• other acks later, asynchronously 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs 
10 μs 
120 μs
Write consistency! 
Write QUORUM 
• write request to all replicas in // 
• wait for QUORUM acks before 
returning to client 
• other acks later, asynchronously 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs 
10 μs 
120 μs
Read consistency! 
Read ONE 
• read from one node among all replicas 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read ONE 
• read from one node among all replicas 
• contact the fastest node (stats) 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
replicas to reach QUORUM 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
replicas to reach QUORUM 
• return most up-to-date data to client 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
replicas to reach QUORUM 
• return most up-to-date data to client 
• repair if digest mismatch n1 
@doanduyhai 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Consistency trade-off! 
@doanduyhai 
30
Consistency in action! 
@doanduyhai 
31 
RF = 3, Write ONE, Read ONE 
B A A 
B A A 
Read ONE: A 
data replication in progress … 
Write ONE: B
Consistency in action! 
@doanduyhai 
32 
RF = 3, Write ONE, Read QUORUM 
B A A 
Write ONE: B 
Read QUORUM: A 
data replication in progress … 
B A A
Consistency in action! 
@doanduyhai 
33 
RF = 3, Write ONE, Read ALL 
B A A 
Read ALL: B 
data replication in progress … 
B A A 
Write ONE: B
Consistency in action! 
@doanduyhai 
34 
RF = 3, Write QUORUM, Read ONE 
B B A 
Write QUORUM: B 
Read ONE: A 
data replication in progress … 
B B A
Consistency in action! 
@doanduyhai 
35 
RF = 3, Write QUORUM, Read QUORUM 
B B A 
Read QUORUM: B 
data replication in progress … 
B B A 
Write QUORUM: B
Consistency level! 
@doanduyhai 
36 
ONE 
Fast, may not read latest written value
Consistency level! 
@doanduyhai 
37 
QUORUM 
Strict majority w.r.t. Replication Factor 
Good balance
Consistency level! 
@doanduyhai 
38 
ALL 
Paranoid 
Slow, no high availability
Consistency summary! 
ONERead + ONEWrite 
☞ available for read/write even (N-1) replicas down 
QUORUMRead + QUORUMWrite 
☞ available for read/write even 1+ replica down 
@doanduyhai 39
! " 
! 
Q & R
Data model! 
Cassandra Write Path! 
Last Write Win! 
CQL basics! 
From SQL to CQL!
Cassandra Write Path! 
@doanduyhai 
42 
Commit log1 
. . . 
1 
Commit log2 
Commit logn 
Memory
Cassandra Write Path! 
@doanduyhai 
43 
Memory 
MemTable 
Table1 
Commit log1 
. . . 
1 
Commit log2 
Commit logn 
MemTable 
Table2 
MemTable 
TableN 
2 
. . .
Cassandra Write Path! 
@doanduyhai 
44 
Commit log1 
Commit log2 
Commit logn 
Table1 
Table2 Table3 
SStable2 SStable3 3 
SStable1 
Memory 
. . .
Cassandra Write Path! 
@doanduyhai 
45 
MemTable . . . Memory 
Table1 
Commit log1 
Commit log2 
Commit logn 
Table1 
SStable1 
Table2 Table3 
SStable2 SStable3 
MemTable 
Table2 
MemTable 
TableN 
. . .
Cassandra Write Path! 
@doanduyhai 
46 
Commit log1 
Commit log2 
SStable3 . . . 
Commit logn 
Table1 
SStable1 
Memory 
Table2 Table3 
SStable2 SStable3 
SStable1 
SStable2
Last Write Win (LWW)! 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
@doanduyhai 
47 
jdoe 
age 
name 
33 John DOE 
#partition
Last Write Win (LWW)! 
@doanduyhai 
jdoe 
age (t1) name (t1) 
33 John DOE 
48 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
auto-generated timestamp (μs) 
.
Last Write Win (LWW)! 
@doanduyhai 
49 
UPDATE users SET age = 34 WHERE login = jdoe; 
jdoe 
SSTable1 SSTable2 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
@doanduyhai 
50 
DELETE age FROM users WHERE login = jdoe; 
tombstone 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
@doanduyhai 
51 
SELECT age FROM users WHERE login = jdoe; 
? ? ? 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
@doanduyhai 
52 
SELECT age FROM users WHERE login = jdoe; 
✕ ✕ ✓ 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Compaction! 
@doanduyhai 
53 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34 
New SSTable 
jdoe 
age (t3) name (t1) 
ý John DOE
CRUD operations! 
@doanduyhai 
54 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
UPDATE users SET age = 34 WHERE login = jdoe; 
DELETE age FROM users WHERE login = jdoe; 
SELECT age FROM users WHERE login = jdoe;
Simple Table! 
@doanduyhai 
55 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
… 
PRIMARY KEY(login)); 
partition key (#partition)
Clustered table! 
@doanduyhai 
56 
CREATE TABLE mailbox ( 
login text, 
message_id timeuuid, 
interlocutor text, 
message text, 
PRIMARY KEY((login), message_id)); 
partition key clustering column unicity
Queries! 
@doanduyhai 
57 
Get message by user and message_id (date) 
SELECT * FROM mailbox WHERE login = jdoe 
and message_id = ‘2014-09-25 16:00:00’; 
Get message by user and date interval 
SELECT * FROM mailbox WHERE login = jdoe 
and message_id <= ‘2014-09-25 16:00:00’ 
and message_id >= ‘2014-09-20 16:00:00’;
Queries! 
@doanduyhai 
58 
Get message by message_id only (#partition not provided) 
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’; 
Get message by date interval (#partition not provided) 
SELECT * FROM mailbox WHERE 
and message_id <= ‘2014-09-25 16:00:00’ 
and message_id >= ‘2014-09-20 16:00:00’;
Queries! 
Get message by user range (range query on #partition) 
Get message by user pattern (non exact match on #partition) 
@doanduyhai 
59 
SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe; 
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
WHERE clause restrictions! 
@doanduyhai 
60 
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition 
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed 
• ☞ full cluster scan 
On clustering columns, only range queries (<, ≤, >, ≥) and exact match 
WHERE clause only possible on columns defined in PRIMARY KEY
WHERE clause restrictions! 
@doanduyhai 
61 
What if I want to perform « arbitrary » WHERE clause ? 
• search form scenario, dynamic search fields
WHERE clause restrictions! 
@doanduyhai 
62 
What if I want to perform « arbitrary » WHERE clause ? 
• search form scenario, dynamic search fields 
☞ Apache Solr (Lucene) integration (DSE) 
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND sex:male’; 
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
Collections & maps! 
@doanduyhai 
63 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
friends set<text>, 
hobbies list<text>, 
languages map<int, text>, 
… 
PRIMARY KEY(login));
User Defined Type (UDT)! 
Instead of 
@doanduyhai 
64 
CREATE TABLE users ( 
login text, 
… 
street_number int, 
street_name text, 
postcode int, 
country text, 
… 
PRIMARY KEY(login));
User Defined Type (UDT)! 
@doanduyhai 
65 
CREATE TYPE address ( 
street_number int, 
street_name text, 
postcode int, 
country text); 
CREATE TABLE users ( 
login text, 
… 
location frozen <address>, 
… 
PRIMARY KEY(login));
UDT insert! 
@doanduyhai 
66 
INSERT INTO users(login,name, location) VALUES ( 
‘jdoe’, 
’John DOE’, 
{ 
‘street_number’: 124, 
‘street_name’: ‘Congress Avenue’, 
‘postcode’: 95054, 
‘country’: ‘USA’ 
});
UDT update! 
@doanduyhai 
67 
UPDATE users set location = 
{ 
‘street_number’: 125, 
‘street_name’: ‘Congress Avenue’, 
‘postcode’: 95054, 
‘country’: ‘USA’ 
} 
WHERE login = jdoe;
From SQL to CQL! 
@doanduyhai 
68 
Remember…
From SQL to CQL! 
@doanduyhai 
69 
Remember… 
CQL is not SQL
From SQL to CQL! 
@doanduyhai 
70 
Remember… 
there is no join 
(do you want to scale ?)
From SQL to CQL! 
@doanduyhai 
71 
Remember… 
there is no integrity constraint 
(do you want to read-before-write ?)
From SQL to CQL! 
@doanduyhai 
72 
Paradigm change 
• space is cheap (somehow …), latency is precious 
• embrace immutability 
• think query first 
• denormalize !!!
From SQL to CQL! 
@doanduyhai 
73 
Normalized 
User 
1 
n 
Comment 
CREATE TABLE comments ( 
article_id uuid, 
comment_id timeuuid, 
author_id text, // typical join id 
content text, 
PRIMARY KEY((article_id), comment_id));
From SQL to CQL! 
@doanduyhai 
74 
De-normalized 
User 
1 
n 
Comment 
CREATE TABLE comments ( 
article_id uuid, 
comment_id timeuuid, 
author person, // person is UDT 
content text, 
PRIMARY KEY((article_id), comment_id));
Data modeling best practices! 
@doanduyhai 
75 
Start by queries 
• identify core functional read paths 
• 1 read scenario ≈ 1 SELECT
Data modeling best practices! 
@doanduyhai 
76 
Start by queries 
• identify core functional read paths 
• 1 read scenario ≈ 1 SELECT 
Denormalize 
• wisely, only duplicate necessary & immutable data 
• functional/technical trade-off
Data modeling best practices! 
@doanduyhai 
77 
Person UDT 
- firstname/lastname 
- date of birth 
- gender 
- mood 
- location
Data modeling best practices! 
@doanduyhai 
78 
John DOE, male 
birthdate: 21/02/1981 
subscribed since 03/06/2011 
☉ San Mateo, CA 
’’Impossible is not John DOE’’ 
Full detail read from 
User table on click
! " 
! 
Q & R
Dev Center 
Demo
DSE (Datastax Enterprise)! 
@doanduyhai 
81 
Security 
Analytics (Spark & Hadoop) 
Search (Solr)
OpsCenter Enterprise! 
@doanduyhai 
82
Training Day | December 3rd 
Beginner Track 
• Introduction to Cassandra 
• Introduction to Spark, Shark, Scala and 
Cassandra 
Advanced Track 
• Data Modeling 
• Performance Tuning 
Conference Day | December 4th 
Cassandra Summit Europe 2014 will be the single 
largest gathering of Cassandra users in Europe. 
Learn how the world's most successful companies are 
transforming their businesses and growing faster than 
ever using Apache Cassandra. 
http://bit.ly/cassandrasummit2014 
@doanduyhai Company Confidential 83
CQL In Depth! 
Simple Table! 
Clustered Table! 
Bucketing!
Storage Engine! 
@doanduyhai 
85 
#partition1 
#col1 #col2 #col3 #col4 
cell1 cell2 cell3 cell4 
#partition2 
#col1 #col2 #col3 
cell1 cell2 cell3 
#partition3 
#col1 #col2 
cell1 cell2 
#partition4 
#col1 #col2 #col3 #col4 … 
cell1 cell2 cell3 cell4 … 
Partition Key 
Column Name 
Cell
Data Model Abstraction! 
@doanduyhai 
86 
Table ≈ Map<#p,SortedMap<#col,cell>> 
SortedMap<token,…>
Data Model Abstraction! 
@doanduyhai 
87 
Table ≈ Map<#p,SortedMap<#col,cell>> 
SortedMap<#col,cell>> 
! 
! 
SortedMap<token,…> 
Unicity 
Sort
Static Data Type! 
Partition Key Type Column Name Type Cell Type 
@doanduyhai 
88 
Table ≈ Map<#p,SortedMap<#col,cell>> 
Native types: bigint, blob, counter, decimal, double, float, inet, int, 
timestamp, timeuuid, uuid.
Simple Table Mapping! 
@doanduyhai 
89 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
… 
PRIMARY KEY(login)); 
Map<login,SortedMap<column_label,value>>! 
text text blob
Simple Table Mapping! 
@doanduyhai 
90 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); 
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); 
RowKey: jdoe 
=> (name=, value=, timestamp=1412419763515000) 
=> (name=age, value=00000021, timestamp=1412419763515000) 
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) 
RowKey: hsue 
=> (name=, value=, timestamp=1412419776578000) 
=> (name=age, value=0000001c, timestamp=1412419776578000) 
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)!
Simple Table Mapping! 
@doanduyhai 
91 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); 
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); 
RowKey: jdoe 
=> (name=, value=, timestamp=1412419763515000) 
=> (name=age, value=00000021, timestamp=1412419763515000) 
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) 
RowKey: hsue 
=> (name=, value=, timestamp=1412419776578000) 
=> (name=age, value=0000001c, timestamp=1412419776578000) 
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)! 
Marker column
Simple Table Mapping! 
@doanduyhai 
92 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); 
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); 
RowKey: jdoe 
=> (name=, value=, timestamp=1412419763515000) 
=> (name=age, value=00000021, timestamp=1412419763515000) 
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) 
RowKey: hsue 
=> (name=, value=, timestamp=1412419776578000) 
=> (name=age, value=0000001c, timestamp=1412419776578000) 
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)! 
Sorted 
column_label
Simple Table Mapping! 
@doanduyhai 
93 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); 
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); 
RowKey: jdoe 
=> (name=, value=, timestamp=1412419763515000) 
=> (name=age, value=00000021, timestamp=1412419763515000) 
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) 
RowKey: hsue 
=> (name=, value=, timestamp=1412419776578000) 
=> (name=age, value=0000001c, timestamp=1412419776578000) 
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)! 
Values 
as bytes
Clustered Table Mapping! 
@doanduyhai 
94 
CREATE TABLE daily_3g_quality_per_city ( 
operator text, 
city text, 
date int, //date as YYYYMMdd 
latency_ms int, 
power_watt double, 
PRIMARY KEY((operator), city, date); 
Map<operator, 
SortedMap<city, 
SortedMap<date, 
SortedMap<column_label,value>>>>!
Clustered Table Mapping! 
@doanduyhai 
95 
RowKey: verizon 
=> (name=Austin:20140910:, value=, timestamp=…) 
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) 
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) 
=> (name=Austin:20140911:, value=, timestamp=…) 
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) 
=> (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) 
=> (name=New York:20140913:, value=, timestamp=1412422893832000) 
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) 
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) 
=> (name=New York:20140917:, value=, timestamp=…) 
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…) 
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…)
Clustered Table Mapping! 
@doanduyhai 
96 
RowKey: verizon 
=> (name=Austin:20140910:, value=, timestamp=…) 
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) 
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) 
=> (name=Austin:20140911:, value=, timestamp=…) 
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) 
=> (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) 
=> (name=New York:20140913:, value=, timestamp=1412422893832000) 
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) 
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) 
=> (name=New York:20140917:, value=, timestamp=…) 
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…) 
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) 
Sort first by 
city
Clustered Table Mapping! 
@doanduyhai 
97 
RowKey: verizon 
=> (name=Austin:20140910:, value=, timestamp=…) 
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) 
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) 
=> (name=Austin:20140911:, value=, timestamp=…) 
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) 
=> (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) 
=> (name=New York:20140913:, value=, timestamp=1412422893832000) 
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) 
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) 
=> (name=New York:20140917:, value=, timestamp=…) 
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…) 
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) 
… then by 
date
Clustered Table Mapping! 
@doanduyhai 
98 
RowKey: verizon 
=> (name=Austin:20140910:, value=, timestamp=…) 
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) 
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) 
=> (name=Austin:20140911:, value=, timestamp=…) 
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) 
=> (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) 
=> (name=New York:20140913:, value=, timestamp=1412422893832000) 
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) 
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) 
=> (name=New York:20140917:, value=, timestamp=…) 
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…) 
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) 
… then by 
column_label
Query With Clustered Table! 
Select by operator and city for all dates 
Select by operator and city range for all dates 
@doanduyhai 
99 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city = ‘Austin’ 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city >= ‘Austin’ AND city <= ‘New York’
Query With Clustered Table! 
Select by operator and city and date 
Select by operator and city and range of date 
@doanduyhai 
100 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date = 20140910 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city = ‘Austin’ 
AND date >= 20140910 AND date <= 20140913
Query With Clustered Table! 
@doanduyhai 
101 
Select by operator and city and date tuples 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city = ‘Austin’ 
AND date IN (20140910, 20140913)
Query With Clustered Table! 
@doanduyhai 
102 
Select by operator and date without city 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND date = 20140910 
Map<operator, 
SortedMap<city, 
SortedMap<date, 
SortedMap<column_label,value>>>>!
Bucketing! 
@doanduyhai 
103 
CREATE TABLE sensor_data ( 
sensor_id text, 
date timestamp, 
raw_data blob, 
PRIMARY KEY(sensor_id, date)); 
sensor_id 
date1 date2 date3 date4 … 
blob1 blob2 blob3 blob4 …
Bucketing! 
@doanduyhai 
104 
Problems: 
• limit of 2.109 physical columns 
• bad load balancing (1 user = 1 node) 
• wide row spans over many files 
sensor_id 
date1 date2 date3 date4 … 
blob1 blob2 blob3 blob4 …
Bucketing! 
@doanduyhai 
105 
Idea: 
• composite partition key: sensor_id:date_bucket 
• tunable date granularity: per hour/per day/per month … 
CREATE TABLE sensor_data ( 
sensor_id text, 
date_bucket int, //format YYYYMMdd 
date timestamp, 
raw_data blob, 
PRIMARY KEY((sensor_id, date_bucket), date));
Bucketing! 
Idea: 
• composite partition key: sensor_id:date_bucket 
• tunable date granularity: per hour/per day/per month … 
@doanduyhai 
106 
sensor_id:2014091014 
date1 date2 date3 date4 … 
blob1 blob2 blob3 blob4 … 
sensor_id:2014091015 
date11 date12 date13 date14 … 
blob11 blob12 blob13 blob14 … 
Buckets
Bucketing! 
@doanduyhai 
107 
Advantage: 
• distribute load: 1 bucket = 1 node 
• limit partition width (max x columns per bucket) 
Buckets 
sensor_id:2014091014 
date1 date2 date3 date4 … 
blob1 blob2 blob3 blob4 … 
sensor_id:2014091015 
date11 date12 date13 date14 … 
blob11 blob12 blob13 blob14 …
Bucketing! 
@doanduyhai 
108 
But how can I select raw data between 14:45 and 15:10 ? 
14:45 à ? 
15:00 à 15:10 
sensor_id:2014091014 
date1 date2 date3 date4 … 
blob1 blob2 blob3 blob4 … 
sensor_id:2014091015 
date11 date12 date13 date14 … 
blob11 blob12 blob13 blob14 …
Bucketing! 
Solution 
• use IN clause on partition key component 
• with range condition on date column 
☞ date column should be monotonic function (increasing/decreasing) 
@doanduyhai 
109 
SELECT * FROM sensor_data WHERE sensor_id = xxx 
AND date_bucket IN (2014091014 , 2014091015) 
AND date >= ‘2014-09-10 14:45:00.000‘ 
AND date <= ‘2014-09-10 15:10:00.000‘
Bucketing Caveats! 
@doanduyhai 
110 
IN clause for #partition is not silver bullet ! 
• use scarcely 
• keep cardinality low (≤ 5) 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
coordinator 
n8 
sensor_id:2014091014 
sensor_id:2014091015
Bucketing Caveats! 
@doanduyhai 
111 
IN clause for #partition is not silver bullet ! 
• use scarcely 
• keep cardinality low (≤ 5) 
• prefer // async queries 
• ease of query vs perf 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
Async client 
sensor_id:2014091014 
sensor_id:2014091015
! " 
! 
Q & R
Thank You 
@doanduyhai 
duy_hai.doan@datastax.com 
https://academy.datastax.com/

More Related Content

PPTX
Apache Cassandra, part 2 – data model example, machinery
PPTX
Introduction to Apache Cassandra
PDF
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
PDF
Cassandra data structures and algorithms
PDF
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
PDF
Cassandra introduction @ ParisJUG
PDF
Deep Dive into Cassandra
PPTX
Spanner (may 19)
Apache Cassandra, part 2 – data model example, machinery
Introduction to Apache Cassandra
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra data structures and algorithms
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra introduction @ ParisJUG
Deep Dive into Cassandra
Spanner (may 19)

What's hot (20)

PPT
Python basics - for bigginers
PPTX
Spanner
PDF
Google Spanner
PPTX
Be Lazy & Scale
ODP
Beyond php - it's not (just) about the code
PDF
Teaching PostgreSQL to new people
PPTX
Synchronization
PPTX
Intro to Forth - 2018/09/13 ACM Greenville
PDF
Parallel R in snow (english after 2nd slide)
PDF
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PDF
Salmon Protocol - OpenWebTO
PDF
Password Cracking with Rainbow Tables
PDF
SGN Introduction to UNIX Command-line 2015 part 2
PDF
Gur1009
PDF
Monitoring Postgres at Scale | PGConf.ASIA 2018 | Lukas Fittl
PPTX
Spanner osdi2012
PDF
Flexible Indexing with Postgres
 
PDF
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
PDF
Advanced MySQL Query and Schema Tuning
PDF
The State of (Full) Text Search in PostgreSQL 12
Python basics - for bigginers
Spanner
Google Spanner
Be Lazy & Scale
Beyond php - it's not (just) about the code
Teaching PostgreSQL to new people
Synchronization
Intro to Forth - 2018/09/13 ACM Greenville
Parallel R in snow (english after 2nd slide)
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
Salmon Protocol - OpenWebTO
Password Cracking with Rainbow Tables
SGN Introduction to UNIX Command-line 2015 part 2
Gur1009
Monitoring Postgres at Scale | PGConf.ASIA 2018 | Lukas Fittl
Spanner osdi2012
Flexible Indexing with Postgres
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Advanced MySQL Query and Schema Tuning
The State of (Full) Text Search in PostgreSQL 12
Ad

Viewers also liked (7)

PDF
Cassandra techniques de modelisation avancee
PPTX
Extending Cassandra with Doradus OLAP for High Performance Analytics
PPTX
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
PDF
Cassandra - how to fail?
PDF
Cassandra Day Chicago 2015: Advanced Data Modeling
PDF
Overiew of Cassandra and Doradus
PDF
World’s Best Data Modeling Tool
Cassandra techniques de modelisation avancee
Extending Cassandra with Doradus OLAP for High Performance Analytics
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra - how to fail?
Cassandra Day Chicago 2015: Advanced Data Modeling
Overiew of Cassandra and Doradus
World’s Best Data Modeling Tool
Ad

Similar to Introduction to Cassandra & Data model (20)

PDF
Cassandra introduction mars jug
PDF
Cassandra introduction apache con 2014 budapest
PDF
Cassandra for the ops dos and donts
PDF
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
PDF
Cassandra introduction 2016
PDF
Cassandra introduction at FinishJUG
PDF
Cassandra introduction @ NantesJUG
PDF
Cassandra Drivers and Tools
PDF
Cassandra drivers and libraries
PDF
Cassandra nice use cases and worst anti patterns
PDF
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
PDF
Cassandra NodeJS driver & NodeJS Paris
PDF
Big data 101 for beginners devoxxpl
PDF
Cassandra introduction 2016
PDF
Big data 101 for beginners riga dev days
PDF
Vienna Feb 2015: Cassandra: How it works and what it's good for!
PDF
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
PDF
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
PDF
Spark Cassandra 2016
PPTX
Back to Basics Webinar 1 - Introduction to NoSQL
Cassandra introduction mars jug
Cassandra introduction apache con 2014 budapest
Cassandra for the ops dos and donts
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
Cassandra introduction 2016
Cassandra introduction at FinishJUG
Cassandra introduction @ NantesJUG
Cassandra Drivers and Tools
Cassandra drivers and libraries
Cassandra nice use cases and worst anti patterns
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Cassandra NodeJS driver & NodeJS Paris
Big data 101 for beginners devoxxpl
Cassandra introduction 2016
Big data 101 for beginners riga dev days
Vienna Feb 2015: Cassandra: How it works and what it's good for!
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Spark Cassandra 2016
Back to Basics Webinar 1 - Introduction to NoSQL

More from Duyhai Doan (20)

PDF
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
PDF
Le futur d'apache cassandra
PDF
Datastax enterprise presentation
PDF
Datastax day 2016 introduction to apache cassandra
PDF
Datastax day 2016 : Cassandra data modeling basics
PDF
Apache cassandra in 2016
PDF
Spark zeppelin-cassandra at synchrotron
PDF
Sasi, cassandra on full text search ride
PDF
Cassandra 3 new features @ Geecon Krakow 2016
PDF
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
PDF
Apache Zeppelin @DevoxxFR 2016
PDF
Cassandra 3 new features 2016
PDF
Spark cassandra integration 2016
PDF
Apache zeppelin the missing component for the big data ecosystem
PDF
Cassandra UDF and Materialized Views
PDF
Data stax academy
PDF
Apache zeppelin, the missing component for the big data ecosystem
PDF
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
PDF
Fast track to getting started with DSE Max @ ING
PDF
Distributed algorithms for big data @ GeeCon
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Le futur d'apache cassandra
Datastax enterprise presentation
Datastax day 2016 introduction to apache cassandra
Datastax day 2016 : Cassandra data modeling basics
Apache cassandra in 2016
Spark zeppelin-cassandra at synchrotron
Sasi, cassandra on full text search ride
Cassandra 3 new features @ Geecon Krakow 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016
Cassandra 3 new features 2016
Spark cassandra integration 2016
Apache zeppelin the missing component for the big data ecosystem
Cassandra UDF and Materialized Views
Data stax academy
Apache zeppelin, the missing component for the big data ecosystem
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Fast track to getting started with DSE Max @ ING
Distributed algorithms for big data @ GeeCon

Recently uploaded (20)

PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPT
Teaching material agriculture food technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
August Patch Tuesday
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Unlocking AI with Model Context Protocol (MCP)
Heart disease approach using modified random forest and particle swarm optimi...
Teaching material agriculture food technology
Encapsulation_ Review paper, used for researhc scholars
Assigned Numbers - 2025 - Bluetooth® Document
SOPHOS-XG Firewall Administrator PPT.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TLE Review Electricity (Electricity).pptx
NewMind AI Weekly Chronicles - August'25-Week II
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Getting Started with Data Integration: FME Form 101
August Patch Tuesday
A comparative analysis of optical character recognition models for extracting...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Univ-Connecticut-ChatGPT-Presentaion.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Unlocking AI with Model Context Protocol (MCP)

Introduction to Cassandra & Data model

  • 1. Introduction to Cassandra DuyHai DOAN, Technical Advocate @doanduyhai
  • 2. Shameless self-promotion! @doanduyhai 2 Duy Hai DOAN Cassandra technical advocate • talks, meetups, confs • open-source devs (Achilles, …) • Europe technical point of contact ☞ duy_hai.doan@datastax.com • production troubleshooting
  • 3. Datastax! @doanduyhai 3 • Founded in April 2010 • We drive Apache Cassandra™ • 400+ customers (25 of the Fortune 100), 200+ employees • Home to Cassandra chair & most committers (≈80%) • Headquartered in San Francisco Bay area • EU headquarters in London, offices in France and Germany
  • 4. Agenda! @doanduyhai 4 Architecture • Cluster, Replication, Consistency Data model • Last Write Win (LWW), CQL basics, From SQL to CQL Dev Center Demo DSE overview CQL In Depth (time permitted)
  • 5. Cassandra history! @doanduyhai 5 NoSQL database • created at Facebook • open-sourced since 2008 • current version = 2.1 • column-oriented ☞ distributed table
  • 6. Cassandra 5 key facts! @doanduyhai 6 Linear scalability Small & « huge » scale • 2 à1k+ nodes cluster • 3Gb à Pb+
  • 7. Cassandra 5 key facts! @doanduyhai 7 Continuous availability (≈100% up-time) • resilient architecture (Dynamo) • rolling upgrades • data backward compatible n/n+1 versions
  • 8. Cassandra 5 key facts! @doanduyhai 8 Multi-data centers • out-of-the-box (config only) • AWS conf for multi-region DCs • GCE/CloudStack support • resilience, work-load segregation • virtual data-centers
  • 9. Cassandra 5 key facts! @doanduyhai 9 Operational simplicity • 1 node = 1 process + 1 config file • deployment automation • OpsCenter for monitoring
  • 10. Cassandra 5 key facts! @doanduyhai 10 Analytics combo • Cassandra + Spark = awesome ! • realtime streaming
  • 11. Cassandra architecture! Cluster Replication Consistency
  • 12. Cassandra architecture! @doanduyhai 12 Cluster layer • Amazon DynamoDB paper • masterless architecture Data-store layer • Google Big Table paper • Columns/columns family
  • 13. Cassandra architecture! @doanduyhai 13 API (CQL & RPC) CLUSTER (DYNAMO) DATA STORE (BIG TABLES) DISKS Node1 Client request API (CQL & RPC) CLUSTER (DYNAMO) DATA STORE (BIG TABLES) DISKS Node2
  • 14. Data distribution! @doanduyhai 14 Random: hash of #partition → token = hash(#p) Hash: ]-X, X] X = huge number (264/2) n1 n2 n3 n4 n5 n6 n7 n8
  • 15. Token Ranges! @doanduyhai 15 A: ]0, X/8] B: ] X/8, 2X/8] C: ] 2X/8, 3X/8] D: ] 3X/8, 4X/8] E: ] 4X/8, 5X/8] F: ] 5X/8, 6X/8] G: ] 6X/8, 7X/8] H: ] 7X/8, X] n1 n2 n3 n4 n5 n6 n7 n8 A B C D E F G H
  • 16. Linear scalability! @doanduyhai 16 n1 n2 8 nodes 10 nodes n3 n4 n5 n6 n7 n8 n1 n2 n3 n4 n5 n6 n7 n9 n8 n10
  • 17. Failure tolerance! @doanduyhai 17 Replication Factor (RF) = 3 n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 {B, A, H} {C, B, A} {D, C, B} A B C D E F G H
  • 18. Coordinator node! Incoming requests (read/write) Coordinator node handles the request Every node can be coordinator àmasterless @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator request
  • 19. Consistency! @doanduyhai 19 Tunable at runtime • ONE • QUORUM (strict majority w.r.t. RF) • ALL Apply both to read & write
  • 20. Write consistency! Write ONE • write request to all replicas in // @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 21. Write consistency! Write ONE • write request to all replicas in // • wait for ONE ack before returning to client @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs
  • 22. Write consistency! Write ONE • write request to all replicas in // • wait for ONE ack before returning to client • other acks later, asynchronously @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs 10 μs 120 μs
  • 23. Write consistency! Write QUORUM • write request to all replicas in // • wait for QUORUM acks before returning to client • other acks later, asynchronously @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs 10 μs 120 μs
  • 24. Read consistency! Read ONE • read from one node among all replicas @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 25. Read consistency! Read ONE • read from one node among all replicas • contact the fastest node (stats) @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 26. Read consistency! Read QUORUM • read from one fastest node @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 27. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other replicas to reach QUORUM @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 28. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other replicas to reach QUORUM • return most up-to-date data to client @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 29. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other replicas to reach QUORUM • return most up-to-date data to client • repair if digest mismatch n1 @doanduyhai n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 31. Consistency in action! @doanduyhai 31 RF = 3, Write ONE, Read ONE B A A B A A Read ONE: A data replication in progress … Write ONE: B
  • 32. Consistency in action! @doanduyhai 32 RF = 3, Write ONE, Read QUORUM B A A Write ONE: B Read QUORUM: A data replication in progress … B A A
  • 33. Consistency in action! @doanduyhai 33 RF = 3, Write ONE, Read ALL B A A Read ALL: B data replication in progress … B A A Write ONE: B
  • 34. Consistency in action! @doanduyhai 34 RF = 3, Write QUORUM, Read ONE B B A Write QUORUM: B Read ONE: A data replication in progress … B B A
  • 35. Consistency in action! @doanduyhai 35 RF = 3, Write QUORUM, Read QUORUM B B A Read QUORUM: B data replication in progress … B B A Write QUORUM: B
  • 36. Consistency level! @doanduyhai 36 ONE Fast, may not read latest written value
  • 37. Consistency level! @doanduyhai 37 QUORUM Strict majority w.r.t. Replication Factor Good balance
  • 38. Consistency level! @doanduyhai 38 ALL Paranoid Slow, no high availability
  • 39. Consistency summary! ONERead + ONEWrite ☞ available for read/write even (N-1) replicas down QUORUMRead + QUORUMWrite ☞ available for read/write even 1+ replica down @doanduyhai 39
  • 40. ! " ! Q & R
  • 41. Data model! Cassandra Write Path! Last Write Win! CQL basics! From SQL to CQL!
  • 42. Cassandra Write Path! @doanduyhai 42 Commit log1 . . . 1 Commit log2 Commit logn Memory
  • 43. Cassandra Write Path! @doanduyhai 43 Memory MemTable Table1 Commit log1 . . . 1 Commit log2 Commit logn MemTable Table2 MemTable TableN 2 . . .
  • 44. Cassandra Write Path! @doanduyhai 44 Commit log1 Commit log2 Commit logn Table1 Table2 Table3 SStable2 SStable3 3 SStable1 Memory . . .
  • 45. Cassandra Write Path! @doanduyhai 45 MemTable . . . Memory Table1 Commit log1 Commit log2 Commit logn Table1 SStable1 Table2 Table3 SStable2 SStable3 MemTable Table2 MemTable TableN . . .
  • 46. Cassandra Write Path! @doanduyhai 46 Commit log1 Commit log2 SStable3 . . . Commit logn Table1 SStable1 Memory Table2 Table3 SStable2 SStable3 SStable1 SStable2
  • 47. Last Write Win (LWW)! INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); @doanduyhai 47 jdoe age name 33 John DOE #partition
  • 48. Last Write Win (LWW)! @doanduyhai jdoe age (t1) name (t1) 33 John DOE 48 INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); auto-generated timestamp (μs) .
  • 49. Last Write Win (LWW)! @doanduyhai 49 UPDATE users SET age = 34 WHERE login = jdoe; jdoe SSTable1 SSTable2 age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 50. Last Write Win (LWW)! @doanduyhai 50 DELETE age FROM users WHERE login = jdoe; tombstone SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 51. Last Write Win (LWW)! @doanduyhai 51 SELECT age FROM users WHERE login = jdoe; ? ? ? SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 52. Last Write Win (LWW)! @doanduyhai 52 SELECT age FROM users WHERE login = jdoe; ✕ ✕ ✓ SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 53. Compaction! @doanduyhai 53 SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 New SSTable jdoe age (t3) name (t1) ý John DOE
  • 54. CRUD operations! @doanduyhai 54 INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); UPDATE users SET age = 34 WHERE login = jdoe; DELETE age FROM users WHERE login = jdoe; SELECT age FROM users WHERE login = jdoe;
  • 55. Simple Table! @doanduyhai 55 CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login)); partition key (#partition)
  • 56. Clustered table! @doanduyhai 56 CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id)); partition key clustering column unicity
  • 57. Queries! @doanduyhai 57 Get message by user and message_id (date) SELECT * FROM mailbox WHERE login = jdoe and message_id = ‘2014-09-25 16:00:00’; Get message by user and date interval SELECT * FROM mailbox WHERE login = jdoe and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
  • 58. Queries! @doanduyhai 58 Get message by message_id only (#partition not provided) SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’; Get message by date interval (#partition not provided) SELECT * FROM mailbox WHERE and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
  • 59. Queries! Get message by user range (range query on #partition) Get message by user pattern (non exact match on #partition) @doanduyhai 59 SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe; SELECT * FROM mailbox WHERE login like ‘%doe%‘;
  • 60. WHERE clause restrictions! @doanduyhai 60 All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed • ☞ full cluster scan On clustering columns, only range queries (<, ≤, >, ≥) and exact match WHERE clause only possible on columns defined in PRIMARY KEY
  • 61. WHERE clause restrictions! @doanduyhai 61 What if I want to perform « arbitrary » WHERE clause ? • search form scenario, dynamic search fields
  • 62. WHERE clause restrictions! @doanduyhai 62 What if I want to perform « arbitrary » WHERE clause ? • search form scenario, dynamic search fields ☞ Apache Solr (Lucene) integration (DSE) SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND sex:male’; SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
  • 63. Collections & maps! @doanduyhai 63 CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY(login));
  • 64. User Defined Type (UDT)! Instead of @doanduyhai 64 CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, … PRIMARY KEY(login));
  • 65. User Defined Type (UDT)! @doanduyhai 65 CREATE TYPE address ( street_number int, street_name text, postcode int, country text); CREATE TABLE users ( login text, … location frozen <address>, … PRIMARY KEY(login));
  • 66. UDT insert! @doanduyhai 66 INSERT INTO users(login,name, location) VALUES ( ‘jdoe’, ’John DOE’, { ‘street_number’: 124, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ });
  • 67. UDT update! @doanduyhai 67 UPDATE users set location = { ‘street_number’: 125, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ } WHERE login = jdoe;
  • 68. From SQL to CQL! @doanduyhai 68 Remember…
  • 69. From SQL to CQL! @doanduyhai 69 Remember… CQL is not SQL
  • 70. From SQL to CQL! @doanduyhai 70 Remember… there is no join (do you want to scale ?)
  • 71. From SQL to CQL! @doanduyhai 71 Remember… there is no integrity constraint (do you want to read-before-write ?)
  • 72. From SQL to CQL! @doanduyhai 72 Paradigm change • space is cheap (somehow …), latency is precious • embrace immutability • think query first • denormalize !!!
  • 73. From SQL to CQL! @doanduyhai 73 Normalized User 1 n Comment CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author_id text, // typical join id content text, PRIMARY KEY((article_id), comment_id));
  • 74. From SQL to CQL! @doanduyhai 74 De-normalized User 1 n Comment CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author person, // person is UDT content text, PRIMARY KEY((article_id), comment_id));
  • 75. Data modeling best practices! @doanduyhai 75 Start by queries • identify core functional read paths • 1 read scenario ≈ 1 SELECT
  • 76. Data modeling best practices! @doanduyhai 76 Start by queries • identify core functional read paths • 1 read scenario ≈ 1 SELECT Denormalize • wisely, only duplicate necessary & immutable data • functional/technical trade-off
  • 77. Data modeling best practices! @doanduyhai 77 Person UDT - firstname/lastname - date of birth - gender - mood - location
  • 78. Data modeling best practices! @doanduyhai 78 John DOE, male birthdate: 21/02/1981 subscribed since 03/06/2011 ☉ San Mateo, CA ’’Impossible is not John DOE’’ Full detail read from User table on click
  • 79. ! " ! Q & R
  • 81. DSE (Datastax Enterprise)! @doanduyhai 81 Security Analytics (Spark & Hadoop) Search (Solr)
  • 83. Training Day | December 3rd Beginner Track • Introduction to Cassandra • Introduction to Spark, Shark, Scala and Cassandra Advanced Track • Data Modeling • Performance Tuning Conference Day | December 4th Cassandra Summit Europe 2014 will be the single largest gathering of Cassandra users in Europe. Learn how the world's most successful companies are transforming their businesses and growing faster than ever using Apache Cassandra. http://bit.ly/cassandrasummit2014 @doanduyhai Company Confidential 83
  • 84. CQL In Depth! Simple Table! Clustered Table! Bucketing!
  • 85. Storage Engine! @doanduyhai 85 #partition1 #col1 #col2 #col3 #col4 cell1 cell2 cell3 cell4 #partition2 #col1 #col2 #col3 cell1 cell2 cell3 #partition3 #col1 #col2 cell1 cell2 #partition4 #col1 #col2 #col3 #col4 … cell1 cell2 cell3 cell4 … Partition Key Column Name Cell
  • 86. Data Model Abstraction! @doanduyhai 86 Table ≈ Map<#p,SortedMap<#col,cell>> SortedMap<token,…>
  • 87. Data Model Abstraction! @doanduyhai 87 Table ≈ Map<#p,SortedMap<#col,cell>> SortedMap<#col,cell>> ! ! SortedMap<token,…> Unicity Sort
  • 88. Static Data Type! Partition Key Type Column Name Type Cell Type @doanduyhai 88 Table ≈ Map<#p,SortedMap<#col,cell>> Native types: bigint, blob, counter, decimal, double, float, inet, int, timestamp, timeuuid, uuid.
  • 89. Simple Table Mapping! @doanduyhai 89 CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login)); Map<login,SortedMap<column_label,value>>! text text blob
  • 90. Simple Table Mapping! @doanduyhai 90 INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); RowKey: jdoe => (name=, value=, timestamp=1412419763515000) => (name=age, value=00000021, timestamp=1412419763515000) => (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) RowKey: hsue => (name=, value=, timestamp=1412419776578000) => (name=age, value=0000001c, timestamp=1412419776578000) => (name=name, value=48656c656e20535545, timestamp=1412419776578000)!
  • 91. Simple Table Mapping! @doanduyhai 91 INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); RowKey: jdoe => (name=, value=, timestamp=1412419763515000) => (name=age, value=00000021, timestamp=1412419763515000) => (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) RowKey: hsue => (name=, value=, timestamp=1412419776578000) => (name=age, value=0000001c, timestamp=1412419776578000) => (name=name, value=48656c656e20535545, timestamp=1412419776578000)! Marker column
  • 92. Simple Table Mapping! @doanduyhai 92 INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); RowKey: jdoe => (name=, value=, timestamp=1412419763515000) => (name=age, value=00000021, timestamp=1412419763515000) => (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) RowKey: hsue => (name=, value=, timestamp=1412419776578000) => (name=age, value=0000001c, timestamp=1412419776578000) => (name=name, value=48656c656e20535545, timestamp=1412419776578000)! Sorted column_label
  • 93. Simple Table Mapping! @doanduyhai 93 INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); RowKey: jdoe => (name=, value=, timestamp=1412419763515000) => (name=age, value=00000021, timestamp=1412419763515000) => (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) RowKey: hsue => (name=, value=, timestamp=1412419776578000) => (name=age, value=0000001c, timestamp=1412419776578000) => (name=name, value=48656c656e20535545, timestamp=1412419776578000)! Values as bytes
  • 94. Clustered Table Mapping! @doanduyhai 94 CREATE TABLE daily_3g_quality_per_city ( operator text, city text, date int, //date as YYYYMMdd latency_ms int, power_watt double, PRIMARY KEY((operator), city, date); Map<operator, SortedMap<city, SortedMap<date, SortedMap<column_label,value>>>>!
  • 95. Clustered Table Mapping! @doanduyhai 95 RowKey: verizon => (name=Austin:20140910:, value=, timestamp=…) => (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) => (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) => (name=Austin:20140911:, value=, timestamp=…) => (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) => (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) => (name=New York:20140913:, value=, timestamp=1412422893832000) => (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) => (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) => (name=New York:20140917:, value=, timestamp=…) => (name=New York:20140917:latency_ms, value=00000067, timestamp=…) => (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…)
  • 96. Clustered Table Mapping! @doanduyhai 96 RowKey: verizon => (name=Austin:20140910:, value=, timestamp=…) => (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) => (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) => (name=Austin:20140911:, value=, timestamp=…) => (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) => (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) => (name=New York:20140913:, value=, timestamp=1412422893832000) => (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) => (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) => (name=New York:20140917:, value=, timestamp=…) => (name=New York:20140917:latency_ms, value=00000067, timestamp=…) => (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) Sort first by city
  • 97. Clustered Table Mapping! @doanduyhai 97 RowKey: verizon => (name=Austin:20140910:, value=, timestamp=…) => (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) => (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) => (name=Austin:20140911:, value=, timestamp=…) => (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) => (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) => (name=New York:20140913:, value=, timestamp=1412422893832000) => (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) => (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) => (name=New York:20140917:, value=, timestamp=…) => (name=New York:20140917:latency_ms, value=00000067, timestamp=…) => (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) … then by date
  • 98. Clustered Table Mapping! @doanduyhai 98 RowKey: verizon => (name=Austin:20140910:, value=, timestamp=…) => (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) => (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) => (name=Austin:20140911:, value=, timestamp=…) => (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) => (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) => (name=New York:20140913:, value=, timestamp=1412422893832000) => (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) => (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) => (name=New York:20140917:, value=, timestamp=…) => (name=New York:20140917:latency_ms, value=00000067, timestamp=…) => (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) … then by column_label
  • 99. Query With Clustered Table! Select by operator and city for all dates Select by operator and city range for all dates @doanduyhai 99 SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city = ‘Austin’ SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city >= ‘Austin’ AND city <= ‘New York’
  • 100. Query With Clustered Table! Select by operator and city and date Select by operator and city and range of date @doanduyhai 100 SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date = 20140910 SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date >= 20140910 AND date <= 20140913
  • 101. Query With Clustered Table! @doanduyhai 101 Select by operator and city and date tuples SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date IN (20140910, 20140913)
  • 102. Query With Clustered Table! @doanduyhai 102 Select by operator and date without city SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND date = 20140910 Map<operator, SortedMap<city, SortedMap<date, SortedMap<column_label,value>>>>!
  • 103. Bucketing! @doanduyhai 103 CREATE TABLE sensor_data ( sensor_id text, date timestamp, raw_data blob, PRIMARY KEY(sensor_id, date)); sensor_id date1 date2 date3 date4 … blob1 blob2 blob3 blob4 …
  • 104. Bucketing! @doanduyhai 104 Problems: • limit of 2.109 physical columns • bad load balancing (1 user = 1 node) • wide row spans over many files sensor_id date1 date2 date3 date4 … blob1 blob2 blob3 blob4 …
  • 105. Bucketing! @doanduyhai 105 Idea: • composite partition key: sensor_id:date_bucket • tunable date granularity: per hour/per day/per month … CREATE TABLE sensor_data ( sensor_id text, date_bucket int, //format YYYYMMdd date timestamp, raw_data blob, PRIMARY KEY((sensor_id, date_bucket), date));
  • 106. Bucketing! Idea: • composite partition key: sensor_id:date_bucket • tunable date granularity: per hour/per day/per month … @doanduyhai 106 sensor_id:2014091014 date1 date2 date3 date4 … blob1 blob2 blob3 blob4 … sensor_id:2014091015 date11 date12 date13 date14 … blob11 blob12 blob13 blob14 … Buckets
  • 107. Bucketing! @doanduyhai 107 Advantage: • distribute load: 1 bucket = 1 node • limit partition width (max x columns per bucket) Buckets sensor_id:2014091014 date1 date2 date3 date4 … blob1 blob2 blob3 blob4 … sensor_id:2014091015 date11 date12 date13 date14 … blob11 blob12 blob13 blob14 …
  • 108. Bucketing! @doanduyhai 108 But how can I select raw data between 14:45 and 15:10 ? 14:45 à ? 15:00 à 15:10 sensor_id:2014091014 date1 date2 date3 date4 … blob1 blob2 blob3 blob4 … sensor_id:2014091015 date11 date12 date13 date14 … blob11 blob12 blob13 blob14 …
  • 109. Bucketing! Solution • use IN clause on partition key component • with range condition on date column ☞ date column should be monotonic function (increasing/decreasing) @doanduyhai 109 SELECT * FROM sensor_data WHERE sensor_id = xxx AND date_bucket IN (2014091014 , 2014091015) AND date >= ‘2014-09-10 14:45:00.000‘ AND date <= ‘2014-09-10 15:10:00.000‘
  • 110. Bucketing Caveats! @doanduyhai 110 IN clause for #partition is not silver bullet ! • use scarcely • keep cardinality low (≤ 5) n1 n2 n3 n4 n5 n6 n7 coordinator n8 sensor_id:2014091014 sensor_id:2014091015
  • 111. Bucketing Caveats! @doanduyhai 111 IN clause for #partition is not silver bullet ! • use scarcely • keep cardinality low (≤ 5) • prefer // async queries • ease of query vs perf n1 n2 n3 n4 n5 n6 n7 n8 Async client sensor_id:2014091014 sensor_id:2014091015
  • 112. ! " ! Q & R
  • 113. Thank You @doanduyhai duy_hai.doan@datastax.com https://academy.datastax.com/