Cassandra lesson learned - extended

Cassandra - lesson learned
Andrzej Ludwikowski

About me?
- www.ludwikowski.info
- github.com/aludwiko
- @aludwikowski
-

Why Cassandra?
- BigData!!!
- Volume (petabytes of data, trillions of entities)
- Velocity (real-time, streams, millions of transactions per second)
- Variety (un-, semi-, structured)
- writes are cheap, reads are ???
- near-linear horizontal scaling (in a proper use cases)
- fully distributed, with no single point of failure
- data replication by default

Cassandra vs CAP?
- CAP Theorem - pick two

Write path
Node 1
Node 2
Node 3
Node 4
Client
(driver)

Write path
Node 1
Node 2
Node 3
Node 4
Client
(driver)
- Any node can coordinate any request (NSPOF)

- Replication Factor
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3

- Replication Factor
- Consistency Level
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=2

- Token ring from -2^63 to 2^64
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
0100

- Partitioner: partition key -> token
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77

- Partitioner: primary key -> token
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77

- Partitioner: primary key -> token
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
77
77

Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77

- Hinted handoff
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77

- Hinted handoff
- Retry idempotent inserts
- build-in policies
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77

- Hinted handoff
- build-in policies
- Lightweight transactions (Paxos)
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77

- Hinted handoff
- build-in policies
- Lightweight transactions (Paxos)
- Batches
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77

Write path - why so fast?
- Commit log - append only

50,000 t/s
50 t/ms
5 t/100us
1 t/20us

- Periodic (10s) or batch sync to disk
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2

D
asdd
R
ack
2
R
ack
1
- Periodic or batch sync to disk
- Network topology aware
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2

Client
- Periodic or batch sync to disk
- Network topology aware
Asia DC
Europe DC

- Most recent win
- Eager retries
- In-memory
- MemTable
- Row Cache
- Bloom Filters
- Key Caches
- Partition Summaries
- On disk
- Partition Indexes
- SSTables
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=3
Read path
timestamp 67
timestamp 99
timestamp 88

Immediate vs. Eventual Consistency
- if (writeCL + readCL) > replication_factor then immediate consistency
- writeCL=ALL, readCL=1
- writeCL=1, readCL=ALL
- writeCL,readCL=QUORUM
- https://www.ecyrd.com/cassandracalculator/
Node 1
Node 2
Node 3
Node 4
Client
RF=3

Modeling - new mindset
- QDD, Query Driven Development
- Nesting is ok
- Duplication is ok
- Writes are cheap
no joins

QDD - Conceptual model
- Technology independent
- Chen notation

QDD - Logical model
- Chebotko diagram

QDD - Physical model
- Technology dependent
- Analysis and validation (finding problems)
- Physical optimization (fixing problems)
- Data types

Physical storage
- Primary key
- Partition key
CREATE TABLE videos (
id int,
title text,
runtime int,
year int,
PRIMARY KEY (id)
);
id | title | runtime | year
----+---------------------+---------+------
1 | dzien swira | 93 | 2002
2 | chlopaki nie placza | 96 | 2000
3 | psy | 104 | 1992
4 | psy 2 | 96 | 1994
1
title runtime year
dzien swira 93 2002
2
title runtime year
chlopaki... 96 2000
3
title runtime year
psy 104 1992
4
title runtime year
psy 2 96 1994
SELECT FROM videos
WHERE title = ‘dzien swira’

Physical storage
CREATE TABLE videos_with_clustering (
title text,
runtime int,
year int,
PRIMARY KEY ((title), year)
);
- Primary key (could be compound)
- Partition key
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla
1954 runtime
98
1998 runtime
140
2014 runtime
123
1992 runtime
104
psy
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’;
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’ AND year > 1998;

Physical storage
CREATE TABLE videos_with_composite_pk(
title text,
runtime int,
year int,
PRIMARY KEY ((title, year))
);
- Primary key (could be compound)
- Partition key (could be composite)
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla:1954
runtime
93
godzilla:1998
runtime
140
godzilla:2014
runtime
123
psy:1992
runtime
104
SELECT FROM videos_with_composite_pk
WHERE title = ‘godzilla’
AND year = 1954

Modeling - clustering column(s)
Q: Retrieve videos an actor has appeared in (newest first).

CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );

actor text,
video_id timeuuid,
description text,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date)
) WITH CLUSTERING ORDER BY (added_date desc);

actor text,
video_id timeuuid,
description text,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id)

actor text,
video_id timeuuid,
description text,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id, character_name)

Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve last 1000 measurement from given day.

date text,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
) WITH CLUSTERING ORDER BY (event_time desc);

date text,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
1 day = 86 400 rows
1 week = 604 800 rows
1 month = 2 592 000 rows
1 year = 31 536 000 rows

date text,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)

Modeling - TTL
date text,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
Retention policy - keep data only from last week.
INSERT INTO temperature_by_day … USING TTL 604800;

Modeling - bit map index
CREATE TABLE car (
year timestamp,
model text,
color timestamp,
vehicle_id int,
//other columns
PRIMARY KEY ((year, model, color), vehicle_id)
);
Q: Find car by year and/or model and/or color.
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...);
SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;

Modeling - wide rows
CREATE TABLE user (
email text,
name text,
age int,
PRIMARY KEY (email)
);
Q: Find user by email.

Modeling - wide rows
CREATE TABLE user (
domain text,
user text,
name text,
age int,
PRIMARY KEY ((domain), user)
);
Q: Find user by email.

Modeling - versioning with lightweight transactions
CREATE TABLE document (
id text,
content text,
version int,
locked_by text,
PRIMARY KEY ((id))
);
INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1)
IF NOT EXISTS;
UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null;
UPDATE document SET content = 'better content', version = 2, locked_by = null
WHERE id = 'my doc' IF locked_by = 'andrzej';

Modeling - JSON with UDT and tuples
{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": “andrzej”,
"lastName": “ludwikowski”,
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
},
“x_dimension”: “1”,
“y_dimension”: “2”,
}
CREATE TYPE age (
description text,
type int,
minimum int
);
CREATE TYPE prop (
firstName text,
lastName text,
age frozen <age>
);
CREATE TABLE json (
title text,
type text,
properties list<frozen <prop>>,
dimensions tuple<int, int>
PRIMARY KEY (title)
);

Common use cases
- Sensor data (Zonar)
- Fraud detection (Barracuda)
- Playlist and collections (Spotify)
- Personalization and recommendation engines (Ebay)
- Messaging (Instagram)
- Event Sourcing!

Common anti use cases
- Queue
- Search engine

Tombstones
- Understanding Cassandra tombstones

Datastax Academy
- Introduction to Apache Cassandra
- Data Modeling
- DataStax Enterprise Foundations of Apache Cassandra
- DataStax Enterprise Operations with Apache Cassandra
- DataStax Enterprise Search
- DataStax Enterprise Analytics with Apache Spark
- DataStax Enterprise Graph

Competition?
ScyllaDB
- Cassandra without JVM
- same protocol, SSTable compatibility
- C++ and Seastar lib
- 1,000,000 IOPS

Not covered
- schema migrations
- backups
- DSE

Cassandra lesson learned - extended

Cassandra lesson learned - extended

More Related Content

What's hot (20)

Similar to Cassandra lesson learned - extended (20)

More from Andrzej Ludwikowski (10)

Recently uploaded (20)

Cassandra lesson learned - extended