SlideShare a Scribd company logo
Cassandra - lesson learned
Andrzej Ludwikowski
About me?
- www.ludwikowski.info
- github.com/aludwiko
- @aludwikowski
-
Why Cassandra?
- BigData!!!
- Volume (petabytes of data, trillions of entities)
- Velocity (real-time, streams, millions of transactions per second)
- Variety (un-, semi-, structured)
- writes are cheap, reads are ???
- near-linear horizontal scaling (in a proper use cases)
- fully distributed, with no single point of failure
- data replication by default
Cassandra vs CAP?
- CAP Theorem - pick two
Cassandra vs CAP?
- CAP Theorem - pick two
Cassandra vs CAP?
- CAP Theorem - pick two
Origins?
2010
Name?
Name?
Write path
Node 1
Node 2
Node 3
Node 4
Client
(driver)
Write path
Node 1
Node 2
Node 3
Node 4
Client
(driver)
- Any node can coordinate any request (NSPOF)
- Any node can coordinate any request (NSPOF)
- Replication Factor
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
- Any node can coordinate any request (NSPOF)
- Replication Factor
- Consistency Level
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=2
- Token ring from -2^63 to 2^64
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
0100
- Token ring from -2^63 to 2^64
- Partitioner: partition key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
- Token ring from -2^63 to 2^64
- Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
- Token ring from -2^63 to 2^64
- Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
77
77
DEMO
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
- Lightweight transactions (Paxos)
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
- Lightweight transactions (Paxos)
- Batches
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
Write path - node level
Write path - why so fast?
- Commit log - append only
Write path - why so fast?
Write path - why so fast?
50,000 t/s
50 t/ms
5 t/100us
1 t/20us
Write path - why so fast?
- Commit log - append only
- Periodic (10s) or batch sync to disk
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2
D
asdd
R
ack
2
R
ack
1
Write path - why so fast?
- Commit log - append only
- Periodic or batch sync to disk
- Network topology aware
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2
Write path - why so fast?
Client
- Commit log - append only
- Periodic or batch sync to disk
- Network topology aware
Asia DC
Europe DC
- Most recent win
- Eager retries
- In-memory
- MemTable
- Row Cache
- Bloom Filters
- Key Caches
- Partition Summaries
- On disk
- Partition Indexes
- SSTables
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=3
Read path
timestamp 67
timestamp 99
timestamp 88
Immediate vs. Eventual Consistency
- if (writeCL + readCL) > replication_factor then immediate consistency
- writeCL=ALL, readCL=1
- writeCL=1, readCL=ALL
- writeCL,readCL=QUORUM
- https://www.ecyrd.com/cassandracalculator/
Node 1
Node 2
Node 3
Node 4
Client
RF=3
Modeling - new mindset
- QDD, Query Driven Development
- Nesting is ok
- Duplication is ok
- Writes are cheap
no joins
QDD - Conceptual model
- Technology independent
- Chen notation
QDD - Application workflow
QDD - Logical model
- Chebotko diagram
QDD - Physical model
- Technology dependent
- Analysis and validation (finding problems)
- Physical optimization (fixing problems)
- Data types
Physical storage
- Primary key
- Partition key
CREATE TABLE videos (
id int,
title text,
runtime int,
year int,
PRIMARY KEY (id)
);
id | title | runtime | year
----+---------------------+---------+------
1 | dzien swira | 93 | 2002
2 | chlopaki nie placza | 96 | 2000
3 | psy | 104 | 1992
4 | psy 2 | 96 | 1994
1
title runtime year
dzien swira 93 2002
2
title runtime year
chlopaki... 96 2000
3
title runtime year
psy 104 1992
4
title runtime year
psy 2 96 1994
SELECT FROM videos
WHERE title = ‘dzien swira’
Physical storage
CREATE TABLE videos_with_clustering (
title text,
runtime int,
year int,
PRIMARY KEY ((title), year)
);
- Primary key (could be compound)
- Partition key
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla
1954 runtime
98
1998 runtime
140
2014 runtime
123
1992 runtime
104
psy
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’;
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’ AND year > 1998;
Physical storage
CREATE TABLE videos_with_composite_pk(
title text,
runtime int,
year int,
PRIMARY KEY ((title, year))
);
- Primary key (could be compound)
- Partition key (could be composite)
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla:1954
runtime
93
godzilla:1998
runtime
140
godzilla:2014
runtime
123
psy:1992
runtime
104
SELECT FROM videos_with_composite_pk
WHERE title = ‘godzilla’
AND year = 1954
Modeling - clustering column(s)
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id, character_name)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve last 1000 measurement from given day.
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
1 day = 86 400 rows
1 week = 604 800 rows
1 month = 2 592 000 rows
1 year = 31 536 000 rows
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
Modeling - TTL
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Retention policy - keep data only from last week.
INSERT INTO temperature_by_day … USING TTL 604800;
Modeling - bit map index
CREATE TABLE car (
year timestamp,
model text,
color timestamp,
vehicle_id int,
//other columns
PRIMARY KEY ((year, model, color), vehicle_id)
);
Q: Find car by year and/or model and/or color.
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...);
SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
Modeling - wide rows
CREATE TABLE user (
email text,
name text,
age int,
PRIMARY KEY (email)
);
Q: Find user by email.
Modeling - wide rows
CREATE TABLE user (
domain text,
user text,
name text,
age int,
PRIMARY KEY ((domain), user)
);
Q: Find user by email.
Modeling - versioning with lightweight transactions
CREATE TABLE document (
id text,
content text,
version int,
locked_by text,
PRIMARY KEY ((id))
);
INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1)
IF NOT EXISTS;
UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null;
UPDATE document SET content = 'better content', version = 2, locked_by = null
WHERE id = 'my doc' IF locked_by = 'andrzej';
Modeling - JSON with UDT and tuples
{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": “andrzej”,
"lastName": “ludwikowski”,
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
},
“x_dimension”: “1”,
“y_dimension”: “2”,
}
CREATE TYPE age (
description text,
type int,
minimum int
);
CREATE TYPE prop (
firstName text,
lastName text,
age frozen <age>
);
CREATE TABLE json (
title text,
type text,
properties list<frozen <prop>>,
dimensions tuple<int, int>
PRIMARY KEY (title)
);
Common use cases
- Sensor data (Zonar)
- Fraud detection (Barracuda)
- Playlist and collections (Spotify)
- Personalization and recommendation engines (Ebay)
- Messaging (Instagram)
- Event Sourcing!
Common anti use cases
- Queue
- Search engine
Tombstones
- Understanding Cassandra tombstones
Datastax Academy
- Introduction to Apache Cassandra
- Data Modeling
- DataStax Enterprise Foundations of Apache Cassandra
- DataStax Enterprise Operations with Apache Cassandra
- DataStax Enterprise Search
- DataStax Enterprise Analytics with Apache Spark
- DataStax Enterprise Graph
Competition?
ScyllaDB
- Cassandra without JVM
- same protocol, SSTable compatibility
- C++ and Seastar lib
- 1,000,000 IOPS
Not covered
- schema migrations
- backups
- DSE
Cassandra   lesson learned  - extended
About me?
- www.ludwikowski.info
- github.com/aludwiko
- @aludwikowski
-

More Related Content

PDF
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
PDF
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PDF
Altinity Quickstart for ClickHouse
PDF
Advanced fulltext search with Sphinx
PPTX
Advantages of Cassandra's masterless architecture
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Quickstart for ClickHouse
Advanced fulltext search with Sphinx
Advantages of Cassandra's masterless architecture

What's hot (20)

PDF
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
PDF
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
PDF
ClickHouse Features for Advanced Users, by Aleksei Milovidov
PDF
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
PDF
Fulltext engine for non fulltext searches
PDF
ClickHouse Materialized Views: The Magic Continues
PDF
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
PDF
Leveraging Hadoop for Legacy Systems
PDF
Bh ad-12-stealing-from-thieves-saher-slides
PDF
Fun with click house window functions webinar slides 2021-08-19
DOCX
financial analytics of AAPL_stock markets
PDF
Go Programming Patterns
PDF
Cassandra
PDF
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
PDF
Artimon - Apache Flume (incubating) NYC Meetup 20111108
PDF
Tiered storage intro. By Robert Hodges, Altinity CEO
PDF
This is not your father's monitoring.
PDF
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
PDF
ClickHouse materialized views - a secret weapon for high performance analytic...
PDF
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
ClickHouse Features for Advanced Users, by Aleksei Milovidov
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
Fulltext engine for non fulltext searches
ClickHouse Materialized Views: The Magic Continues
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Leveraging Hadoop for Legacy Systems
Bh ad-12-stealing-from-thieves-saher-slides
Fun with click house window functions webinar slides 2021-08-19
financial analytics of AAPL_stock markets
Go Programming Patterns
Cassandra
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Tiered storage intro. By Robert Hodges, Altinity CEO
This is not your father's monitoring.
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Ad

Similar to Cassandra lesson learned - extended (20)

PDF
Cassandra - lesson learned
PDF
Oracle to Cassandra Core Concepts Guide Pt. 2
PDF
Advanced Data Modeling with Apache Cassandra
PDF
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
PDF
Cassandra, web scale no sql data platform
PDF
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
PDF
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
PDF
What is the best full text search engine for Python?
ODP
Beyond php - it's not (just) about the code
PDF
SkySQL Cloud MySQL MariaDB
ODP
Beyond php - it's not (just) about the code
PDF
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
PDF
Non-blocking I/O, Event loops and node.js
PDF
Declarative benchmarking of cassandra and it's data models
PDF
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
PDF
PDF
1 Dundee - Cassandra 101
PDF
Apache cassandra & apache spark for time series data
PDF
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
PDF
Introduction to data modeling with apache cassandra
Cassandra - lesson learned
Oracle to Cassandra Core Concepts Guide Pt. 2
Advanced Data Modeling with Apache Cassandra
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
Cassandra, web scale no sql data platform
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
What is the best full text search engine for Python?
Beyond php - it's not (just) about the code
SkySQL Cloud MySQL MariaDB
Beyond php - it's not (just) about the code
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Non-blocking I/O, Event loops and node.js
Declarative benchmarking of cassandra and it's data models
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
1 Dundee - Cassandra 101
Apache cassandra & apache spark for time series data
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Introduction to data modeling with apache cassandra
Ad

More from Andrzej Ludwikowski (10)

PDF
Event-driven systems without pulling your hair out
PDF
Event Sourcing - what could go wrong - Devoxx BE
PDF
Event Sourcing - what could go wrong - Jfokus 2022
PDF
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
PDF
Event Sourcing - what could possibly go wrong?
PDF
Performance tests - it's a trap
PDF
Performance tests with Gatling (extended)
PPTX
Stress test your backend with Gatling
PPTX
Performance tests with Gatling
PPTX
Annotation processing tool
Event-driven systems without pulling your hair out
Event Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Jfokus 2022
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event Sourcing - what could possibly go wrong?
Performance tests - it's a trap
Performance tests with Gatling (extended)
Stress test your backend with Gatling
Performance tests with Gatling
Annotation processing tool

Recently uploaded (20)

PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
medical staffing services at VALiNTRY
PDF
Nekopoi APK 2025 free lastest update
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
System and Network Administration Chapter 2
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Design an Analysis of Algorithms II-SECS-1021-03
Softaken Excel to vCard Converter Software.pdf
Transform Your Business with a Software ERP System
medical staffing services at VALiNTRY
Nekopoi APK 2025 free lastest update
Odoo POS Development Services by CandidRoot Solutions
2025 Textile ERP Trends: SAP, Odoo & Oracle
Design an Analysis of Algorithms I-SECS-1021-03
history of c programming in notes for students .pptx
How Creative Agencies Leverage Project Management Software.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
How to Migrate SBCGlobal Email to Yahoo Easily
How to Choose the Right IT Partner for Your Business in Malaysia
System and Network Administration Chapter 2
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Design an Analysis of Algorithms II-SECS-1021-03

Cassandra lesson learned - extended

  • 1. Cassandra - lesson learned Andrzej Ludwikowski
  • 2. About me? - www.ludwikowski.info - github.com/aludwiko - @aludwikowski -
  • 3. Why Cassandra? - BigData!!! - Volume (petabytes of data, trillions of entities) - Velocity (real-time, streams, millions of transactions per second) - Variety (un-, semi-, structured) - writes are cheap, reads are ??? - near-linear horizontal scaling (in a proper use cases) - fully distributed, with no single point of failure - data replication by default
  • 4. Cassandra vs CAP? - CAP Theorem - pick two
  • 5. Cassandra vs CAP? - CAP Theorem - pick two
  • 6. Cassandra vs CAP? - CAP Theorem - pick two
  • 10. Write path Node 1 Node 2 Node 3 Node 4 Client (driver)
  • 11. Write path Node 1 Node 2 Node 3 Node 4 Client (driver) - Any node can coordinate any request (NSPOF)
  • 12. - Any node can coordinate any request (NSPOF) - Replication Factor Write path Node 1 Node 2 Node 3 Node 4 Client RF=3
  • 13. - Any node can coordinate any request (NSPOF) - Replication Factor - Consistency Level Write path Node 1 Node 2 Node 3 Node 4 Client RF=3 CL=2
  • 14. - Token ring from -2^63 to 2^64 Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 0100
  • 15. - Token ring from -2^63 to 2^64 - Partitioner: partition key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77
  • 16. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77
  • 17. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77 77 77
  • 18. DEMO
  • 19. Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 20. - Hinted handoff Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 21. - Hinted handoff Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 22. - Hinted handoff - Retry idempotent inserts - build-in policies Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 23. - Hinted handoff - Retry idempotent inserts - build-in policies - Lightweight transactions (Paxos) Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 24. - Hinted handoff - Retry idempotent inserts - build-in policies - Lightweight transactions (Paxos) - Batches Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 25. Write path - node level
  • 26. Write path - why so fast? - Commit log - append only
  • 27. Write path - why so fast?
  • 28. Write path - why so fast? 50,000 t/s 50 t/ms 5 t/100us 1 t/20us
  • 29. Write path - why so fast? - Commit log - append only - Periodic (10s) or batch sync to disk Node 1 Node 2 Node 3 Node 4 Client RF=2 CL=2
  • 30. D asdd R ack 2 R ack 1 Write path - why so fast? - Commit log - append only - Periodic or batch sync to disk - Network topology aware Node 1 Node 2 Node 3 Node 4 Client RF=2 CL=2
  • 31. Write path - why so fast? Client - Commit log - append only - Periodic or batch sync to disk - Network topology aware Asia DC Europe DC
  • 32. - Most recent win - Eager retries - In-memory - MemTable - Row Cache - Bloom Filters - Key Caches - Partition Summaries - On disk - Partition Indexes - SSTables Node 1 Node 2 Node 3 Node 4 Client RF=3 CL=3 Read path timestamp 67 timestamp 99 timestamp 88
  • 33. Immediate vs. Eventual Consistency - if (writeCL + readCL) > replication_factor then immediate consistency - writeCL=ALL, readCL=1 - writeCL=1, readCL=ALL - writeCL,readCL=QUORUM - https://www.ecyrd.com/cassandracalculator/ Node 1 Node 2 Node 3 Node 4 Client RF=3
  • 34. Modeling - new mindset - QDD, Query Driven Development - Nesting is ok - Duplication is ok - Writes are cheap no joins
  • 35. QDD - Conceptual model - Technology independent - Chen notation
  • 36. QDD - Application workflow
  • 37. QDD - Logical model - Chebotko diagram
  • 38. QDD - Physical model - Technology dependent - Analysis and validation (finding problems) - Physical optimization (fixing problems) - Data types
  • 39. Physical storage - Primary key - Partition key CREATE TABLE videos ( id int, title text, runtime int, year int, PRIMARY KEY (id) ); id | title | runtime | year ----+---------------------+---------+------ 1 | dzien swira | 93 | 2002 2 | chlopaki nie placza | 96 | 2000 3 | psy | 104 | 1992 4 | psy 2 | 96 | 1994 1 title runtime year dzien swira 93 2002 2 title runtime year chlopaki... 96 2000 3 title runtime year psy 104 1992 4 title runtime year psy 2 96 1994 SELECT FROM videos WHERE title = ‘dzien swira’
  • 40. Physical storage CREATE TABLE videos_with_clustering ( title text, runtime int, year int, PRIMARY KEY ((title), year) ); - Primary key (could be compound) - Partition key - Clustering column (order, uniqueness) title | year | runtime -------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104 godzilla 1954 runtime 98 1998 runtime 140 2014 runtime 123 1992 runtime 104 psy SELECT FROM videos_with_clustering WHERE title = ‘godzilla’; SELECT FROM videos_with_clustering WHERE title = ‘godzilla’ AND year > 1998;
  • 41. Physical storage CREATE TABLE videos_with_composite_pk( title text, runtime int, year int, PRIMARY KEY ((title, year)) ); - Primary key (could be compound) - Partition key (could be composite) - Clustering column (order, uniqueness) title | year | runtime -------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104 godzilla:1954 runtime 93 godzilla:1998 runtime 140 godzilla:2014 runtime 123 psy:1992 runtime 104 SELECT FROM videos_with_composite_pk WHERE title = ‘godzilla’ AND year = 1954
  • 42. Modeling - clustering column(s) Q: Retrieve videos an actor has appeared in (newest first).
  • 43. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ( ) ) WITH CLUSTERING ORDER BY ( ); Q: Retrieve videos an actor has appeared in (newest first).
  • 44. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 45. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 46. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id, character_name) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 47. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ( ) ) WITH CLUSTERING ORDER BY ( ); Q: Retrieve last 1000 measurement from given day.
  • 48. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day.
  • 49. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day. 1 day = 86 400 rows 1 week = 604 800 rows 1 month = 2 592 000 rows 1 year = 31 536 000 rows
  • 50. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day.
  • 51. Modeling - TTL CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Retention policy - keep data only from last week. INSERT INTO temperature_by_day … USING TTL 604800;
  • 52. Modeling - bit map index CREATE TABLE car ( year timestamp, model text, color timestamp, vehicle_id int, //other columns PRIMARY KEY ((year, model, color), vehicle_id) ); Q: Find car by year and/or model and/or color. INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...); SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
  • 53. Modeling - wide rows CREATE TABLE user ( email text, name text, age int, PRIMARY KEY (email) ); Q: Find user by email.
  • 54. Modeling - wide rows CREATE TABLE user ( domain text, user text, name text, age int, PRIMARY KEY ((domain), user) ); Q: Find user by email.
  • 55. Modeling - versioning with lightweight transactions CREATE TABLE document ( id text, content text, version int, locked_by text, PRIMARY KEY ((id)) ); INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1) IF NOT EXISTS; UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null; UPDATE document SET content = 'better content', version = 2, locked_by = null WHERE id = 'my doc' IF locked_by = 'andrzej';
  • 56. Modeling - JSON with UDT and tuples { "title": "Example Schema", "type": "object", "properties": { "firstName": “andrzej”, "lastName": “ludwikowski”, "age": { "description": "Age in years", "type": "integer", "minimum": 0 } }, “x_dimension”: “1”, “y_dimension”: “2”, } CREATE TYPE age ( description text, type int, minimum int ); CREATE TYPE prop ( firstName text, lastName text, age frozen <age> ); CREATE TABLE json ( title text, type text, properties list<frozen <prop>>, dimensions tuple<int, int> PRIMARY KEY (title) );
  • 57. Common use cases - Sensor data (Zonar) - Fraud detection (Barracuda) - Playlist and collections (Spotify) - Personalization and recommendation engines (Ebay) - Messaging (Instagram) - Event Sourcing!
  • 58. Common anti use cases - Queue - Search engine
  • 60. Datastax Academy - Introduction to Apache Cassandra - Data Modeling - DataStax Enterprise Foundations of Apache Cassandra - DataStax Enterprise Operations with Apache Cassandra - DataStax Enterprise Search - DataStax Enterprise Analytics with Apache Spark - DataStax Enterprise Graph
  • 61. Competition? ScyllaDB - Cassandra without JVM - same protocol, SSTable compatibility - C++ and Seastar lib - 1,000,000 IOPS
  • 62. Not covered - schema migrations - backups - DSE
  • 64. About me? - www.ludwikowski.info - github.com/aludwiko - @aludwikowski -