Mikhail Dubkov
mdubkov@griddynamics.com
Apache Cassandra. Inception
 Why Cassandra ?
 Architecture & Design
 Use cases
 Summary
 Q & A
Agenda
Why Cassandra ?
Apache Cassandra. Inception - all you need to know by Mikhail Dubkov
http://db-engines.com/en/ranking
Ranking
NoSql Benchmark
www.endpoint.com
Apache Cassandra. Inception - all you need to know by Mikhail Dubkov
Apache Cassandra. Inception - all you need to know by Mikhail Dubkov
 DataStax Java Driver
 Astyanax (Netflix)
 ODBC Driver with SQL Connector
 Hector (NO LONGER ACTIVE )
Driver
value1 value2 value3
JohnDoe
*
27abc@d.com
column1 column3column2
user nameageemail
Row
key
user123
Table (Column Family)
KEYSPACE
Data Model
 Usual types :
boolean, text, int, bigint ( long ), float, double, decimal , blob
 Additional types :
counter, varint, inet(IPv4 or IPv6), timestamp, timeuuid, uuid
 Collection types :
list, set, map
Data Types
 Basic component of Cassandra
 Main configuration cassandra.yaml
 Restart after changing
cassandra.yaml
Node
key value ts
Commit log
Memtable
Write flow
key value ts
Flusher Thread
SSTable1 SSTable2 SSTable3 SSTable4
Memtable
Background
Write flow
key value ts
Flusher Thread
SSTable1 SSTable2 SSTable3 SSTable4
Memtable
Compaction Thread
SSTable5
Background
Write flow
 Writes new record if not exists
 Writes record with more recent timestamp if exists
 UPDATE leads to new record if not exists
Upsert
Delete / TTL
 DELETE -> Tombstone
 TTL -> Tombstone
 Compaction removes Tombstone’s data
Tombstone
 Tombstones live time defined by gc_grace_seconds
 Tombstone’s limit (100K)
 Lot Tombstones -> slow compaction
result
SSTable6 SSTable7 SSTable5
Memtable
Read flow
Bloom filter
A
D
C
B
Peer to Peer
 PRIMARY KEY ( country )
 PRIMARY KEY ( country, state )
 PRIMARY KEY ( (country, state), city )
* Partition key
Row key
A
BD
C
-9223372036854775808
to
-4611686018427387903
-1
to
4611686018427387903
-4611686018427387904
to
-1
4611686018427387904
to
9223372036854775807
Distribution
A
D
C
B
Replication
RF: 3
A
D
C
B
Replication
D C
B
C
BD
A
A
Consistency level
 Tunable ( 1, 2, 3, Quorum, All etc. )
 Strong consistency ( R + W > N )
 Weak consistency ( R + W <= N )
RP: 3
A
E
B
C
D
CL: TWO
Strong consistency
WRITE
RP: 3
A
E
B
C
D
CL: TWO
Strong consistency
READ
RP: 3
A
E
B
C
D
CL: TWO
Weak consistency
WRITE
RP: 3
A
E
B
C
D
CL: ONE
Weak consistency
READ
RP: 3
A
E
C
D
CL: ONE
Node E store hint for Node B
Fall down
WRITE
Hinted Handoff
RP: 3
A
E
C
D
Fall down
Alive
Hinted Handoff
RP: 3
A
E
B
C
D
CL: TWO
Read repair
READ
Cache
 Row cache
 Partition key cache
Row cache
 Not write-through
 Off-heap
 Misuse exhausts the JVM heap
Partition key cache
 Where the partition is located on disk
 Decreasing seek times
 Enabled by default
CQL
 ALTER ( keyspace, table, type, user )
 CREATE (keyspace, table, trigger, user )
 SELECT
 INSERT
 UPDATE
 DELETE
 TRUNCATE
 DROP ( keyspace, table, trigger, user )
 GRANT
 REVOKE
CREATE TABLE user (
user_id int,
. . .
group text,
PRIMARY KEY(user_id)
);
CREATE INDEX ON user (group) ;
SELECT * FROM user WHERE group = 'specified_group’ ;
Secondary index
E
B
C
D
ALL
Secondary index
A
CREATE TABLE car_owner (
group text ,
owner text,
model text,
. . .
PRIMARY KEY ( group, owner ));
group | owner | model
major | Ben Affleck | Ferrari
major | Vin Diesel | Bugatti
minor | Sebastian Druid | Opel
CREATE INDEX ON car_owner ( model ) ;
SELECT * FROM car_owner where group = 'major' and model= 'Ferrari';
Secondary index
E
B
C
D
ONE
Secondary index
A
Association Table Mapping
CREATE TABLE email_to_user_id (
email text,
user_id int,
PRIMARY KEY(email))
CREATE TABLE email_to_user (
email text,
user_name text,
. . .
PRIMARY KEY(email))
Error handling
 NoHostAvailableException
 UnavailableException
 ( Write | Read ) TimeoutException
E
B
C
D
Error handling
A
NoHostAvailableException
RF: 3
E
B
D
Error handling
A
UnavailableException
CL: ALL
R1
R2
R3
RF: 3
E
B
D
Error handling
A
( Write | Read ) TimeoutException
CL: ALL
R1
R2
R3
C
Use cases
Product catalogs & Playlists
Finance
Event log
Distributed cache
Social network
Use power correctly
Apache Cassandra. Inception - all you need to know by Mikhail Dubkov
 Apple – 75 000+ Nodes 10s of PBs Millions ops/s
 Netflix – 90+ Clusters 2700+ Nodes >1 Trillion ops/day
 eBay – 100+ Clusters 20+ Billion read/writes per day
Motivation
 www.macys.com
 4 nodes
 99.9 % availability
De facto motivation
Macy‘s
Q & A
Thank you!
http://www.planetcassandra.org/
http://docs.datastax.com/en/index.html
http://cassandrasummit-datastax.com/

More Related Content

PDF
9.1 Grand Tour
PPTX
Introduction to PostgreSQL
PDF
2013 april gruff webinar san diego copy
PDF
2013 april gruff webinar san diego copy
PPT
Borisov - sales
PDF
What to expect from Java 9
PPTX
Next-gen DevOps engineering with Docker and Kubernetes by Antons Kranga
PPTX
Stream-style messaging development with Rabbit, Active, ZeroMQ & Apache Kafka...
9.1 Grand Tour
Introduction to PostgreSQL
2013 april gruff webinar san diego copy
2013 april gruff webinar san diego copy
Borisov - sales
What to expect from Java 9
Next-gen DevOps engineering with Docker and Kubernetes by Antons Kranga
Stream-style messaging development with Rabbit, Active, ZeroMQ & Apache Kafka...

Viewers also liked (20)

PDF
Spark-driven audience counting by Boris Trofimov
PDF
Flavors of Concurrency in Java
PPTX
Monitoring of developers. The necessity or self-indulgence by Oleksiy Dyomin
PPTX
Interactive Java Support to your tool -- The JShell API and Architecture
PPTX
Web-application I have always dreamt of by Victor Polischuk
PPTX
Testing in Legacy: from Rags to Riches by Taras Slipets
PDF
Unlocking the Magic of Monads with Java 8
PDF
Continuously building, releasing and deploying software: The Revenge of the M...
PDF
Save Java memory
PPTX
JShell: An Interactive Shell for the Java Platform
PDF
API first with Swagger and Scala by Slava Schmidt
PDF
The Epic Groovy Puzzlers S02: The Revenge of the Parentheses
PDF
STEMing Kids: One workshop at a time
PDF
Virtual Private Cloud with container technologies for DevOps
PDF
MapDB - taking Java collections to the next level
PPTX
Gamification in outsourcing company: experience report.
PPTX
Java 8, the Good, the Bad and the Ugly
PPTX
JavaFX 8 everywhere; write once run anywhere by Mohamed Taman
PPTX
Portrait of professional developer 2.0
PDF
Spring the Ripper by Evgeny Borisov
Spark-driven audience counting by Boris Trofimov
Flavors of Concurrency in Java
Monitoring of developers. The necessity or self-indulgence by Oleksiy Dyomin
Interactive Java Support to your tool -- The JShell API and Architecture
Web-application I have always dreamt of by Victor Polischuk
Testing in Legacy: from Rags to Riches by Taras Slipets
Unlocking the Magic of Monads with Java 8
Continuously building, releasing and deploying software: The Revenge of the M...
Save Java memory
JShell: An Interactive Shell for the Java Platform
API first with Swagger and Scala by Slava Schmidt
The Epic Groovy Puzzlers S02: The Revenge of the Parentheses
STEMing Kids: One workshop at a time
Virtual Private Cloud with container technologies for DevOps
MapDB - taking Java collections to the next level
Gamification in outsourcing company: experience report.
Java 8, the Good, the Bad and the Ugly
JavaFX 8 everywhere; write once run anywhere by Mohamed Taman
Portrait of professional developer 2.0
Spring the Ripper by Evgeny Borisov
Ad

Similar to Apache Cassandra. Inception - all you need to know by Mikhail Dubkov (20)

PPT
NOSQL and Cassandra
PDF
Lightning fast analytics with Spark and Cassandra
PDF
Lobos Introduction
PPTX
Lightning Fast Analytics with Cassandra and Spark
ODP
PostgreSQL 8.4 TriLUG 2009-11-12
ODP
Meetup cassandra for_java_cql
ODP
Introduciton to Apache Cassandra for Java Developers (JavaOne)
PDF
SQL injection: Not only AND 1=1
PPT
MDI Training DB2 Course
PPTX
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
PDF
Mito, a successor of Integral
PPTX
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
PPTX
Scaling with MongoDB
PPTX
Lightning fast analytics with Cassandra and Spark
PDF
About "Apache Cassandra"
PPT
扩展世界上最大的图片Blog社区
PPT
Fotolog: Scaling the World's Largest Photo Blogging Community
PPTX
Apache Cassandra, part 1 – principles, data model
PDF
Pune Clojure Course Outline
PDF
mar07-redis.pdf
NOSQL and Cassandra
Lightning fast analytics with Spark and Cassandra
Lobos Introduction
Lightning Fast Analytics with Cassandra and Spark
PostgreSQL 8.4 TriLUG 2009-11-12
Meetup cassandra for_java_cql
Introduciton to Apache Cassandra for Java Developers (JavaOne)
SQL injection: Not only AND 1=1
MDI Training DB2 Course
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Mito, a successor of Integral
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Scaling with MongoDB
Lightning fast analytics with Cassandra and Spark
About "Apache Cassandra"
扩展世界上最大的图片Blog社区
Fotolog: Scaling the World's Largest Photo Blogging Community
Apache Cassandra, part 1 – principles, data model
Pune Clojure Course Outline
mar07-redis.pdf
Ad

More from JavaDayUA (10)

PDF
20 Years of Java
PDF
How to get the most out of code reviews
PDF
Design rationales in the JRockit JVM
PPTX
Solution Architecture tips & tricks by Roman Shramkov
PDF
Reactive programming and Hystrix fault tolerance by Max Myslyvtsev
PDF
Spring Puzzlers by Evgeny Borisov, Baruch Sadogursky
PDF
From REST to Hypermedia APIs with Spring by Vladimir Tsukur
PPTX
Everything you wanted to know about writing async, high-concurrency HTTP apps...
ODP
The Great Migration by Baruch Sadogursky
PDF
Paintfree Object-Document Mapping for MongoDB by Philipp Krenn
20 Years of Java
How to get the most out of code reviews
Design rationales in the JRockit JVM
Solution Architecture tips & tricks by Roman Shramkov
Reactive programming and Hystrix fault tolerance by Max Myslyvtsev
Spring Puzzlers by Evgeny Borisov, Baruch Sadogursky
From REST to Hypermedia APIs with Spring by Vladimir Tsukur
Everything you wanted to know about writing async, high-concurrency HTTP apps...
The Great Migration by Baruch Sadogursky
Paintfree Object-Document Mapping for MongoDB by Philipp Krenn

Recently uploaded (20)

PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Architecture types and enterprise applications.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
CloudStack 4.21: First Look Webinar slides
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
Convolutional neural network based encoder-decoder for efficient real-time ob...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Getting started with AI Agents and Multi-Agent Systems
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Custom Battery Pack Design Considerations for Performance and Safety
2018-HIPAA-Renewal-Training for executives
Abstractive summarization using multilingual text-to-text transfer transforme...
sbt 2.0: go big (Scala Days 2025 edition)
Taming the Chaos: How to Turn Unstructured Data into Decisions
Architecture types and enterprise applications.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
Consumable AI The What, Why & How for Small Teams.pdf
1 - Historical Antecedents, Social Consideration.pdf
Microsoft Excel 365/2024 Beginner's training
Zenith AI: Advanced Artificial Intelligence
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
sustainability-14-14877-v2.pddhzftheheeeee
CloudStack 4.21: First Look Webinar slides
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
The influence of sentiment analysis in enhancing early warning system model f...

Apache Cassandra. Inception - all you need to know by Mikhail Dubkov