SlideShare a Scribd company logo
Cassandra
Nick Bailey
@nickmbailey
nick@datastax.com
Thursday, May 30, 13
©2012 DataStax
Introduction
2
Thursday, May 30, 13
©2012 DataStax
Why does Cassandra Exist?
3
Thursday, May 30, 13
©2012 DataStax
Analytics
+
Real Time
4
Big Data
Thursday, May 30, 13
©2012 DataStax
Architecture
5
Thursday, May 30, 13
©2012 DataStax
Dynamo
+
BigTable
6
Thursday, May 30, 13
©2012 DataStax
Why do people like Cassandra?
7
Thursday, May 30, 13
©2012 DataStax
Availability
8
Thursday, May 30, 13
©2012 DataStax
Scalability
9
Thursday, May 30, 13
©2012 DataStax 10
Thursday, May 30, 13
©2012 DataStax
Performance
11
Thursday, May 30, 13
©2012 DataStax 12
Thursday, May 30, 13
©2012 DataStax
Multi Datacenter Support
13
Thursday, May 30, 13
©2012 DataStax 14
Thursday, May 30, 13
©2012 DataStax
Hadoop Support
15
Thursday, May 30, 13
©2012 DataStax
Hadoop Support
• InputFormat
• Run tasktrackers/datanodes locally
• Run namenode/jobtracker anywhere
16
Thursday, May 30, 13
©2012 DataStax
Data Locality
Workload Partitioning
17
Thursday, May 30, 13
©2012 DataStax
Data Modeling
18
Thursday, May 30, 13
©2012 DataStax
Keyspace,
Column Families
19
Thursday, May 30, 13
©2012 DataStax
Database,
Tables
20
Thursday, May 30, 13
©2012 DataStax
Column Family =
Row Key + Columns (name, value)
...
21
Thursday, May 30, 13
©2012 DataStax
Static Column Families
Dynamic Column Families
22
Thursday, May 30, 13
©2012 DataStax
Static - Users Column Family
23
Row Key
g_m_bluth
password:
banana stand
name: George
Michael
tobias_f
password:
c_weathers
name:Tobias phone: 512-7777
Thursday, May 30, 13
©2012 DataStax
Dynamic - Friend Column Family
24
Row Key
g_m_bluth <date>:ann_v <date>:maeby
tobias_f <date>:barry_z <date>:carl_w <date>:lindsay ...
Thursday, May 30, 13
©2012 DataStax
Time Series Data
• Event logs
• Metrics
• Sensor Data
• Etc
25
Thursday, May 30, 13
©2012 DataStax
Time Series - Login CF
26
Row Key
g_m_bluth
1369633061:
United States
1369625839:
Mexico
...
tobias_f
1369932413:
Canada
1369681738:
United States
...
Thursday, May 30, 13
©2012 DataStax
What Else?
27
Thursday, May 30, 13
©2012 DataStax
Counter Columns
28
• Inc/Dec operations
• Not idempotent
• Possibility for over counting
Thursday, May 30, 13
©2012 DataStax
Expiring Columns
29
• TTL - Time to live
• Set per column
• Possibly an anti-pattern (we’ll get to that later)
Thursday, May 30, 13
©2012 DataStax
Secondary Indexes
30
• Select * from Users where name=Nick;
• Only support ‘=’ clauses (for first condition)
• Often misused
Thursday, May 30, 13
©2012 DataStax
CQL
Cassandra Query Language
31
Thursday, May 30, 13
©2012 DataStax 32
CREATE COLUMNFAMILY songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob);
INSERT INTO songs (id, title, artist, album)
VALUES ('a3e64f8f...', 'La Grange', 'ZZ Top', 'Tres Hombres');
SELECT * FROM songs;
id          | album        | artist         | title
-------------+--------------+----------------+----------------
2b09185b... |    Roll Away | Back Door Slam | Outside Woman...
8a172618... | We Must Obey |      Fu Manchu | Moving in Ste...
a3e64f8f... | Tres Hombres |         ZZ Top | La Grange
Thursday, May 30, 13
©2012 DataStax
How do I start?
33
Thursday, May 30, 13
©2012 DataStax
Define your questions
34
Thursday, May 30, 13
©2012 DataStax
SELECT time, location FROM
logins WHERE user =
‘nickmbailey’ ORDER BY time
DESC LIMIT 10;
35
Thursday, May 30, 13
©2012 DataStax
WHERE user = ‘nickmbailey’
Row Key
36
Thursday, May 30, 13
©2012 DataStax
ORDER BY time DESC LIMIT
10;
Store columns in chronological
order
37
Thursday, May 30, 13
©2012 DataStax
CREATE COLUMN FAMILY logins (
! user,
time,
location,
PRIMARY KEY (user, time));
38
Thursday, May 30, 13
©2012 DataStax
What about?
39
Thursday, May 30, 13
©2012 DataStax
SELECT time FROM logins
WHERE user = ‘nickmbailey’
and location = ‘United States’;
40
Thursday, May 30, 13
©2012 DataStax 41
g_m_bluth
1369633061:
United States
1369625839:
Mexico
....
1369622839:
Canada
1369422839:
Canada
1368422839:
Canada
....
1368421839:
Canada
1367421839:
United States
1367411839:
Mexico
....
Thursday, May 30, 13
©2012 DataStax
CREATE COLUMN FAMILY
logins (user, time, location,
PRIMARY KEY (user, location));
42
Thursday, May 30, 13
©2012 DataStax 43
g_m_bluth
United States:
1369633061
Canada:
1369622839
....
Thursday, May 30, 13
©2012 DataStax
To Normalize or Not
44
Thursday, May 30, 13
©2012 DataStax
SELECT time, location FROM.....
+
SELECT city, state, zip.... FROM
locations.....
45
Thursday, May 30, 13
©2012 DataStax 46
g_m_bluth
1369633061:
<United States,
Austin,
Texas,
78701>
1369625839:
<Mexico,
Tiajuana,
88191>
1358633061:
<United
States,Austin,
Texas,
78701>
Thursday, May 30, 13
©2012 DataStax
Anti Patterns
47
Thursday, May 30, 13
©2012 DataStax
Batched Writes
• Failure case is suboptimal
• Increased chance of failure
• Tune to your workload
48
Thursday, May 30, 13
©2012 DataStax
BOP/OPP
• You don’t really need it
• Your Ops Team will hate you
• Really, you don’t need it.
49
Thursday, May 30, 13
©2012 DataStax
Super Columns
• Performance penalty
• Speed
• Memory
• Replaced by CQL3
50
Thursday, May 30, 13
©2012 DataStax
Read Before Write
• Race conditions
• Hurts performance
• Cache
• IO
51
Thursday, May 30, 13
©2012 DataStax
Queues
• More generally, many deletes within a row
• A delete in Cassandra is actually a tombstone
• Read 1000 tombstones in order to find 10
columns
52
Thursday, May 30, 13
©2012 DataStax
Use Cases
53
Thursday, May 30, 13
©2012 DataStax
Ebay
54
Thursday, May 30, 13
©2012 DataStax
http://www.youtube.com/
watch?v=F-fYqPu2ciQ
55
Thursday, May 30, 13
©2012 DataStax
Ebay
• dozens of nodes
• 200 TB+ of storage
56
Thursday, May 30, 13
©2012 DataStax
Ebay
• Social Signals
• Hunch Taste Graph
• Various Time Series
57
Thursday, May 30, 13
©2012 DataStax
Social Signals
• Like, Own, Want
• Need:
• scalable counters
• high performance writes
• want to find most popular items in a given
category
58
Thursday, May 30, 13
©2012 DataStax
Social Signals
59
Row Key
item_id_1 like: 300 own:104 want:105
item_id_2 ... ... ...
ItemCount
Row Key
user_id_1 like: 50 own:10 want:75
user_id_2 ... ... ...
UserCount
Thursday, May 30, 13
©2012 DataStax
Social Signals
60
Row Key
item_id_1 user_id_1:<time> user_id_2:<time> ...
item_id_2 ... ... ...
ItemLike
Row Key
user_id_1 <time>: <item_id> <time>: <item_id> ...
user_id_2 ... ... ...
UserLike
Thursday, May 30, 13
©2012 DataStax
Social Signals - Possibilities
• Store aggregated counts per category
• Column names are counts
• Get top N items in a category
61
Thursday, May 30, 13
Questions?
Thursday, May 30, 13
Come to the Summit!
Ask me for a discount code
June 11-12, 2013
San Francisco, CA
http://www.datastax.com/company/news-and-events/events/
cassandrasummit2013
Thursday, May 30, 13

More Related Content

PDF
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
PDF
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
PDF
Use Your MySQL Knowledge to Become a MongoDB Guru
PDF
Data storage systems
PDF
Introduction to Cassandra Basics
PDF
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
PDF
Cassandra Summit 2013 Keynote
ODP
Intro to cassandra
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.fr
Use Your MySQL Knowledge to Become a MongoDB Guru
Data storage systems
Introduction to Cassandra Basics
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Summit 2013 Keynote
Intro to cassandra

Similar to Introduction to Cassandra and Data Modeling (20)

PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
PDF
Introduction to cassandra 2014
PDF
Cassandra and Spark
PPTX
Presentation
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
PDF
Intro to Cassandra
PDF
Cassandra 2.0 to 2.1
PPTX
Cassandra20141009
PDF
State of Cassandra 2012
PPTX
Learning Cassandra NoSQL
PDF
Tokyo Cassandra Summit 2014: Apache Cassandra 2.0 + 2.1 by Jonathan Ellis
PDF
Tokyo cassandra conference 2014
PPTX
Apache Cassandra Data Modeling with Travis Price
PPTX
Cassandra Tutorial
PDF
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
PDF
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
PDF
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
PPTX
Cassandra20141113
PDF
Introduction to Apache Cassandra
PDF
Cassandra introduction 2016
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
Introduction to cassandra 2014
Cassandra and Spark
Presentation
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Intro to Cassandra
Cassandra 2.0 to 2.1
Cassandra20141009
State of Cassandra 2012
Learning Cassandra NoSQL
Tokyo Cassandra Summit 2014: Apache Cassandra 2.0 + 2.1 by Jonathan Ellis
Tokyo cassandra conference 2014
Apache Cassandra Data Modeling with Travis Price
Cassandra Tutorial
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra20141113
Introduction to Apache Cassandra
Cassandra introduction 2016
Ad

More from nickmbailey (7)

PDF
Clojure at DataStax: The Long Road From Python to Clojure
PDF
Introduction to Cassandra Architecture
PDF
Lightning fast analytics with Spark and Cassandra
PPTX
Cassandra and Clojure
PDF
An Introduction to Cassandra on Linux
PDF
CFS: Cassandra backed storage for Hadoop
PDF
Clojure and the Web
Clojure at DataStax: The Long Road From Python to Clojure
Introduction to Cassandra Architecture
Lightning fast analytics with Spark and Cassandra
Cassandra and Clojure
An Introduction to Cassandra on Linux
CFS: Cassandra backed storage for Hadoop
Clojure and the Web
Ad

Introduction to Cassandra and Data Modeling