SlideShare a Scribd company logo
©2013 DataStax Confidential. Do not distribute without consent.
Jon Haddad, Luke Tillman
Technical Evangelists, DataStax
@rustyrazorblade, @LukeTillman
Introduction to Apache Cassandra
1
What is Apache Cassandra?
• Fast Distributed Database
• High Availability
• Linear Scalability
• Predictable Performance
• No SPOF
• Multi-DC
• Commodity Hardware
• Easy to manage operationally
Hash Ring
• No master / slave / replica sets
• No config servers, zookeeper
• Data is partitioned around the ring
• Data is replicated to RF=N servers
• All nodes hold data and can answer
queries (both reads & writes)
• Location of data on ring is
determined by partition key
CAP Tradeoffs
• Cassandra chooses Availability &
Partition Tolerance over Consistency
• Queries have tunable consistency level
• ALL, QUORUM, ONE
• Hinted Handoff to deal with failed nodes
Data Modeling
Data Structures
• Like an RDBMS, Cassandra uses a Table to
store data
• But there’s where the similarities end
• Partitions within tables
• Rows within partitions (or a single row)
• CQL to create tables & query data
• Partition keys determine where a partition
is found
• Clustering keys determine ordering of rows
within a partition
Table
Partition
Row
Keyspace
Example: Single Row Partition
• Simple User system
• Identified by name (pk)
• 1 Row per partition
• This is familiar territory
name age job
jon 33 evangelist
luke 33 evangelist
old pete 108 retired
s. seagal 62 actor
JCVD 53 actor
cqlsh:demo> select * from user WHERE name = 'JCVD'
cqlsh:demo> create table user
(name text primary key,
age int,
job text);
Example: Multiple Rows
• Comments on photos
• Comments are always selected by
the photo_id
• There are only 4 rows in 2 partitions
• In the real world, use UUIDs instead
of int for PK
photo_id comment_id user comment
5 1 jon hi
5 2 luke oh hey
5 3 JCVD AHHHHH!!!
6 4 jon great pic
select * from comment where photo_id=5
create table comment
( photo_id int,
comment_id int,
user text,
comment text,
primary key (photo_id, comment_id));
Partition with Clustering
photo_id comment_id user comment comment_id user comment comment_id user comment
5 1 jon hi 2 luke oh hey 3 JCVD AHHHHH!!!
6 4 jon great pic
• Multiple rows are transposed into a single partition
• Partitions vary in size
• Old terminology - "wide row"
Model Tables to Answer Queries
• This is not 3NF!!
• We always query by partition key
• Create many tables aka
materialized views
• Manage in your app code
• Denormalize!!
user age
jon 33
luke 33
JCVD 53
age user user
33 jon luke
53 JCVD
CREATE TABLE age_to_user (
age int,
user text,
primary key (age, user)
);
CQL Data Types
Basic Types Collections
text uuid counter map
int timeuuid list
decimal set
blob
Read the CQL documentation for the full list of types
Reads & Writes
The Write Path
• Writes are written to any node in the cluster
(coordinator)
• Writes are written to commit log, then to
memtable
• Every write includes a timestamp
• Memtable flushed to disk periodically
(sstable)
• New memtable is created in memory
• Deletes are actually a special write case,
called a “tombstone”
What is an SSTable?
• Immutable data file for row storage
• Deletes are written as tombstones
• Every write includes a timestamp of when it
was written
• Partition is spread across multiple SSTables
• Same column can be in multiple SSTables
• Merged through compaction, only latest
timestamp is kept
• Easy backups!
sstable sstable sstable
sstable
The Read Path
• Any server may be queried, it acts as the
coordinator
• Contacts nodes with the requested key
• On each node, data is pulled from
SSTables and merged
• Consistency< ALL performs read repair
in background (read_repair_chance)
Analytics with Spark
Spark at a Glance
• Scala, Python, Java
• Hadoop alternative - batch analytics
• Distributed SQL
• Real time analytics via streaming
• Machine learning
• GraphX (in progress)
• Open source connector available
• Built into DSE
Summary
• How do I query my data if I can only
query by key?
• Denormalize!
• Create multiple views into your data
(multiple tables)
• Cassandra is built for fast writes
• Use fast writes to do as few reads as
possible
• Use Spark for advanced analytics and
real time analysis
©2013 DataStax Confidential. Do not distribute without consent. 19

More Related Content

PDF
Crash course intro to cassandra
PDF
Introduction to Cassandra - Denver
PDF
Cassandra Core Concepts
PDF
Migrating from MySQL to MongoDB
PPTX
Sq lite presentation
PDF
What every developer should know about database scalability, PyCon 2010
PPTX
SQLite: Light, Open Source Relational Database Management System
PDF
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer
Crash course intro to cassandra
Introduction to Cassandra - Denver
Cassandra Core Concepts
Migrating from MySQL to MongoDB
Sq lite presentation
What every developer should know about database scalability, PyCon 2010
SQLite: Light, Open Source Relational Database Management System
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer

What's hot (18)

PDF
Apache Traffic Server Internals
PPT
chOpaal -- Senior Project Presentation
KEY
MongoDB Administration 20110922
PDF
Shard-Query, an MPP database for the cloud using the LAMP stack
PDF
SQLDay2013_DennyCherry_GettingSQLServiceBrokerUp&Running
PDF
SQLite3
KEY
MongoDB Administration ~ Kevin Hanson
ODP
Introduction4 SQLite
PDF
Networking in iOS NSURLSession & NSStream
PDF
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
PDF
Turning a Search Engine into a Relational Database
PPTX
Should I use a document database?
PDF
What's brewing in the eZ Systems extensions kitchen
PDF
mogpres
PPT
PDF
Your backend architecture is what matters slideshare
PDF
SQL for Elasticsearch
Apache Traffic Server Internals
chOpaal -- Senior Project Presentation
MongoDB Administration 20110922
Shard-Query, an MPP database for the cloud using the LAMP stack
SQLDay2013_DennyCherry_GettingSQLServiceBrokerUp&Running
SQLite3
MongoDB Administration ~ Kevin Hanson
Introduction4 SQLite
Networking in iOS NSURLSession & NSStream
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
Turning a Search Engine into a Relational Database
Should I use a document database?
What's brewing in the eZ Systems extensions kitchen
mogpres
Your backend architecture is what matters slideshare
SQL for Elasticsearch
Ad

Similar to Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enterprise (20)

PDF
Intro to Cassandra
PDF
Introduction to Cassandra
PDF
Cassandra Day Denver 2014: Introduction to Apache Cassandra
PDF
Deep Dive into Cassandra
PDF
Cassandra Basics, Counters and Time Series Modeling
PDF
Cassandra and Spark
PDF
A Deep Dive into Apache Cassandra for .NET Developers
PDF
Intro to Cassandra
PPTX
Presentation
PDF
Introduction to cassandra 2014
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
PDF
Cassandra1.2
PDF
Getting started with Spark & Cassandra by Jon Haddad of Datastax
PPTX
Apache Cassandra introduction
PPTX
Cassandra Tutorial
PDF
Spark and cassandra (Hulu Talk)
ODP
Intro to cassandra
PPTX
Learning Cassandra NoSQL
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
PPTX
cassandra_presentation_final
Intro to Cassandra
Introduction to Cassandra
Cassandra Day Denver 2014: Introduction to Apache Cassandra
Deep Dive into Cassandra
Cassandra Basics, Counters and Time Series Modeling
Cassandra and Spark
A Deep Dive into Apache Cassandra for .NET Developers
Intro to Cassandra
Presentation
Introduction to cassandra 2014
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Cassandra1.2
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Apache Cassandra introduction
Cassandra Tutorial
Spark and cassandra (Hulu Talk)
Intro to cassandra
Learning Cassandra NoSQL
Introduction to Apache Cassandra™ + What’s New in 4.0
cassandra_presentation_final
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
A Presentation on Artificial Intelligence
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
A Presentation on Artificial Intelligence
Chapter 3 Spatial Domain Image Processing.pdf
Empathic Computing: Creating Shared Understanding
A comparative analysis of optical character recognition models for extracting...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Digital-Transformation-Roadmap-for-Companies.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The AUB Centre for AI in Media Proposal.docx
Spectroscopy.pptx food analysis technology
Advanced methodologies resolving dimensionality complications for autism neur...

Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enterprise

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. Jon Haddad, Luke Tillman Technical Evangelists, DataStax @rustyrazorblade, @LukeTillman Introduction to Apache Cassandra 1
  • 2. What is Apache Cassandra? • Fast Distributed Database • High Availability • Linear Scalability • Predictable Performance • No SPOF • Multi-DC • Commodity Hardware • Easy to manage operationally
  • 3. Hash Ring • No master / slave / replica sets • No config servers, zookeeper • Data is partitioned around the ring • Data is replicated to RF=N servers • All nodes hold data and can answer queries (both reads & writes) • Location of data on ring is determined by partition key
  • 4. CAP Tradeoffs • Cassandra chooses Availability & Partition Tolerance over Consistency • Queries have tunable consistency level • ALL, QUORUM, ONE • Hinted Handoff to deal with failed nodes
  • 6. Data Structures • Like an RDBMS, Cassandra uses a Table to store data • But there’s where the similarities end • Partitions within tables • Rows within partitions (or a single row) • CQL to create tables & query data • Partition keys determine where a partition is found • Clustering keys determine ordering of rows within a partition Table Partition Row Keyspace
  • 7. Example: Single Row Partition • Simple User system • Identified by name (pk) • 1 Row per partition • This is familiar territory name age job jon 33 evangelist luke 33 evangelist old pete 108 retired s. seagal 62 actor JCVD 53 actor cqlsh:demo> select * from user WHERE name = 'JCVD' cqlsh:demo> create table user (name text primary key, age int, job text);
  • 8. Example: Multiple Rows • Comments on photos • Comments are always selected by the photo_id • There are only 4 rows in 2 partitions • In the real world, use UUIDs instead of int for PK photo_id comment_id user comment 5 1 jon hi 5 2 luke oh hey 5 3 JCVD AHHHHH!!! 6 4 jon great pic select * from comment where photo_id=5 create table comment ( photo_id int, comment_id int, user text, comment text, primary key (photo_id, comment_id));
  • 9. Partition with Clustering photo_id comment_id user comment comment_id user comment comment_id user comment 5 1 jon hi 2 luke oh hey 3 JCVD AHHHHH!!! 6 4 jon great pic • Multiple rows are transposed into a single partition • Partitions vary in size • Old terminology - "wide row"
  • 10. Model Tables to Answer Queries • This is not 3NF!! • We always query by partition key • Create many tables aka materialized views • Manage in your app code • Denormalize!! user age jon 33 luke 33 JCVD 53 age user user 33 jon luke 53 JCVD CREATE TABLE age_to_user ( age int, user text, primary key (age, user) );
  • 11. CQL Data Types Basic Types Collections text uuid counter map int timeuuid list decimal set blob Read the CQL documentation for the full list of types
  • 13. The Write Path • Writes are written to any node in the cluster (coordinator) • Writes are written to commit log, then to memtable • Every write includes a timestamp • Memtable flushed to disk periodically (sstable) • New memtable is created in memory • Deletes are actually a special write case, called a “tombstone”
  • 14. What is an SSTable? • Immutable data file for row storage • Deletes are written as tombstones • Every write includes a timestamp of when it was written • Partition is spread across multiple SSTables • Same column can be in multiple SSTables • Merged through compaction, only latest timestamp is kept • Easy backups! sstable sstable sstable sstable
  • 15. The Read Path • Any server may be queried, it acts as the coordinator • Contacts nodes with the requested key • On each node, data is pulled from SSTables and merged • Consistency< ALL performs read repair in background (read_repair_chance)
  • 17. Spark at a Glance • Scala, Python, Java • Hadoop alternative - batch analytics • Distributed SQL • Real time analytics via streaming • Machine learning • GraphX (in progress) • Open source connector available • Built into DSE
  • 18. Summary • How do I query my data if I can only query by key? • Denormalize! • Create multiple views into your data (multiple tables) • Cassandra is built for fast writes • Use fast writes to do as few reads as possible • Use Spark for advanced analytics and real time analysis
  • 19. ©2013 DataStax Confidential. Do not distribute without consent. 19