SlideShare a Scribd company logo
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadin

Chief Evangelist, DataStax
Advanced Cassandra
1
Does Apache Cassandra Work?
Advanced Cassandra
Motivations
Advanced Cassandra
Cassandra is not…
6
A Data Ocean or Pond., Lake
An In-Memory Database
A Key-Value Store
A magical database unicorn that farts rainbows
7
When to use…
Loose data model (joins, sub-selects)
Absolute consistency (aka gotta have ACID)
No need to use anything else
You’ll miss the long, candle lit dinners with your Oracle rep
that always end with “what’s your budget look like this
year?”
Oracle, MySQL, Postgres or <RDBMS>
Uptime is a top priority
Unpredictable or high scaling requirements
Workload is transactional
Willing to put the time or effort into understanding how Cassandra works
and how to use it.
8
When to use…
Use Oracle when you want to count your money.
Use Cassandra when you want to make money.
Cassandra
Copy n Paste your relational model
APACHE
CASSANDRA
Advanced Cassandra
1000 Node Cluster
Scaling up
Stick the landing
12
Going to deploy in production!
Not sure about this!
Done!
Topology considerations
Replication Strategy
CREATE KEYSPACE killrvideo WITH
REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
Strategy
Copies
Topology considerations
• Default
• One data center
SimpleStrategy
NetworkTopologyStrategy
• Use for multi-data center
• Just use this always
NetworkTopologyStrategy
CREATE KEYSPACE Product_Catalog WITH
REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 };
CREATE KEYSPACE EU_Customer_Data WITH
REPLICATION = { 'class' : 'NetworkTopologyStrategy',
'eu1' : 3
‘eu2’ : 3
‘us1’ : 0 };
Symmetric
Asymmetric
No copies in the US
Application
• Closer to customers
• No downtime
Product_Catalog RF=3
Product_Catalog RF=3 EU_Customer_Data RF=3
EU_Customer_Data RF=0
Product_Catalog RF=3
EU_Customer_Data RF=3
Snitches
Snitches
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Where do I place
this data?
?
Dynamic Snitching
Route based on node performance
Snitches
SimpleSnitch
GossipingPropertyFileSnitch
RackInferringSnitch
PropertyFileSnitch
EC2Snitch
GoogleCloudSnitch
CloudStackSnitch
EC2MultiRegionSnitch
Snitches
• Most typically used in production
• Absolute placement
GossipingPropertyFileSnitch
cassandra-rackdc.properties
dc=DC1
rack=RAC1
Booting a datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
DC2
Pre-check
• Use NetworkTopologyStrategy
• In cassandra.yaml
• auto_bootstrap: false
• add seeds from other DC
• Set node location for Snitch
• GossipingPropertyFileSnitch:
cassandra-rackdc.properties
• PropertyFileSnitch: cassandra-
topology.properties
Booting a datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
DC2
10.1.0.1
00-25
10.1.0.4
76-100
10.1.0.2
26-50
10.1.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
ALTER KEYSPACE
Booting a datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
DC2
10.1.0.1
00-25
10.1.0.4
76-100
10.1.0.2
26-50
10.1.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
nodetool rebuild
Security
NoSQL == No Security
User Auth
Step 1 Turn it on
cassandra.yaml
authorizer:PasswordAuthorizerAllowAllAuthorizer
authenticator:AllowAllAuthenticatorPasswordAuthenticator
User Auth
cqlsh -u cassandra -p cassandra
Step 2 Create users
cqlsh> create user dude with password 'manager' superuser;
cqlsh> create user worker with password 'newhire';
cqlsh> list users;
name | super
----------+-------
cassandra | True
worker | False
dude | True
User Auth
cqlsh -u cassandra -p cassandra
Step 3 Grant permissions
cqlsh> create user ro_user with password '1234567';
cqlsh> grant all on killrvideo.user to dude;
cqlsh> grant select on killrvideo.user to ro_user;
SSL
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html
10.0.0.1
10.0.0.4 10.0.0.2
10.0.0.3
• Create SSL certificates
• Copy to each server
• Start each node
Prepared Statements
• Built for speed an efficiency
How they work: Prepare
SELECT * FROM user WHERE id = ?
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
Client
Prepare
Parsed
Hashed Cached
Prepared Statement
How they work: Bind
id = 1 + PreparedStatement Hash
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
Client
Bind & Execute
Combine
Pre-parsed Query and
Variable
Execute
Result?
How to Prepare(Statements)
PreparedStatement userSelect = session.prepare(“SELECT * FROM user WHERE id = ?”);
BoundStatement userSelectStatement = new BoundStatement(userSelect);
session.execute(userSelectStatement.bind(1));
prepared_stmt = session.prepare (“SELECT * FROM user WHERE id = ?”)
bound_stmt = prepared_stmt.bind([1])
session.execute(bound_stmt)
Java
Python
Don’t do this
for (int i = 1; i < 100; i++) {
PreparedStatement userSelect = session.prepare(“SELECT * FROM user WHERE id = ?”);
BoundStatement userSelectStatement = new BoundStatement(userSelect);
session.execute(userSelectStatement.bind(1));
}
Execute vs Execute Async
• Very subtle difference
• Blocking vs non-blocking call
VS
Async
• Request pipelining
• One connection for requests
• Responses return whenever
Async
for (…) {
future = executeAsync(statement)
}
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
Client
Do something
for (…) {
result = future.get
}
Block
Batch vs Execute Async
VS
(Potentially)
Load Balancing Policies
cluster = Cluster
.builder()
.addContactPoint("192.168.0.30")
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.ONE)
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy()))
.build();
session = cluster.connect("demo");
Data Locality
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Read partition
15
DC2
10.1.0.1
00-25
10.1.0.4
76-100
10.1.0.2
26-50
10.1.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
Client
Read partition
15
Batch (Logged)
• All statements collected on client
• Sent in one shot
• All done on 1 node
Batch is accepted
All actions are logged on
two replicas
Statements executed in
sequence
Results are collected and
returned
Batches: The good
• Great for denormalized inserts/updates
// Looking from the video side to many users
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
// looking from the user side to many videos
CREATE TABLE comments_by_user (
userid uuid,
commentid timeuuid,
videoid uuid,
comment text,
PRIMARY KEY (userid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Batches: The good
• Both inserts are run
• On failure, the batch log will replay
BEGIN BATCH
INSERT INTO comments_by_video (videoid, userid, commentid, comment)
VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.')
INSERT INTO comments_by_video (videoid, userid, commentid, comment)
VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.')
APPLY BATCH;
Batches: The bad
“I was doing a load test and nodes started blinking offline”
“Were you using a batch by any chance?”
“Why yes I was! How did you know?”
“How big was each batch?”
“1000 inserts each”
Batches: The bad
BEGIN BATCH
1000 inserts
APPLY BATCH;
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
Client
Batches: The rules
• Keep them small and for atomicity
CASSANDRA-6487 - Warn on large batches (5Kb default)
CASSANDRA-8011 - Fail on large batches (50Kb default)
The alternative
BEGIN BATCH
1000 inserts
APPLY BATCH;
while() {
future = session.executeAsync(statement)
}
Instead of:
Do this:
Old Row cache: The problem
• Reads an entire storage row of data
ID = 1
Partition Key
(Storage Row Key)
2014-09-08 12:00:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.4
2014-09-08 12:01:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.9
2014-09-08 12:02:00 :
name
SFO
2014-09-08 12:00:00 :
temp
64.0
Need this
Caches this
New Row Cache: The solution
• Stores just a few CQL rows
ID = 1
Partition Key
(Storage Row Key)
2014-09-08 12:00:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.4
2014-09-08 12:01:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.9
2014-09-08 12:02:00 :
name
SFO
2014-09-08 12:00:00 :
temp
64.0
Need this
Caches this
Using row cache
CREATE TABLE user_search_history_with_cache (
id int,
search_time timestamp,
search_text text,
search_results int,
PRIMARY KEY (id, search_time)
) WITH CLUSTERING ORDER BY (search_time DESC)
AND caching = { 'keys' : 'ALL', 'rows_per_partition' : '20' };
Perf increase
95th ms
Requests
Go make something awesome
Thank you!
Bring the questions
Follow me on twitter
@PatrickMcFadin

More Related Content

PDF
Apache Cassandra and Drivers
PPTX
Spark Cassandra Connector: Past, Present and Furure
PDF
Introduction to CQL and Data Modeling with Apache Cassandra
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Cassandra 3.0 advanced preview
PDF
Storing time series data with Apache Cassandra
PDF
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
PDF
Cassandra 2.0 and timeseries
Apache Cassandra and Drivers
Spark Cassandra Connector: Past, Present and Furure
Introduction to CQL and Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
Cassandra 3.0 advanced preview
Storing time series data with Apache Cassandra
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
Cassandra 2.0 and timeseries

What's hot (20)

PDF
Introduction to Cassandra Architecture
PDF
Coursera Cassandra Driver
PDF
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
PDF
Software Development with Apache Cassandra
PDF
Introduction to data modeling with apache cassandra
PDF
Cassandra 3.0 Data Modeling
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PDF
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
PDF
How to Bulletproof Your Scylla Deployment
PDF
Cassandra Basics, Counters and Time Series Modeling
PDF
Advanced Operations
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
PDF
Time series with apache cassandra strata
PDF
Advanced data modeling with apache cassandra
PPT
Webinar: Getting Started with Apache Cassandra
PDF
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
PDF
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Bulk Loading into Cassandra
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
Introduction to Cassandra Architecture
Coursera Cassandra Driver
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
Software Development with Apache Cassandra
Introduction to data modeling with apache cassandra
Cassandra 3.0 Data Modeling
Enabling Search in your Cassandra Application with DataStax Enterprise
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
How to Bulletproof Your Scylla Deployment
Cassandra Basics, Counters and Time Series Modeling
Advanced Operations
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
Time series with apache cassandra strata
Advanced data modeling with apache cassandra
Webinar: Getting Started with Apache Cassandra
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Introduction to Data Modeling with Apache Cassandra
Bulk Loading into Cassandra
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
Ad

Similar to Advanced Cassandra (20)

PPTX
Using Cassandra with your Web Application
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Things YouShould Be Doing When Using Cassandra Drivers
PDF
Cassandra: An Alien Technology That's not so Alien
PPTX
Cassandra tech talk
ODP
Intro to cassandra
PPTX
Cassandra implementation for collecting data and presenting data
PDF
Getting started with Spark & Cassandra by Jon Haddad of Datastax
PDF
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
PPTX
Apache Cassandra introduction
DOCX
Cassandra data modelling best practices
PPTX
An Overview of Apache Cassandra
PDF
Introduction to Cassandra and CQL for Java developers
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
PDF
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
PDF
NYC Cassandra Day - Java Intro
PDF
Introduction to Apache Cassandra
PDF
An Introduction to Apache Cassandra
PPTX
DataStax NYC Java Meetup: Cassandra with Java
PDF
Introduction to cassandra 2014
Using Cassandra with your Web Application
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Things YouShould Be Doing When Using Cassandra Drivers
Cassandra: An Alien Technology That's not so Alien
Cassandra tech talk
Intro to cassandra
Cassandra implementation for collecting data and presenting data
Getting started with Spark & Cassandra by Jon Haddad of Datastax
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
Apache Cassandra introduction
Cassandra data modelling best practices
An Overview of Apache Cassandra
Introduction to Cassandra and CQL for Java developers
Introduction to Apache Cassandra™ + What’s New in 4.0
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
NYC Cassandra Day - Java Intro
Introduction to Apache Cassandra
An Introduction to Apache Cassandra
DataStax NYC Java Meetup: Cassandra with Java
Introduction to cassandra 2014
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Cassandra Core Concepts
PPTX
Bad Habits Die Hard
PDF
Getting Started with Graph Databases
PDF
Cassandra Data Maintenance with Spark
PDF
Analytics with Spark and Cassandra
PDF
Make 2016 your year of SMACK talk
PDF
Client Drivers and Cassandra, the Right Way
PPTX
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Cassandra Core Concepts
Bad Habits Die Hard
Getting Started with Graph Databases
Cassandra Data Maintenance with Spark
Analytics with Spark and Cassandra
Make 2016 your year of SMACK talk
Client Drivers and Cassandra, the Right Way
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
Mobile App Security Testing_ A Comprehensive Guide.pdf
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
A Presentation on Artificial Intelligence
CIFDAQ's Market Insight: SEC Turns Pro Crypto
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
MYSQL Presentation for SQL database connectivity
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
Review of recent advances in non-invasive hemoglobin estimation
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding

Advanced Cassandra

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin
 Chief Evangelist, DataStax Advanced Cassandra 1
  • 6. Cassandra is not… 6 A Data Ocean or Pond., Lake An In-Memory Database A Key-Value Store A magical database unicorn that farts rainbows
  • 7. 7 When to use… Loose data model (joins, sub-selects) Absolute consistency (aka gotta have ACID) No need to use anything else You’ll miss the long, candle lit dinners with your Oracle rep that always end with “what’s your budget look like this year?” Oracle, MySQL, Postgres or <RDBMS>
  • 8. Uptime is a top priority Unpredictable or high scaling requirements Workload is transactional Willing to put the time or effort into understanding how Cassandra works and how to use it. 8 When to use… Use Oracle when you want to count your money. Use Cassandra when you want to make money. Cassandra
  • 9. Copy n Paste your relational model APACHE CASSANDRA
  • 12. Stick the landing 12 Going to deploy in production! Not sure about this! Done!
  • 13. Topology considerations Replication Strategy CREATE KEYSPACE killrvideo WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; Strategy Copies
  • 14. Topology considerations • Default • One data center SimpleStrategy NetworkTopologyStrategy • Use for multi-data center • Just use this always
  • 15. NetworkTopologyStrategy CREATE KEYSPACE Product_Catalog WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 }; CREATE KEYSPACE EU_Customer_Data WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'eu1' : 3 ‘eu2’ : 3 ‘us1’ : 0 }; Symmetric Asymmetric No copies in the US
  • 16. Application • Closer to customers • No downtime Product_Catalog RF=3 Product_Catalog RF=3 EU_Customer_Data RF=3 EU_Customer_Data RF=0 Product_Catalog RF=3 EU_Customer_Data RF=3
  • 18. Snitches DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Where do I place this data? ? Dynamic Snitching Route based on node performance
  • 20. Snitches • Most typically used in production • Absolute placement GossipingPropertyFileSnitch cassandra-rackdc.properties dc=DC1 rack=RAC1
  • 21. Booting a datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 DC2 Pre-check • Use NetworkTopologyStrategy • In cassandra.yaml • auto_bootstrap: false • add seeds from other DC • Set node location for Snitch • GossipingPropertyFileSnitch: cassandra-rackdc.properties • PropertyFileSnitch: cassandra- topology.properties
  • 22. Booting a datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 DC2: RF=3 ALTER KEYSPACE
  • 23. Booting a datacenter DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 DC2: RF=3 nodetool rebuild
  • 25. NoSQL == No Security
  • 26. User Auth Step 1 Turn it on cassandra.yaml authorizer:PasswordAuthorizerAllowAllAuthorizer authenticator:AllowAllAuthenticatorPasswordAuthenticator
  • 27. User Auth cqlsh -u cassandra -p cassandra Step 2 Create users cqlsh> create user dude with password 'manager' superuser; cqlsh> create user worker with password 'newhire'; cqlsh> list users; name | super ----------+------- cassandra | True worker | False dude | True
  • 28. User Auth cqlsh -u cassandra -p cassandra Step 3 Grant permissions cqlsh> create user ro_user with password '1234567'; cqlsh> grant all on killrvideo.user to dude; cqlsh> grant select on killrvideo.user to ro_user;
  • 30. Prepared Statements • Built for speed an efficiency
  • 31. How they work: Prepare SELECT * FROM user WHERE id = ? 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 Client Prepare Parsed Hashed Cached Prepared Statement
  • 32. How they work: Bind id = 1 + PreparedStatement Hash 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 Client Bind & Execute Combine Pre-parsed Query and Variable Execute
  • 34. How to Prepare(Statements) PreparedStatement userSelect = session.prepare(“SELECT * FROM user WHERE id = ?”); BoundStatement userSelectStatement = new BoundStatement(userSelect); session.execute(userSelectStatement.bind(1)); prepared_stmt = session.prepare (“SELECT * FROM user WHERE id = ?”) bound_stmt = prepared_stmt.bind([1]) session.execute(bound_stmt) Java Python
  • 35. Don’t do this for (int i = 1; i < 100; i++) { PreparedStatement userSelect = session.prepare(“SELECT * FROM user WHERE id = ?”); BoundStatement userSelectStatement = new BoundStatement(userSelect); session.execute(userSelectStatement.bind(1)); }
  • 36. Execute vs Execute Async • Very subtle difference • Blocking vs non-blocking call VS
  • 37. Async • Request pipelining • One connection for requests • Responses return whenever
  • 38. Async for (…) { future = executeAsync(statement) } 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 Client Do something for (…) { result = future.get } Block
  • 39. Batch vs Execute Async VS (Potentially)
  • 40. Load Balancing Policies cluster = Cluster .builder() .addContactPoint("192.168.0.30") .withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.ONE) .withRetryPolicy(DefaultRetryPolicy.INSTANCE) .withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy())) .build(); session = cluster.connect("demo");
  • 41. Data Locality DC1 DC1: RF=3 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Client Read partition 15 DC2 10.1.0.1 00-25 10.1.0.4 76-100 10.1.0.2 26-50 10.1.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50 Node Primary Replica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 DC2: RF=3 Client Read partition 15
  • 42. Batch (Logged) • All statements collected on client • Sent in one shot • All done on 1 node Batch is accepted All actions are logged on two replicas Statements executed in sequence Results are collected and returned
  • 43. Batches: The good • Great for denormalized inserts/updates // Looking from the video side to many users CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC); // looking from the user side to many videos CREATE TABLE comments_by_user ( userid uuid, commentid timeuuid, videoid uuid, comment text, PRIMARY KEY (userid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 44. Batches: The good • Both inserts are run • On failure, the batch log will replay BEGIN BATCH INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.') INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.') APPLY BATCH;
  • 45. Batches: The bad “I was doing a load test and nodes started blinking offline” “Were you using a batch by any chance?” “Why yes I was! How did you know?” “How big was each batch?” “1000 inserts each”
  • 46. Batches: The bad BEGIN BATCH 1000 inserts APPLY BATCH; 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 Client
  • 47. Batches: The rules • Keep them small and for atomicity CASSANDRA-6487 - Warn on large batches (5Kb default) CASSANDRA-8011 - Fail on large batches (50Kb default)
  • 48. The alternative BEGIN BATCH 1000 inserts APPLY BATCH; while() { future = session.executeAsync(statement) } Instead of: Do this:
  • 49. Old Row cache: The problem • Reads an entire storage row of data ID = 1 Partition Key (Storage Row Key) 2014-09-08 12:00:00 : name SFO 2014-09-08 12:00:00 : temp 63.4 2014-09-08 12:01:00 : name SFO 2014-09-08 12:00:00 : temp 63.9 2014-09-08 12:02:00 : name SFO 2014-09-08 12:00:00 : temp 64.0 Need this Caches this
  • 50. New Row Cache: The solution • Stores just a few CQL rows ID = 1 Partition Key (Storage Row Key) 2014-09-08 12:00:00 : name SFO 2014-09-08 12:00:00 : temp 63.4 2014-09-08 12:01:00 : name SFO 2014-09-08 12:00:00 : temp 63.9 2014-09-08 12:02:00 : name SFO 2014-09-08 12:00:00 : temp 64.0 Need this Caches this
  • 51. Using row cache CREATE TABLE user_search_history_with_cache ( id int, search_time timestamp, search_text text, search_results int, PRIMARY KEY (id, search_time) ) WITH CLUSTERING ORDER BY (search_time DESC) AND caching = { 'keys' : 'ALL', 'rows_per_partition' : '20' };
  • 53. Go make something awesome
  • 54. Thank you! Bring the questions Follow me on twitter @PatrickMcFadin