SlideShare a Scribd company logo
Cassandra and Linux
An Introduction
Nick Bailey
@nickmbailey
nick@datastax.com
Saturday, June 1, 13
©2012 DataStax
Background
2
Saturday, June 1, 13
©2012 DataStax
Analytics
+
Real Time
3
Big Data
Saturday, June 1, 13
©2012 DataStax
Dynamo
+
BigTable
4
Saturday, June 1, 13
©2012 DataStax
Who is using it?
5
Saturday, June 1, 13
©2012 DataStax 6
Saturday, June 1, 13
©2012 DataStax
Why do people like Cassandra?
7
Saturday, June 1, 13
©2012 DataStax
Availability
8
Saturday, June 1, 13
©2012 DataStax 9
http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html
Saturday, June 1, 13
©2012 DataStax
Scalability
10
Saturday, June 1, 13
©2012 DataStax 11
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
Saturday, June 1, 13
©2012 DataStax
Performance
12
Saturday, June 1, 13
©2012 DataStax 13
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf
Saturday, June 1, 13
©2012 DataStax
Multi Datacenter Support
14
Saturday, June 1, 13
©2012 DataStax
Hadoop Support
15
Saturday, June 1, 13
©2012 DataStax
Hadoop Support
• Data Locality
• Workload Partitioning
16
Saturday, June 1, 13
©2012 DataStax
Architecture - Cluster
17
Saturday, June 1, 13
©2012 DataStax 18
Saturday, June 1, 13
©2012 DataStax 19
Saturday, June 1, 13
©2012 DataStax
Architecture - Node
20
Saturday, June 1, 13
©2012 DataStax
Writes
21
Saturday, June 1, 13
©2012 DataStax
Writes
22
Saturday, June 1, 13
©2012 DataStax
Reads
23
Saturday, June 1, 13
©2012 DataStax
Reads
24
Saturday, June 1, 13
©2012 DataStax
Compaction
25
Saturday, June 1, 13
©2012 DataStax
Compaction
• Periodically merge sstables
• Multiple strategies
• SizeTieredCompaction
• LeveledCompaction
26
Saturday, June 1, 13
©2012 DataStax
Hardware
27
Saturday, June 1, 13
©2012 DataStax
Remember:
Cassandra scales horizontally
28
Saturday, June 1, 13
©2012 DataStax
Memory
29
Saturday, June 1, 13
©2012 DataStax
Memory
• More is better
• Sweet spot: 16-64GB
• Don’t give it all to the JVM
• Generally no more than 8GB
• Rest for page cache
• Can run with less for quick testing
30
Saturday, June 1, 13
©2012 DataStax
CPU
31
Saturday, June 1, 13
©2012 DataStax
CPU
• Cassandra is almost always IO bound
• Sweet spot: 8 cores
• Additional CPU required for:
• compression
• leveled compaction
32
Saturday, June 1, 13
©2012 DataStax
Disks
33
Saturday, June 1, 13
©2012 DataStax
Disks
• SSDs are awesome, not required
• Without SSDs:
• At least 2 disks (commitlog, data) (more on that later)
• Faster is better
• Before Cassandra 1.2: ~500GB per node
34
Saturday, June 1, 13
©2012 DataStax
A Note on SSDs
• Write Amplification
• http://en.wikipedia.org/wiki/Write_amplification
• Consumer grade SSDs are fine
• See talk by Rick Branson for more
• http://www.youtube.com/watch?v=zQdDi9pdf3I
• http://www.slideshare.net/rbranson/cassandra-and-
solid-state-drives
35
Saturday, June 1, 13
©2012 DataStax
Homogenous Nodes
• Usually, keep nodes the same
• Vnodes
• Make heterogenous clusters easier
• Added in version 1.2
36
Saturday, June 1, 13
©2012 DataStax
Configuration
37
Saturday, June 1, 13
©2012 DataStax
Disks
38
Saturday, June 1, 13
©2012 DataStax 39
Saturday, June 1, 13
©2012 DataStax
Commit Log
• Keep separate from data drives
• Caveats
• SSDs
• Virtualized Environments
40
Saturday, June 1, 13
©2012 DataStax
Data Drives
• Before Cassandra 1.2
• RAID0/RAID10
• Cassandra 1.2
• JBOD
• Configuration options: stop/best_effort
• XFS
41
Saturday, June 1, 13
©2012 DataStax
Note on SAN/NAS
• Don’t use them
• Cassandra is already distributed
• SPOF
• Cassandra is already IO bound
42
Saturday, June 1, 13
©2012 DataStax
Firewall
43
Saturday, June 1, 13
©2012 DataStax
Firewall
• Ports:
• 7000 - cluster communication
• 9160 - client communication
• JMX:
• Unfortunately, the JMX protocol sucks
• Ports 7199 and 1024+ for remote access
• Solution: only access JMX locally
44
Saturday, June 1, 13
©2012 DataStax
Virtualized Environments (EC2)
45
Saturday, June 1, 13
©2012 DataStax
EC2
• Large/XLarge instances
• Don’t use EBS
• phi_convict_threshold
• Don’t fix nodes, Replace them
• DataStax provides an AMI
46
Saturday, June 1, 13
©2012 DataStax
Miscellaneous
47
Saturday, June 1, 13
©2012 DataStax
Swap
• Disable it
• sudo swapoff --all
• JVM swaps to disk, Cassandra explodes
48
Saturday, June 1, 13
©2012 DataStax
Limits
• /etc/security/limits.conf
• nofile
• memlock
• as
49
Saturday, June 1, 13
©2012 DataStax
NTP
• Install it on
• Cassandra Servers
• Clients
50
Saturday, June 1, 13
©2012 DataStax
Monitor your cluster!
• Cassandra exposes tons of metrics
• Via JMX
• Recently, more options available
• DataStax OpsCenter
• http://www.datastax.com/what-we-offer/products-
services/datastax-opscenter
• Or integrate with your own system
51
Saturday, June 1, 13
©2012 DataStax
Don’t use Windows
• I’m not presenting at Texas Windows Fest
• Technically supported
• Not widely deployed
• Reduced performance
52
Saturday, June 1, 13
©2012 DataStax
Resources
• http://www.datastax.com/docs
• #cassandra on freenode
• http://www.planetcassandra.org
• Mailing Lists
• http://cassandra.apache.org to subscribe
53
Or...
Saturday, June 1, 13
Come to the Summit!
Ask me for a discount code (nick@datastax.com)
June 11-12, 2013
San Francisco, CA
http://www.datastax.com/company/news-and-events/events/
cassandrasummit2013
Saturday, June 1, 13
Want a job?
http://www.datastax.com/company/careers
Saturday, June 1, 13
Questions?
Saturday, June 1, 13

More Related Content

PDF
Cassandra at scale
PDF
IT Puls Tromsø - Windows Server 2012 Og Windows 8
PDF
Building Antifragile Applications with Apache Cassandra
PDF
Stackpath use case
PDF
OTechs Cloud Computing Training Course
PDF
NewSQL - The Future of Databases?
PPTX
Application_Benchmark_into_Virtualization
PPTX
Leveraging AWS
Cassandra at scale
IT Puls Tromsø - Windows Server 2012 Og Windows 8
Building Antifragile Applications with Apache Cassandra
Stackpath use case
OTechs Cloud Computing Training Course
NewSQL - The Future of Databases?
Application_Benchmark_into_Virtualization
Leveraging AWS

What's hot (20)

ODP
Disaster Recovery in oVirt
PDF
How can you successfully migrate to hosted private cloud 2020
PDF
NDH2k12 Cloud Computing Security
PPT
Cluster your application using CDI and JCache - Jonathan Gallimore
PPTX
Myrocks in the wild wild west! FOSDEM 2020
PPT
Open vStorage Meetup - Santa Clara 04/16
PDF
Quantifying the Noisy Neighbor Problem in Openstack
PDF
JCache (JSR107) - QCon London 2015 & JBCNConf Barcelona 2015
PDF
Road show 2015 triangle meetup
PPT
Open vStorage Road show 2015 Q1
PPT
Turning OpenStack Swift into a VM storage platform
PDF
Vitess: Scalable Database Architecture - Kubernetes Community Days Africa Ap...
PPTX
Mysql 8 vs Mariadb 10.4 Webinar 2020 Feb
PDF
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
PDF
SLE12 SP2 : High Availability et Geo Cluster
PDF
When is Myrocks good? 2020 Webinar Series
PDF
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
PDF
Disk health prediction for Ceph
PDF
What's New with Ceph - Ceph Day Silicon Valley
PPTX
Mysql 8 vs Mariadb 10.4 Highload++ 2019
Disaster Recovery in oVirt
How can you successfully migrate to hosted private cloud 2020
NDH2k12 Cloud Computing Security
Cluster your application using CDI and JCache - Jonathan Gallimore
Myrocks in the wild wild west! FOSDEM 2020
Open vStorage Meetup - Santa Clara 04/16
Quantifying the Noisy Neighbor Problem in Openstack
JCache (JSR107) - QCon London 2015 & JBCNConf Barcelona 2015
Road show 2015 triangle meetup
Open vStorage Road show 2015 Q1
Turning OpenStack Swift into a VM storage platform
Vitess: Scalable Database Architecture - Kubernetes Community Days Africa Ap...
Mysql 8 vs Mariadb 10.4 Webinar 2020 Feb
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
SLE12 SP2 : High Availability et Geo Cluster
When is Myrocks good? 2020 Webinar Series
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
Disk health prediction for Ceph
What's New with Ceph - Ceph Day Silicon Valley
Mysql 8 vs Mariadb 10.4 Highload++ 2019
Ad

Similar to An Introduction to Cassandra on Linux (20)

PDF
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
PDF
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
PPTX
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
PPTX
Cassandra Tuning - above and beyond
PDF
Cassandra Day Atlanta 2015: Troubleshooting with Apache Cassandra
PDF
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
PDF
Cassandra 2.0 to 2.1
PDF
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
PPTX
Devops kc
PDF
DataStax: Extreme Cassandra Optimization: The Sequel
PDF
Cassandra 2.0 (Introduction)
PDF
State of Cassandra 2012
PPTX
How to size up an Apache Cassandra cluster (Training)
PDF
Introduction to Apache Cassandra
PDF
Introduction to Cassandra and Data Modeling
PPTX
DataStax TechDay - Munich 2014
PDF
Cassandra Workshop - Cassandra from scratch in one day
PDF
Developing with Cassandra
PPTX
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
PDF
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - above and beyond
Cassandra Day Atlanta 2015: Troubleshooting with Apache Cassandra
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
Cassandra 2.0 to 2.1
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Devops kc
DataStax: Extreme Cassandra Optimization: The Sequel
Cassandra 2.0 (Introduction)
State of Cassandra 2012
How to size up an Apache Cassandra cluster (Training)
Introduction to Apache Cassandra
Introduction to Cassandra and Data Modeling
DataStax TechDay - Munich 2014
Cassandra Workshop - Cassandra from scratch in one day
Developing with Cassandra
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Ad

More from nickmbailey (8)

PDF
Clojure at DataStax: The Long Road From Python to Clojure
PDF
Introduction to Cassandra Architecture
PDF
Cassandra and Spark
PDF
Lightning fast analytics with Spark and Cassandra
PPTX
Cassandra and Clojure
PDF
Introduction to Cassandra Basics
PDF
CFS: Cassandra backed storage for Hadoop
PDF
Clojure and the Web
Clojure at DataStax: The Long Road From Python to Clojure
Introduction to Cassandra Architecture
Cassandra and Spark
Lightning fast analytics with Spark and Cassandra
Cassandra and Clojure
Introduction to Cassandra Basics
CFS: Cassandra backed storage for Hadoop
Clojure and the Web

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
MYSQL Presentation for SQL database connectivity
A comparative analysis of optical character recognition models for extracting...
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine learning based COVID-19 study performance prediction
Machine Learning_overview_presentation.pptx
Spectroscopy.pptx food analysis technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
cuic standard and advanced reporting.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
MYSQL Presentation for SQL database connectivity

An Introduction to Cassandra on Linux