SlideShare a Scribd company logo
SOUTH BAY CASSANDRA USERS NOVEMBER 2014 
COMPACTION, COMPACTION, 
EVERYWHERE. 
Aaron Morton 
@aaronmorton 
Co-Founder & Principal Consultant 
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
About The Last Pickle. 
Work with clients to deliver and improve Apache Cassandra 
based solutions. 
Apache Cassandra Committer, DataStax MVP, Apache 
Usergrid Committer. 
Based in New Zealand & USA.
Compaction? 
STCS 
LCS 
DTCS
Compaction? 
Because reasons.
No compaction? 
Row fragmentation would 
result in dramatically increased 
read latency.
No compaction? 
Increased file count would 
increase memory usage.
No compaction? 
Overwrites and deletions 
would result in wasted disk 
space.
Compaction? 
Yes.
Compaction? 
Log Structured Merge Tree.
Compaction? 
Creating new files when 
flushing to disk improves 
performance and reduces 
complexity.
Compaction? 
SSTable 1 
foo: 
dishwasher (ts 10): 
tomato 
purple (ts 10): 
cromulent 
SSTable 2 
foo: 
frink (ts 20): 
flayven 
monkey (ts 10): 
embiggins 
SSTable 3 SSTable 4 
foo: 
dishwasher (ts 15): 
tomacco 
SSTable 5
Demo.
nodetool cfhistograms foo bar 
SSTables per Read 
1 sstables: 149 
2 sstables: 62 
3 sstables: 65 
4 sstables: 50 
5 sstables: 45 
6 sstables: 44 
7 sstables: 76 
8 sstables: 72 
10 sstables: 305 
12 sstables: 390
Compaction? 
STCS 
LCS 
DTCS
SizeTieredCompactionStrategy 
The first compaction strategy. 
Group files of a similar size for 
compaction.
SizeTieredCompactionStrategy 
Works well when data is 
written to initially and then 
only read from.
STCS - After flush. 
Tier 0 ( < 50 MB) Tier 1 ~125MB
STCS - Compaction Starts 
Tier 0 ( < 50 MB) Tier 1 ~125MB
STCS - New SSTable 
Tier 0 ( < 50 MB) Tier 1 ~125MB
STCS - Purge old SSTables 
Tier 0 ( < 50 MB) Tier 1 ~125MB
STCS - Compaction Starts (again) 
Tier 0 ( < 50 MB) Tier 1 ~125MB
STCS - Final State 
Tier 0 ( < 50 MB) Tier 1 ~200MB Tier 2 ~800MB
STCS - min_sstable_size 
Maximum size of SSTables in 
the “small” bucket. 
Default 50
STCS - bucket_low 
Lower bound of the bucket 
size compared to the average 
size in the bucket. 
Default 0.5
STCS - bucket_high 
Upper bound of the bucket 
size compared to the average 
size in the bucket. 
Default 1.5
STCS - cold_reads_to_omit 
Maximum percentage of reads 
SSTables ignored by STCS may 
be responsible for. 
Default 0
min_compaction_threshold 
Compact buckets with at least 
this many SSTables. 
Default 4
max_compaction_threshold 
Compact no more than this 
many SSTables in a bucket. 
Default 32
Compaction? 
STCS 
LCS 
DTCS
LeveledCompactionStrategy 
Based on LevelDB from the 
Chromium team. 
http://leveldb.org/
LeveledCompactionStrategy 
Works well with overwrites 
and tombstones. 
Provides low read latency.
LeveledCompactionStrategy 
“Uses twice the disk IO”
DataStax Blogs 
“Leveled Compaction in Apache 
Cassandra” 
“When to Use Leveled 
Compaction”
LCS - “It’s going to be all levels Jerry” 
Level Number 
of Files 
0 Unlimited* 
1 100 
2 1000 
3 10000
LCS in nodetool cfstats 
Column Family: HappyPandaCF 
SSTable count: 21 
SSTables in each level: [1, 7, 13, 0, 0, 0, 0, 0, 0] 
Column Family: SadPandaCF 
SSTable count: 710 
SSTables in each level: [1, 10, 117/100, 582, 0, 0, 
0, 0, 0]
LCS - Starting out 
level 0
LCS - New File in Level 1 
level 0 level 1
LCS - Later, Compact L0 With Overlapping L1 
level 0 level 1
LCS - Another File in L1 
level 0 level 1
LCS - Level 1 Full, compact overlapping 
level 0 level 1
LCS - New Files in Level 2 
level 0 level 1 level 2
LCS - sstable_size_in_mb 
Maximum* size of each 
SSTable at all levels. 
Default 160
Compaction? 
STCS 
LCS 
DTCS
DateTieredCompactionStrategy 
CASSANDRA-6602 
In 2.0.11 and 2.1.1 
“Experimental”
DTCS - Compact Newest Time Bucket 
4 hours 20 hours 80 hours > 365 days
DTCS - New File in First Bucket 
4 hours 20 hours 80 hours > 365 days
DTCS - Promoted to Later Bucket 
4 hours 20 hours 80 hours > 365 days
DTCS - base_time_seconds 
Target size. 
Multiplied by 
min_sstable_size.
DTCS - timestamp_resolution 
What TimeUnit you are 
using for your WriteTime.
DTCS - max_sstable_age_days 
Do not compact SSTables 
where the youngest 
WRITETIME is older than this.
Thanks.
Aaron Morton 
@aaronmorton 
Co-Founder & Principal Consultant 
www.thelastpickle.com 
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

More Related Content

PPTX
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
PPTX
Cassandra compaction
PPTX
Apache Cassandra 2.0
PDF
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
PDF
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
PDF
Cassandra at Instagram (August 2013)
PPTX
Cassandra in Operation
PPTX
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
Cassandra compaction
Apache Cassandra 2.0
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra at Instagram (August 2013)
Cassandra in Operation
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...

What's hot (20)

PDF
DataStax: Extreme Cassandra Optimization: The Sequel
PDF
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
PDF
Cassandra and Solid State Drives
PDF
Scaling Cassandra for Big Data
PPTX
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PDF
Cassandra summit 2013 how not to use cassandra
PPTX
Cassandra Operations at Netflix
PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
PDF
C* Summit 2013: Cassandra at Instagram by Rick Branson
PDF
TechTalk v2.0 - Performance tuning Cassandra + AWS
PPTX
Understanding AntiEntropy in Cassandra
PDF
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
PDF
PagerDuty: One Year of Cassandra Failures
PDF
Performance Monitoring: Understanding Your Scylla Cluster
PDF
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
PDF
Managing Cassandra at Scale by Al Tobey
PPTX
Performance tuning - A key to successful cassandra migration
PPTX
How to size up an Apache Cassandra cluster (Training)
PPTX
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
PDF
Performance at Scale, Cassandra for FamilySearch FamilyTree
DataStax: Extreme Cassandra Optimization: The Sequel
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
Cassandra and Solid State Drives
Scaling Cassandra for Big Data
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
Cassandra summit 2013 how not to use cassandra
Cassandra Operations at Netflix
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
C* Summit 2013: Cassandra at Instagram by Rick Branson
TechTalk v2.0 - Performance tuning Cassandra + AWS
Understanding AntiEntropy in Cassandra
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
PagerDuty: One Year of Cassandra Failures
Performance Monitoring: Understanding Your Scylla Cluster
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
Managing Cassandra at Scale by Al Tobey
Performance tuning - A key to successful cassandra migration
How to size up an Apache Cassandra cluster (Training)
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Performance at Scale, Cassandra for FamilySearch FamilyTree
Ad

Viewers also liked (20)

PDF
Cassandra 1.1
PDF
Cassandra 2.1 boot camp, Compaction
PDF
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
PDF
Case Study: Troubleshooting Cassandra performance issues as a developer
PDF
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
PPT
Soil mechanics
PPT
Compaction sivakugan (Complete Soil Mech. Undestanding Pakage: ABHAY)
PPTX
12 factor app an introduction
PPTX
G1 collector and tuning and Cassandra
PPTX
PPT
Soil Compaction: A case study of Anpara Thermal Power Plant, Uttar Pradesh
PPT
Lecture soil compaction
PPTX
Basics of soil mechanics
PPTX
Compaction test of soil ASTM-D698
PPT
compaction equipment
PDF
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
PPT
Compaction and its effects on soil
PPTX
Geotechnical Engineering
PPTX
Compaction of soil
PDF
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
Cassandra 1.1
Cassandra 2.1 boot camp, Compaction
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Case Study: Troubleshooting Cassandra performance issues as a developer
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
Soil mechanics
Compaction sivakugan (Complete Soil Mech. Undestanding Pakage: ABHAY)
12 factor app an introduction
G1 collector and tuning and Cassandra
Soil Compaction: A case study of Anpara Thermal Power Plant, Uttar Pradesh
Lecture soil compaction
Basics of soil mechanics
Compaction test of soil ASTM-D698
compaction equipment
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
Compaction and its effects on soil
Geotechnical Engineering
Compaction of soil
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
Ad

Similar to Compaction, Compaction Everywhere (20)

PPTX
Balancing Compaction Principles and Practices
PPTX
Manage your compactions before they manage you!
PDF
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
PPTX
Using Time Window Compaction Strategy For Time Series Workloads
PPTX
Cassandra Summit 2015: Real World DTCS For Operators
PDF
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
PDF
CrowdStrike: Real World DTCS For Operators
PPTX
How Incremental Compaction Reduces Your Storage Footprint
PDF
Wikimedia Content API (Strangeloop)
PPTX
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
PDF
Webinar: Using Control Theory to Keep Compactions Under Control
PDF
Scylla Compaction Strategies
PDF
ScyllaDB’s Monstrous Engineering Advances by Avi Kivity
PDF
Wikimedia Content API: A Cassandra Use-case
PDF
Object Compaction in Cloud for High Yield
PDF
Instaclustr introduction to managing cassandra
PDF
State of Cassandra, 2011
PDF
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
PPTX
RocksDB detail
PDF
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Balancing Compaction Principles and Practices
Manage your compactions before they manage you!
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
Using Time Window Compaction Strategy For Time Series Workloads
Cassandra Summit 2015: Real World DTCS For Operators
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
CrowdStrike: Real World DTCS For Operators
How Incremental Compaction Reduces Your Storage Footprint
Wikimedia Content API (Strangeloop)
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Webinar: Using Control Theory to Keep Compactions Under Control
Scylla Compaction Strategies
ScyllaDB’s Monstrous Engineering Advances by Avi Kivity
Wikimedia Content API: A Cassandra Use-case
Object Compaction in Cloud for High Yield
Instaclustr introduction to managing cassandra
State of Cassandra, 2011
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
RocksDB detail
Instaclustr Apache Cassandra Best Practices & Toubleshooting

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Understanding_Digital_Forensics_Presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.

Compaction, Compaction Everywhere