SlideShare a Scribd company logo
© 2017 by Intellectual Reserve, Inc. All rights reserved. 1
Performance at Scale
Cassandra for FamilySearch Family Tree
John Sumsion
2
Cassandra Database
Metaphor:
• Set of Pipes
• Client => Disk
• Low Friction
• High Throughput
• Redundant
3
Cassandra Database
Presentation Scope:
• Read-Heavy Cluster
• Tuning Strategy
• Smooth Flow
• Peak Performance
4
Family Tree
• Free access at https://familysearch.org
• Supported by growing record collection
• World-wide user base
• Backed by Apache Cassandra (DSE)
5
Family Tree
• 63T database
• One RF=3 datacenter
• Another offsite RF=2 datacenter
• About 5700M DB reads / peak day
• About 70M DB writes / peak day
6
Family Tree
• Multiple views of person
• Full change history
• Flexible schema
• 4th major iteration over 10 years
• constant schema change
7
8
9
10
11
Cassandra Database
Metaphor:
• Set of Pipes
• Client => Disk (or RAM)
• Low Friction
• High Throughput
• Redundant
12
Cassandra Database
Topics:
• Read-Heavy Cluster
• Tuning Strategy
• Smooth Flow
• Peak Performance
13
Cassandra Database
Ideal Read Circumstances:
• Lots of RAM
• Lots of CPU capacity
• Lots of Disk bandwidth
• Lots of Network bandwidth
• Enough nodes to reduce impact of GC outliers
14
Cassandra Database
Ideal Read Circumstances:
• Small records
• All records are same size
• Balanced data distribution
• Balanced access patterns
• No hotspots
15
Cassandra Database
Ideal Read Circumstances:
1. No Friction
2. Full Throughput
vs
16
Cassandra Database
Realistic Read Circumstances:
• Ideal is RARELY going to happen!
• Cassandra can stay up under abuse
• But throughput suffers
17
Cassandra Database
Realistic Read Circumstances:
1. Minimize Friction
2. Maximize Throughput
vs
18
Friction in Cassandra
Sources of Friction:
• CPU spikes to 100%
• Disk Saturation spikes to 100%
• GC Pauses spike above 200ms
• Total GC Time goes over 1-2% over 5min
• Network Saturation spikes to 100% (rare)
19
Friction in Cassandra
Needed Visibility
• Gathered metrics (JMX, GC, dstat)
• Composed CPU/Disk/GC in a dashboard
• Example Dashboard
20
Friction in Cassandra
21
Friction in Cassandra
22
Friction in Cassandra
23
Friction in Cassandra
24
Turbulence in Cassandra
• Friction somewhere causes requests to queue
• Queued requests cause upstream delays
• Affected node tries to shed load to avoid dying
• Clients / Other nodes become affected
25
Turbulence in Cassandra
Situation: Compactions not throttled enough
Symptoms:
• Periods of heavy CPU utilization (plateau)
• Periods of full disk saturation (plateau)
• Periods of more-frequent GC
• Periods of higher request latency (p99+ plateau)
26
Turbulence in Cassandra
Situation: Compactions not throttled enough
Solutions:
• Throttle compaction dynamically using
nodetool setcompactionthroughput
• Keep compaction backlog under 10-30min
• Bake the setting into cassandra.yaml
27
Turbulence in Cassandra
Situation: Too frequent Memtable flushing
Symptoms:
• Very frequent compaction on tables with most writes
• "Forcing flush" in debug.log
• Constant compactions, Constant disk saturation
• Using Opscenter Repair Service
• High number of cells read per query
28
Turbulence in Cassandra
Situation: Too frequent Memtable flushing
Solutions:
• Turn off Opscenter Repair for short TTL tables
• Turn off Opscenter Repair for other tables that don't
need full consistency
• Google "dse excluding tables ignore_tables"
• Switch small tables from STCS to LCS
29
Turbulence in Cassandra
Situation: Not enough JVM Heap
Symptoms:
• Overly frequent GC, occasional OOM
• Lower query throughput
• No obvious bottleneck
• More CPU spent on GC than necessary
30
Turbulence in Cassandra
Situation: Not enough JVM Heap
Solutions:
• Increase JVM heap, but not more than 32G
• Ratchet up until occasional OOM stops
• Don't go too high, 32G max
• Stop if max GC pause increases
31
Turbulence in Cassandra
Situation: Too large JVM Heap
Symptoms:
• Much longer GC cycles once in a while
• Old Gen able to build up too much cruft
• Large variation in response time (p99+ spikes)
• Other nodes experience request queueing
32
Turbulence in Cassandra
Situation: Too large JVM Heap
Solutions:
• Reduce heap size
• But don't cause OOM
• Ratchet down while max GC pause times drop
• Remember extra RAM means extra buffer cache
33
Turbulence in Cassandra
Situation: GC not tuned for low-latency
Symptoms:
• Longer pause times
• Large variation in response time (p99+ spikes)
• GC gets behind and has to do long Full GCs
• Other nodes experience request queueing
34
Turbulence in Cassandra
Situation: GC not tuned for low-latency
Solutions:
• CMS wizard? Do that
• Easier? Use G1 with 40-50% new space
• Turn on GC logging, Plot GC over time
35
Turbulence in Cassandra
Situation: GC not tuned for low-latency
Solutions:
• -XX:G1RSetUpdatingPauseTimePercent=5
• -XX:InitiatingHeapOccupancyPercent=60
• -XX:+ParallelRefProcEnabled
• -XX:G1ReservePercent=20
• -XX:ParallelGCThreads=13 (on r4.4xlarge 16 CPU box)
• -XX:ConcGCThreads=13
36
Turbulence in Cassandra
Situation: Disk spikes even when compaction throttled
Symptoms:
• No CPU spikes or plateaus
• No Disk activity during compaction
• But short periods of 100% disk saturation right after
• Also GC spike right after compaction complete
• Large response time variation around compaction
37
Turbulence in Cassandra
Situation: Disk spikes even when compaction throttled
Solutions:
• Use sysctl.conf to spread out writes during
compaction
• See Amy's tuning guide
• https://tobert.github.io/pages/als-cassandra-21-
tuning-guide.html
38
Turbulence in Cassandra
Situation: Disk readahead too large
Symptoms:
• Lower throughput than you expect
• No obvious bottleneck
• More bytes read from disk than network send
39
Turbulence in Cassandra
Situation: Disk readahead too large
Solutions:
• blkdev --setra 128 (for 64k chunks)
• See Amy's tuning guide
• https://tobert.github.io/pages/als-
cassandra-21-tuning-guide.html
40
Turbulence in Cassandra
Situation: Timeouts set too long
Symptoms:
• Much longer GC cycles once in a while
• Large variation in response time (p99+ spikes)
• Large GC delays on good nodes when one goes bad
• One bad node cascades to more
41
Turbulence in Cassandra
Situation: Timeouts set too long
Solutions:
• Reduce read timeout until it hurts (nodetool)
• Reduce write timeout until it hurts (nodetool)
• Leave general request timeout higher to avoid cqlsh
timeout
• Bake timeouts into cassandra.yaml
42
Turbulence in Cassandra
Situation: Not enough free memory
Symptoms:
• More disk activity than working set size
• High query latency even for hot records
• More bytes read from disk than active set size
43
Turbulence in Cassandra
Situation: Not enough free memory
Solutions:
• Shrink heap if possible
• Maybe shrink row/key/chunk caches
• This makes more room for OS buffer cache
• Stop unnecessary processes
44
Turbulence in Cassandra
Situation: Disproportionately large records
Symptoms:
• Queries for certain keys always take longer
• Three nodes spike IO/CPU at the same time
• Slow query logging in C* 3.x
45
Turbulence in Cassandra
Situation: Disproportionately large records
Solutions:
• Alert or monitor slow query logs to find
problems (C* 3.x)
• Median:p99 of 1:100 is ok, but 1:10000 is bad
• Optimize size of the largest records
46
Turbulence in Cassandra
Situation: Disproportionate edit rate
Symptoms:
• Queries for certain keys take longer
• Large number of cells per read in tablestats
• Three nodes spike IO/CPU at the same time
• Slow query logging in C* 3.x
47
Turbulence in Cassandra
Situation: Disproportionate edit rate
Solutions:
• Postpone large edits until user is done
• Minimize number of redundant bytes rewritten
48
Turbulence in Cassandra
Situation: Too few nodes
Symptoms:
• Majority of nodes hitting same bottleneck
• Hour-long periods of poor p99+ response time
• One bad node cascades to more
49
Turbulence in Cassandra
Situation: Too few nodes
Solutions:
• Try all of the above first (the easy ones)
• Add nodes at a practical point
• Tweak & tune more, maybe you can shrink
50
Turbulence in Cassandra
Situation: ALL OF THE ABOVE
Symptoms:
• Cluster IO overwhelmed
• Bad p99+ response times
• Multiple sick nodes after large user edits
51
Turbulence in Cassandra
Before
52
Turbulence in Cassandra
After
53
Cassandra Database
Metaphor:
• Set of Pipes
• Client => Disk (or RAM)
• Low Friction
• High Throughput
• Redundant
54
Cassandra Database
Presentation Scope:
• Read-Heavy Cluster
• Tuning Strategy
• Smooth Flow
• Peak Performance
55
Cassandra Database
Ideal Read Circumstances:
1. Minimal Friction
2. Maximal Throughput
vs
56
Cassandra Database
Reprise:
• Read-Heavy Cluster
• Clean the Pipes
• Smooth the Flow
• Peak Performance
57
Wrap-up
• Q&A
• Thanks for great conference!

More Related Content

PPTX
Cassandra in Operation
PDF
DataStax: Extreme Cassandra Optimization: The Sequel
PDF
Compaction, Compaction Everywhere
PPTX
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
PPTX
Exactly once with spark streaming
PPTX
Jvm & Garbage collection tuning for low latencies application
PDF
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
PDF
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
Cassandra in Operation
DataStax: Extreme Cassandra Optimization: The Sequel
Compaction, Compaction Everywhere
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
Exactly once with spark streaming
Jvm & Garbage collection tuning for low latencies application
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...

What's hot (20)

PDF
Cassandra at Instagram (August 2013)
PPTX
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
PDF
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
PDF
Cassandra勉強会
PPTX
Aerospike & GCE (LSPE Talk)
PDF
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
PPTX
Dnsdist
PDF
Training Slides: Intermediate 205: Configuring Tungsten Replicator to Extract...
PPTX
Jvm tuning for low latency application & Cassandra
PPTX
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
PDF
Tombstones and Compaction
PDF
Performance Monitoring: Understanding Your Scylla Cluster
PDF
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
PPTX
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
PPTX
Update on OpenTSDB and AsyncHBase
PDF
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
PPTX
Cassandra compaction
PDF
C* Summit 2013: Cassandra at Instagram by Rick Branson
PDF
Building Apache Cassandra clusters for massive scale
PDF
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Cassandra at Instagram (August 2013)
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
Cassandra勉強会
Aerospike & GCE (LSPE Talk)
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Dnsdist
Training Slides: Intermediate 205: Configuring Tungsten Replicator to Extract...
Jvm tuning for low latency application & Cassandra
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Tombstones and Compaction
Performance Monitoring: Understanding Your Scylla Cluster
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
Update on OpenTSDB and AsyncHBase
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Cassandra compaction
C* Summit 2013: Cassandra at Instagram by Rick Branson
Building Apache Cassandra clusters for massive scale
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Ad

Similar to Performance at Scale, Cassandra for FamilySearch FamilyTree (20)

PDF
Cassandra CLuster Management by Japan Cassandra Community
PDF
Scaling Cassandra for Big Data
PDF
Instaclustr introduction to managing cassandra
PDF
Moving from a Relational Database to Cassandra: Why, Where, When, and How
PDF
Using cassandra as a distributed logging to store pb data
PPTX
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
PPTX
Migrating from a Relational Database to Cassandra: Why, Where, When and How
PPTX
Always On: Building Highly Available Applications on Cassandra
PDF
An Introduction to Apache Cassandra
PPTX
Devops kc
PPTX
Big Data and its emergence
PDF
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
PPTX
Learn Cassandra at edureka!
PDF
Developing with Cassandra
PDF
Understanding Cassandra internals to solve real-world problems
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Netflix at-disney-09-26-2014
PPTX
BigData Developers MeetUp
PDF
The Apache Cassandra ecosystem
Cassandra CLuster Management by Japan Cassandra Community
Scaling Cassandra for Big Data
Instaclustr introduction to managing cassandra
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Using cassandra as a distributed logging to store pb data
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Always On: Building Highly Available Applications on Cassandra
An Introduction to Apache Cassandra
Devops kc
Big Data and its emergence
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
Learn Cassandra at edureka!
Developing with Cassandra
Understanding Cassandra internals to solve real-world problems
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Netflix at-disney-09-26-2014
BigData Developers MeetUp
The Apache Cassandra ecosystem
Ad

Recently uploaded (20)

PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
ai tools demonstartion for schools and inter college
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Transform Your Business with a Software ERP System
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
System and Network Administration Chapter 2
PPTX
history of c programming in notes for students .pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
L1 - Introduction to python Backend.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Wondershare Filmora 15 Crack With Activation Key [2025
Internet Downloader Manager (IDM) Crack 6.42 Build 41
ai tools demonstartion for schools and inter college
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Transform Your Business with a Software ERP System
Upgrade and Innovation Strategies for SAP ERP Customers
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
How Creative Agencies Leverage Project Management Software.pdf
Operating system designcfffgfgggggggvggggggggg
Navsoft: AI-Powered Business Solutions & Custom Software Development
System and Network Administration Chapter 2
history of c programming in notes for students .pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Design an Analysis of Algorithms II-SECS-1021-03
How to Choose the Right IT Partner for Your Business in Malaysia
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Understanding Forklifts - TECH EHS Solution
L1 - Introduction to python Backend.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

Performance at Scale, Cassandra for FamilySearch FamilyTree