SlideShare a Scribd company logo
Cassandra: From tarball to production
Why talk about this?
You are about to deploy Cassandra
You are looking for “best practices”
You don’t want:
... to scour through the documentation
... to do something known not to work well
... to forget to cover some important step
What we won’t cover
● Cassandra: how does
it work?
● How do I design my
schema?
● What’s new in
Cassandra X.Y?
So many things to do
Monitoring Snitch DC/Rack Settings Time Sync
Seeds/Autoscaling Full/Incremental
Backups
AWS Instance
Selection
Disk - SSD?
Disk Space - 2x? AWS AMI (Image)
Selection
Periodic Repairs Replication Strategy
Compaction
Strategy
SSL/VPC/VPN Authorization +
Authentication
OS Conf - Users
OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs
C* Start/Stop OS Conf - Path Use case evaluation
Chef to the rescue?
Chef community cookbook available
https://github.com/michaelklishin/cassandra-chef-cookbook
Installs java Creates a “cassandra” user/group
Download/extract the tarball Fixes up ownership
Builds the C* configuration files
Sets the ulimits for filehandles, processes,
memory locking
Sets up an init script Sets up data directories
Chef Cookbook Coverage
Monitoring Snitch DC/Rack Settings Time Sync
Seeds/Autoscaling Full/Incremental
Backups
Disk - SSD? Disk - How much?
AWS Instance Type AWS AMI (Image)
Selection
Periodic Repairs Replication Strategy
Compaction
Strategy
SSL/VPC/VPN Authorization +
Authentication
OS Conf - Users
OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs
C* Start/Stop OS Conf - Path Use case evaluation
Monitoring
Is every node answering queries?
Are nodes talking to each other?
Are any nodes running slowly?
Push UDP! (statsd)
http://hackers.lookout.com/2015/01/cassandra-monitoring/
https://github.com/lookout/cassandra-statsd-agent
Monitoring - Synthetic
Health checks, bad and good
● ‘nodetool status’ exit code
○ Might return 0 if the node is not accepting requests
○ Slow, cross node reads
● cqlsh -u sysmon -p password < /dev/null
● Verifies this node can read auth table
● https://github.com/lookout/cassandra-health-check
What about OpsCenter?
We chose not to use it
Want consistent interface for all monitoring
GUI vs Command Line argument
Didn’t see good auditing capabilities
Didn’t interface well with our chef solution
Snitch
Use the right snitch!
● AWS EC2MultiRegionSnitch
● Google? GoogleCloudSnitch
● GossipingPropertyFileSnitch
NOT
● SimpleSnitch (default)
Community cookbook: set it!
What is RF?
Replication Factor is how many copies of data
Value is hashed to determine primary host
Additional copies always next node
Hash here
What is CL?
Consistency Level -- It’s not RF!
Describes how many nodes must respond
before operation is considered COMPLETE
CL_ONE - only one node responds
CL_QUORUM - (RF/2)+1 nodes (round down)
CL_ALL - RF nodes respond
DC/Rack Settings
You might need to set these
Maybe you’re not in Amazon
Rack == Availability Zone?
Hard: Renaming DC or adding racks
Renaming DCs
Clients “remember” which DC they talk to
Renaming single DC causes all clients to fail
Better to spin up a new one than rename old
Adding a rack
Start with 6 node cluster, rack R1
Replication factor 3
Add 1 node in R2, and rebalance
ALL data in R2 node?
Good idea to keep racks balanced
I don’t have time for this
Clusters must have synchronized time
You will get lots of drift with: [0-3].amazon.pool.
ntp.org
Community cookbook doesn’t cover anything
here
Better make time for this
C* serializes write operations by time stamps
Clocks on virtual machines drift!
It’s the relative difference among clocks that matters
C* nodes should synchronize with each other
Solution: use a pair of peered NTP servers (level 2 or 3)
and a small set of known upstream providers
From a small seed…
Seeds are used for new nodes to find cluster
Every new node should use the same seeds
Seed nodes get topology changes faster
Each seed node must be in the config file
Multiple seeds per datacenter recommended
Tricky to configure on AWS
Backups - Full+Incremental
Nothing in the cookbooks for this
C* makes it “easy”: snapshot, then copy
Snapshots might require a lot more space
Remove the snapshot after copying it
Disk selection
SSD Rotational
Ephemeral
EBS
Low latency Any size instance Any size instance
Recommended Not cheap Less expensive
Great random r/w perf Good write performance No node rebuilds
No network use for disk No network use for disk
AWS Instance Selection
We moved to EC2
c3.2xlarge (15GiB mem, Disk 160GB)?
i2.xlarge (30GiB mem, 800GB disk)
Max recommended storage per node is 1TB
Use instance types that support HVM
Some previous generation instance types, such as T1, C1, M1, and M2 do not support Linux HVM AMIs. Some current generation instance
types, such as T2, I2, R3, G2, and C4 do not support PV AMIs.
How much can I use??
Snapshots take space (kind of)
Best practice: keep disks half full!
800GB disk becomes 400GB
Snapshots during repairs?
Lots of uses for snapshots!
Periodic Repairs
Buried in the docs:
“As a best practice, you should
schedule repairs weekly”
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
● “-pr” (yes)
● “-par” (maybe)
● “--in-local-dc” (no)
Repair Tips
Raise gc_grace_seconds (tombstones?)
Run on one node at a time
Schedule for low usage hours
Use “par” if you have dead time (faster)
Tune with: nodetool setcompactionthroughput
I thought I deleted that
Compaction removes “old” tombstones
10 day default grace period (gc_grace_period)
After that, deletes will not be propagated!
Run ‘nodetool repair’ at least every 10 days
Once a week is perfect (3 day grace)
Node down >7 days? ‘nodetool remove’ it!
Changing RF within DC?
Easy to decrease RF
Impossible to increase RF without (usually)
Reads with CL_ONE might fail!
Hash here
Replication Strategy
How many replicas should we have?
What happens if some data is lost?
Are you write-heavy or read-heavy?
Quorum considerations: odd is better!
RF=1? RF=3? RF=5?
Magic JMX setting: reduce traffic to a node
Great when node is “behind” the 4 hour window
Used by gossiper to divert traffic during repairs
Writes: ok, read repair: ok, nodetool repair: ok
$ java -jar jmxterm.jar -l localhost:7199
$> set -b org.apache.cassandra.db:type=DynamicEndpointSnitch Severity
10000
Don’t be too severe!
Compaction Strategy
Solved by using a good C* design
SizeTiered or Leveled?
Leveled has better guarantees for read times
SizeTiered may require 10 (or more) reads!
Leveled uses less disk space
Leveled tombstone collection is slower
Auth*
Cookbooks default to OFF
Turn authenticator and authorizer on
‘cassandra’ user is super special
Requires QUORUM (cross-DC) for signon
LOCAL_ONE for all other users!
Users
OS users vs Cassandra users: 1 to 1?
Shared credentials for apps?
Nothing logs the user taking the action!
‘cassandra’ user is created by cookbook
All processes run as ‘cassandra’
Limits
Chef helps here! Startup:
ulimit -l unlimited # mem lock
ulimit -n 48000 # fds
/etc/security/limits.d
cassandra - nofile 48000
cassandra - nproc unlimited
cassandra - memlock unlimited
Filesystem Type
Officially supported: ext4 or XFS
XFS is slightly faster
Interesting options:
● ext4 without journal
● ext2
● zfs
Logs
To consolidate or not to consolidate?
Push or pull? Usually push!
FOSS: syslogd, syslog-ng, logstash/kibana,
heka, banana
Others: Splunk, SumoLogic, Loggly, Stackify
Shutdown
Nice init script with cookbook, steps are:
● nodetool disablethrift (no more clients)
● nodetool disablegossip (stop talking to
cluster)
● nodetool drain (flush all memtables)
● kill the jvm
Quick performance wins
● Disable assertions - cookbook property
● No swap space (or vm.swappiness=1)
● max_concurrent_reads
● max_concurrent_writes
Thank
You!@rkuris
ron.kuris@gmail.com

More Related Content

PDF
Quarkus - a shrink ray to your Java Application
PPT
VMWare Performance Tuning by Virtera (Jan 2009)
PDF
TechTalk v2.0 - Performance tuning Cassandra + AWS
PDF
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
PDF
Scaling Apache Pulsar to 10 Petabytes/Day
PDF
Glauber Costa on OSv as NoSQL platform
PPTX
Speeding up R with Parallel Programming in the Cloud
PPTX
MySQL Head-to-Head
Quarkus - a shrink ray to your Java Application
VMWare Performance Tuning by Virtera (Jan 2009)
TechTalk v2.0 - Performance tuning Cassandra + AWS
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Scaling Apache Pulsar to 10 Petabytes/Day
Glauber Costa on OSv as NoSQL platform
Speeding up R with Parallel Programming in the Cloud
MySQL Head-to-Head

What's hot (13)

PDF
OOPs, OOMs, oh my! Containerizing JVM apps
PPTX
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
PPT
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
PPTX
Ceph Day KL - Ceph Tiering with High Performance Archiecture
PDF
CASSANDRA MEETUP - Choosing the right cloud instances for success
PDF
Troubleshooting redis
PDF
OSv at Cassandra Summit
PPTX
Data Scotland 2019: You can run SQL Server on AWS
PDF
92 grand prix_2013
PDF
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
PPTX
AWS - an introduction to bursting (GP2 - T2)
PDF
Solr on Docker - the Good, the Bad and the Ugly
PDF
Global deduplication for Ceph - Myoungwon Oh
OOPs, OOMs, oh my! Containerizing JVM apps
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Ceph Day KL - Ceph Tiering with High Performance Archiecture
CASSANDRA MEETUP - Choosing the right cloud instances for success
Troubleshooting redis
OSv at Cassandra Summit
Data Scotland 2019: You can run SQL Server on AWS
92 grand prix_2013
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
AWS - an introduction to bursting (GP2 - T2)
Solr on Docker - the Good, the Bad and the Ugly
Global deduplication for Ceph - Myoungwon Oh
Ad

Viewers also liked (12)

PDF
LJC: Fault tolerance with Apache Cassandra
PDF
Counters At Scale - A Cautionary Tale
PPTX
Manage your compactions before they manage you!
PDF
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
PPTX
A deep look at the cql where clause
PDF
Hardening cassandra for compliance or paranoia
PDF
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
PDF
Case Study: Troubleshooting Cassandra performance issues as a developer
PDF
Tombstones and Compaction
PPTX
Cassandra Summit 2015: Real World DTCS For Operators
PDF
Indexing in Cassandra
PDF
Understanding Data Partitioning and Replication in Apache Cassandra
LJC: Fault tolerance with Apache Cassandra
Counters At Scale - A Cautionary Tale
Manage your compactions before they manage you!
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
A deep look at the cql where clause
Hardening cassandra for compliance or paranoia
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
Case Study: Troubleshooting Cassandra performance issues as a developer
Tombstones and Compaction
Cassandra Summit 2015: Real World DTCS For Operators
Indexing in Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
Ad

Similar to Cassandra from tarball to production (20)

PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
PDF
Apache Cassandra multi-datacenter essentials
PDF
Cassandra multi-datacenter operations essentials
PDF
Cassandra at teads
PDF
Cassandra CLuster Management by Japan Cassandra Community
PPTX
Devops kc
PPTX
Cassandra in Operation
PDF
Building Apache Cassandra clusters for massive scale
PPTX
Cassandra on Ubuntu AUTOMATIC Install
PDF
The Automation Factory
PDF
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
PDF
DataStax: 7 Deadly Sins for Cassandra Ops
PDF
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
PDF
Cassandra for Sysadmins
PDF
1 Million Writes per second on 60 nodes with Cassandra and EBS
PPTX
Cassandra via-docker
PDF
Austin Web Architecture
PDF
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in Production
PDF
A Quick Look At Cassandra
PPTX
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra multi-datacenter essentials
Cassandra multi-datacenter operations essentials
Cassandra at teads
Cassandra CLuster Management by Japan Cassandra Community
Devops kc
Cassandra in Operation
Building Apache Cassandra clusters for massive scale
Cassandra on Ubuntu AUTOMATIC Install
The Automation Factory
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
DataStax: 7 Deadly Sins for Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Cassandra for Sysadmins
1 Million Writes per second on 60 nodes with Cassandra and EBS
Cassandra via-docker
Austin Web Architecture
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in Production
A Quick Look At Cassandra
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Cloud computing and distributed systems.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Monthly Chronicles - July 2025
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25 Week I
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Cassandra from tarball to production

  • 1. Cassandra: From tarball to production
  • 2. Why talk about this? You are about to deploy Cassandra You are looking for “best practices” You don’t want: ... to scour through the documentation ... to do something known not to work well ... to forget to cover some important step
  • 3. What we won’t cover ● Cassandra: how does it work? ● How do I design my schema? ● What’s new in Cassandra X.Y?
  • 4. So many things to do Monitoring Snitch DC/Rack Settings Time Sync Seeds/Autoscaling Full/Incremental Backups AWS Instance Selection Disk - SSD? Disk Space - 2x? AWS AMI (Image) Selection Periodic Repairs Replication Strategy Compaction Strategy SSL/VPC/VPN Authorization + Authentication OS Conf - Users OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs C* Start/Stop OS Conf - Path Use case evaluation
  • 5. Chef to the rescue? Chef community cookbook available https://github.com/michaelklishin/cassandra-chef-cookbook Installs java Creates a “cassandra” user/group Download/extract the tarball Fixes up ownership Builds the C* configuration files Sets the ulimits for filehandles, processes, memory locking Sets up an init script Sets up data directories
  • 6. Chef Cookbook Coverage Monitoring Snitch DC/Rack Settings Time Sync Seeds/Autoscaling Full/Incremental Backups Disk - SSD? Disk - How much? AWS Instance Type AWS AMI (Image) Selection Periodic Repairs Replication Strategy Compaction Strategy SSL/VPC/VPN Authorization + Authentication OS Conf - Users OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs C* Start/Stop OS Conf - Path Use case evaluation
  • 7. Monitoring Is every node answering queries? Are nodes talking to each other? Are any nodes running slowly? Push UDP! (statsd) http://hackers.lookout.com/2015/01/cassandra-monitoring/ https://github.com/lookout/cassandra-statsd-agent
  • 8. Monitoring - Synthetic Health checks, bad and good ● ‘nodetool status’ exit code ○ Might return 0 if the node is not accepting requests ○ Slow, cross node reads ● cqlsh -u sysmon -p password < /dev/null ● Verifies this node can read auth table ● https://github.com/lookout/cassandra-health-check
  • 9. What about OpsCenter? We chose not to use it Want consistent interface for all monitoring GUI vs Command Line argument Didn’t see good auditing capabilities Didn’t interface well with our chef solution
  • 10. Snitch Use the right snitch! ● AWS EC2MultiRegionSnitch ● Google? GoogleCloudSnitch ● GossipingPropertyFileSnitch NOT ● SimpleSnitch (default) Community cookbook: set it!
  • 11. What is RF? Replication Factor is how many copies of data Value is hashed to determine primary host Additional copies always next node Hash here
  • 12. What is CL? Consistency Level -- It’s not RF! Describes how many nodes must respond before operation is considered COMPLETE CL_ONE - only one node responds CL_QUORUM - (RF/2)+1 nodes (round down) CL_ALL - RF nodes respond
  • 13. DC/Rack Settings You might need to set these Maybe you’re not in Amazon Rack == Availability Zone? Hard: Renaming DC or adding racks
  • 14. Renaming DCs Clients “remember” which DC they talk to Renaming single DC causes all clients to fail Better to spin up a new one than rename old
  • 15. Adding a rack Start with 6 node cluster, rack R1 Replication factor 3 Add 1 node in R2, and rebalance ALL data in R2 node? Good idea to keep racks balanced
  • 16. I don’t have time for this Clusters must have synchronized time You will get lots of drift with: [0-3].amazon.pool. ntp.org Community cookbook doesn’t cover anything here
  • 17. Better make time for this C* serializes write operations by time stamps Clocks on virtual machines drift! It’s the relative difference among clocks that matters C* nodes should synchronize with each other Solution: use a pair of peered NTP servers (level 2 or 3) and a small set of known upstream providers
  • 18. From a small seed… Seeds are used for new nodes to find cluster Every new node should use the same seeds Seed nodes get topology changes faster Each seed node must be in the config file Multiple seeds per datacenter recommended Tricky to configure on AWS
  • 19. Backups - Full+Incremental Nothing in the cookbooks for this C* makes it “easy”: snapshot, then copy Snapshots might require a lot more space Remove the snapshot after copying it
  • 20. Disk selection SSD Rotational Ephemeral EBS Low latency Any size instance Any size instance Recommended Not cheap Less expensive Great random r/w perf Good write performance No node rebuilds No network use for disk No network use for disk
  • 21. AWS Instance Selection We moved to EC2 c3.2xlarge (15GiB mem, Disk 160GB)? i2.xlarge (30GiB mem, 800GB disk) Max recommended storage per node is 1TB Use instance types that support HVM Some previous generation instance types, such as T1, C1, M1, and M2 do not support Linux HVM AMIs. Some current generation instance types, such as T2, I2, R3, G2, and C4 do not support PV AMIs.
  • 22. How much can I use?? Snapshots take space (kind of) Best practice: keep disks half full! 800GB disk becomes 400GB Snapshots during repairs? Lots of uses for snapshots!
  • 23. Periodic Repairs Buried in the docs: “As a best practice, you should schedule repairs weekly” http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html ● “-pr” (yes) ● “-par” (maybe) ● “--in-local-dc” (no)
  • 24. Repair Tips Raise gc_grace_seconds (tombstones?) Run on one node at a time Schedule for low usage hours Use “par” if you have dead time (faster) Tune with: nodetool setcompactionthroughput
  • 25. I thought I deleted that Compaction removes “old” tombstones 10 day default grace period (gc_grace_period) After that, deletes will not be propagated! Run ‘nodetool repair’ at least every 10 days Once a week is perfect (3 day grace) Node down >7 days? ‘nodetool remove’ it!
  • 26. Changing RF within DC? Easy to decrease RF Impossible to increase RF without (usually) Reads with CL_ONE might fail! Hash here
  • 27. Replication Strategy How many replicas should we have? What happens if some data is lost? Are you write-heavy or read-heavy? Quorum considerations: odd is better! RF=1? RF=3? RF=5?
  • 28. Magic JMX setting: reduce traffic to a node Great when node is “behind” the 4 hour window Used by gossiper to divert traffic during repairs Writes: ok, read repair: ok, nodetool repair: ok $ java -jar jmxterm.jar -l localhost:7199 $> set -b org.apache.cassandra.db:type=DynamicEndpointSnitch Severity 10000 Don’t be too severe!
  • 29. Compaction Strategy Solved by using a good C* design SizeTiered or Leveled? Leveled has better guarantees for read times SizeTiered may require 10 (or more) reads! Leveled uses less disk space Leveled tombstone collection is slower
  • 30. Auth* Cookbooks default to OFF Turn authenticator and authorizer on ‘cassandra’ user is super special Requires QUORUM (cross-DC) for signon LOCAL_ONE for all other users!
  • 31. Users OS users vs Cassandra users: 1 to 1? Shared credentials for apps? Nothing logs the user taking the action! ‘cassandra’ user is created by cookbook All processes run as ‘cassandra’
  • 32. Limits Chef helps here! Startup: ulimit -l unlimited # mem lock ulimit -n 48000 # fds /etc/security/limits.d cassandra - nofile 48000 cassandra - nproc unlimited cassandra - memlock unlimited
  • 33. Filesystem Type Officially supported: ext4 or XFS XFS is slightly faster Interesting options: ● ext4 without journal ● ext2 ● zfs
  • 34. Logs To consolidate or not to consolidate? Push or pull? Usually push! FOSS: syslogd, syslog-ng, logstash/kibana, heka, banana Others: Splunk, SumoLogic, Loggly, Stackify
  • 35. Shutdown Nice init script with cookbook, steps are: ● nodetool disablethrift (no more clients) ● nodetool disablegossip (stop talking to cluster) ● nodetool drain (flush all memtables) ● kill the jvm
  • 36. Quick performance wins ● Disable assertions - cookbook property ● No swap space (or vm.swappiness=1) ● max_concurrent_reads ● max_concurrent_writes