SlideShare a Scribd company logo
Real world capacity planning: Cassandra on blades and big iron   July 2011
About me Hadoop System Admin @ media6degrees Watch cassandra servers as well Write code (hadoop filecrusher) Hive Committer  Variable substitution, UDFs like atan, rough draft of c* handler  Epic Cassandra Contributor (not!) CLI should allow users to chose consistency level NodeCmd should be able to view Compaction Statistics Self proclaimed president of Cassandra fan club Cassandra NYC User Group High Performance Cassandra Cookbook
Media6 Degrees  Social Targeting in online advertising Real Time Bidding -  A dynamic auction process where each impression is bid for in (near) real time Cassandra @ work storing: Visit Data Ad History Id Mapping Multiple Data Centers (home brew replication) Back end tools hadoop  (Data mining, bulk loads) Front end tomcat, mysql + cassandra (lookup data)
What is this talk about?  Real World Capacity Planning Been running c* in production > 1 year Started with a hand full of nodes also running tomcat and Replication Factor 2! Grew data from 0-10 TB data Grew from 0-751,398,530 reads / day All types of fun along the way
Using puppet, chef... from day 1 “ I am going to chose Cassandra 0.6.0-beta-1 over 0.5.x so I am future proof” -- Famous quote by me  Cassandra is active new versions are coming Rolling restarts between minors But much better to get all to same rev quickly New nodes are coming do not let them: start with the wrong settings fail because you forgot open file limits, etc
Calculating Data size on disk SSTable format currently not compressed Repairs, joins, and moves need “wiggle room” Smaller keys and column names save space Enough free space to compact your largest column family Snapshots keep SSTables around after compaction Most *Nix files systems need free space to avoid performance loss to fragmentation!
Speed of disk The faster the better! But faster + bigger gets expensive and challenging RAID0 Faster for streaming  not necessarily seeking Fragile, larger the stripe, higher chance of failure RAID5 Not as fast but survives disk failure Battery backed cache helps but is $$$ The dedicated commit log decision
Disk formatting ext4 everywhere Deletes are much better then ext3 Noticeable performance as disks get full A full async mode for risk takers Obligatory noatime fstab setting using multiple file systems can result in multiple caches (check slabtop) Mention XFS
Memory Garbage collection is on a separate thread(s) Each request creates temporary objects Cassandra's fast writes go to Memtables You will never guess what they use :) Bloom filter data is in memory Key cache and Row cache For low latency RAM must be some % of data RAM not used by process is OS cache
CPU Workload could be more disk then CPU bound  High load needs a CPU to clean up java garbage Other then serving requests, compaction uses resources
Different workloads Structured log format of C* has deep implications Is data written once or does it change over time? How high is data churn? How random is the read/write pattern? What is the write/read percentage? What are your latency requirements?
Large Disk / Big Iron key points RAID0 mean time to failure with bigger stripes Java can not address large heaps well Compactions/Joins/repairs take a long time Lowers agility when joining a node could take hours Maintaining high RAM to Data percentage costly IE 2 machines with 32GB vs 1 machine with 64GB Capacity heavily diminished with loss of one node
Blade server key points Management software gives cloud computing vibe  Cassandra internode traffic on blade back plane Usually support 1-2 on board disk SCSI/SSD Usually support RAM configurations up to 128G Single and duel socket CPU No exotic RAID options
Schema lessons You only need one column family. not always true Infrequently read data in the same CF as frequently data compete for “cache” Separating allows employing multiple cache options Rows that are written or updated get fragmented
Capacity Planning rule #1 Know your hard drive limits
Capacity Planning rule #2 Writes are fast, until c* flushes and  compacts so much, that they are not
Capacity Planning rule #3 Row cache is fools gold Faster then a read from disk cache Memory use (row key + columns and values) Causes memory pressure (data in and out of mem) Fails with large rows  Cold on startup
Capacity Planning rule #4 Do not upgrade tomorrow what  you can upgrade today Joining nodes is intensive on the cluster Do not wait till c* disks are 99% utilized  Do not get 100% benefit of new nodes until neighbors are cleaned  Doubling nodes results in less move steps Adding RAM is fast and takes heat of hard disk
Capacity Planning rule #5 Know your traffic patterns better then yourself
The use case: Dr. Real Time and Mr. Batch
Dr. Real Time Real time bidding needs low latency Peak traffic during the day Need to keep a high cache hit rate Avoid compact, repair, cleanup, joins
Dr. Real Time's Lab Experiments with Xmx vs VFS caching Experiments with cache sizing  Studying graphs as new releases and features are added  Monitoring dropped messages, garbage collection Dr. Real Time enjoys lots of memory for GB of data on disk Enjoys reading (data), writing as well Nice sized memtables help to not pollute vfs cache
Mr. Batch Night falls and users sleep Batch/Back loading data (bulk inserts) Finding and removing old data  (range scanning) Maintenance work (nodetool)
Mr. Batch rampaging  through the data Bulk loading Write at quorum, c* work harder on front end Turning off compaction For short burst fine, but we are pushing for hours Forget to turn it back on SSTable count gets bad fast Range scanning to locate and remove old data Scheduling repairs and compaction Mr. Batch enjoys tearing through data  Writes, tombstones, range scanning, repairs Enjoys fast disks for compacting
Questions ???

More Related Content

ODP
MySQL HA
PDF
OSDC 2013 | Neues in DRBD9 by Philipp Reisner
ODP
Barcamp MySQL
PDF
Life as a GlusterFS Consultant with Ivan Rossi
PPTX
Devopsconf 2015 sebamontini
PDF
Quantcast File System (QFS) - Alternative to HDFS
PPTX
Hadoop, Map Reduce and Apache Pig tutorial
PDF
Ceph at salesforce ceph day external presentation
MySQL HA
OSDC 2013 | Neues in DRBD9 by Philipp Reisner
Barcamp MySQL
Life as a GlusterFS Consultant with Ivan Rossi
Devopsconf 2015 sebamontini
Quantcast File System (QFS) - Alternative to HDFS
Hadoop, Map Reduce and Apache Pig tutorial
Ceph at salesforce ceph day external presentation

What's hot (19)

PPTX
Hardware Provisioning for MongoDB
PDF
Hdfs internals
PDF
Edge performance with in memory nosql
PPTX
Cassandra in Operation
PDF
Hive at booking
PPTX
MongoDB Deployment Checklist
PPTX
Redis database
PPTX
Top 10 database optimization tips
PPTX
RedisConf17- Redis as a Primary Data Store
ODP
Ceph Day Santa Clara: Ceph Performance & Benchmarking
PPTX
Speed up R with parallel programming in the Cloud
PPTX
Lessons Learned Migrating 2+ Billion Documents at Craigslist
PDF
Ceph Days 2014 Paul Evans Slide Deck
PPTX
DHT2 - O Brother, Where Art Thou with Shyam Ranganathan
PPTX
Hardware Provisioning
PPTX
4 use cases for C* to Scylla
PPTX
Redis Modules - Redis India Tour - 2017
PPTX
Capacity Planning For Your Growing MongoDB Cluster
PDF
HDFS Deep Dive
Hardware Provisioning for MongoDB
Hdfs internals
Edge performance with in memory nosql
Cassandra in Operation
Hive at booking
MongoDB Deployment Checklist
Redis database
Top 10 database optimization tips
RedisConf17- Redis as a Primary Data Store
Ceph Day Santa Clara: Ceph Performance & Benchmarking
Speed up R with parallel programming in the Cloud
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Ceph Days 2014 Paul Evans Slide Deck
DHT2 - O Brother, Where Art Thou with Shyam Ranganathan
Hardware Provisioning
4 use cases for C* to Scylla
Redis Modules - Redis India Tour - 2017
Capacity Planning For Your Growing MongoDB Cluster
HDFS Deep Dive
Ad

Viewers also liked (20)

PPT
Adco teaser
PPTX
My life
PPTX
Unit 7 y 8
DOC
Successes2009
PPT
Lecture2
PDF
Outland res. brochure 6 30-11 brown
DOC
Desktop support qua
DOC
베트남 노동법 주요내용
PDF
SBPS Staff Survey
PPTX
Jennifer h. jenny and timmy
PPT
Themes ways of the world
PPT
Avanta Brochure Presentation 2011
KEY
Linkedin
PDF
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
PPTX
Module 1
DOC
PPT
Da rtn 11_jan2013
PPTX
Andrés bio
PPTX
Cd y ci
PDF
แนะนำทุน พสวท.
Adco teaser
My life
Unit 7 y 8
Successes2009
Lecture2
Outland res. brochure 6 30-11 brown
Desktop support qua
베트남 노동법 주요내용
SBPS Staff Survey
Jennifer h. jenny and timmy
Themes ways of the world
Avanta Brochure Presentation 2011
Linkedin
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
Module 1
Da rtn 11_jan2013
Andrés bio
Cd y ci
แนะนำทุน พสวท.
Ad

Similar to Real world capacity (20)

PPTX
M6d cassandrapresentation
PDF
Netflix at-disney-09-26-2014
PPTX
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
PDF
Instaclustr Apache Cassandra Best Practices & Toubleshooting
PPTX
MongoDB Capacity Planning
PDF
Scaling Cassandra for Big Data
PDF
Ebay: DB Capacity planning at eBay
PPTX
Webinar: Capacity Planning
PDF
Cassandra CLuster Management by Japan Cassandra Community
PPTX
Getting started with Cassandra 2.1
PDF
5 Steps to PostgreSQL Performance
PDF
Five steps perform_2009 (1)
PPTX
Who wants to be a Cassandra Millionaire
ODP
Cassandra as Memcache
PDF
Why does my choice of storage matter with cassandra?
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PPTX
Cassandra an overview
PDF
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
PDF
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
M6d cassandrapresentation
Netflix at-disney-09-26-2014
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Instaclustr Apache Cassandra Best Practices & Toubleshooting
MongoDB Capacity Planning
Scaling Cassandra for Big Data
Ebay: DB Capacity planning at eBay
Webinar: Capacity Planning
Cassandra CLuster Management by Japan Cassandra Community
Getting started with Cassandra 2.1
5 Steps to PostgreSQL Performance
Five steps perform_2009 (1)
Who wants to be a Cassandra Millionaire
Cassandra as Memcache
Why does my choice of storage matter with cassandra?
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Cassandra an overview
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...

More from Edward Capriolo (14)

PPT
Nibiru: Building your own NoSQL store
ODP
Web-scale data processing: practical approaches for low-latency and batch
ODP
Big data nyu
PPT
Cassandra4hadoop
ODP
Intravert Server side processing for Cassandra
ODP
M6d cassandra summit
ODP
Apache Kafka Demo
ODP
Cassandra NoSQL Lan party
PPT
Breaking first-normal form with Hive
ODP
Casbase presentation
PPT
Hadoop Monitoring best Practices
PPT
Whirlwind tour of Hadoop and HIve
ODP
Cli deep dive
PPT
Counters for real-time statistics
Nibiru: Building your own NoSQL store
Web-scale data processing: practical approaches for low-latency and batch
Big data nyu
Cassandra4hadoop
Intravert Server side processing for Cassandra
M6d cassandra summit
Apache Kafka Demo
Cassandra NoSQL Lan party
Breaking first-normal form with Hive
Casbase presentation
Hadoop Monitoring best Practices
Whirlwind tour of Hadoop and HIve
Cli deep dive
Counters for real-time statistics

Real world capacity

  • 1. Real world capacity planning: Cassandra on blades and big iron July 2011
  • 2. About me Hadoop System Admin @ media6degrees Watch cassandra servers as well Write code (hadoop filecrusher) Hive Committer Variable substitution, UDFs like atan, rough draft of c* handler Epic Cassandra Contributor (not!) CLI should allow users to chose consistency level NodeCmd should be able to view Compaction Statistics Self proclaimed president of Cassandra fan club Cassandra NYC User Group High Performance Cassandra Cookbook
  • 3. Media6 Degrees Social Targeting in online advertising Real Time Bidding - A dynamic auction process where each impression is bid for in (near) real time Cassandra @ work storing: Visit Data Ad History Id Mapping Multiple Data Centers (home brew replication) Back end tools hadoop (Data mining, bulk loads) Front end tomcat, mysql + cassandra (lookup data)
  • 4. What is this talk about? Real World Capacity Planning Been running c* in production > 1 year Started with a hand full of nodes also running tomcat and Replication Factor 2! Grew data from 0-10 TB data Grew from 0-751,398,530 reads / day All types of fun along the way
  • 5. Using puppet, chef... from day 1 “ I am going to chose Cassandra 0.6.0-beta-1 over 0.5.x so I am future proof” -- Famous quote by me Cassandra is active new versions are coming Rolling restarts between minors But much better to get all to same rev quickly New nodes are coming do not let them: start with the wrong settings fail because you forgot open file limits, etc
  • 6. Calculating Data size on disk SSTable format currently not compressed Repairs, joins, and moves need “wiggle room” Smaller keys and column names save space Enough free space to compact your largest column family Snapshots keep SSTables around after compaction Most *Nix files systems need free space to avoid performance loss to fragmentation!
  • 7. Speed of disk The faster the better! But faster + bigger gets expensive and challenging RAID0 Faster for streaming not necessarily seeking Fragile, larger the stripe, higher chance of failure RAID5 Not as fast but survives disk failure Battery backed cache helps but is $$$ The dedicated commit log decision
  • 8. Disk formatting ext4 everywhere Deletes are much better then ext3 Noticeable performance as disks get full A full async mode for risk takers Obligatory noatime fstab setting using multiple file systems can result in multiple caches (check slabtop) Mention XFS
  • 9. Memory Garbage collection is on a separate thread(s) Each request creates temporary objects Cassandra's fast writes go to Memtables You will never guess what they use :) Bloom filter data is in memory Key cache and Row cache For low latency RAM must be some % of data RAM not used by process is OS cache
  • 10. CPU Workload could be more disk then CPU bound High load needs a CPU to clean up java garbage Other then serving requests, compaction uses resources
  • 11. Different workloads Structured log format of C* has deep implications Is data written once or does it change over time? How high is data churn? How random is the read/write pattern? What is the write/read percentage? What are your latency requirements?
  • 12. Large Disk / Big Iron key points RAID0 mean time to failure with bigger stripes Java can not address large heaps well Compactions/Joins/repairs take a long time Lowers agility when joining a node could take hours Maintaining high RAM to Data percentage costly IE 2 machines with 32GB vs 1 machine with 64GB Capacity heavily diminished with loss of one node
  • 13. Blade server key points Management software gives cloud computing vibe Cassandra internode traffic on blade back plane Usually support 1-2 on board disk SCSI/SSD Usually support RAM configurations up to 128G Single and duel socket CPU No exotic RAID options
  • 14. Schema lessons You only need one column family. not always true Infrequently read data in the same CF as frequently data compete for “cache” Separating allows employing multiple cache options Rows that are written or updated get fragmented
  • 15. Capacity Planning rule #1 Know your hard drive limits
  • 16. Capacity Planning rule #2 Writes are fast, until c* flushes and compacts so much, that they are not
  • 17. Capacity Planning rule #3 Row cache is fools gold Faster then a read from disk cache Memory use (row key + columns and values) Causes memory pressure (data in and out of mem) Fails with large rows Cold on startup
  • 18. Capacity Planning rule #4 Do not upgrade tomorrow what you can upgrade today Joining nodes is intensive on the cluster Do not wait till c* disks are 99% utilized Do not get 100% benefit of new nodes until neighbors are cleaned Doubling nodes results in less move steps Adding RAM is fast and takes heat of hard disk
  • 19. Capacity Planning rule #5 Know your traffic patterns better then yourself
  • 20. The use case: Dr. Real Time and Mr. Batch
  • 21. Dr. Real Time Real time bidding needs low latency Peak traffic during the day Need to keep a high cache hit rate Avoid compact, repair, cleanup, joins
  • 22. Dr. Real Time's Lab Experiments with Xmx vs VFS caching Experiments with cache sizing Studying graphs as new releases and features are added Monitoring dropped messages, garbage collection Dr. Real Time enjoys lots of memory for GB of data on disk Enjoys reading (data), writing as well Nice sized memtables help to not pollute vfs cache
  • 23. Mr. Batch Night falls and users sleep Batch/Back loading data (bulk inserts) Finding and removing old data (range scanning) Maintenance work (nodetool)
  • 24. Mr. Batch rampaging through the data Bulk loading Write at quorum, c* work harder on front end Turning off compaction For short burst fine, but we are pushing for hours Forget to turn it back on SSTable count gets bad fast Range scanning to locate and remove old data Scheduling repairs and compaction Mr. Batch enjoys tearing through data Writes, tombstones, range scanning, repairs Enjoys fast disks for compacting

Editor's Notes