SlideShare a Scribd company logo
Feng Qu, Sr MTS
Bass Chorng, Principal Capacity Engineer
DB Capacity Planning at eBay
#CassandraSummit2015	
  	
  
Who Am I?
#CassandraSummit2015 2
Bass Chorng – Principal Capacity Engineer @
eBay
Specializes in database performance, availability
& scalability in a large website.
Established DB capacity team at eBay in 2003.
Loves mountain biking.
#CassandraSummit2015	
  	
  
	
  
eBay Site DB Traffic At A Glance
NoSQL Total – 52 B/Day
Cassandra – 15 B
Mongo – 15 B
CouchBase – 12 B
PushVM – 10B
RDBMS Total – 350 B
MySQL – 10 B
Oracle – 340 B
Peak Traffic – 8M/sec
Site Total DB Calls – 400B/Day across 2,000 NoSQL Nodes + 450 Oracle Nodes
Hosting 800M Active items & 120M Active Users
Y-o-Y Growth – 30% ~ 35%
15 15 12 10
10
340
Billion SQL Calls per Day
Cassandra
Mongo
CouchBase
PushVM
MySQL
Oracle
Capacity Planning - Simply Put
Ø  Analyze Traffic
o  Data
Ø  Analyze Utilization
o  Data
Ø  Analyze The Relationship Of The Above Two
o  Same Data
Ø  Forecast Growth
o  Simple Models, Then Impress Your Boss.
Ø  Convert Resource Need into $
o  A Calculator, Then Impress Your CIO’s
BTW, You Also Need To Know …
•  Platform Domain Knowledge – Server, DB Engine, IO Subsystem, Networks …
•  Relationship Between System Overhead & Utilization
•  Seasonality & Workload Characteristics
•  Bottlenecks – Components, Systems, Platforms, Architecture, Site & Apps
•  New Technologies
#CassandraSummit2015 4
Domain Knowledge Stack
#CassandraSummit2015 5
APPS
DB
UNIX
STORAGE
C
A
P
A
C
I
T
Y
C
A
P
A
C
I
T
Y
aka Whom To Blame Stack
Bottom of food chain =>
Data
Ø What To Collect?
Apps, Database, Sessions, CPU, Memory, Connections, IOPS,
IO Time, NIC, HBA, Array
Ø How To Collect?
Time Resolution, Aggregation Level, Retention
Ø How To Use It?
Average, Max, 95th percentile, Dashboard, Reporting, Trending
#CassandraSummit2015 6
0.0
1.0
2.0
3.0
4.0
5/1/2015
5/2/2015
5/3/2015
5/4/2015
5/5/2015
5/6/2015
5/7/2015
5/8/2015
5/10/2015
5/11/2015
5/12/2015
5/13/2015
5/14/2015
5/15/2015
5/16/2015
5/17/2015
5/19/2015
5/20/2015
5/21/2015
5/22/2015
5/23/2015
5/24/2015
5/25/2015
5/26/2015
5/27/20150
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
1/26/2015
1/28/2015
1/30/2015
2/1/2015
2/3/2015
2/5/2015
2/7/2015
2/9/2015
2/11/2015
2/13/2015
2/15/2015
2/17/2015
2/19/2015
2/21/2015
2/23/2015
2/25/2015
2/27/2015
3/1/2015
Forecast
Ø Model Traffic, Not Resources
Ø Need One Year Trend
Ø Forecast At Daily Level
Ø Eliminate Outliers
Ø No Data Is Better Than Wrong Data
Ø Convert Traffic To Resource Usage
Ø Linear Extrapolation Only (CPU Utilization, not IO Time)
Ø Simple Excel Formula Works Well
Ø For Long Term Resource Planning Only
Ø Use Average, Not Max
Ø Not All Workloads Are Predictable
#CassandraSummit2015
7
0
10
20
30
40
50
60
70
01/01/2012 01/01/2013 01/01/2014 01/01/2015
Billion
Calls
CATY Traffic Forecast
Forecast Actual Capacity
Things To Watch For
Myths
Ø More CPU Makes Apps Run Faster
Ø More Data Makes Apps Run Slower
Ø Apps Run Twice As Fast On CPU Twice The Speed
Ø High Session = High Load
Pitfalls
Ø Cause VS. Symptom
Ø Time Resolution Masks Issues
Ø Look At The Whole Picture
Ø Slow Down In Order To Go Faster < Throttle >
Challenges
Ø Data Quality – Data Missing, Data Source Changes, F/O Data Residency, Data Errors …
Ø Varieties of Data Formats & Resolutions
Ø Data Collection In Secured Zones
#CassandraSummit2015
8
Me: Everything NoSQL
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Ø Prior to 2011: Worked on Oracle at DoubleClick/Yahoo/Intuit
Ø Worked on NoSQL at eBay Database Infrastructure team:
Ø Cassandra since 2011
Ø MongoDB since 2012
Ø Couchbase since 2014
Ø Cassandra Summit speaker for 2013, 2014, 2015
Ø DataStax Cassandra MVP for 2014, 2015
For Cassandra
Ø Capacity Measurements
Ø Throughput
Ø Latency
Ø E.g. 30,000 reads/sec with SLA of P99 at 5ms
Ø Hardware SKU Example
Ø CPU: 20 cores
Ø Memory: 128GB RAM
Ø Storage: 1.5TB local SSD
Ø Network: 10g NIC
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Benchmarking
Ø Benchmarking for different hardware
Ø High I/O SKU
Ø High memory SKU
Ø High storage SKU
Ø Bare metal or cloud
Ø Benchmarking for different software releases
Ø Benchmarking for different workloads
Ø  100% Writes
Ø  50% Writes, 50% Reads
Ø  5% Writes, 95% Reads
Ø  100% Reads
Ø Benchmarking Tools
Ø YCSB
Ø Cassandra-stress
Ø Proactive and repeated process using near real-time traffic in prod like environment
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Capacity Planning
Ø Key to avoid surprise in production
Ø The concept behind capacity planning is simple, but the mechanics are harder.
Ø Business requirements may increase, need to forecast how much resource must be
added to the system to ensure that user experience continues uninterrupted
Ø  Input: clearly defined capacity goal coming from business requirement and performance baseline
from benchmark test
Ø  Output: Identify resources to be added, such as memory, CPU, storage, I/O, network
Ø Always prepare for peak + headroom
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Capacity Planning Process
Ø Initial Sizing
Ø Storage size vs. data size
Ø Compaction overhead, compression ratio, RF, indexes
Ø Cost-effective configuration to meet capacpity/latency SLA
Ø Routine Review
Ø System utilization on I/O, storage, network, CPU, memory etc
Ø Cassandra metrics on GC, compaction, latency, throughput etc
Ø Compactionstats, cfhistoralgrams, tpstats etc
Ø Forecasting
Ø Historical comparison
Ø Traffic projection
Ø Flex up or Flex down
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Scale Up vs. Scale Out
Ø Scale Up(vertical)
Ø  Pros
Ø Smaller data center footprint, such as space, power, cooling
Ø Less license cost
Ø  Cons
Ø Likely cost more using proprietary hardware
Ø Less fault tolerant
Ø Limited upgradability in future
Ø Scale Out(horizontal)
Ø  Pros
Ø Cheaper using commodity hardware
Ø More fault tolerant
Ø (unlimited) upgradability
Ø  Cons
Ø Bigger data center footprint
Ø More license cost
Ø Likely need more network equipment
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Questions ?
CassandraSummit2015	
  |	
  #CassandraSummit	
  
eBay is hiring experienced NoSQL professionals, please send resume to fengqu@ebay.com

More Related Content

PDF
Case Study: Troubleshooting Cassandra performance issues as a developer
PDF
Target: Performance Tuning Cassandra at Target
PDF
PagerDuty: Span the WAN? Yes you can!
PDF
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
PDF
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
PPTX
Captial One: Why Stream Data as Part of Data Transformation?
PPTX
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Case Study: Troubleshooting Cassandra performance issues as a developer
Target: Performance Tuning Cassandra at Target
PagerDuty: Span the WAN? Yes you can!
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Captial One: Why Stream Data as Part of Data Transformation?
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB

What's hot (20)

PPTX
Performance Testing: Scylla vs. Cassandra vs. Datastax
PPTX
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
PPTX
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
PDF
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
PPTX
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
PPTX
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
PDF
Scylla Summit 2016: ScyllaDB, Present and Future
PDF
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
PPTX
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
PDF
Fast dataarchitecture
PDF
Shift: Real World Migration from MongoDB to Cassandra
PPTX
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
PPTX
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
PPTX
Using ScyllaDB with JanusGraph for Cyber Security
PDF
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
PPTX
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
PDF
Tsinghua University: Two Exemplary Applications in China
PDF
DIscover Spark and Spark streaming
PPTX
Event Streaming Architectures with Confluent and ScyllaDB
PPTX
How ReversingLabs Serves File Reputation Service for 10B Files
Performance Testing: Scylla vs. Cassandra vs. Datastax
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
Scylla Summit 2016: ScyllaDB, Present and Future
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
Fast dataarchitecture
Shift: Real World Migration from MongoDB to Cassandra
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Using ScyllaDB with JanusGraph for Cyber Security
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Tsinghua University: Two Exemplary Applications in China
DIscover Spark and Spark streaming
Event Streaming Architectures with Confluent and ScyllaDB
How ReversingLabs Serves File Reputation Service for 10B Files
Ad

Viewers also liked (14)

PPTX
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
PPT
MongoATL: How Sourceforge is Using MongoDB
PDF
Artigo Nosql
KEY
Scaling with MongoDB
PDF
An Elastic Metadata Store for eBay’s Media Platform
PDF
Social Data and Log Analysis Using MongoDB
PPTX
NOSQL uma breve introdução
PPTX
eBay Cloud CMS based on NOSQL
PDF
No sql e as vantagens na utilização do mongodb
PPTX
Semantic Wiki: Social Semantic Web In Action:
KEY
NoSQL at Twitter (NoSQL EU 2010)
PDF
Building LinkedIn's Learning Platform with MongoDB
PPTX
MongoDB at eBay
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoATL: How Sourceforge is Using MongoDB
Artigo Nosql
Scaling with MongoDB
An Elastic Metadata Store for eBay’s Media Platform
Social Data and Log Analysis Using MongoDB
NOSQL uma breve introdução
eBay Cloud CMS based on NOSQL
No sql e as vantagens na utilização do mongodb
Semantic Wiki: Social Semantic Web In Action:
NoSQL at Twitter (NoSQL EU 2010)
Building LinkedIn's Learning Platform with MongoDB
MongoDB at eBay
Ad

Similar to Ebay: DB Capacity planning at eBay (20)

PDF
Building Products Quantitatively
PDF
Predicting When Your Applications Will Go Off the Rails! Managing DB2 Appli...
PPTX
Predicting Consumer Behaviour via Hadoop
PDF
Tech Talk: Five Simple Steps to a More Powerful Database Experience
PDF
Sybase ASE 15.7- Two Case Studies of Successful Migration
PDF
Hadoop and the Relational Database: The Best of Both Worlds
PDF
CA Performance Management 2.6 Deep Dive
DOCX
Vadlamudi saketh30 (ml)
PDF
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
PDF
ADV Slides: 2021 Trends in Enterprise Analytics
PDF
How Gousto is moving to just-in-time personalization with Snowplow
PDF
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
PDF
MeasureWorks - The Art of Staying Fast
PDF
Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...
PDF
My sql cluster case study apr16
PPTX
The Cloud - What's different
PDF
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
PPT
Data flow in the data center
PDF
Path to Event Sourcing/CQRS - Derya SEZEN
PDF
The Science of DBMS: Query Optimization
Building Products Quantitatively
Predicting When Your Applications Will Go Off the Rails! Managing DB2 Appli...
Predicting Consumer Behaviour via Hadoop
Tech Talk: Five Simple Steps to a More Powerful Database Experience
Sybase ASE 15.7- Two Case Studies of Successful Migration
Hadoop and the Relational Database: The Best of Both Worlds
CA Performance Management 2.6 Deep Dive
Vadlamudi saketh30 (ml)
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
ADV Slides: 2021 Trends in Enterprise Analytics
How Gousto is moving to just-in-time personalization with Snowplow
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
MeasureWorks - The Art of Staying Fast
Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...
My sql cluster case study apr16
The Cloud - What's different
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
Data flow in the data center
Path to Event Sourcing/CQRS - Derya SEZEN
The Science of DBMS: Query Optimization

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
cuic standard and advanced reporting.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
cuic standard and advanced reporting.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Monthly Chronicles - July 2025
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Ebay: DB Capacity planning at eBay

  • 1. Feng Qu, Sr MTS Bass Chorng, Principal Capacity Engineer DB Capacity Planning at eBay #CassandraSummit2015    
  • 2. Who Am I? #CassandraSummit2015 2 Bass Chorng – Principal Capacity Engineer @ eBay Specializes in database performance, availability & scalability in a large website. Established DB capacity team at eBay in 2003. Loves mountain biking.
  • 3. #CassandraSummit2015       eBay Site DB Traffic At A Glance NoSQL Total – 52 B/Day Cassandra – 15 B Mongo – 15 B CouchBase – 12 B PushVM – 10B RDBMS Total – 350 B MySQL – 10 B Oracle – 340 B Peak Traffic – 8M/sec Site Total DB Calls – 400B/Day across 2,000 NoSQL Nodes + 450 Oracle Nodes Hosting 800M Active items & 120M Active Users Y-o-Y Growth – 30% ~ 35% 15 15 12 10 10 340 Billion SQL Calls per Day Cassandra Mongo CouchBase PushVM MySQL Oracle
  • 4. Capacity Planning - Simply Put Ø  Analyze Traffic o  Data Ø  Analyze Utilization o  Data Ø  Analyze The Relationship Of The Above Two o  Same Data Ø  Forecast Growth o  Simple Models, Then Impress Your Boss. Ø  Convert Resource Need into $ o  A Calculator, Then Impress Your CIO’s BTW, You Also Need To Know … •  Platform Domain Knowledge – Server, DB Engine, IO Subsystem, Networks … •  Relationship Between System Overhead & Utilization •  Seasonality & Workload Characteristics •  Bottlenecks – Components, Systems, Platforms, Architecture, Site & Apps •  New Technologies #CassandraSummit2015 4
  • 5. Domain Knowledge Stack #CassandraSummit2015 5 APPS DB UNIX STORAGE C A P A C I T Y C A P A C I T Y aka Whom To Blame Stack Bottom of food chain =>
  • 6. Data Ø What To Collect? Apps, Database, Sessions, CPU, Memory, Connections, IOPS, IO Time, NIC, HBA, Array Ø How To Collect? Time Resolution, Aggregation Level, Retention Ø How To Use It? Average, Max, 95th percentile, Dashboard, Reporting, Trending #CassandraSummit2015 6 0.0 1.0 2.0 3.0 4.0 5/1/2015 5/2/2015 5/3/2015 5/4/2015 5/5/2015 5/6/2015 5/7/2015 5/8/2015 5/10/2015 5/11/2015 5/12/2015 5/13/2015 5/14/2015 5/15/2015 5/16/2015 5/17/2015 5/19/2015 5/20/2015 5/21/2015 5/22/2015 5/23/2015 5/24/2015 5/25/2015 5/26/2015 5/27/20150 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 1/26/2015 1/28/2015 1/30/2015 2/1/2015 2/3/2015 2/5/2015 2/7/2015 2/9/2015 2/11/2015 2/13/2015 2/15/2015 2/17/2015 2/19/2015 2/21/2015 2/23/2015 2/25/2015 2/27/2015 3/1/2015
  • 7. Forecast Ø Model Traffic, Not Resources Ø Need One Year Trend Ø Forecast At Daily Level Ø Eliminate Outliers Ø No Data Is Better Than Wrong Data Ø Convert Traffic To Resource Usage Ø Linear Extrapolation Only (CPU Utilization, not IO Time) Ø Simple Excel Formula Works Well Ø For Long Term Resource Planning Only Ø Use Average, Not Max Ø Not All Workloads Are Predictable #CassandraSummit2015 7 0 10 20 30 40 50 60 70 01/01/2012 01/01/2013 01/01/2014 01/01/2015 Billion Calls CATY Traffic Forecast Forecast Actual Capacity
  • 8. Things To Watch For Myths Ø More CPU Makes Apps Run Faster Ø More Data Makes Apps Run Slower Ø Apps Run Twice As Fast On CPU Twice The Speed Ø High Session = High Load Pitfalls Ø Cause VS. Symptom Ø Time Resolution Masks Issues Ø Look At The Whole Picture Ø Slow Down In Order To Go Faster < Throttle > Challenges Ø Data Quality – Data Missing, Data Source Changes, F/O Data Residency, Data Errors … Ø Varieties of Data Formats & Resolutions Ø Data Collection In Secured Zones #CassandraSummit2015 8
  • 9. Me: Everything NoSQL CassandraSummit2015  |  #CassandraSummit   Ø Prior to 2011: Worked on Oracle at DoubleClick/Yahoo/Intuit Ø Worked on NoSQL at eBay Database Infrastructure team: Ø Cassandra since 2011 Ø MongoDB since 2012 Ø Couchbase since 2014 Ø Cassandra Summit speaker for 2013, 2014, 2015 Ø DataStax Cassandra MVP for 2014, 2015
  • 10. For Cassandra Ø Capacity Measurements Ø Throughput Ø Latency Ø E.g. 30,000 reads/sec with SLA of P99 at 5ms Ø Hardware SKU Example Ø CPU: 20 cores Ø Memory: 128GB RAM Ø Storage: 1.5TB local SSD Ø Network: 10g NIC CassandraSummit2015  |  #CassandraSummit  
  • 11. Benchmarking Ø Benchmarking for different hardware Ø High I/O SKU Ø High memory SKU Ø High storage SKU Ø Bare metal or cloud Ø Benchmarking for different software releases Ø Benchmarking for different workloads Ø  100% Writes Ø  50% Writes, 50% Reads Ø  5% Writes, 95% Reads Ø  100% Reads Ø Benchmarking Tools Ø YCSB Ø Cassandra-stress Ø Proactive and repeated process using near real-time traffic in prod like environment CassandraSummit2015  |  #CassandraSummit  
  • 12. Capacity Planning Ø Key to avoid surprise in production Ø The concept behind capacity planning is simple, but the mechanics are harder. Ø Business requirements may increase, need to forecast how much resource must be added to the system to ensure that user experience continues uninterrupted Ø  Input: clearly defined capacity goal coming from business requirement and performance baseline from benchmark test Ø  Output: Identify resources to be added, such as memory, CPU, storage, I/O, network Ø Always prepare for peak + headroom CassandraSummit2015  |  #CassandraSummit  
  • 13. Capacity Planning Process Ø Initial Sizing Ø Storage size vs. data size Ø Compaction overhead, compression ratio, RF, indexes Ø Cost-effective configuration to meet capacpity/latency SLA Ø Routine Review Ø System utilization on I/O, storage, network, CPU, memory etc Ø Cassandra metrics on GC, compaction, latency, throughput etc Ø Compactionstats, cfhistoralgrams, tpstats etc Ø Forecasting Ø Historical comparison Ø Traffic projection Ø Flex up or Flex down CassandraSummit2015  |  #CassandraSummit  
  • 14. Scale Up vs. Scale Out Ø Scale Up(vertical) Ø  Pros Ø Smaller data center footprint, such as space, power, cooling Ø Less license cost Ø  Cons Ø Likely cost more using proprietary hardware Ø Less fault tolerant Ø Limited upgradability in future Ø Scale Out(horizontal) Ø  Pros Ø Cheaper using commodity hardware Ø More fault tolerant Ø (unlimited) upgradability Ø  Cons Ø Bigger data center footprint Ø More license cost Ø Likely need more network equipment CassandraSummit2015  |  #CassandraSummit  
  • 15. Questions ? CassandraSummit2015  |  #CassandraSummit   eBay is hiring experienced NoSQL professionals, please send resume to fengqu@ebay.com