SlideShare a Scribd company logo
Cassandra Compute Cloud 
An Elastic Cassandra Infrastructure 
Gurashish Singh Brar 
Member of Technical Staff @ BloomReach
Abstract 
Dynamically scaling Cassandra to serve hundreds of map-reduce jobs that come at an unpredictable rate and at 
the same time providing access to the data in real time to front-end application with strict TP95 latency 
guarantees is a hard problem. 
We present a system for managing Cassandra clusters which provide following functionality: 
1) Dynamic scaling of capacity to serve high throughput map-reduce jobs 
2) Provide access to data generated by map-reduce jobs in realtime to front-end applications while providing 
latency SLAs for TP95 
3) Maintain a low cost by leveraging Amazon Spot Instances and through demand based scaling. 
At the heart of this infrastructure lies a custom data replication service that makes it possible to stream data to 
new nodes as needed
What is it about ? 
• Dynamically scaling the infrastructure to support large EMR jobs 
• Throughput SLA to backend applications 
• TP95 latency SLA to frontend applications 
• Cassandra 2.0 using vnodes
Agenda 
• Application requirements 
• Major issues we encountered 
• Solutions to the issues
Application Requirements 
• Backend EMR jobs performing scans, lookups and writes 
Heterogeneous applications with varying degree of throughput SLAs. 
Very high peak loads 
Always available (no maintenance periods or planned downtimes) 
• Frontend applications performing lookups 
Data from backend applications expected in realtime 
Low latencies 
• Developer support
How we started 
Frontend Applications 
Frontend 
DC 
Backend 
DC 
Cassandra Cluster 
EMR Jobs
Frontend isolation using multiple DCs 
Frontend DC 
Cassandra Cluster 
Backend DC
Frontend Issue: Spillover Reads 
Frontend DC 
Cassandra Cluster 
Backend DC
Frontend Issue: Latencies vs Replication load 
Frontend Applications 
Cassandra Cluster 
Frontend 
DC 
Backend 
DC 
EMR Jobs
Backend Issue: Fixed resource 
Backend 
DC 
Cassandra Cluster 
EMR Jobs 
EMR Jobs
Backend Issue: Fixed Resource 
Cassandra Cluster 
Backend 
DC 
EMR Jobs 
EMR Jobs 
EMR Jobs 
EMR Jobs 
EMR Jobs 
EMR Jobs 
EMR Jobs 
EMR Jobs
Backend Issue: Starvation 
Backend 
DC 
Cassandra Cluster 
Large EMR Jobs 
with 
relaxed SLA 
Small EMR job 
with 
tighter SLA
Summary of Issues 
• Frontend isolation is not perfect 
• Frontend latencies are impacted by backend write load 
• EMR jobs can overwhelm the Cassandra cluster 
• Large EMR jobs can starve smaller ones
Rate Limiter 
Frontend Applications 
Frontend 
DC 
Backend 
DC 
Cassandra Cluster 
EMR Jobs 
Token Server 
(Redis)
Rate Limiter 
• QPS allocated on per operation and application level 
• Operations can be: scans, reads, writes, prepare, alter, create … etc 
• Each mapper/reducer obtains permits for 1 minute (configurable). 
• The token bucket is periodically refreshed with allocated capacity 
• Quotas are dynamically adjusted to take advantage of unused quotas of applications 
( We do want to maximize the cluster usage)
Why Redis ? 
• High load from all EMR nodes 
• Low latency 
• Support high number of concurrent connections 
• Support atomic fetch and add
Cost of Rate Limiter 
• We converted EMR from an elastic resource to a fixed resource 
• To scale EMR we have to scale Cassandra 
• Adding capacity to Cassandra cluster is not trivial 
• Adding capacity under heavy load is harder 
• Auto scaling and reducing under heavy load is even harder
Managing capacity - Requirements 
• Time to increase capacity should be in minutes 
• Programmatic management and not manual 
• Minimum load on the production cluster during the operation
C* increasing capacity 
C* 
Cluster 
Adding nodes is 
expensive
C* increasing capacity 
C* 
Cluster 
C* 
Sol: Replicate to a Cluster 
new cluster
Custom Replication Service 
Source 
Cluster 
Destination 
Cluster 
SSTable file copy
Custom Replication Service
Custom Replication Service 
• Replication Service (source node) takes snapshot of column family 
• SSTables in snapshot are evenly streamed on destination cluster 
• Replication Service (destination node) splits a single source SSTable to N SSTables 
• Splits computed using SSTableReader & SSTableWriter classes. A single SSTable can 
be split in parallel by multiple threads
Custom Replication Service 
• Once split the new SSTables are streamed to correct destination nodes 
• Rolling restart is initiated on the destination cluster (we could have used nodetool refresh, 
but it was unreliable) 
• The cluster is ready for use 
• In parallel trigger compaction on destination cluster for optimizing reads
Cluster Provisioning 
• Estimate the required cluster size based on column family disk size on source cluster 
• Provision machines on AWS (Cassandra is pre-installed on AMI , so no setup required) 
• Generate yaml and topology file with the new cluster and create a backend datacenter 
(Application agnostic) 
• Copy schema from source cluster to destination cluster 
• Call replication service on source cluster to replicate data
C* Compute Cloud 
Source 
Cluster 
Cluster Management 
service 
On-demand cluster 
On-demand cluster 
On-demand cluster 
On-demand cluster 
EMR Jobs
C* Compute Cloud 
• Very high throughput in moving raw data from source to destination cluster (10 X 
increase in network usage compared to normal) 
• Little CPU/Memory load on the source cluster 
• Leverage the size of destination cluster to compute new SSTables for the new ring 
• Time to provision varies between 10 minutes to 40 minutes 
• API driven so automatically scales up and down with demand 
• Application agnostic
C* Compute Cloud - Limitations 
• Snapshot model : Take a snapshot of production and operate on it 
This works really well for some use cases, good for most, but not all 
• Provisioning time order of minutes 
Works for EMR jobs which themselves take few minutes to provision but does not work for 
dedicated backend applications 
• Writes still need to happen on production reserved cluster
Where we are now 
Frontend Applications 
Frontend 
DC 
Backend 
DC 
Cassandra Cluster 
EMR Jobs 
On-demand cluster 
Token Server (Redis) 
On-demand cluster 
On-demand cluster 
Replication 
Cluster Management 
service
Exploiting the C* compute cloud 
• Key feature: Easy, automated and fast cluster provisioning with 
production data 
• Use Spot Instances instead of On-Demand 
• Failures in few nodes are survivable due to C* redundancy 
• In case of too many failures, just rebuild on retry (its fast ! & automatic)
Spot Instances 
• Service supports all instance types in AWS and all AZs 
• Pick the optimal Spot Instance type & AZ that is the cheapest and 
satisfies the constraints 
• Further reduces cost and improves reliability of the service 
• If r3.2xlarge spot price spikes on retry service might pick c3.8xlarge 
• Auto expire clusters to adjust automatically to cheaper instances
Cost or Capacity (take your pick) 
Capacity of C* compute cloud on spot instances 
~= 
(5 to 10) X C* cluster using on-demand instances 
for same $ value
Issues Addressed 
• Backend Read Capacity can scale linearly with C* compute cloud 
• Frontend latencies are protected from write load through rate limit
Remaining issues 
• Read load on backend DC can spillover to frontend DC causing spikes 
• Write capacity is still defined by frontend latencies
Issue: Spillover Reads 
Frontend DC 
Cassandra Cluster 
Backend DC
Spillover Reads Fix: Fail the read 
Frontend DC 
Cassandra Cluster 
Backend DC 
X
Addressing the Write Capacity 
• The obvious : Only push the updates that are new and not the same 
Big improvement, 80-90% data did not change 
• Add more nodes : With the backend read load off production it is lot 
easier to expand capacity 
• But we are still operating at ~ 3rd or 5th the write capacity to keep read 
latencies low
Addressing the Write Capacity 
• Experimental changes under evaluation 
• Prioritize reads over writes on frontend 
Pause write stage during a read 
• Reduce replication load to frontend DC from backend DC 
ColumnLevel replication strategy 
Most frontend applications operate on a subset view of backend data
Key Takeaways 
• Scale Cassandra dynamically for backend load by creating snapshot 
clusters 
• Use rate limiter to protect the production cluster from spiky and 
unexpected backend traffic 
• Build better isolation between frontend DC and backend DC 
• Writes throughput from backend to frontend is a challenge
Questions ? 
Thank you

More Related Content

PDF
Data Stores @ Netflix
PPTX
Investing the Effects of Overcommitting YARN resources
PPTX
Aws multi-region High Availability
PPTX
Unit1 dbms
PPTX
Myths and facts of cloud hosting services
PPTX
Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure
PPTX
Redshift
PPTX
Data Stores @ Netflix
Investing the Effects of Overcommitting YARN resources
Aws multi-region High Availability
Unit1 dbms
Myths and facts of cloud hosting services
Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure
Redshift

Similar to Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastructure (20)

PDF
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
PPT
Scalable analytics for iaas cloud availability
PPTX
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
PPTX
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
PPTX
Tokyo azure meetup #12 service fabric internals
PDF
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
PDF
Amazon relational database service (rds)
PPTX
Building a Just-in-Time Application Stack for Analysts
PDF
Cloud Architecture Tutorial - Running in the Cloud (3of3)
PPTX
How to Set Up ApsaraDB for RDS on Alibaba Cloud
PPTX
week 5 cloud security computing northumbria foudation
PPTX
Serverlessusecase workshop feb3_v2
PPTX
EC2 BY RASHMI GR.pptx
PDF
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
PPTX
AWS re:Invent 2013 Recap
PDF
Migrating Your Databases to AWS Deep Dive on Amazon RDS and AWS
PDF
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DOC
Scalable analytics for iaas cloud availability
PDF
Amazon Web Services - Relational Database Service Meetup
PDF
Serverless on AWS : Understanding the hard parts at Froscon 2019
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
Scalable analytics for iaas cloud availability
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
Tokyo azure meetup #12 service fabric internals
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
Amazon relational database service (rds)
Building a Just-in-Time Application Stack for Analysts
Cloud Architecture Tutorial - Running in the Cloud (3of3)
How to Set Up ApsaraDB for RDS on Alibaba Cloud
week 5 cloud security computing northumbria foudation
Serverlessusecase workshop feb3_v2
EC2 BY RASHMI GR.pptx
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
AWS re:Invent 2013 Recap
Migrating Your Databases to AWS Deep Dive on Amazon RDS and AWS
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
Scalable analytics for iaas cloud availability
Amazon Web Services - Relational Database Service Meetup
Serverless on AWS : Understanding the hard parts at Froscon 2019
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra
Ad

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation theory and applications.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Understanding_Digital_Forensics_Presentation.pptx
Electronic commerce courselecture one. Pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The AUB Centre for AI in Media Proposal.docx
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Per capita expenditure prediction using model stacking based on satellite ima...
“AI and Expert System Decision Support & Business Intelligence Systems”
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
NewMind AI Monthly Chronicles - July 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastructure

  • 1. Cassandra Compute Cloud An Elastic Cassandra Infrastructure Gurashish Singh Brar Member of Technical Staff @ BloomReach
  • 2. Abstract Dynamically scaling Cassandra to serve hundreds of map-reduce jobs that come at an unpredictable rate and at the same time providing access to the data in real time to front-end application with strict TP95 latency guarantees is a hard problem. We present a system for managing Cassandra clusters which provide following functionality: 1) Dynamic scaling of capacity to serve high throughput map-reduce jobs 2) Provide access to data generated by map-reduce jobs in realtime to front-end applications while providing latency SLAs for TP95 3) Maintain a low cost by leveraging Amazon Spot Instances and through demand based scaling. At the heart of this infrastructure lies a custom data replication service that makes it possible to stream data to new nodes as needed
  • 3. What is it about ? • Dynamically scaling the infrastructure to support large EMR jobs • Throughput SLA to backend applications • TP95 latency SLA to frontend applications • Cassandra 2.0 using vnodes
  • 4. Agenda • Application requirements • Major issues we encountered • Solutions to the issues
  • 5. Application Requirements • Backend EMR jobs performing scans, lookups and writes Heterogeneous applications with varying degree of throughput SLAs. Very high peak loads Always available (no maintenance periods or planned downtimes) • Frontend applications performing lookups Data from backend applications expected in realtime Low latencies • Developer support
  • 6. How we started Frontend Applications Frontend DC Backend DC Cassandra Cluster EMR Jobs
  • 7. Frontend isolation using multiple DCs Frontend DC Cassandra Cluster Backend DC
  • 8. Frontend Issue: Spillover Reads Frontend DC Cassandra Cluster Backend DC
  • 9. Frontend Issue: Latencies vs Replication load Frontend Applications Cassandra Cluster Frontend DC Backend DC EMR Jobs
  • 10. Backend Issue: Fixed resource Backend DC Cassandra Cluster EMR Jobs EMR Jobs
  • 11. Backend Issue: Fixed Resource Cassandra Cluster Backend DC EMR Jobs EMR Jobs EMR Jobs EMR Jobs EMR Jobs EMR Jobs EMR Jobs EMR Jobs
  • 12. Backend Issue: Starvation Backend DC Cassandra Cluster Large EMR Jobs with relaxed SLA Small EMR job with tighter SLA
  • 13. Summary of Issues • Frontend isolation is not perfect • Frontend latencies are impacted by backend write load • EMR jobs can overwhelm the Cassandra cluster • Large EMR jobs can starve smaller ones
  • 14. Rate Limiter Frontend Applications Frontend DC Backend DC Cassandra Cluster EMR Jobs Token Server (Redis)
  • 15. Rate Limiter • QPS allocated on per operation and application level • Operations can be: scans, reads, writes, prepare, alter, create … etc • Each mapper/reducer obtains permits for 1 minute (configurable). • The token bucket is periodically refreshed with allocated capacity • Quotas are dynamically adjusted to take advantage of unused quotas of applications ( We do want to maximize the cluster usage)
  • 16. Why Redis ? • High load from all EMR nodes • Low latency • Support high number of concurrent connections • Support atomic fetch and add
  • 17. Cost of Rate Limiter • We converted EMR from an elastic resource to a fixed resource • To scale EMR we have to scale Cassandra • Adding capacity to Cassandra cluster is not trivial • Adding capacity under heavy load is harder • Auto scaling and reducing under heavy load is even harder
  • 18. Managing capacity - Requirements • Time to increase capacity should be in minutes • Programmatic management and not manual • Minimum load on the production cluster during the operation
  • 19. C* increasing capacity C* Cluster Adding nodes is expensive
  • 20. C* increasing capacity C* Cluster C* Sol: Replicate to a Cluster new cluster
  • 21. Custom Replication Service Source Cluster Destination Cluster SSTable file copy
  • 23. Custom Replication Service • Replication Service (source node) takes snapshot of column family • SSTables in snapshot are evenly streamed on destination cluster • Replication Service (destination node) splits a single source SSTable to N SSTables • Splits computed using SSTableReader & SSTableWriter classes. A single SSTable can be split in parallel by multiple threads
  • 24. Custom Replication Service • Once split the new SSTables are streamed to correct destination nodes • Rolling restart is initiated on the destination cluster (we could have used nodetool refresh, but it was unreliable) • The cluster is ready for use • In parallel trigger compaction on destination cluster for optimizing reads
  • 25. Cluster Provisioning • Estimate the required cluster size based on column family disk size on source cluster • Provision machines on AWS (Cassandra is pre-installed on AMI , so no setup required) • Generate yaml and topology file with the new cluster and create a backend datacenter (Application agnostic) • Copy schema from source cluster to destination cluster • Call replication service on source cluster to replicate data
  • 26. C* Compute Cloud Source Cluster Cluster Management service On-demand cluster On-demand cluster On-demand cluster On-demand cluster EMR Jobs
  • 27. C* Compute Cloud • Very high throughput in moving raw data from source to destination cluster (10 X increase in network usage compared to normal) • Little CPU/Memory load on the source cluster • Leverage the size of destination cluster to compute new SSTables for the new ring • Time to provision varies between 10 minutes to 40 minutes • API driven so automatically scales up and down with demand • Application agnostic
  • 28. C* Compute Cloud - Limitations • Snapshot model : Take a snapshot of production and operate on it This works really well for some use cases, good for most, but not all • Provisioning time order of minutes Works for EMR jobs which themselves take few minutes to provision but does not work for dedicated backend applications • Writes still need to happen on production reserved cluster
  • 29. Where we are now Frontend Applications Frontend DC Backend DC Cassandra Cluster EMR Jobs On-demand cluster Token Server (Redis) On-demand cluster On-demand cluster Replication Cluster Management service
  • 30. Exploiting the C* compute cloud • Key feature: Easy, automated and fast cluster provisioning with production data • Use Spot Instances instead of On-Demand • Failures in few nodes are survivable due to C* redundancy • In case of too many failures, just rebuild on retry (its fast ! & automatic)
  • 31. Spot Instances • Service supports all instance types in AWS and all AZs • Pick the optimal Spot Instance type & AZ that is the cheapest and satisfies the constraints • Further reduces cost and improves reliability of the service • If r3.2xlarge spot price spikes on retry service might pick c3.8xlarge • Auto expire clusters to adjust automatically to cheaper instances
  • 32. Cost or Capacity (take your pick) Capacity of C* compute cloud on spot instances ~= (5 to 10) X C* cluster using on-demand instances for same $ value
  • 33. Issues Addressed • Backend Read Capacity can scale linearly with C* compute cloud • Frontend latencies are protected from write load through rate limit
  • 34. Remaining issues • Read load on backend DC can spillover to frontend DC causing spikes • Write capacity is still defined by frontend latencies
  • 35. Issue: Spillover Reads Frontend DC Cassandra Cluster Backend DC
  • 36. Spillover Reads Fix: Fail the read Frontend DC Cassandra Cluster Backend DC X
  • 37. Addressing the Write Capacity • The obvious : Only push the updates that are new and not the same Big improvement, 80-90% data did not change • Add more nodes : With the backend read load off production it is lot easier to expand capacity • But we are still operating at ~ 3rd or 5th the write capacity to keep read latencies low
  • 38. Addressing the Write Capacity • Experimental changes under evaluation • Prioritize reads over writes on frontend Pause write stage during a read • Reduce replication load to frontend DC from backend DC ColumnLevel replication strategy Most frontend applications operate on a subset view of backend data
  • 39. Key Takeaways • Scale Cassandra dynamically for backend load by creating snapshot clusters • Use rate limiter to protect the production cluster from spiky and unexpected backend traffic • Build better isolation between frontend DC and backend DC • Writes throughput from backend to frontend is a challenge