SlideShare a Scribd company logo
Cassandra serving netflix @ scale
Cassandra serving netflix @ scale
CDE
• Cloud Database Engineering
• Responsible for providing data stores as
services @ Netflix
CDE Services
Agenda
• Cassandra @ Netflix
• Challenges
• Certification and benchmarking
• CDE Architecture
• 98% of streaming data is stored in
Cassandra
• Data ranges from customer
details to Viewing history /
streaming bookmarks to billing
and payment
Cassandra @ Netflix
Cassandra Footprint
• Hundreds of clusters
• Tens of Thousands of nodes
• PBs of data
• Millions of transactions / sec
Challenges
• Monitoring
• Maintenance
• Open source product
• Production readiness
Cassandra serving netflix @ scale
Monitoring
• What do we monitor?
– Latencies
• Co-ordinator Read 99th and 95th based on cluster configurations
• Co-ordinator Write 99th and 95th based on cluster configurations
– Health
• Health check (Powered by Mantis)
• Gossip issues
• Thrift/ Binary services status
• Heap
• Dmesg - Hardware and network issues
Monitoring
• Recent maintenances
– Jenkins
– User initiated maintenances
• Wide row metrics
• Log file warning/ errors/exceptions
Cassandra serving netflix @ scale
Common Approach
CRON System
Job
RunnerJob
RunnerJob
RunnerJob
Runner
Common Architecture
Problems inherent in polling
● Point-in-time snapshot, no state
● Establishing a connection to a cluster when it’s
under heavy load is problematic
● Not resilient to network hiccups, especially for
large clusters
A different approach
What if we had a continuous stream
of fine-grained snapshots ?
Mantis Streaming System
Stream processing system built on Apache Mesos
– Provides a flexible programming model
– Models computation as a distributed DAG
– Designed for high throughput, low latency
Health Check using Mantis
Source
Job
Local
Ring
Agg
Global
Ring
Agg
Source
Job
Source
Job
eu-west-1us-east-1us-west-2
Local
Ring
Agg
Local
Ring
Agg
Score
S
Health Evaluator
Consumes Scores
FSM
Health
Status
S
S
S
S
S
S
S
Score
MM
MM
MM
That’s great, but...
Now the health of the fleet is encapsulated in a
single data stream, so how do we make sense of
that ?
Real Time Dash (Macro View)
Macro View of the fleet
Real Time Dash (Cluster View)
Real Time Dash (Perspective)
Benefits
● Faster detection of issues
● Greater accuracy
● Massive reduction in false positives
● Separation of concerns (decouples detection
from remediation)
Cassandra serving netflix @ scale
Known problems
• Distributed persistent stores (Not stateless)
• Unresponsive nodes
• Cloud
• Configurations setup and tuning
• Hot nodes / token distribution
• Resiliency
Cassandra serving netflix @ scale
• Bootstrapping and automated token assignment
• Backup and recovery/restore
• Centralized configuration management
• REST API for most nodetool commands
• C* JMX metrics collection
• Monitor C* health
Building C* in cloud with Priam
(1) Alternate
availability zones
(a, b, c) around the
ring to ensure data
is written to
multiple data
centers.
(2) Survive the
loss of a data
center by ensuring
that we only lose
one node from
each replication
set.
A
B
C
A
B
c
A
B
C
A
B
C
Priam runs on each node and
will:
* Assign tokens to each
node, alternating (1) the
AZs around the ring (2).
* Perform nightly snapshot
backup to S3
* Perform incremental
SSTable backups to S3
* Bootstrap replacement
nodes to use vacated
tokens
* Collect JMX metrics for our
monitoring systems
* REST API calls to most
nodetool functions
Cassandra
Priam
Tomcat
Putting it all together
Constructing a cluster in AWS
AMI contains os, base netflix packages
and Cassandra and Priam
S3
2
Address DC Rack Status State Load Owns Token
…
###.##.##.### eu---west 1a Up Normal 108.97 GB 16.67% …
###.##.#.## us---east 1e Up Normal 103.72 GB 0.00% …
##.###.###.### eu---west 1b Up Normal 104.82 GB 16.67% …
##.##.##.### us---east 1c Up Normal 111.87 GB 0.00% …
###.##.##.### us---east 1e Up Normal 102.71 GB 0.00% …
##.###.###.### eu---west 1b Up Normal 101.87 GB 16.67% …
##.##.###.## us---east 1c Up Normal 102.83 GB 0.00% …
###.##.###.## eu---west 1c Up Normal 96.66 GB 16.67% …
##.##.##.### us---east 1d Up Normal 99.68 GB 0.00% …
Instance
Region
Availability Zone
(AZ)
Autoscaling Groups
ASGs do not map directly to
nodetool ring output, but are
used to define the cluster (# of
instances, AZs, etc).
Amazon Machine Image
Image loaded onto an AWS
instance; all packages needed
to run an application.
2
##.###.##.### eu---west 1c Up Normal 95.51 GB 16.67% …
##.##.##.## us---east 1d Up Normal 105.85 GB 0.00% …
##.###.##.### eu---west 1a Up Normal 91.25 GB 16.67% …
AWS Terminology
Constructing a cluster in AWS
Security Group
Defines access control
between ASGs
Resiliency
• Instance
• AZ
• Multiple AZ
• Region
Resiliency - Instance
• RF=AZ=3
• Cassandra bootstrapping works really well
• Replace nodes immediately
• Repair on regular interval
Resiliency - One AZ
• RF=AZ=3
• Alternating AZs ensures that each AZ has a full replica of
data
• Provision cluster to run at 2/3 capacity
• Ride out a zone outage; do not move to another zone
• Bootstrap one node at a time
• Repair after recovery
Resiliency - Multiple AZ
• Outage; can no longer satisfy quorum
• Restore from backup and repair
Resiliency - Region
• Connectivity loss between regions – operate as island
clusters until service restored
• Repair data between regions
Cassandra serving netflix @ scale
Cassandra serving netflix @ scale
Cassandra serving netflix @ scale
NdBench - Netflix Data Benchmark
Cassandra serving netflix @ scale
•
•
•
•
•
•
Cassandra serving netflix @ scale
•
-
-
-
Cassandra serving netflix @ scale
Cassandra serving netflix @ scale
•
Cassandra serving netflix @ scale
Stitching it together
C* as a Service - Architecture
J
E
N
K
I
N
S
W
I
N
S
T
O
N
EUNOMIA
Alert Atlas Mantis
C*
C*
C*
Priam
Bolt
Cluster
Metadata
Cluster Metadata/
Advisor
Maintenance
Remediation
C*
PAGE
CDE
Alert if
needed
Capacity
Prediction
Outlier
detection
C*
Forklifter
NDBench
C* Explorer
Client
Drivers
Log
analysis
Cassandra serving netflix @ scale
•
•

More Related Content

PDF
Apache Kafka Architecture & Fundamentals Explained
PPTX
Apache Spark Architecture
PPTX
Kafka replication apachecon_2013
PPTX
Introduction to Storm
PPTX
Apache Flink and what it is used for
PDF
Cassandra at eBay - Cassandra Summit 2012
PDF
카프카, 산전수전 노하우
PDF
Introduction to Apache Flink - Fast and reliable big data processing
Apache Kafka Architecture & Fundamentals Explained
Apache Spark Architecture
Kafka replication apachecon_2013
Introduction to Storm
Apache Flink and what it is used for
Cassandra at eBay - Cassandra Summit 2012
카프카, 산전수전 노하우
Introduction to Apache Flink - Fast and reliable big data processing

What's hot (20)

PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
How netflix manages petabyte scale apache cassandra in the cloud
PPTX
Elastic Stack Introduction
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Deploying Flink on Kubernetes - David Anderson
PDF
Disaster Recovery Plans for Apache Kafka
PDF
When NOT to use Apache Kafka?
PDF
Etsy Activity Feeds Architecture
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
PPTX
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
The Dual write problem
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PDF
ksqlDB로 실시간 데이터 변환 및 스트림 처리
PDF
Cassandra Introduction & Features
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Envoy and Kafka
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
A Thorough Comparison of Delta Lake, Iceberg and Hudi
How netflix manages petabyte scale apache cassandra in the cloud
Elastic Stack Introduction
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Hive + Tez: A Performance Deep Dive
Deploying Flink on Kubernetes - David Anderson
Disaster Recovery Plans for Apache Kafka
When NOT to use Apache Kafka?
Etsy Activity Feeds Architecture
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
The Dual write problem
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
ksqlDB로 실시간 데이터 변환 및 스트림 처리
Cassandra Introduction & Features
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Envoy and Kafka
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Ad

Similar to Cassandra serving netflix @ scale (20)

PDF
Data Stores @ Netflix
PDF
Building a Bigdata Architecture on AWS
PPTX
Cassandra Operations at Netflix
PPTX
Svc 202-netflix-open-source
PPTX
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
PPTX
Big data journey to the cloud rohit pujari 5.30.18
PPTX
cse40822-amazon.pptx
PPTX
Servers fail, who cares?
PDF
Netflix presents at MassTLC Cloud Summit 2013
PPTX
Netflix and Open Source
PPTX
amazon web servics in the cloud aws and its categories compute cloud and stor...
PDF
002 AWSSlides.pdf
PPTX
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
PPTX
Migrating enterprise workloads to AWS
PDF
Cloud storage: the right way OSS EU 2018
PDF
Amazon Elastic Map Reduce - Ian Meyers
PPTX
Being Well Architected in the Cloud (Updated)
PPTX
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
PDF
AWS 101 December 2014
PPTX
Case Study Amazon AWS
Data Stores @ Netflix
Building a Bigdata Architecture on AWS
Cassandra Operations at Netflix
Svc 202-netflix-open-source
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
Big data journey to the cloud rohit pujari 5.30.18
cse40822-amazon.pptx
Servers fail, who cares?
Netflix presents at MassTLC Cloud Summit 2013
Netflix and Open Source
amazon web servics in the cloud aws and its categories compute cloud and stor...
002 AWSSlides.pdf
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Migrating enterprise workloads to AWS
Cloud storage: the right way OSS EU 2018
Amazon Elastic Map Reduce - Ian Meyers
Being Well Architected in the Cloud (Updated)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
AWS 101 December 2014
Case Study Amazon AWS
Ad

More from Vinay Kumar Chella (9)

PDF
Building and running cloud native cassandra
PDF
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
PDF
Live traffic capture and replay in cassandra 4.0
PDF
Query and audit logging in cassandra
PDF
Looking towards an official cassandra sidecar netflix
PDF
A glimpse of cassandra 4.0 features netflix
PDF
Honest performance testing with NDBench
PDF
Real world repairs
PDF
CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX
Building and running cloud native cassandra
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Live traffic capture and replay in cassandra 4.0
Query and audit logging in cassandra
Looking towards an official cassandra sidecar netflix
A glimpse of cassandra 4.0 features netflix
Honest performance testing with NDBench
Real world repairs
CassandraSummit2015_Cassandra upgrades at scale @ NETFLIX

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
KodekX | Application Modernization Development
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KodekX | Application Modernization Development
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
Spectral efficient network and resource selection model in 5G networks
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.

Cassandra serving netflix @ scale

  • 3. CDE • Cloud Database Engineering • Responsible for providing data stores as services @ Netflix
  • 5. Agenda • Cassandra @ Netflix • Challenges • Certification and benchmarking • CDE Architecture
  • 6. • 98% of streaming data is stored in Cassandra • Data ranges from customer details to Viewing history / streaming bookmarks to billing and payment Cassandra @ Netflix
  • 7. Cassandra Footprint • Hundreds of clusters • Tens of Thousands of nodes • PBs of data • Millions of transactions / sec
  • 8. Challenges • Monitoring • Maintenance • Open source product • Production readiness
  • 10. Monitoring • What do we monitor? – Latencies • Co-ordinator Read 99th and 95th based on cluster configurations • Co-ordinator Write 99th and 95th based on cluster configurations – Health • Health check (Powered by Mantis) • Gossip issues • Thrift/ Binary services status • Heap • Dmesg - Hardware and network issues
  • 11. Monitoring • Recent maintenances – Jenkins – User initiated maintenances • Wide row metrics • Log file warning/ errors/exceptions
  • 15. Problems inherent in polling ● Point-in-time snapshot, no state ● Establishing a connection to a cluster when it’s under heavy load is problematic ● Not resilient to network hiccups, especially for large clusters
  • 16. A different approach What if we had a continuous stream of fine-grained snapshots ?
  • 17. Mantis Streaming System Stream processing system built on Apache Mesos – Provides a flexible programming model – Models computation as a distributed DAG – Designed for high throughput, low latency
  • 18. Health Check using Mantis Source Job Local Ring Agg Global Ring Agg Source Job Source Job eu-west-1us-east-1us-west-2 Local Ring Agg Local Ring Agg Score S Health Evaluator Consumes Scores FSM Health Status S S S S S S S Score MM MM MM
  • 19. That’s great, but... Now the health of the fleet is encapsulated in a single data stream, so how do we make sense of that ?
  • 20. Real Time Dash (Macro View) Macro View of the fleet
  • 21. Real Time Dash (Cluster View)
  • 22. Real Time Dash (Perspective)
  • 23. Benefits ● Faster detection of issues ● Greater accuracy ● Massive reduction in false positives ● Separation of concerns (decouples detection from remediation)
  • 25. Known problems • Distributed persistent stores (Not stateless) • Unresponsive nodes • Cloud • Configurations setup and tuning • Hot nodes / token distribution • Resiliency
  • 27. • Bootstrapping and automated token assignment • Backup and recovery/restore • Centralized configuration management • REST API for most nodetool commands • C* JMX metrics collection • Monitor C* health Building C* in cloud with Priam
  • 28. (1) Alternate availability zones (a, b, c) around the ring to ensure data is written to multiple data centers. (2) Survive the loss of a data center by ensuring that we only lose one node from each replication set. A B C A B c A B C A B C Priam runs on each node and will: * Assign tokens to each node, alternating (1) the AZs around the ring (2). * Perform nightly snapshot backup to S3 * Perform incremental SSTable backups to S3 * Bootstrap replacement nodes to use vacated tokens * Collect JMX metrics for our monitoring systems * REST API calls to most nodetool functions Cassandra Priam Tomcat Putting it all together Constructing a cluster in AWS AMI contains os, base netflix packages and Cassandra and Priam S3 2
  • 29. Address DC Rack Status State Load Owns Token … ###.##.##.### eu---west 1a Up Normal 108.97 GB 16.67% … ###.##.#.## us---east 1e Up Normal 103.72 GB 0.00% … ##.###.###.### eu---west 1b Up Normal 104.82 GB 16.67% … ##.##.##.### us---east 1c Up Normal 111.87 GB 0.00% … ###.##.##.### us---east 1e Up Normal 102.71 GB 0.00% … ##.###.###.### eu---west 1b Up Normal 101.87 GB 16.67% … ##.##.###.## us---east 1c Up Normal 102.83 GB 0.00% … ###.##.###.## eu---west 1c Up Normal 96.66 GB 16.67% … ##.##.##.### us---east 1d Up Normal 99.68 GB 0.00% … Instance Region Availability Zone (AZ) Autoscaling Groups ASGs do not map directly to nodetool ring output, but are used to define the cluster (# of instances, AZs, etc). Amazon Machine Image Image loaded onto an AWS instance; all packages needed to run an application. 2 ##.###.##.### eu---west 1c Up Normal 95.51 GB 16.67% … ##.##.##.## us---east 1d Up Normal 105.85 GB 0.00% … ##.###.##.### eu---west 1a Up Normal 91.25 GB 16.67% … AWS Terminology Constructing a cluster in AWS Security Group Defines access control between ASGs
  • 30. Resiliency • Instance • AZ • Multiple AZ • Region
  • 31. Resiliency - Instance • RF=AZ=3 • Cassandra bootstrapping works really well • Replace nodes immediately • Repair on regular interval
  • 32. Resiliency - One AZ • RF=AZ=3 • Alternating AZs ensures that each AZ has a full replica of data • Provision cluster to run at 2/3 capacity • Ride out a zone outage; do not move to another zone • Bootstrap one node at a time • Repair after recovery
  • 33. Resiliency - Multiple AZ • Outage; can no longer satisfy quorum • Restore from backup and repair
  • 34. Resiliency - Region • Connectivity loss between regions – operate as island clusters until service restored • Repair data between regions
  • 38. NdBench - Netflix Data Benchmark
  • 45.
  • 48. C* as a Service - Architecture J E N K I N S W I N S T O N EUNOMIA Alert Atlas Mantis C* C* C* Priam Bolt Cluster Metadata Cluster Metadata/ Advisor Maintenance Remediation C* PAGE CDE Alert if needed Capacity Prediction Outlier detection C* Forklifter NDBench C* Explorer Client Drivers Log analysis