SlideShare a Scribd company logo
Apache Cassandra at Talkbits
Max Alexejev
Moscow Cassandra Users Group
25 April 2013
What is talkbits?
Talkbits backend
Recursive call
Talkbits backend deployment diagram
Cassandra in EC2 at Talkbits

NetworkTopologyStrategy + EC2MultiRegionSnitch

1 DC, 3 racks (availability zones in S3 Region), N nodes per rack.
3N nodes total.

Data stored in 3 local copies, 1 per zone.

Write with LOCAL_QUORUM setting, read with 1 or 2.

m1.large nodes (2 cores, 4CU, 7.5Gb RAM).

Transaction log and data files are both on RAID0-ed ephemeral
drive (2 drives in array). Works for SSD or EC2 disks only!
Other typical setup options for EC2:

m1.xlarge (16Gb) / m2.4xlarge (64Gb) / hi1.4xlarge (SSD) nodes

EBS-backed data volumes (not recommended. use for
development only).
Cassandra consistency options
Definitions
N, R, W settings from Amazon Dynamo.
N – replication factor. Set per keyspace on keyspace creation.
Quorum: N / 2 + 1 (rounded down)
RW consistency options:
ANY, ONE, TWO, THREE, QUORUM, LOCAL_QUORUM &
EACH_QUORUM (multi-dc), ALL.
Set per query.
Cassandra consistency semantics
W + R > N
Ensures strong consistency. Read will always reflect the most recent
write.
R = W = [LOCAL_]QUORUM
Strong consistency. See quorum definition and formula above.
W + R <= N
Eventual consistency.
W = 1
Good for fire-n-forget writes: logs, traces, metrics, page views etc.
Cassandra backups to S3
Full backups
•Periodic snapshots (daily, weekly)
•Remove from local disk after upload to S3 to prevent disk
overflow
Incremental backups
•SSTable are compressed and copied to S3
•Happens on IN_MOVED_TO, IN_CLOSE_WRITE events
•Don’t turn on with leveled compaction (huge network traffic
to S3)
Continuous backups
•Compress and copy transaction log to S3 with short time
intervals (for example - 5, 30, 60 mins)
Cassandra backups to S3 - tools
TableSnap from SimpleGeo
https://github.com/Instagram/tablesnap (most up-to-date fork)
3 simple Python scripts is the whole tool (tablesnap, tableslurp,
tablechop). Allows to upload SSTables in real-time, restore and remove
old backups uploads from S3.
Priam from Netflix
https://github.com/Netflix/Priam
Full-blown web application. Requires servlet container to run and
depends on Amazon SimpleDB service for distributed token
management.
Contacts
Max Alexejev
http://ru.linkedin.com/pub/max-alexejev/51/820/ab9
http://www.slideshare.net/MaxAlexejev/
malexejev@gmail.com

More Related Content

PDF
An Introduction to Priam
PDF
Cassandra&map reduce
PPTX
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
ODP
Clock
PDF
Gnocchi v3 brownbag
PDF
My Learnings on Setting up a Kubernetes Cluster on AWS using Kubernetes Opera...
PDF
Alexander Ignatyev "MapReduce infrastructure"
PDF
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
An Introduction to Priam
Cassandra&map reduce
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Clock
Gnocchi v3 brownbag
My Learnings on Setting up a Kubernetes Cluster on AWS using Kubernetes Opera...
Alexander Ignatyev "MapReduce infrastructure"
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better

What's hot (18)

PDF
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
PDF
Gnocchi v4 (preview)
PDF
Gnocchi v4 - past and present
PDF
Gnocchi v3
PPTX
Spark Gotchas and Lessons Learned
PPT
Cassandra 1.2 by Eddie Satterly
PPTX
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
PDF
Cassandra 2.1 boot camp, Compaction
PDF
JavaCro'15 - Big Data in a DIY home - Marko Švaljek
PPTX
R user-group-2011-09
ODP
bup backup system (2011-04)
PPT
Avi Apelbaum - RAC
PPTX
ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019
PPTX
NoSql with cassandra
PPTX
R user group 2011 09
PPTX
MongoDB Backup & Disaster Recovery
PDF
Gnocchi Profiling 2.1.x
PDF
Galaxy CloudMan performance on AWS
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
Gnocchi v4 (preview)
Gnocchi v4 - past and present
Gnocchi v3
Spark Gotchas and Lessons Learned
Cassandra 1.2 by Eddie Satterly
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Cassandra 2.1 boot camp, Compaction
JavaCro'15 - Big Data in a DIY home - Marko Švaljek
R user-group-2011-09
bup backup system (2011-04)
Avi Apelbaum - RAC
ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019
NoSql with cassandra
R user group 2011 09
MongoDB Backup & Disaster Recovery
Gnocchi Profiling 2.1.x
Galaxy CloudMan performance on AWS
Ad

Viewers also liked (12)

PPTX
Психология и юзабилити электронной коммерции
PPTX
3rd Moscow cassandra meetup (Fast In-memory Analytics Over Cassandra Data )
PDF
PPTX
NoSQL: what's under the hood?
PPT
Cassandra at talkbits
PDF
Redis
PPTX
Digging Cassandra Cluster
PPTX
Scalable Application Development on AWS
PDF
Sasi, cassandra on full text search ride
PDF
О современном состоянии юзабилити-инженерии
PDF
Плоский и традиционный дизайн интернет-сайтов: сравнительная оценка эффективн...
PDF
Плоский дизайн: юзабилити-экспертиза
Психология и юзабилити электронной коммерции
3rd Moscow cassandra meetup (Fast In-memory Analytics Over Cassandra Data )
NoSQL: what's under the hood?
Cassandra at talkbits
Redis
Digging Cassandra Cluster
Scalable Application Development on AWS
Sasi, cassandra on full text search ride
О современном состоянии юзабилити-инженерии
Плоский и традиционный дизайн интернет-сайтов: сравнительная оценка эффективн...
Плоский дизайн: юзабилити-экспертиза
Ad

Similar to Apache Cassandra at TalkBits (20)

PPTX
Dynamo cassandra
PDF
Cassandra
PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
PDF
Apache Cassandra multi-datacenter essentials
PDF
Cassandra overview
PPT
in this ppt the basic details of cassandra database
PPTX
final demo 1.pptx about Property rental system
PPT
6.1-Cassandra.ppt
PPT
Cassandra
PPT
6.1-Cassandra.ppt
PDF
Cassandra 101
PPTX
Cassandra & Python - Springfield MO User Group
PDF
Cassandra for Sysadmins
PPT
NOSQL Database: Apache Cassandra
PDF
Cassandra
PDF
Introduction to Apache Cassandra
PDF
A Quick Look At Cassandra
PPT
5266732.ppt
PDF
Cassandra multi-datacenter operations essentials
PPTX
Cassandra
Dynamo cassandra
Cassandra
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra multi-datacenter essentials
Cassandra overview
in this ppt the basic details of cassandra database
final demo 1.pptx about Property rental system
6.1-Cassandra.ppt
Cassandra
6.1-Cassandra.ppt
Cassandra 101
Cassandra & Python - Springfield MO User Group
Cassandra for Sysadmins
NOSQL Database: Apache Cassandra
Cassandra
Introduction to Apache Cassandra
A Quick Look At Cassandra
5266732.ppt
Cassandra multi-datacenter operations essentials
Cassandra

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A comparative analysis of optical character recognition models for extracting...
Assigned Numbers - 2025 - Bluetooth® Document
gpt5_lecture_notes_comprehensive_20250812015547.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
The AUB Centre for AI in Media Proposal.docx
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx

Apache Cassandra at TalkBits

  • 1. Apache Cassandra at Talkbits Max Alexejev Moscow Cassandra Users Group 25 April 2013
  • 5. Cassandra in EC2 at Talkbits  NetworkTopologyStrategy + EC2MultiRegionSnitch  1 DC, 3 racks (availability zones in S3 Region), N nodes per rack. 3N nodes total.  Data stored in 3 local copies, 1 per zone.  Write with LOCAL_QUORUM setting, read with 1 or 2.  m1.large nodes (2 cores, 4CU, 7.5Gb RAM).  Transaction log and data files are both on RAID0-ed ephemeral drive (2 drives in array). Works for SSD or EC2 disks only! Other typical setup options for EC2:  m1.xlarge (16Gb) / m2.4xlarge (64Gb) / hi1.4xlarge (SSD) nodes  EBS-backed data volumes (not recommended. use for development only).
  • 6. Cassandra consistency options Definitions N, R, W settings from Amazon Dynamo. N – replication factor. Set per keyspace on keyspace creation. Quorum: N / 2 + 1 (rounded down) RW consistency options: ANY, ONE, TWO, THREE, QUORUM, LOCAL_QUORUM & EACH_QUORUM (multi-dc), ALL. Set per query.
  • 7. Cassandra consistency semantics W + R > N Ensures strong consistency. Read will always reflect the most recent write. R = W = [LOCAL_]QUORUM Strong consistency. See quorum definition and formula above. W + R <= N Eventual consistency. W = 1 Good for fire-n-forget writes: logs, traces, metrics, page views etc.
  • 8. Cassandra backups to S3 Full backups •Periodic snapshots (daily, weekly) •Remove from local disk after upload to S3 to prevent disk overflow Incremental backups •SSTable are compressed and copied to S3 •Happens on IN_MOVED_TO, IN_CLOSE_WRITE events •Don’t turn on with leveled compaction (huge network traffic to S3) Continuous backups •Compress and copy transaction log to S3 with short time intervals (for example - 5, 30, 60 mins)
  • 9. Cassandra backups to S3 - tools TableSnap from SimpleGeo https://github.com/Instagram/tablesnap (most up-to-date fork) 3 simple Python scripts is the whole tool (tablesnap, tableslurp, tablechop). Allows to upload SSTables in real-time, restore and remove old backups uploads from S3. Priam from Netflix https://github.com/Netflix/Priam Full-blown web application. Requires servlet container to run and depends on Amazon SimpleDB service for distributed token management.