SlideShare a Scribd company logo
Apache Cassandra
via Docker
Chris Ballance
Diligent Corporation
@ballance
What we’ll cover
Docker Fundamentals
CAP Theorem
Cassandra Fundamentals
Demo of Cassandra deployed with Docker
Docker
Hosting & Coordination
Docker Hub
“GIT for Virtual Machines”
https://hub.docker.com/
Available Docker Images
• Ubuntu, Centos, Debian, Fedora
• Cassandra, mySQL, mongoDB
• Node, Java, Erlang, Ruby, Rails
• Wordpress
• Redis
• Hipache, NGINX
• Create your own!
CAP THEOREM
Project Delivery Triangle
Good
Fast Cheap
Good
Fast Cheap
Project Delivery Triangle
Consistent
Available Partition Tolerant
CAP Theorem
CAP Theorem
Consistent
Available Partition Tolerant
Available Partition Tolerant
Consistent
Consistent
Partition TolerantAvailable
Consistent
Available Partition Tolerant
Consistent
Available Partition Tolerant
Partition Tolerance is a myth!
We can guarantee Consistency, but not Availability.
We may time out or fail to return anything
Consistent
Available Partition Tolerant
We can guarantee Availability, but our data
may not be Consistent with other nodes
Consistent
Available Partition Tolerant
Our business needs will drive whether
we choose Consistency or Availability
Partition Tolerant
Implementations
Consistent
Available Partition TolerantAP
Cassandra, CouchDB
CP
MongoDB,
Big Table (GFS)
CA
RDBMS
SQL Server
A contrived example for
Consistency
A B
User writes S1 to Node B
A B
S0 S0
S1
User queries Node A
Nodes B & A have not sync’d
A B
S1S0
Query is blocked
until B syncs with A
A B
S0 S1
?
Once B syncs with A,
the query on A is unblocked
and returns S1 as expected
A B
S1 S1S1
S1 :-)
* The query could potentially time out
A contrived example for
Availability
A B
User writes S1 to Node B
A B
S1
S0 S0
User queries Node A for S1
A B
S1S0
Query returns current state of
A, but is not consistent with B
A B
S1
S0
S0 S1
?
A later query of A yields S1 previously written to B.
Eventual consistency has been achieved.
Idempotence
Event Sourcing
Local Caching
Queueing
A B
S1
S1
S1 S1
:-)
Cassandra
History
Developed for Facebook Inbox Search
Created by one of the authors of Amazon Dynamo
Released as Open Source in 2008
Became an Apache Incubator project in 2009
Graduated to an Apache Top Level Project
Who is using Cassandra
in production today?
Twitter
Netflix
Credit Suisse
Cisco
Many more…
http://PlanetCassandra.org/companies/
Benefits
Linear Scalability
Data Sets can be larger than available memory
Multi-master
Built-in support for handling multiple data centers
Decentralized & Distributed - No single point of failure
Integrated caching
Consistency options can be tuned through configuration
Supports MapReduce
Familiar query syntax - CQL (Cassandra Query Language)
Designed for sparse loading of loosely typed data
Linear Scalability
Challenges
JOINs are not supported! (not unique to Cassandra)
Not for financial data (eventual consistency)
Tooling is not yet as mature as Oracle or SQL Server
Why NOSQL?
Multiple persistence
strategies
Use the right tool for the job. Sometimes more than
one tool is appropriate. NoSQL can work in
conjunction with SQL solutions. For example, you
might have trasactions you continue to store in a
Relational Database, and build your new user social
graph data in a NoSQL solution. Can’t decide on a
local database for a mobile app? Sometimes the
best route is to just persist the whole thing as JSON
on disk until you need something faster.
Total Cost of Ownership
(TCO)
Let’s face it, SQL Server is expensive. If you need it,
and can fully justify the cost, then it might be right for
you. But as we’ve discussed, RDBMS can be a
crutch and a default choice for persistence layers.
Defaults are just that, default. They’re a catch-all that
is rarely the best choice unless you’re resolving
generic default problems
Write (consistency two)
Write (consistency two)
Client
Write (consistency two)
Client
Write (consistency two)
Client
Coordinator
Write (consistency two)
Client
Coordinator
Replicant
Write (consistency two)
Client
Coordinator
Replicant
Replicant
Write (consistency two)
Client
Coordinator
Replicant
Replicant
Confirm
Write (consistency two)
Client
Coordinator
Replicant
Replicant
Time-out
Confirm
Write (consistency two)
Client
Coordinator
Replicant
Replicant
Time-out
Confirm
Replicant
Write (consistency two)
Client
Coordinator
Replicant
Replicant
Time-out
Confirm
Confirm Replicant
Write (consistency two)
Client
Coordinator
Replicant
Replicant
Success
Time-out
Confirm
Confirm Replicant
Read (consistency two)
Client
Read (consistency two)
Client
Read (consistency two)
Client
Coordinator
Read (consistency two)
Client
Coordinator
Read (consistency two)
Client
Coordinator
Chosen Node
Chosen Node
Read (consistency two)
Client
Coordinator
Chosen Node
Chosen Node
Read (consistency two)
Client
Coordinator
Inconsistent
Node
Chosen Node
Read (consistency two)
Client
Coordinator
Chosen Node
Inconsistent
Node
Read (consistency two)
Client
Coordinator
Chosen Node
Chosen Node
Inconsistent
Node
Read (consistency two)
Client
Coordinator
Chosen Node
Chosen Node
Inconsistent
Node
Read (consistency two)
Client
Coordinator
Chosen Node
Chosen Node
Inconsistent
Node
Success
On-premise + Cloud
Consistency Modes
ALL - Every node must have the data
QUORUM - Most nodes must have the data
ONE - At least one node must have the data
TWO - At least two nodes must have the data
THREE - At least three Nodes must have the data
ANY - Any node has the data
EACH_QUORUM - Each datacenter must have a quorum
LOAL_QUORUM - Each node in the datacenter handling the request must have
a quorum
*Quorum = (replication_factor / 2 ) + 1
Calculating Consistency
R + W > N
R —> Read level consistency
W —> Write level consistency
N —> Number of replicas of the data
Data Replication
SimpleStrategy - Single data center
NetworkTopologyStrategy - Recommended
strategy for multiple data centers. Provides
Cassandra with info about the location of nodes by
rack and datacenter
Deploying Cassandra with Docker
Demo
Additional Resources
Apache Cassandra
http://cassandra.apache.org/
Docker
https://www.docker.com/what-docker
Docker Hub
https://hub.docker.com/explore/
Cassandra for Developers - Paul O’Fallon
https://www.pluralsight.com/courses/cassandra-developers
DBA’s Guid to NoSQL: Apache Cassandra (Free eBook)
http://is.gd/CassandraFreeEbook
Cassandra 3.0 - DataStax PDF
http://docs.datastax.com/en/cassandra/3.0/pdf/cassandra30.pdf
Questions?

More Related Content

PDF
Cassandra on Docker
PDF
Cassandra and docker
PDF
Cassandra and Docker Lessons Learned
PDF
Scaling and Managing Cassandra with docker, CoreOS and Presto
PDF
Docker Container Orchestration
PDF
Introduction To Docker
PDF
Docker and Containers for Development and Deployment — SCALE12X
ODP
Docker - The Linux Container
Cassandra on Docker
Cassandra and docker
Cassandra and Docker Lessons Learned
Scaling and Managing Cassandra with docker, CoreOS and Presto
Docker Container Orchestration
Introduction To Docker
Docker and Containers for Development and Deployment — SCALE12X
Docker - The Linux Container

What's hot (20)

PDF
Docker from A to Z, including Swarm and OCCS
PDF
How we dockerized a startup? #meetup #docker
PDF
Introduction openstack-meetup-nov-28
PDF
Immutable infrastructure with Docker and containers (GlueCon 2015)
PPTX
Introduction to Docker
PDF
Basic docker for developer
PDF
Docker Tips And Tricks at the Docker Beijing Meetup
PPTX
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
PDF
Orchestrating Docker containers at scale
PDF
Docker and containers : Disrupting the virtual machine(VM)
PDF
Docker introduction
PDF
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
PPTX
Consuming Cinder from Docker
PPTX
Docker Ecosystem on Azure
PPTX
Docker for the new Era: Introducing Docker,its components and tools
PPTX
Hypervisor "versus" Linux Containers with Docker !
PDF
Visualising Basic Concepts of Docker
PDF
Introduction to Docker - Docker workshop @Twitter
PDF
Bare Metal to OpenStack with Razor and Chef
PDF
Docker Introduction + what is new in 0.9
Docker from A to Z, including Swarm and OCCS
How we dockerized a startup? #meetup #docker
Introduction openstack-meetup-nov-28
Immutable infrastructure with Docker and containers (GlueCon 2015)
Introduction to Docker
Basic docker for developer
Docker Tips And Tricks at the Docker Beijing Meetup
Scaling Your App With Docker Swarm using Terraform, Packer on Openstack
Orchestrating Docker containers at scale
Docker and containers : Disrupting the virtual machine(VM)
Docker introduction
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Consuming Cinder from Docker
Docker Ecosystem on Azure
Docker for the new Era: Introducing Docker,its components and tools
Hypervisor "versus" Linux Containers with Docker !
Visualising Basic Concepts of Docker
Introduction to Docker - Docker workshop @Twitter
Bare Metal to OpenStack with Razor and Chef
Docker Introduction + what is new in 0.9
Ad

Viewers also liked (20)

PDF
DataStax: Dockerizing Cassandra on Modern Linux
PDF
Cassandra Tutorial
PPTX
Building blocks of e-commerce sites
PDF
[Hadoop] NexR Terapot: Massive Email Archiving
PDF
Apache Cassandra Management
PDF
Cassandra Front Lines
PDF
Cassandra-as-a-Service
PDF
Cassandra Bootstap from Backups
PDF
Micro-batching: High-performance writes
PDF
Bucket List Item #1246
PPT
Фантастика и романтика (А. Грин, А. Азимов, Д. Амонд) Проблемы, волнующие че...
PPTX
Scaling DataStax in Docker
PDF
Securing Cassandra
PDF
Multi-Region Cassandra Clusters
PDF
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
PPTX
Cassandra Metrics
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Cassandra on Docker @ Walmart Labs
PPTX
API Design Best Practices & Tech Talk : API Craft Meetup @ Apigee
PPTX
Managing Objects and Data in Apache Cassandra
DataStax: Dockerizing Cassandra on Modern Linux
Cassandra Tutorial
Building blocks of e-commerce sites
[Hadoop] NexR Terapot: Massive Email Archiving
Apache Cassandra Management
Cassandra Front Lines
Cassandra-as-a-Service
Cassandra Bootstap from Backups
Micro-batching: High-performance writes
Bucket List Item #1246
Фантастика и романтика (А. Грин, А. Азимов, Д. Амонд) Проблемы, волнующие че...
Scaling DataStax in Docker
Securing Cassandra
Multi-Region Cassandra Clusters
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Cassandra Metrics
Introduction to DataStax Enterprise Graph Database
Cassandra on Docker @ Walmart Labs
API Design Best Practices & Tech Talk : API Craft Meetup @ Apigee
Managing Objects and Data in Apache Cassandra
Ad

Similar to Cassandra via-docker (20)

PPTX
BigData Developers MeetUp
PPTX
Learn Cassandra at edureka!
PDF
About "Apache Cassandra"
PDF
Cassandra Fundamentals - C* 2.0
PPT
NOSQL Database: Apache Cassandra
PPTX
Apache Cassandra at the Geek2Geek Berlin
PPT
5266732.ppt
PDF
Deep Dive into Cassandra
PPTX
final demo 1.pptx about Property rental system
PPTX
Cassandra & Python - Springfield MO User Group
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
PPTX
Cassandra training
PDF
An Introduction to Apache Cassandra
PPTX
Cassandra tutorial
ODP
Intro to cassandra
PDF
Introduction to Cassandra
PPTX
Scaling opensimulator inventory using nosql
PPTX
An Introduction to Cassandra - Oracle User Group
PDF
Cassandra Core Concepts
PPTX
cassandra_presentation_final
BigData Developers MeetUp
Learn Cassandra at edureka!
About "Apache Cassandra"
Cassandra Fundamentals - C* 2.0
NOSQL Database: Apache Cassandra
Apache Cassandra at the Geek2Geek Berlin
5266732.ppt
Deep Dive into Cassandra
final demo 1.pptx about Property rental system
Cassandra & Python - Springfield MO User Group
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Cassandra training
An Introduction to Apache Cassandra
Cassandra tutorial
Intro to cassandra
Introduction to Cassandra
Scaling opensimulator inventory using nosql
An Introduction to Cassandra - Oracle User Group
Cassandra Core Concepts
cassandra_presentation_final

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Encapsulation theory and applications.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Spectroscopy.pptx food analysis technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation_ Review paper, used for researhc scholars
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Encapsulation theory and applications.pdf
Cloud computing and distributed systems.
Spectroscopy.pptx food analysis technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I

Cassandra via-docker

Editor's Notes

  • #7: Before I get into CAP theorem, I’d like to start off with another very similar triangle you’re already familiar with…
  • #8: Before I get into CAP theorem, I’d like to start off with another very similar triangle you’re already familiar with…
  • #10: CAP Theorem is similar. We have three competing interests
  • #11: Choose two
  • #12: Consistency - Always reads the most recent write - One version of the truth - But, if nodes are partitioned, we cannot guarantee consistency. - We can wait until we’re consistent, but in the meantime we may timeout and cannot continue.
  • #13: Every (unfailed) node always returns query results Eventually consistent No errors or timeouts, but data may be stale If partitioned, a node can return its current state, but it may be an old version of the truth
  • #14: Are all nodes always consistent? What is the lag time between node updates? One version of the truth Transactions are atomic
  • #15: -Be dramatic… PARTITION TOLERANCE IS A MYTH!!1! -This is just the nature of networks. Nothing is guaranteed. -So really, we just have two competing interests, Consistency, and Availability. -Choosing which to favor cannot be an entirely technical decision, but something to discuss with your stakeholders and it depends on the type of app you’re building.
  • #16: Once choice would be to favor Consistency. While this isn’t as popular with most workflows, it has its place.
  • #17: -Another choice would be to favor Availability. You’ll find this is very common in social media, and media in general.
  • #20: Here’s an example where we’re favoring consistency. We have a stick guy with data to write, and two nodes to write to and read from.
  • #21: S1 is going to be our payload, and we send it to node B.
  • #22: After we’ve sent our write (S1) to node B, we query Node A for the value of S. We might not even know that we’re being routed to node A for the read when we just wrote to node B. This can be even more confusing to even savvy users if this redirection is fully transparent behind a load balancer (and it typically is).
  • #23: Stick man is not a patient fellow and he’s tapping his foot, waiting for his query to return. He’s probably just going to assume the system is broken after a few seconds…
  • #24: Finally, Node A is able to synchronize with Node B to get S1 and return it to Stick guy. There’s a real risk of starving Stuck guy, and he’s already looking mighty thin.
  • #25: Now let’s look at a different approach. This time, we will favor Availability over Consistency.
  • #26: Our buddy, Stick guy, again sends S1 to Node B.
  • #27: Now he queries Node A.
  • #28: He immediately gets a reply, but it is stale data, since Nodes A & B have not yet synchronized. We don’t have to wait for the query to return, but we do get stale data that is older than what we have already sent out.
  • #29: On the later query, stick guy gets back the value S1 that he is expecting, consistent with what he previously sent to Node B.