SlideShare a Scribd company logo
UDA SE Tech Talk
Sep 20 2012




                   CASSANDRA
Agenda
•What is Cassandra?
•Architecture
•Why use Cassandra?
•Current Consumers/use cases?
•Limitations
•Demo

                2
CASSANDRA




    3
What is Cassandra?

-Free, Open Source, Distributed database
-Written by 2 Facebook Engineers
-Hybrid of
  -BigTable from Google
  -DynamoDB from Amazon
-For Structured, Semi-structured, Unstructured Data
-Designed to scale across commodity servers
-Assures AP out of CAP
  -(Consistency, Availability, Partition Tolerance)



                                    4
Architectural Overview
           •Independent nodes form cluster

           •All nodes peers

           •Gossip protocol to discover/connect nodes

           •Gossip process runs every second

           •Nodes exchanges state mesgs with max 3 nodes

           •Nodes exchange info about themselves/Others

           •Seed Nodes have cluster info in cassandra.yaml file

           •All nodes have same seed nodes in their config file

           •Nodes remember all gossip info since last restart




           5
Data Partitioning
     •Should be decided when setting up
     •Total Data managed by Cassandra like a Ring
     •Ring is divided into Ranges
     •Each node responsible for one or more
     •Before a node joins it is given a token
     •Token depends on
           •Node’s position
           •Range of data it is responsible for
     •Column Family partitioned based on row key
     •For given row key value, ring is walked clockwise
     until token is within range
     •2 High Level Partitioning Schemes:
                - Random Partitioner
                - Ordered Partitioner
     •Random Partitioner uses consistent hashing
     •Ordered Partitioner ensures sorted order.




            6
Data Partitioning (Contd)




   Partitioning in Multi-Data Center Clusters




                           7
Replication
         •Replication – process of storing copies of data

         •Replication Strategy

              •Number of Replicas

              •Distro of replicas over the nodes

         •Relies on cluster configured Snitch

         •SimpleStrategy - default

         •NetworkTopologyStrategy

         •Takes rack, data center into consideration




     8
Snitches…
    •The snitch is a configurable component of cluster

    •Defines how the nodes are grouped together

    •Types of snitches:


            •SimpleSnitch

            •BriskSimpleSnitch

            •RackInferringSnitch

            •PropertyFileSnitch

            •EC2Snitch

            •Dynamic Snitching



        9
Snitches…(contd)




         10
Why Use Cassandra?
•Very High Volume writes/reads
•All writes HAVE to succeed
•Horizontal scalability
•Commodity HW
•Integration with Hadoop/Hbase/HIVE
•SQL Like usage
•No Single point of failure
•Powerful dynamic Schema data model
  •Maximum flexibility
  •Performance at scale


                          11
Some Well known Current Customers
WebEx              Ooyala
Clearspring        Openwave
Cloudkick          OpenX
Cloudtalk          Plaxo
connex.io          Rackspace
Constant Contact   Reddit
Digg               SimpleGeo
Facebook           SoundCloud
IBM                Twitter
Netflix            Walmart Labs
Formspring         Yakaz
Mahalo.com

                   12
Limitations
Be aware of these differences when you move
  from a relational database to Cassandra.
• No transactions,
• No JOINs
• No foreign keys and keys are immutable
• Keys have to be unique
• Failed operations may leave changes
• Searching is complicated
• Super columns and order preserving partitioners are
  discouraged
• Healing from failure is manual
• It remembers deletes (until v0.8, at least)
                               13
DEMO
Questions?

More Related Content

PPTX
Aruman Cassandra database
PDF
Crash course intro to cassandra
PDF
Apache Cassandra in the Real World
PPTX
Cassandra an overview
ODP
Intro to cassandra
PDF
Diagnosing Problems in Production (Nov 2015)
PPTX
Managing Security At 1M Events a Second using Elasticsearch
PDF
Run Cloud Native MySQL NDB Cluster in Kubernetes
Aruman Cassandra database
Crash course intro to cassandra
Apache Cassandra in the Real World
Cassandra an overview
Intro to cassandra
Diagnosing Problems in Production (Nov 2015)
Managing Security At 1M Events a Second using Elasticsearch
Run Cloud Native MySQL NDB Cluster in Kubernetes

What's hot (20)

PPTX
Apache Cassandra at the Geek2Geek Berlin
PPTX
How to size up an Apache Cassandra cluster (Training)
PDF
PPTX
An Introduction to Cassandra - Oracle User Group
PDF
The Cassandra Distributed Database
PPTX
An Overview of Apache Cassandra
PPTX
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
PPTX
BigData Developers MeetUp
PPT
Cassandra architecture
ODP
Barcamp MySQL
PDF
Apache Cassandra and The Multi-Cloud by Amanda Moran
PPTX
mParticle's Journey to Scylla from Cassandra
PPTX
Cassandra training
PDF
NewSQL overview, Feb 2015
PPTX
MySQL HA Sharding-Fabric
PDF
Cassandra 101
PDF
MySQL NDB Cluster 8.0
PDF
MySQL Storage Engines
PDF
MySQL NDB Cluster 101
PPT
Apache Cassandra training. Overview and Basics
Apache Cassandra at the Geek2Geek Berlin
How to size up an Apache Cassandra cluster (Training)
An Introduction to Cassandra - Oracle User Group
The Cassandra Distributed Database
An Overview of Apache Cassandra
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
BigData Developers MeetUp
Cassandra architecture
Barcamp MySQL
Apache Cassandra and The Multi-Cloud by Amanda Moran
mParticle's Journey to Scylla from Cassandra
Cassandra training
NewSQL overview, Feb 2015
MySQL HA Sharding-Fabric
Cassandra 101
MySQL NDB Cluster 8.0
MySQL Storage Engines
MySQL NDB Cluster 101
Apache Cassandra training. Overview and Basics
Ad

Similar to Cassandra tech talk (20)

PPTX
Cassandra
PPTX
cybersecurity notes for mca students for learning
PPTX
CASSANDRA - Next to RDBMS
PDF
Scalability, Availability & Stability Patterns
PDF
High Scalability Toronto: Meetup #2
PDF
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
PPT
Introduction to cassandra
PDF
The MySQL High Availability Landscape and where Galera Cluster fits in
PPT
7. Key-Value Databases: In Depth
PPTX
Apache cassandra
PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PPTX
Cassandra - A Basic Introduction Guide
PDF
Using galera replication to create geo distributed clusters on the wan
PDF
Using galera replication to create geo distributed clusters on the wan
PPT
Using galera replication to create geo distributed clusters on the wan
PPTX
Cassandra for mission critical data
PPTX
BigData, NoSQL & ElasticSearch
PPTX
M6d cassandrapresentation
PPTX
DataStax TechDay - Munich 2014
PPTX
Cassandra presentation
Cassandra
cybersecurity notes for mca students for learning
CASSANDRA - Next to RDBMS
Scalability, Availability & Stability Patterns
High Scalability Toronto: Meetup #2
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Introduction to cassandra
The MySQL High Availability Landscape and where Galera Cluster fits in
7. Key-Value Databases: In Depth
Apache cassandra
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Cassandra - A Basic Introduction Guide
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Cassandra for mission critical data
BigData, NoSQL & ElasticSearch
M6d cassandrapresentation
DataStax TechDay - Munich 2014
Cassandra presentation
Ad

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
KodekX | Application Modernization Development
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation theory and applications.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KodekX | Application Modernization Development
Digital-Transformation-Roadmap-for-Companies.pptx
sap open course for s4hana steps from ECC to s4
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
Understanding_Digital_Forensics_Presentation.pptx
The AUB Centre for AI in Media Proposal.docx
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...

Cassandra tech talk

  • 1. UDA SE Tech Talk Sep 20 2012 CASSANDRA
  • 2. Agenda •What is Cassandra? •Architecture •Why use Cassandra? •Current Consumers/use cases? •Limitations •Demo 2
  • 4. What is Cassandra? -Free, Open Source, Distributed database -Written by 2 Facebook Engineers -Hybrid of -BigTable from Google -DynamoDB from Amazon -For Structured, Semi-structured, Unstructured Data -Designed to scale across commodity servers -Assures AP out of CAP -(Consistency, Availability, Partition Tolerance) 4
  • 5. Architectural Overview •Independent nodes form cluster •All nodes peers •Gossip protocol to discover/connect nodes •Gossip process runs every second •Nodes exchanges state mesgs with max 3 nodes •Nodes exchange info about themselves/Others •Seed Nodes have cluster info in cassandra.yaml file •All nodes have same seed nodes in their config file •Nodes remember all gossip info since last restart 5
  • 6. Data Partitioning •Should be decided when setting up •Total Data managed by Cassandra like a Ring •Ring is divided into Ranges •Each node responsible for one or more •Before a node joins it is given a token •Token depends on •Node’s position •Range of data it is responsible for •Column Family partitioned based on row key •For given row key value, ring is walked clockwise until token is within range •2 High Level Partitioning Schemes: - Random Partitioner - Ordered Partitioner •Random Partitioner uses consistent hashing •Ordered Partitioner ensures sorted order. 6
  • 7. Data Partitioning (Contd) Partitioning in Multi-Data Center Clusters 7
  • 8. Replication •Replication – process of storing copies of data •Replication Strategy •Number of Replicas •Distro of replicas over the nodes •Relies on cluster configured Snitch •SimpleStrategy - default •NetworkTopologyStrategy •Takes rack, data center into consideration 8
  • 9. Snitches… •The snitch is a configurable component of cluster •Defines how the nodes are grouped together •Types of snitches: •SimpleSnitch •BriskSimpleSnitch •RackInferringSnitch •PropertyFileSnitch •EC2Snitch •Dynamic Snitching 9
  • 11. Why Use Cassandra? •Very High Volume writes/reads •All writes HAVE to succeed •Horizontal scalability •Commodity HW •Integration with Hadoop/Hbase/HIVE •SQL Like usage •No Single point of failure •Powerful dynamic Schema data model •Maximum flexibility •Performance at scale 11
  • 12. Some Well known Current Customers WebEx Ooyala Clearspring Openwave Cloudkick OpenX Cloudtalk Plaxo connex.io Rackspace Constant Contact Reddit Digg SimpleGeo Facebook SoundCloud IBM Twitter Netflix Walmart Labs Formspring Yakaz Mahalo.com 12
  • 13. Limitations Be aware of these differences when you move from a relational database to Cassandra. • No transactions, • No JOINs • No foreign keys and keys are immutable • Keys have to be unique • Failed operations may leave changes • Searching is complicated • Super columns and order preserving partitioners are discouraged • Healing from failure is manual • It remembers deletes (until v0.8, at least) 13
  • 14. DEMO