SlideShare a Scribd company logo
Needle Meet HaystackAdapting your data models for CassandraGary Dusbabek  •  Rackspace•  ICOODB 2010
OutlineFirst Things FirstColumn FamiliesTrade OffsProcedures & Best PracticesInternals
It’s all about scalability
We can all be friends
Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
2.TradeOffs
No Transactions
No Adhoc Queries
No Joins
No Flexible Indexes
Don’tPanic!
ScalabilityAvailabilityReplication & Backup
3. Procedures & Practices
Relational WayDefine entitiesNormalizeIdentify Many-to-manyQuery any way you want
How Come?ScarcityEfficiency
Cassandra WayKnow your appQueries firstDenormalize
Know Your App
Queries First
Nobody is Normal
Relational Example
Column Family Example
Column Family Example
Column Family Example
Column Family Example
Does it feel strange?
4. Internals
Sequential WritesAlways
Consistency Level
Partitioning
SlicesData Locality
SummaryThe goal is to scale
ColumnFamilies != Relational tables
Trade-offs: you win some, you lose some

More Related Content

PDF
NoSQL Database- cassandra column Base DB
PDF
CQL3 in depth
PDF
Cybersecurity-Serverless-Graph DB
PDF
Cassandra 2.0 to 2.1
PDF
How to find Zero day vulnerabilities
PDF
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
PDF
Usergrid Overview
PDF
Open Source Mobile Backend on Cassandra
NoSQL Database- cassandra column Base DB
CQL3 in depth
Cybersecurity-Serverless-Graph DB
Cassandra 2.0 to 2.1
How to find Zero day vulnerabilities
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
Usergrid Overview
Open Source Mobile Backend on Cassandra

Viewers also liked (6)

PPT
Docker and CloudStack
PDF
Advanced excel 2010 & 2013 updated Terrabiz
PDF
Key-Value Stores: a practical overview
PDF
Hbase: Introduction to column oriented databases
DOC
Sql queries with answers
Docker and CloudStack
Advanced excel 2010 & 2013 updated Terrabiz
Key-Value Stores: a practical overview
Hbase: Introduction to column oriented databases
Sql queries with answers
Ad

Similar to Data Modeling with Cassandra Column Families (20)

PPTX
Learn Cassandra at edureka!
PDF
Moving from a Relational Database to Cassandra: Why, Where, When, and How
PPTX
Migrating from a Relational Database to Cassandra: Why, Where, When and How
PPTX
An Introduction to Cassandra - Oracle User Group
PDF
cassandra
PPTX
Cassandra Overview
PPTX
Learning Cassandra NoSQL
PPTX
Using Cassandra with your Web Application
PPTX
Cassandra - A decentralized storage system
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
PDF
Cassandra Day NY 2014: From Proof of Concept to Production
PDF
Cassandra basics 2.0
PDF
Introduction to cassandra 2014
ODP
Intro to cassandra
PDF
Cassandra NoSQL Tutorial
PDF
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
PDF
How Do I Cassandra?
PPTX
Cassandra tutorial
PPTX
Introduction to cassandra
Learn Cassandra at edureka!
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Migrating from a Relational Database to Cassandra: Why, Where, When and How
An Introduction to Cassandra - Oracle User Group
cassandra
Cassandra Overview
Learning Cassandra NoSQL
Using Cassandra with your Web Application
Cassandra - A decentralized storage system
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Cassandra Day NY 2014: From Proof of Concept to Production
Cassandra basics 2.0
Introduction to cassandra 2014
Intro to cassandra
Cassandra NoSQL Tutorial
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
How Do I Cassandra?
Cassandra tutorial
Introduction to cassandra
Ad

More from gdusbabek (15)

PPTX
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
PDF
How To (Not) Open Source - Javazone, Oslo 2014
PDF
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
PDF
Measure All the Things! - Austin Data Day 2014
PDF
Blueflood: Open Source Metrics Processing at CassandraEU 2013
PDF
Introduction to Blueflood at Berlin Buzzwords 2013
PDF
Rackspace Cloud Monitoring - Strata NYC
PPTX
Austin cassandra meetup
PPTX
How Rackspace Cloud Monitoring uses Cassandra
PPTX
Breaking the Relational Headlock: A Survey of NoSQL Datastores
PPTX
Building Rackspace Cloud Monitoring
PPTX
Cassandra Codebase 2011
PPTX
Getting to Know the Cassandra Codebase
PPTX
Introduction to Cassandra (June 2010)
PPTX
Cassandra Presentation for San Antonio JUG
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
How To (Not) Open Source - Javazone, Oslo 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Measure All the Things! - Austin Data Day 2014
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Introduction to Blueflood at Berlin Buzzwords 2013
Rackspace Cloud Monitoring - Strata NYC
Austin cassandra meetup
How Rackspace Cloud Monitoring uses Cassandra
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Building Rackspace Cloud Monitoring
Cassandra Codebase 2011
Getting to Know the Cassandra Codebase
Introduction to Cassandra (June 2010)
Cassandra Presentation for San Antonio JUG

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Machine learning based COVID-19 study performance prediction
PDF
KodekX | Application Modernization Development
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MIND Revenue Release Quarter 2 2025 Press Release
“AI and Expert System Decision Support & Business Intelligence Systems”
Machine learning based COVID-19 study performance prediction
KodekX | Application Modernization Development
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
Review of recent advances in non-invasive hemoglobin estimation

Data Modeling with Cassandra Column Families

Editor's Notes

  • #4: It’s not about one data model vs. another.It’s not about one storage engine vs. another.Cassandra excels at replicating data and achieving high sustained write throughput.
  • #5: The right tool for the right job
  • #6: Shaped by distribution model
  • #7: Shaped by distribution model
  • #11: Shaped by distribution model
  • #12: Sparse – do not have to exist in every row.
  • #13: Flexible column namingYou define the sort orderNot required to have a specific column just because another row does
  • #14: Look familiar?
  • #15: Arise because of distribution model, not CF model.
  • #16: * Atomic @ CF row. Not isolated.* Large trans apps push down to node (shared nothing)* Guaranteeing ACID constraints across nodes is a hard problem.
  • #19: OTOH, you do get a lot of things:Data redundancyVery fast writes, fast reads
  • #20: Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • #21: Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  • #24: Focus on query & analysis.B+treesUpdate once*Cassandra typically becomes IO bound before becoming CPU bound.
  • #25: Not set in stone.Your application may require a different approach.
  • #26: Recognize non-starters: Is my dataset going to become Very Large? Will I need to sustain high write throughput?Also, what are the common operations? Optimize CFs for those operations.
  • #27: *columns sorted. Choose keys and columns.you need to think about how you plan to slice your data.Related data is close to reduce io
  • #28: DenormalizeUse the disk.Don’t be afraid to create another CF that duplicates some data.
  • #30: Composite column namesPainful updates of denormalized partsFast reads & insertions
  • #31: Key
  • #32: Normal attributes
  • #33: Composite column names.Pulling in relationshipsPainful updates. Denormalization is best when data doesn’t change.
  • #36: Commit log – separate diskMemtableSstable