SlideShare a Scribd company logo
http://www.flickr.com/photos/fpat/3328595063/




Gary Dusbabek
Full Disclosure:
I work on Apache
   Cassandra.

             http://www.flickr.com/photos/vmanso/4040094281/
My Goals For You
     Should I?
    How Then?
     Achtung!

                 http://www.flickr.com/photos/29707865@N05/2780508266/
http://www.flickr.com/photos/marc_smith/6246957472/




A Brief History of
   Databases
http://www.flickr.com/photos/watchsmart/1422274819/
1960s
New
    Stuff
Direct access
    storage
Replaced Tape

New Possibilities
                    http://www.flickr.com/photos/byrion/5264950510/
Navigational
 Databases
   (two kinds)
Hierarchical
   Parent-child
Network
Relationships
   Graph
1970s
Codd
Relational Model

                                                           Search by content

                                                               Good for query

                                                       Demands on processor

                                                        Rigid, fixed structures

                                                            Bad for modeling
http://www.flickr.com/photos/35536700@N07/3292544674
Today
Data needs
   have
 changed
     http://www.flickr.com/photos/franzhaas/6761917637/
Data needs
   have
 changed
     http://www.flickr.com/photos/franzhaas/6761917637/
Technology
                                                   has
                                                 changed
http://www.flickr.com/photos/neosnaps/2574417351/
http://www.flickr.com/photos/katclay/3935629242/




                              Choosing
Choosing
Technology
  is Hard
      ™
   Work .
Can you do it with a
relational database?

Is your DB falling apart?
   What do you need?
Where RDBMS Fall Apart
Scaling
SPoF
Sharding
Denormalizing

Availability
 Slave Systems   http://www.flickr.com/photos/horiavarlan/4681206711/
What do you need?
                                                          Reduced cost
                                                            Throughput
                                                             Availability
                                                         Recoverability
                                                            Correctness
                                                           Transactions
http://www.flickr.com/photos/ell-r-brown/5866777592/
                                                       Flexible Schema
What is NoSQL?

                                                        Flight



http://www.flickr.com/photos/24277960@N08/2609390563/
http://www.flickr.com/photos/taylar/4996955547/
http://www.flickr.com/photos/gromgull/611019520/
http://www.flickr.com/photos/igboo/2583174998/
What isn’t NoSQL?
                                               NoFlight




http://www.flickr.com/photos/alanvernon/3121751152/
http://www.flickr.com/photos/tomsaint/3209482579/
http://www.flickr.com/photos/pointnshoot/408384715/
http://www.flickr.com/photos/zigazou76/5846255426/
Considerations
  Fault Tolerance
    Recovery
    Replication
     Access
      Hooks
    Distributed
Considerations
      Data Model
  Query/Search model
Transactional Semantics
Read vs Write Throughput
Deployment/Management
Focus on a few
                         systems
                               MongoDB




                                                                                              Master-Slave
                                  Redis
FullyDistributed




                   Riak
                   HBase
                   Cassandra    http://www.flickr.com/photos/seier/2455551478/sizes/l/in/photostream/
MongoDB
Document Oriented

Naturally denormalized

Flexible schema
MongoDB
Programmer friendly
Many language drivers
Atomic on a single document
MongoDB
Real-time data
warehousing/analytics

Blocking/offline compaction

Complicated queries
MongoDB

db.foo.find({j: {$ne: 3}, k: {$gt: 10} });

db.foo.find( { name : "bob" , $or : [ { a : 1 } , { b : 2 } ] } )
MongoDB
Master-slave replication
          Asynchronous
Gives failover & data redundancy
       But not consistency
 Only master can receive writes

Makes atomic writes easy
Redis
  Real-time stats
tracking

 Wicked fast

 Collections built in
Redis
In-memory

Snapshots

Master-slave

RAM limited
Riak
Relationships, aka “Links”
Built-in MapReduce
Completely schemaless
No SPoF
Scales linearly
Tunable consistency
Riak
Pre- Post-Hooks
Configurable storage engines
REST access
Easy cluster balancing
Riak
Doesn’t keep data sorted
Erlang
Cassandra
Query language
Range queries
Datacenter/Rack aware
Hadoop integration
Configurable cacheing
Live schema changes
Cassandra
Some schema
Growing cluster isn’t fair
HBase
Coprocessors
Versioned cells (BigTable)
Hadoop integration
HBase
HadoopNameNode is SPoF

Schema maintenance downtime

Schema required up front

Complicated balancing
http://www.flickr.com/photos/annguyenphotography/3267723713/




No Silver Bullet
http://www.flickr.com/photos/nateone/3768979925
           /




   HBase
@gdusbabek

More Related Content

PPTX
Relational databases vs Non-relational databases
PDF
Intro to NoSQL
PDF
NoSQL: Death to Relational Databases(?)
PPTX
Nosql- Introduction for Beginners
PDF
SSE4.2の文字列処理命令の紹介
ZIP
NoSQL databases
PPTX
No sql introduction_v1.1.1
PDF
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
Relational databases vs Non-relational databases
Intro to NoSQL
NoSQL: Death to Relational Databases(?)
Nosql- Introduction for Beginners
SSE4.2の文字列処理命令の紹介
NoSQL databases
No sql introduction_v1.1.1
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt

Similar to Breaking the Relational Headlock: A Survey of NoSQL Datastores (20)

PPTX
DataStax C*ollege Credit: What and Why NoSQL?
PPTX
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
PPTX
Big Data (NJ SQL Server User Group)
PPT
SQL or NoSQL, that is the question!
PDF
The NoSQL Ecosystem
PDF
HPTS 2011: The NoSQL Ecosystem
PPTX
PDF
Scaling the Web: Databases & NoSQL
PDF
Finding the Right Data Solution for your Application in the Data Storage Hays...
PDF
Database Systems - A Historical Perspective
PDF
Mongodb my
PDF
MongoDB
PDF
NoSql and it's introduction features-Unit-1.pdf
KEY
Nosql-columbia-feb2011
PPTX
Introduction to Data Science NoSQL.pptx
PDF
Sql no sql
PPTX
NoSQLDatabases
PPTX
NoSQL and Couchbase
PPTX
The Rise of NoSQL and Polyglot Persistence
DataStax C*ollege Credit: What and Why NoSQL?
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Big Data (NJ SQL Server User Group)
SQL or NoSQL, that is the question!
The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
Scaling the Web: Databases & NoSQL
Finding the Right Data Solution for your Application in the Data Storage Hays...
Database Systems - A Historical Perspective
Mongodb my
MongoDB
NoSql and it's introduction features-Unit-1.pdf
Nosql-columbia-feb2011
Introduction to Data Science NoSQL.pptx
Sql no sql
NoSQLDatabases
NoSQL and Couchbase
The Rise of NoSQL and Polyglot Persistence
Ad

More from gdusbabek (15)

PPTX
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
PDF
How To (Not) Open Source - Javazone, Oslo 2014
PDF
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
PDF
Measure All the Things! - Austin Data Day 2014
PDF
Blueflood: Open Source Metrics Processing at CassandraEU 2013
PDF
Introduction to Blueflood at Berlin Buzzwords 2013
PDF
Rackspace Cloud Monitoring - Strata NYC
PPTX
Austin cassandra meetup
PPTX
How Rackspace Cloud Monitoring uses Cassandra
PPTX
Building Rackspace Cloud Monitoring
PPTX
Cassandra Codebase 2011
PPTX
Data Modeling with Cassandra Column Families
PPTX
Getting to Know the Cassandra Codebase
PPTX
Introduction to Cassandra (June 2010)
PPTX
Cassandra Presentation for San Antonio JUG
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
How To (Not) Open Source - Javazone, Oslo 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Measure All the Things! - Austin Data Day 2014
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Introduction to Blueflood at Berlin Buzzwords 2013
Rackspace Cloud Monitoring - Strata NYC
Austin cassandra meetup
How Rackspace Cloud Monitoring uses Cassandra
Building Rackspace Cloud Monitoring
Cassandra Codebase 2011
Data Modeling with Cassandra Column Families
Getting to Know the Cassandra Codebase
Introduction to Cassandra (June 2010)
Cassandra Presentation for San Antonio JUG
Ad

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
Teaching material agriculture food technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
Understanding_Digital_Forensics_Presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Teaching material agriculture food technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation

Breaking the Relational Headlock: A Survey of NoSQL Datastores

Editor's Notes

  • #2: Technical?Ask questionsDiscussion
  • #3: Open source fan
  • #5: Context
  • #8: Seeking didn’t kill you.
  • #9: TraverselinksFollow pointersNo notion of keysJust data
  • #10: Up and down
  • #11: Up down left right
  • #13: Edgar F CoddDominant by 90s
  • #14: Emphasize search, not navigationForeign keys are a bad model.Relationships not explicit.E-R diagrams not until mid to late 70s.
  • #17: Rackspace example
  • #18: Google File System 2003BigTable 2004
  • #19: Answer: Should I?Temptation – new startup makes a blog post saying “we like it.”HypeThis is Hawt! I should be using it.New shiny
  • #20: Fads aside…Mistakes not evident at first.
  • #21: Answer: Should I?Two questionsHypeNew shiny
  • #22: Fixed table spaceNotlinear – 2x space != 2x money.
  • #29: Relational impedance
  • #33: Datamodel is complexQueries are represented as JSON
  • #34: Datamodel is complexQueries are represented as JSON
  • #35: Does do sharding
  • #36: Like a memcache for lists and setsLive dataFast changing
  • #37: Snapshots - leave delta for data lossMaster/Slave - asynchronous replication
  • #38: Faithful Dynamo CloneMapReduce != Hadoop integrationHooks == BigTable Coprocessors
  • #39: Bitscask -> small data set (keys must fit in memory)InnoDB -> big data setMemory -> duhREST – easy for programmersBalancing – always 64 pieces.
  • #40: Sorted – poor scanning performance
  • #41: Dyanamo + BigTable
  • #44: Balancing –RegionServers + HDFS
  • #45: Will be more choices and better solutions: 205 Million Dollars of Funding For Big Data Startups (http://datascience101.wordpress.com/2012/02/28/funding-for-big-data-startups/)Accel PartnersIA Ventures