SlideShare a Scribd company logo
noSQL choices
What is mySQL?
noSQL choices
What is noSQL?
noSQL choices
Types of noSQL databases
noSQL choices
Why noSQL?
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
Differences between noSQL and
MYSQL
Aggregated data vs tuples
noSQL choices
ACID vs BASE transactions
• A – Atomicity
• C – Consistency
• I – Isolation
• D - Durabilty
noSQL choices
Schema vs Schema-less
The 5 main data stores
• Relational Databases
• Key-store
• Document Databases
• Graph Stores
• Column Stores
Relational Databases
AKA RDBMS
noSQL choices
Why is it good?
• Super flexible
• Proven to work, dominant in the market for 3
years
• Robust, Stable
• Very consistent
• Follows ACID transitions, making it industry
standard
Why is it bad?
• Strongly typed columns
• Inefficient with high volumes of data
• Not designed for clusters
• ONLY EFFICIENT WITH STRUCTRED DATA
• Vertical scaling, need to buy bigger computer
to process bigger data
mySQL
NOSQL databases
Key-value stores
noSQL choices
noSQL choices
Why is it good?
• Hyper fast data storing and retrievals
• Good for storing sessions from users
– User profiles on forums
– Shopping carts on websites
Why is it bad?
• Can’t query for values within the values
• Need to know the key to properly query
Examples of key-stores
• CouchDB
• Aerospike
• Hyperdex
• Flare
• Dynamo
• Redis
Most popular key-store: Redis
• Able to write 114293.71 requests per second
• Able to read 81234.77 requests per second
• https://redis-
docs.readthedocs.org/en/latest/Benchmarks.
html
Companies that use Redis
• Twitter
• Github
• Pinterest
• Snapchat
• Flickr
• Hulu
• Vine
• Imgur
• Craigslist
Document Databases
noSQL choices
noSQL choices
Why is it good?
• Very easy to write up
• Turn objects directly into Json files and easily turn Json
files into objects
• Easy to store data, documents contain
whatever key and value you want
• No schema
• Documents are independent units, easy to
distribute
• No need for data to be related at all
Why is it good? (cont)
• Very, very programmer friendly
• Good for:
– Event logging
– Content managing systems
– E-commerce applications
– Real-time analytics
Why is it bad?
• Tends to struggle when database is too big.
• Not good at handling data that are very
related to each other
• Not designed to handle cross-document operations
• Can’t slice data
Examples of document stores
• Mongo DB
• lotusNotes
• Apache Couch DB
Most popular Document Store: Mongo
DB
Companies that use MongoDB
• Expedia
• The Weather Channel
• Forbes
• Otto
Graph Stores
noSQL choices
noSQL choices
“If you can whiteboard it, you can
graph it”
Why is it good?
• Well suited for analyzing interconnections
• Very good for data that involve complex
relationships
• High interest in mining social media data
• Used for creating “recommended products”
on sales websites
Why is it bad?
• Not good at updating all, or a subset of
entities
• Changing a property on all nodes in not a
straight-forward approach
• Some databases may not be able to handle
large amounts of data
Most popular graph database: Neo4j
Companies that use Neo4j
• Ebay
• Tomtom
• Hp
• Walmart
• eHarmony
Column Stores
noSQL choices
Row vs Column store
Why is it good?
• Designed for gigantic amounts of data
• Far better than row store, doesn’t waste time
searching
• 10,000 rows. If you are looking for a value in a
single column, no need to read every single row.
• Good for blogs, forums
• Event logging
• When you want to count and categorize certain
values
Why is it bad?
• Not good at working with systems that require
ACID transactions for writes and reads
• If the data set is small, it is better of to use
relational databases
– If you just need to look at rows, relational
database is much better
• Or a bunch of columns
Most popular Column-family store:
Cassandra
Companies that use Cassandra
• Walmart
• VMWare
• Unity
• Ubisoft
• Sony
• Reddit
• Paypal
• Netflix
• Nasa
• Instagram
• IBM
• Fedix
• Ebay
• Call of Duty
Scaling in Cassandra
• Horizontal scaling
• A matter of adding more nodes
• Add more nodes = cluster support more writes
and reads
• While clusters are working, you can still add
more nodes
Benchmark reports
Throughput
• Higher, the better
• The power of the database engine
Latency guidelines
• Excellent: < 1ms
• Very good: < 5ms
• Good: 5 – 10ms
• Poor: 10 – 20ms
• Bad: 20 – 100ms
• Really bad: 100 – 500ms
• OMG!: > 500ms
The University of Toronto test (2012)
• Cassandra 1.0.0 rc2
• Redis 2.4.2
• Hbase v0.90.4
• Voldmort 0.90.1
• MySQL – 5.5.17
The tests
• Workload R (95% reads)
• Workload RW (50% writes, 50% reads)
• Workload W (99% writes)
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
Conclusion
• Cassandra – Highest Scalability, suffered in
latency
• Redis – Highest initial troughput in read-
intensive workloads. Latency very low
Conclusion (cont.)
• MySQL – Almost the same as Cassandra,
latency is better
• HBase – Lowest throughput. Highest latency
for reading. Lower latency for writing
EndPoint: Benchmarking Top NoSQL
Databases
• Published: April 13, 2015
• Updated: May 27, 2015
• Cassandra (2.1.0)
• Couchbase (3.0.1)
• MongoDB (3.0)
• Hbase(0.98.6-1 and Hadoop (2.6.0))
What was updated?
• Cassandra’s and Hbase’s performance went far
up after updating results
Workload selection
• Workloads selected to be similar to today’s
applications
• Database nodes: (30.5 GB RAM, 4 CPU cores,
and a single volume of 800 GB of SSD local
storage)
• All data had no data loss
• Used data volumes that exceeded RAM
capacity on each node
Workloads
• Read-mostly: 95% read, 5% update ratio
• Read/write: 50% read, 50% update
• Read-modify-write: 50% read to 50% read-
modify-write ratio
• Insert mostly: 90% insert, 10% read
• 9 million operations per workload
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
noSQL choices
Problems
• Couchbase
• HBase
• MongoDB
Conclusion
• Cassandra outperform everyone heavily in
latency and troughput
• Hbase or CouchDB came second
• MongoDB came last in most test cases
Altoros: The NoSQL Technical
Comparison Report
• Published September 2014
• Pretty unbiased
• Couchbase: 2.5.1
• MongoDB: 2.6.1
• Cassandra: 2.0.8
noSQL choices
Workload B
• 50% read operations
• 40% update operations
• 5% insert operations
• 5% delete operations
• 50 million 1 KB records
noSQL choices
Workload B
• 3 million 10 KB records
noSQL choices
Workload C
• 90% read operations
• 8% update operations
• 1% insert operations
• 1% delete operations.
• 3 million 10 KB records (50 million records is
similar to workload B results)
noSQL choices
Scalability
noSQL choices
noSQL choices
noSQL choices
Conclusions
• Cassandra has amazing scalability again
• Cassandra is weaker at reading in terms of
latency
• MongoDB has the worst latency results in
almost all fields
Overall conclusion
• Can’t state a single noSQL structure beats all
• How about combining?
• POLYGOT PERSISTENCE
Example: Shopping Site
E-Commerce platform
Key/value
E-Commerce platform
Key/value
E-Commerce platform
RDBMS
Key/value
E-Commerce platform
RDBMS Document
Key/value
E-Commerce platform
RDBMS Document Graph
noSQL choices

More Related Content

PPTX
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
PPTX
What is NoSQL and CAP Theorem
PPT
MongoDB - An Agile NoSQL Database
PPTX
Does it Mix? Cassandra and RDBMS working together!
PDF
Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...
PDF
Rails on HBase
PDF
Thug feb 23 2015 Chen Zhang
PPTX
Using flash on the server side
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
What is NoSQL and CAP Theorem
MongoDB - An Agile NoSQL Database
Does it Mix? Cassandra and RDBMS working together!
Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...
Rails on HBase
Thug feb 23 2015 Chen Zhang
Using flash on the server side

What's hot (20)

PPTX
Chicago Data Summit: Geo-based Content Processing Using HBase
PPTX
RavenDB embedded at massive scales
PPTX
RavenDB 4.0
PDF
Auto-Scalable REST APIs with YAWP! and Google Cloud
PPTX
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
PDF
Stumbling stones when migrating from Oracle
 
PDF
Infinispan - Galder Zamarreno - October 2010
PDF
Sisense and Simba MongoDB Analytics Webinar
PDF
MySQL Storage Engines
PPTX
Fast Online Access to Massive Offline Data - SECR 2016
PPTX
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
KEY
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
PPTX
Infinispan, transactional key value data grid and nosql database
PDF
Polyglot Persistence - Two Great Tastes That Taste Great Together
PDF
Ramunas Balukonis. Research DWH
PPTX
SQL Azure - the good, the bad and the ugly.
PDF
How PostgreSQL became King
KEY
Make Life Suck Less (Building Scalable Systems)
PDF
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
PDF
Application Development with Apache Cassandra as a Service
Chicago Data Summit: Geo-based Content Processing Using HBase
RavenDB embedded at massive scales
RavenDB 4.0
Auto-Scalable REST APIs with YAWP! and Google Cloud
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Stumbling stones when migrating from Oracle
 
Infinispan - Galder Zamarreno - October 2010
Sisense and Simba MongoDB Analytics Webinar
MySQL Storage Engines
Fast Online Access to Massive Offline Data - SECR 2016
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, transactional key value data grid and nosql database
Polyglot Persistence - Two Great Tastes That Taste Great Together
Ramunas Balukonis. Research DWH
SQL Azure - the good, the bad and the ugly.
How PostgreSQL became King
Make Life Suck Less (Building Scalable Systems)
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Application Development with Apache Cassandra as a Service
Ad

Similar to noSQL choices (20)

PPTX
Selecting best NoSQL
PPTX
NoSql - mayank singh
PDF
NoSQL Databases
PPTX
Relational databases vs Non-relational databases
PPTX
Four NoSQL Databases You Should Know
PPTX
No SQL- The Future Of Data Storage
PPTX
NoSQL A brief look at Apache Cassandra Distributed Database
PDF
PPT
No sql databases explained
PPTX
No sql databases
PDF
NOsql Presentation.pdf
PPTX
NoSQL.pptx
PDF
Heterogenous Persistence
PPT
No sql
KEY
Austin NoSQL 2011-07-06
DOCX
Sql vs NO-SQL database differences explained
PDF
NoSQL
PPTX
An Intro to NoSQL Databases
PPTX
PPTX
Muskan Kumari (1276) Intro to NoSql.pptx. pptx
Selecting best NoSQL
NoSql - mayank singh
NoSQL Databases
Relational databases vs Non-relational databases
Four NoSQL Databases You Should Know
No SQL- The Future Of Data Storage
NoSQL A brief look at Apache Cassandra Distributed Database
No sql databases explained
No sql databases
NOsql Presentation.pdf
NoSQL.pptx
Heterogenous Persistence
No sql
Austin NoSQL 2011-07-06
Sql vs NO-SQL database differences explained
NoSQL
An Intro to NoSQL Databases
Muskan Kumari (1276) Intro to NoSql.pptx. pptx
Ad

noSQL choices

  • 6. Types of noSQL databases
  • 21. ACID vs BASE transactions
  • 22. • A – Atomicity • C – Consistency • I – Isolation • D - Durabilty
  • 25. The 5 main data stores • Relational Databases • Key-store • Document Databases • Graph Stores • Column Stores
  • 28. Why is it good? • Super flexible • Proven to work, dominant in the market for 3 years • Robust, Stable • Very consistent • Follows ACID transitions, making it industry standard
  • 29. Why is it bad? • Strongly typed columns • Inefficient with high volumes of data • Not designed for clusters • ONLY EFFICIENT WITH STRUCTRED DATA • Vertical scaling, need to buy bigger computer to process bigger data
  • 30. mySQL
  • 35. Why is it good? • Hyper fast data storing and retrievals • Good for storing sessions from users – User profiles on forums – Shopping carts on websites
  • 36. Why is it bad? • Can’t query for values within the values • Need to know the key to properly query
  • 37. Examples of key-stores • CouchDB • Aerospike • Hyperdex • Flare • Dynamo • Redis
  • 39. • Able to write 114293.71 requests per second • Able to read 81234.77 requests per second • https://redis- docs.readthedocs.org/en/latest/Benchmarks. html
  • 40. Companies that use Redis • Twitter • Github • Pinterest • Snapchat • Flickr • Hulu • Vine • Imgur • Craigslist
  • 44. Why is it good? • Very easy to write up • Turn objects directly into Json files and easily turn Json files into objects • Easy to store data, documents contain whatever key and value you want • No schema • Documents are independent units, easy to distribute • No need for data to be related at all
  • 45. Why is it good? (cont) • Very, very programmer friendly • Good for: – Event logging – Content managing systems – E-commerce applications – Real-time analytics
  • 46. Why is it bad? • Tends to struggle when database is too big. • Not good at handling data that are very related to each other • Not designed to handle cross-document operations • Can’t slice data
  • 47. Examples of document stores • Mongo DB • lotusNotes • Apache Couch DB
  • 48. Most popular Document Store: Mongo DB
  • 49. Companies that use MongoDB • Expedia • The Weather Channel • Forbes • Otto
  • 53. “If you can whiteboard it, you can graph it”
  • 54. Why is it good? • Well suited for analyzing interconnections • Very good for data that involve complex relationships • High interest in mining social media data • Used for creating “recommended products” on sales websites
  • 55. Why is it bad? • Not good at updating all, or a subset of entities • Changing a property on all nodes in not a straight-forward approach • Some databases may not be able to handle large amounts of data
  • 56. Most popular graph database: Neo4j
  • 57. Companies that use Neo4j • Ebay • Tomtom • Hp • Walmart • eHarmony
  • 60. Row vs Column store
  • 61. Why is it good? • Designed for gigantic amounts of data • Far better than row store, doesn’t waste time searching • 10,000 rows. If you are looking for a value in a single column, no need to read every single row. • Good for blogs, forums • Event logging • When you want to count and categorize certain values
  • 62. Why is it bad? • Not good at working with systems that require ACID transactions for writes and reads • If the data set is small, it is better of to use relational databases – If you just need to look at rows, relational database is much better • Or a bunch of columns
  • 63. Most popular Column-family store: Cassandra
  • 64. Companies that use Cassandra • Walmart • VMWare • Unity • Ubisoft • Sony • Reddit • Paypal • Netflix • Nasa • Instagram • IBM • Fedix • Ebay • Call of Duty
  • 65. Scaling in Cassandra • Horizontal scaling • A matter of adding more nodes • Add more nodes = cluster support more writes and reads • While clusters are working, you can still add more nodes
  • 67. Throughput • Higher, the better • The power of the database engine
  • 68. Latency guidelines • Excellent: < 1ms • Very good: < 5ms • Good: 5 – 10ms • Poor: 10 – 20ms • Bad: 20 – 100ms • Really bad: 100 – 500ms • OMG!: > 500ms
  • 69. The University of Toronto test (2012) • Cassandra 1.0.0 rc2 • Redis 2.4.2 • Hbase v0.90.4 • Voldmort 0.90.1 • MySQL – 5.5.17
  • 70. The tests • Workload R (95% reads) • Workload RW (50% writes, 50% reads) • Workload W (99% writes)
  • 77. Conclusion • Cassandra – Highest Scalability, suffered in latency • Redis – Highest initial troughput in read- intensive workloads. Latency very low
  • 78. Conclusion (cont.) • MySQL – Almost the same as Cassandra, latency is better • HBase – Lowest throughput. Highest latency for reading. Lower latency for writing
  • 79. EndPoint: Benchmarking Top NoSQL Databases • Published: April 13, 2015 • Updated: May 27, 2015 • Cassandra (2.1.0) • Couchbase (3.0.1) • MongoDB (3.0) • Hbase(0.98.6-1 and Hadoop (2.6.0))
  • 80. What was updated? • Cassandra’s and Hbase’s performance went far up after updating results
  • 81. Workload selection • Workloads selected to be similar to today’s applications • Database nodes: (30.5 GB RAM, 4 CPU cores, and a single volume of 800 GB of SSD local storage) • All data had no data loss • Used data volumes that exceeded RAM capacity on each node
  • 82. Workloads • Read-mostly: 95% read, 5% update ratio • Read/write: 50% read, 50% update • Read-modify-write: 50% read to 50% read- modify-write ratio • Insert mostly: 90% insert, 10% read • 9 million operations per workload
  • 92. Conclusion • Cassandra outperform everyone heavily in latency and troughput • Hbase or CouchDB came second • MongoDB came last in most test cases
  • 93. Altoros: The NoSQL Technical Comparison Report • Published September 2014 • Pretty unbiased • Couchbase: 2.5.1 • MongoDB: 2.6.1 • Cassandra: 2.0.8
  • 95. Workload B • 50% read operations • 40% update operations • 5% insert operations • 5% delete operations • 50 million 1 KB records
  • 97. Workload B • 3 million 10 KB records
  • 99. Workload C • 90% read operations • 8% update operations • 1% insert operations • 1% delete operations. • 3 million 10 KB records (50 million records is similar to workload B results)
  • 105. Conclusions • Cassandra has amazing scalability again • Cassandra is weaker at reading in terms of latency • MongoDB has the worst latency results in almost all fields
  • 106. Overall conclusion • Can’t state a single noSQL structure beats all • How about combining? • POLYGOT PERSISTENCE