SlideShare a Scribd company logo
A comparison between several NoSQL databases
                with comments and notes

                                             Bogdan George Tudorica, Cristian Bucur
                                Department for Economical Mathematics and Economical Informatics
                                               Petroleum-Gas University of Ploiesti
                                                        Ploiesti, Romania
                                                  tudorica_bogdan@yahoo.com


Abstract—This paper is trying to comment on the various NoSQL          transactions with rare write accesses, and not for heavy
(Not only Structured Query Language) systems and to make a             read/write workloads (which is often the case for these large
comparison (using multiple criteria) between them. The NoSQL           scale web services – we mean Google, Amazon, Facebook,
databases were created as a mean to offer high performance             Yahoo and such).
(both in terms of speed and size) and high availability at the price
of loosing the ACID (Atomic, Consistent, Isolated, Durable) trait          It seems that at least some of the major RDBMS producers
of the traditional databases in exchange with keeping a weaker         are learning something from this evolution (e.g. Microsoft
BASE (Basic Availability, Soft state, Eventual consistency)            introduced some NoSQL type features such as snapshot
feature. Remains to be seen which of the multiple solutions            isolation, although used at a single table level, into its newer
created since the official appearance of the NoSQL concept             RDBMS product labeled Azure; Oracle 11g is also containing
(which was defined in 1998 and reintroduced in 2009, around            a similar facility called Oracle Streams, but this one is limited
which moment several NoSQL solutions emerged; at the present           in the same way as the MS product, this time to a single
moment there are known over 120 such solutions) are really             instance [7]).
delivering on these promises of higher performance (although
several of them are already used with very good results).
                                                                                         II.   WHAT DO WE COMPARE
   Keywords-component;       database;    NoSQL;      performance;         In order to be able to compare a set of NoSQL solutions the
comparison                                                             first step should be to select / classify some products which are
                                                                       fulfilling similar purposes or have similar qualities / features.
                       I.    INTRODUCTION                                  For the moment there is no official taxonomy for this kind
    The concept described by the term NoSQL (meaning a                 of software although several attempts do exist.
database system which is distributed, may not require fixed
table schemas, usually avoids join operations, typically scales            First one is provided by Stefan Edlich on his page [8] and it
horizontally, does not expose a SQL interface and may be open          is providing the following categories:
source [1] – some are even using the term with the meaning of             A. Core NoSQL Systems, most of them created as
a completely non relational system) is also referred by the more       component systems for Web 2.0 services, with the following
academic sources as a form of structured storage                       subtypes:
[4][10][11][12] (although the terms may not be equivalent; the
relational databases also comply by the official definition of the        •    Wide Column Store / Column Families (Hadoop /
structured storage term and they are somehow opposite to the                   HBase, Cassandra, Hypertable, Cloudata, Amazon
NoSQL term).                                                                   SimpleDB, SciDB),
   One can not simply label the terms RDBMS and NoSQL as                  •    Document Store (CouchDB, MongoDB, Terrastore,
being the exact opposite. There do even exist some middleware                  ThruDB, OrientDB, RavenDB, Citrusleaf, SisoDB,
appliances (such as CloudTPS for Google’s BigTable and                         CloudKit, Perservere, Jackrabbit),
Amazon’s SimpleDB [17]) or various solutions (such as
                                                                          •    Key Value / Tuple Store (Azure Table Storage,
Percolator for Google’s BigTable [14] and an unnamed
                                                                               MEMBASE, Riak, Redis, Chordless, GenieDB,
prototype system for Google’s Hbase [7]) which are adding full
                                                                               Scalaris, Tokyo Cabinet / Tyrant, GT.M, Keyspace,
ACID features to some NoSQL systems.
                                                                               Berkeley DB, MemcacheDB, HamsterDB, Faircom C-
    It is certain that the NoSQL databases are one of the                      Tree, Mnesia, LightCloud, Pincaster, Hibari, Scality),
byproducts of the Web 2.0 era – they were really used only at
                                                                          •    Eventually Consistent Key Value Store (Amazon
the time when the designers of web services with very large
                                                                               Dynamo, Voldemort, Dynomite, KAI, SubRecord,
number of users discovered that the traditional relational
                                                                               Mo8onDb, Dovetaildb),
database management systems (RDBMS) are fit either for
small but frequent read/write transactions or for large batch
•    Graph Databases (Neo4J, Infinite Graph, Sones,                As it is not in authors’ intention to provide a NoSQL
        InfoGrid, HyperGraphDB, Trinity, AllegroGraph,            taxonomy in this paper, we will not tread further on the reasons
        Bigdata, DEX, OpenLink Virtuoso, VertexDB,                the two sources used for their results.
        FlockDB, Java Universal Network / Graph
                                                                      It is easy for one to see that the two taxonomies, although
        Framework, Sesame, Filament, OWLim, NetworkX,
                                                                  seemingly using the same reason (the manner of
        iGraph),
                                                                  implementation) are providing different results (products which
     B. Soft NoSQL Systems, most of them being older or           are in the same category in one taxonomy are listed in separate
newer systems which are not related to any Web 2.0 service but    categories in the other one, the categories labels and divisions
are sharing the traits being described as NoSQL characteristics   are different).
(A/N: some of them are having strong ACID / relational
                                                                      For this reason we decided to use as grouping criteria,
capabilities and, from this reason, they may be misplaced in a
                                                                  instead of a single property, an ad-hoc set composed of: main
list of NoSQL systems; further analysis may be needed on this
                                                                  intended usage, manner of implementation, ease of obtaining
subject), with the following subtypes:
                                                                  and testing. We only searched for open-source solutions,
   •    Object Databases (db4o, Versant, Objectivity,             having roughly the same number of “users” (we mean
        Gemstone, Progress, Starcounter, Perst, ZODB, NEO,        implementations in use), and with more or less the same size
        PicoLisp, Sterling, StupidDB, KiokuDB, Durus),            for the average and the largest installation and, if possible, with
                                                                  the same intended use.
   •    Grid & Cloud Database Solutions (GigaSpaces,
        Queplix, Hazelcast, Joafip, GridGain, Infinispan,             As such, from the multitude of NoSQL solutions available
        Coherence, eXtremeScale),                                 we restricted our research to a single type of NoSQL databases
                                                                  (meaning “the Wide Column Store / Column Families” subtype
   •    XML Databases (Mark Logic Server, EMC                     from the first taxonomy which is roughly equivalent with the
        Documentum xDB, Tamino, eXist, Sedna, BaseX,              “Key-value store” type from the second taxonomy) and from
        Xindice, Qizx, Berkeley DB XML),                          this set we took two of the products which have larger use at
   •    Multivalue Databases (U2, OpenInsight, OpenQM,            the present moment. The result was that we took into
        Globals),                                                 consideration for this study only Hbase and Cassandra (which,
                                                                  besides the qualities given earlier are also products from the
   •    other NoSQL related databases (IBM Lotus/Domino,          same family and based on the same framework – Hadoop).
        Intersystems Cache, eXtremeDB, ISIS Family,
                                                                     As some description of the selected solutions maybe in
        Prevayler, Yserial).
                                                                  order, here it is:
   Another taxonomy is provided by an unknown author on an
                                                                      “The Apache Hadoop software library is a framework that
wiki page [23] and provides the following categories of
                                                                  allows for the distributed processing of large data sets across
NoSQL databases:
                                                                  clusters of computers using a simple programming model. It is
   •    Document store (Apache Jackrabbit, Apache                 designed to scale up from single servers to thousands of
        CouchDB, Lotus Notes, MongoDB, MarkLogic                  machines, each offering local computation and storage. Rather
        Server, eXist, SimpleDB, Terrastore),                     than rely on hardware to deliver high-availability, the library
                                                                  itself is designed to detect and handle failures at the application
   •    Graph (AllegroGraph, Neo4j, DEX, FlockDB),                layer, so delivering a highly-available service on top of a
   •    Key-value store, with the following subtypes:             cluster of computers, each of which may be prone to
        Eventually‐consistent key‐value store (Cassandra,         failures.”[20]
        Dynamo, Hibari, Project Voldemort, Riak),                     “HBase is an open-source, distributed, versioned, column-
        Hierarchical key-value store (GT.M), Hosted services      oriented store modeled after Google' Bigtable: A Distributed
        (Freebase), Key-value cache in RAM (Citrusleaf            Storage System for Structured by Chang et al. Just as Bigtable
        database, memcached, Oracle Coherence, Redis, Tuple       leverages the distributed data storage provided by the Google
        space, Velocity), Key-value stores implementing the       File System, HBase provides Bigtable-like capabilities on top
        Paxos algorithm (Keyspace), Key-value stores on disk      of Hadoop.”[21]
        (BigTable, CDB, Citrusleaf database, Dynomite,
        Keyspace, membase, MemcacheDB, Redis, Tokyo                  “The Apache Cassandra Project develops a highly scalable
        Cabinet, TreapDB, Tuple space, MongoDB),                  second-generation distributed database, bringing together
        Multivalue databases (Extensible Storage Engine -         Dynamo's fully distributed design and Bigtable's
        ESE/NT,       OpenQM,       Revelation     Software's     ColumnFamily-based data model.”[19]
        OpenInsight, Rocket U2), Object database (db4o,               As a reference element we also took MySQL (also open-
        GemStone/S,       InterSystems     Caché,     JADE,       source, but full relational/SQL able) to see what is lost and
        Objectivity/DB, ObjectStore, Versant Object Database,     what is gained by using a NoSQL solution instead of a
        ZODB), Ordered key-value store (Berkeley DB, IBM          “classic” one.
        Informix C-ISAM, MemcacheDB, NMDB), Tabular
        (BigTable, Hbase, Hypertable, Mnesia), Tuple store
        (Apache River).
III.    A QUALITATIVE POINT OF VIEW                                    IV.     A QUANTITATIVE POINT OF VIEW
    One can compare some items based on qualitative or                       For quantitative evaluation criteria we used two different
quantitative criteria. As such we will start by comparing what           sets, one related to size and one related to performance.
features are available for the NoSQL databases taken into
account. The features we searched for are:                               A. Common instalations size measurements
        •      Persistence (1)                                               The information used for size related criteria are mainly
                                                                         taken from [19], [22] but also form various sources. There will
        •      Replication (2)                                           be no values given for MySQL as the NoSQL products are
        •      High Availability (3)                                     specially designed for large size databases so there is no point
                                                                         in comparing them with MySQL (it is common knowledge that
        •      Transactions (4)                                          the largest MySQL installations cannot be larger than, let’s say,
                                                                         1 million records of average size without memory caching and
        •      Rack-locality awareness (5)                               extended sharding; over that limit information retrieval is
        •      Implementation Language (6)                               becoming too slow to be useful in any situation [15]).

        •      Influences / sponsors (7)                                     There is no official measurement unit for the size of a DB
                                                                         installation but we can take several factors into account:
        •      License type (8)
                                                                                 •     Number of records / rows /documents stored: [22]
    The results are given in the following table. One can see                          is giving values of 6 to 450 million records for
that the three products offer the same features, the only                              different installations of HBase, most of them
differences being the ones related to transactions,                                    being in the range of 6 to 25 million records;
implementation language and license type (although the other                           various sources are giving sizes of 2 to 150 million
features are not implemented or working in the same way). The                          records for diverse installations of Cassandra;
dual licensing solution available now for MySQL is a result of
the series of acquisitions from the last few years (Sun bought                   •     Number of nodes in an installation: [22] is giving
MySQL, Oracle bought Sun).                                                             values of 5 to 110 nodes for Hbase, most of them
                                                                                       being in the range of 6 to 20 nodes; 4 to 150 nodes
                                                                                       for Cassandra with most installations in the span
TABLE I.         A COMPARATIVE TABLE WITH THE FEATURES OF THE THREE                    of 5 to 25 nodes;
                            SELECTED PRODUCTS

Feat.         Cassandra            HBase              MySQL                      •     Total size of the installations: less documented;
                                                                                       some instances are showing maximal sizes for
  1               yes                yes           yes (using a                        current installations of 140 TB for Hbase and 150
                                                 different type of                     TB for Cassandra.
                                                connection than the
                                                   typical one)          B. Performance measurements
  2               yes                yes                 yes                 Most of the data from the following paragraphs, included in
                                                                         the figures is obtained from [2] which is describing a
  3           distributed        distributed    distributed, available   laboratory based benchmark which uses YCSB (Yahoo! Cloud
                                                with MySQL Cluster       Serving Benchmark) as a measurement tool (more on YCSB
  4           eventually           locally        consistent (full       can be found at [25]). The benchmark was run on 120 million
              consistent         (row-level)      ACID actually)         records of small size (1kB), 6 node, and 0.12 TB equivalent
                                  consistent                             installations of the three products.

  5               yes                yes                 yes               1) Performance in a write intensive environment (the
                                                                         number of writes is equal to the one of reads)
            (inherited from       (inherited       (with MySQL              The performance achieved can be seen in Figure 1 and 2.
               Hadoop)               from             Cluster)
                                   Hadoop)                                  Figure 1. Read latency in a write intensive environment (source: [2])

  6              Java               Java        ANSI C / ANSI C++
  7         Dynamo and            BigTable             Oracle
              BigTable,
           Facebook/Digg/
             Rackspace
  8          Apache 2.0          Apache 2.0        GPL+FLOSS /
                                                    proprietary
Figure 2. Write latency in a write intensive environment (source: [2])         Figure 4. Write latency in a read intensive environment (source: [2])




    The latency for both reading and writing in Figures 1 and 2
is given as a dependency of number of operations per second.                                             V.     CONCLUSIONS
   The two figures are indicating that:                                         Although the SQL and the NoSQL databases are having
                                                                            some shared features their behaviors are not similar in given
         •    Over approximately 7000 read or write operations              instances. This is suggesting that they cannot be used
              per second both MySQL and its variation called                interchangeable for solving any type of problem but one shall
              Sherpa are becoming unresponsive – the latency                rather choose between the two types of databases for a given
              time is becoming too great for a real life                    instance.
              application;
         •    The write performance of Hbase is greatly                                                       REFERENCES
              improved by the fact that it’s committing to                  [1]  Agrawal, Rakesh et al., "The Claremont report on database research",
              memory (and not directly to disk as the other                      http://doi.acm.org/10.1145/1462571.1462573, SIGMOD Record (ACM)
              products). [2] is indicating that the write                        37 (3): 9–19. ISSN 0163-5808,
              performance of Cassandra, Sherpa and MySQL                    [2] Cooper, Brian F., “Yahoo! Cloud Serving Benchmark”,
                                                                                 http://research.yahoo.com/files/ycsb-v4.pdf, (unpublished)
              can also be improved by using a log disk.
                                                                            [3] Bucur, Cristian; Tudorica, Bogdan George, “Solutions for working with
  2) Performance in a read intensive environment (the read                       large data volumes in web applications”, The Proceedings of the IE 2011
                                                                                 „Education, Research & Business Technologies” International
operations are accounting for 95% of the total number of                         Conference, 5-7 May 2011, (in press),
operations)                                                                 [4] Chang, Fay, et al., “Bigtable: A Distributed Storage System for
   Studying Figures 3 and 4, one can see that:                                   Structured Data”, http://labs.google.com/papers/bigtable-osdi06.pdf,
                                                                                 Google, (unpublished),
         •    In a read intensive environment, MySQL and its                [5] Cook, John D., “ACID versus BASE for database transactions”,
              Sherpa variation are offering better results,                      http://www.johndcook.com/blog/2009/07/06/brewer-cap-theorem-base/.
              keeping the pace with the NoSQL products                      [6] Cooper, Brian F.; Silberstein, Adam; Tam, Erwin; Ramakrishnan,
              (although, taken into account that the benchmark                   Raghu; Sears, Russell, “Yahoo! cloud serving benchmark”,
              database was not of a real large size, we do not                   http://research.yahoo.com/files/ycsb.pdf, ACM Symposium on Cloud
              think that this trend will look the same for larger                Computing, ACM, Indianapolis, IN, USA (2010),
              installations);                                               [7] De Sterck, Hans, Zhang, Chen, “Supporting multi-row distributed
                                                                                 transactions with global snapshot isolation using bare-bones Hbase”,
         •    A particular figure is given again by Hbase which                  http://www.cs.uwaterloo.ca/~c15zhang/ZhangDeSterckGrid2010.pdf,
                                                                                 The 11th ACM/IEEE International Conference on Grid Computing
              is obtaining a very good write performance by                      (Grid 2010), Oct 25-29, 2010, Brussels, Belgium
              committing to memory.
                                                                            [8] Edlich, Stefan, “NoSQL, your ultimate guide to the non - relational
                                                                                 universe!”, http://nosql-database.org/, (unpublished)
   Figure 3. Read latency in a read intensive environment (source: [2])
                                                                            [9] Eure, Ian, "Looking to the future with Cassandra | Digg about",
                                                                                 http://about.digg.com/blog/looking-future-cassandra, About.digg.com.
                                                                                 2009-09-09, (unpublished),
                                                                            [10] Hamilton,       James,     “One      size    does     not    fit      all”,
                                                                                 http://perspectives.mvdirona.com/CommentView,guid,afe46691-a293-
                                                                                 4f9a-8900-5688a597726a.aspx, (unpublished),
                                                                            [11] Kellerman, Jim, "HBase: structured storage of sparse data for Hadoop"
                                                                                 http://blog.rapleaf.com/wp-content/uploads/2007/12/hbase.pdf,
                                                                                 (unpublished),
                                                                            [12] Lakshman, Avinash; Malik, Prashant, “Cassandra, a decentralized
                                                                                 structured                        storage                        system”,
                                                                                 http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-
                                                                                 ladis2009.pdf, Cornell University, (unpublished),
[13] Lakshman, Avinash; Malik, Prashant, “Cassandra, Structured storage          [17] Wei, Zhou; Pierre, Guillaume; Chi, Chi-Hung, “CloudTPS: scalable
     system over a P2P network”, http://static.last.fm/johan/nosql-                   transactions      for      web    applications     in    the    cloud”,
     20090611/cassandra_nosql.pdf, (unpublished),                                     http://www.globule.org/publi/CSTWAC_ircs53.html, Technical report
[14] Peng, Daniel; Dabek, Frank, “Large-scale incremental processing using            IR-CS-53, Vrije Universiteit, February 2010, to be published at IEEE
     distributed             transactions        and           notifications”,        Transactions on Services Computing, 2011 (in press),
     http://www.google.ca/url?sa=t&source=web&cd=3&ved=0CCQQFjAC                 [18] Wei, Zhou; Pierre, Guillaume; Chi, Chi-Hung, “Consistent join queries
     &url=http%3A%2F%2Fwww.usenix.org%2Fevents%2Fosdi10%2Ftech                        in                    cloud                 data                stores”,
     %2Ffull_papers%2FPeng.pdf&rct=j&q=Large-                                         http://www.globule.org/publi/CJQCDS_ircs68.html, Technical report
     scale%20Incremental%20Processing%20Using%20Distributed%20Tran                    IR-CS-68, Vrije Universiteit, January 2011 (unpublished),
     sactions%20and%20Notifications&ei=eM24TOYnjqedB_mHmLUN&u                    [19] ***, “Cassandra”, http://cassandra.apache.org, (unpublished)
     sg=AFQjCNGGm1Xfaml5lq6Aj1R2BlX7WilIuQ&sig2=ZZcPWxhiMV
                                                                                 [20] ***, “Hadoop”, http://hadoop.apache.org, (unpublished)
     SnY-DmewIFIg&cad=rja, The 9th USENIX Symposium on Operating
     Systems Design and Implementation (OSDI 2010), Oct 4–6, 2010,               [21] ***, ”Hbase”, http://hbase.apache.org, (unpublished)
     Vancouver, BC, Canada,                                                      [22] ***,             “Hbase             /            Powered            by”,
[15] Peters, Mike, “How to install Cassandra + Thrift (and why you should             http://wiki.apache.org/hadoop/Hbase/PoweredBy, (unpublished)
     care)”, http://www.softwareprojects.com/resources/programming/t-how-        [23] ***, “NoSQL”, http://en.wikipedia.org/wiki/NoSQL, (unpublished)
     to-install-cassandra-+-thrift-and-why-you-shou-1956.html,                   [24] ***,        “The         next    generation        cloud     database“,
     (unpublished)                                                                    http://www.microsoft.com/windowsazure/sqlazure/database/,
[16] Stack, Michael, “HBasics: an introduction to Hadoop Hbase”,                      (unpublished),
     http://static.last.fm/johan/huguk-20090414/michael_stack-hbase.pdf,         [25] ***,       “Yahoo!       Cloud    Serving      Benchmark     (YCSB)”,
     HUGUK, April 14th, 2009,                                                         https://github.com/brianfrankcooper/YCSB/wiki, (unpublished)

More Related Content

PDF
Seminar.2010.NoSql
PDF
NoSQL databases
PDF
Datastores
PDF
NOSQL- Presentation on NoSQL
PPSX
A Seminar on NoSQL Databases.
PPTX
CodeFutures - Scaling Your Database in the Cloud
PDF
On Cassandra Development: Past, Present and Future
Seminar.2010.NoSql
NoSQL databases
Datastores
NOSQL- Presentation on NoSQL
A Seminar on NoSQL Databases.
CodeFutures - Scaling Your Database in the Cloud
On Cassandra Development: Past, Present and Future

What's hot (20)

PPTX
Easy Data Object Relational Mapping Tool
PPTX
Compaction and Splitting in Apache Accumulo
PPTX
SQL and NoSQL in SQL Server
PDF
Data Storage Management
PDF
NoSQL-Database-Concepts
PPTX
Nosql databases
PDF
The Coming Database Revolution
PPTX
SQLPASS AD501-M XQuery MRys
PPTX
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
DOCX
Java full stack1
PDF
PostgreSQL - Case Study
PDF
Comparison between mongo db and cassandra using ycsb
PDF
Sql server difference faqs- 6
PDF
Breakthrough performance with MySQL Cluster (2012)
PDF
Sql no sql
PDF
Cassandra 1.1
KEY
NoSQL Databases: Why, what and when
PDF
State of Cassandra 2012
PDF
Ddn 2017 10_dse_primer
PDF
An Overview of ModeShape
Easy Data Object Relational Mapping Tool
Compaction and Splitting in Apache Accumulo
SQL and NoSQL in SQL Server
Data Storage Management
NoSQL-Database-Concepts
Nosql databases
The Coming Database Revolution
SQLPASS AD501-M XQuery MRys
Sergiy Lunyakin "Azure SQL DWH: Tips and Tricks for developers"
Java full stack1
PostgreSQL - Case Study
Comparison between mongo db and cassandra using ycsb
Sql server difference faqs- 6
Breakthrough performance with MySQL Cluster (2012)
Sql no sql
Cassandra 1.1
NoSQL Databases: Why, what and when
State of Cassandra 2012
Ddn 2017 10_dse_primer
An Overview of ModeShape
Ad

Viewers also liked (20)

PPT
Data Grid Taxonomies
PPTX
Taxonomy Management, Automatic Metadata Tagging & Auto Classification in Shar...
PDF
Big Data Taxonomy 8/26/2013
PDF
Topic 10: Taxonomy of Data and Storage
PPT
Global taxonomy initiative ppt
PDF
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
PDF
Taxonomy 101
PPTX
Montreal Cloud Computing Meetup - July 19
PPTX
Successful Content Management Through Taxonomy And Metadata Design
PPTX
Taxonomy And Metadata
PDF
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
PPT
Taxonomies and Metadata in Information Architecture
PPTX
Database mapping of XBRL instance documents from the WIP taxonomy
PDF
Guide to NoSQL with MySQL
PPTX
Sql vs NoSQL
PPT
BCG Matrix
PDF
Introduction to metadata management
PDF
Matrixes analysis of Pepsico (Final project of managerial policy (computerized)
PPTX
BCG matrix with example
Data Grid Taxonomies
Taxonomy Management, Automatic Metadata Tagging & Auto Classification in Shar...
Big Data Taxonomy 8/26/2013
Topic 10: Taxonomy of Data and Storage
Global taxonomy initiative ppt
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Taxonomy 101
Montreal Cloud Computing Meetup - July 19
Successful Content Management Through Taxonomy And Metadata Design
Taxonomy And Metadata
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
Taxonomies and Metadata in Information Architecture
Database mapping of XBRL instance documents from the WIP taxonomy
Guide to NoSQL with MySQL
Sql vs NoSQL
BCG Matrix
Introduction to metadata management
Matrixes analysis of Pepsico (Final project of managerial policy (computerized)
BCG matrix with example
Ad

Similar to A comparison between several no sql databases with comments and notes (20)

PPTX
DOC
Assignment_4
PDF
Vskills Apache Cassandra sample material
PPTX
2018 05 08_biological_databases_no_sql
PPTX
NoSQL powerpoint presentation difference with rdbms
PPTX
Introduction to NoSQL
PPT
NoSql Databases
PPTX
Non relational databases-no sql
PDF
Evaluation Criteria for Selecting NoSQL Databases in a Single Box Environment
PDF
EVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENT
PPTX
UNIT-4 NOTES.pptx for engagement ring start kr dena
PPT
NO SQL: What, Why, How
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
DOCX
Know what is NOSQL
PPTX
NoSQL in Big Data Analytics Tools .pptx
PDF
Datastores
PPTX
Big data technology unit 3
PDF
Big Data technology Landscape
PDF
Functional Dependencies and Normalization for Relational Databases
PDF
Introduction of Redis as NoSQL Database
Assignment_4
Vskills Apache Cassandra sample material
2018 05 08_biological_databases_no_sql
NoSQL powerpoint presentation difference with rdbms
Introduction to NoSQL
NoSql Databases
Non relational databases-no sql
Evaluation Criteria for Selecting NoSQL Databases in a Single Box Environment
EVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENT
UNIT-4 NOTES.pptx for engagement ring start kr dena
NO SQL: What, Why, How
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
Know what is NOSQL
NoSQL in Big Data Analytics Tools .pptx
Datastores
Big data technology unit 3
Big Data technology Landscape
Functional Dependencies and Normalization for Relational Databases
Introduction of Redis as NoSQL Database

More from João Gabriel Lima (20)

PDF
Cooking with data
PDF
Deep marketing - Indoor Customer Segmentation
PDF
Aplicações de Alto Desempenho com JHipster Full Stack
PDF
Realidade aumentada com react native e ARKit
PDF
PDF
Big data e Inteligência Artificial
PDF
Mineração de Dados no Weka - Regressão Linear
PDF
Segurança na Internet - Estudos de caso
PDF
Segurança na Internet - Google Hacking
PDF
Segurança na Internet - Conceitos fundamentais
PDF
Web Machine Learning
PDF
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
PDF
Mineração de dados com RapidMiner + WEKA - Clusterização
PDF
Mineração de dados na prática com RapidMiner e Weka
PDF
Visualizacao de dados - Come to the dark side
PDF
REST x SOAP : Qual abordagem escolher?
PDF
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
PDF
E-trânsito cidadão - IPVA em suas mãos
PPTX
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
PDF
Hackeando a Internet das Coisas com Javascript
Cooking with data
Deep marketing - Indoor Customer Segmentation
Aplicações de Alto Desempenho com JHipster Full Stack
Realidade aumentada com react native e ARKit
Big data e Inteligência Artificial
Mineração de Dados no Weka - Regressão Linear
Segurança na Internet - Estudos de caso
Segurança na Internet - Google Hacking
Segurança na Internet - Conceitos fundamentais
Web Machine Learning
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados na prática com RapidMiner e Weka
Visualizacao de dados - Come to the dark side
REST x SOAP : Qual abordagem escolher?
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
E-trânsito cidadão - IPVA em suas mãos
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
Hackeando a Internet das Coisas com Javascript

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Machine Learning_overview_presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Cloud computing and distributed systems.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
gpt5_lecture_notes_comprehensive_20250812015547.pdf
MIND Revenue Release Quarter 2 2025 Press Release
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine Learning_overview_presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Empathic Computing: Creating Shared Understanding
Big Data Technologies - Introduction.pptx
Spectroscopy.pptx food analysis technology
Cloud computing and distributed systems.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.

A comparison between several no sql databases with comments and notes

  • 1. A comparison between several NoSQL databases with comments and notes Bogdan George Tudorica, Cristian Bucur Department for Economical Mathematics and Economical Informatics Petroleum-Gas University of Ploiesti Ploiesti, Romania tudorica_bogdan@yahoo.com Abstract—This paper is trying to comment on the various NoSQL transactions with rare write accesses, and not for heavy (Not only Structured Query Language) systems and to make a read/write workloads (which is often the case for these large comparison (using multiple criteria) between them. The NoSQL scale web services – we mean Google, Amazon, Facebook, databases were created as a mean to offer high performance Yahoo and such). (both in terms of speed and size) and high availability at the price of loosing the ACID (Atomic, Consistent, Isolated, Durable) trait It seems that at least some of the major RDBMS producers of the traditional databases in exchange with keeping a weaker are learning something from this evolution (e.g. Microsoft BASE (Basic Availability, Soft state, Eventual consistency) introduced some NoSQL type features such as snapshot feature. Remains to be seen which of the multiple solutions isolation, although used at a single table level, into its newer created since the official appearance of the NoSQL concept RDBMS product labeled Azure; Oracle 11g is also containing (which was defined in 1998 and reintroduced in 2009, around a similar facility called Oracle Streams, but this one is limited which moment several NoSQL solutions emerged; at the present in the same way as the MS product, this time to a single moment there are known over 120 such solutions) are really instance [7]). delivering on these promises of higher performance (although several of them are already used with very good results). II. WHAT DO WE COMPARE Keywords-component; database; NoSQL; performance; In order to be able to compare a set of NoSQL solutions the comparison first step should be to select / classify some products which are fulfilling similar purposes or have similar qualities / features. I. INTRODUCTION For the moment there is no official taxonomy for this kind The concept described by the term NoSQL (meaning a of software although several attempts do exist. database system which is distributed, may not require fixed table schemas, usually avoids join operations, typically scales First one is provided by Stefan Edlich on his page [8] and it horizontally, does not expose a SQL interface and may be open is providing the following categories: source [1] – some are even using the term with the meaning of A. Core NoSQL Systems, most of them created as a completely non relational system) is also referred by the more component systems for Web 2.0 services, with the following academic sources as a form of structured storage subtypes: [4][10][11][12] (although the terms may not be equivalent; the relational databases also comply by the official definition of the • Wide Column Store / Column Families (Hadoop / structured storage term and they are somehow opposite to the HBase, Cassandra, Hypertable, Cloudata, Amazon NoSQL term). SimpleDB, SciDB), One can not simply label the terms RDBMS and NoSQL as • Document Store (CouchDB, MongoDB, Terrastore, being the exact opposite. There do even exist some middleware ThruDB, OrientDB, RavenDB, Citrusleaf, SisoDB, appliances (such as CloudTPS for Google’s BigTable and CloudKit, Perservere, Jackrabbit), Amazon’s SimpleDB [17]) or various solutions (such as • Key Value / Tuple Store (Azure Table Storage, Percolator for Google’s BigTable [14] and an unnamed MEMBASE, Riak, Redis, Chordless, GenieDB, prototype system for Google’s Hbase [7]) which are adding full Scalaris, Tokyo Cabinet / Tyrant, GT.M, Keyspace, ACID features to some NoSQL systems. Berkeley DB, MemcacheDB, HamsterDB, Faircom C- It is certain that the NoSQL databases are one of the Tree, Mnesia, LightCloud, Pincaster, Hibari, Scality), byproducts of the Web 2.0 era – they were really used only at • Eventually Consistent Key Value Store (Amazon the time when the designers of web services with very large Dynamo, Voldemort, Dynomite, KAI, SubRecord, number of users discovered that the traditional relational Mo8onDb, Dovetaildb), database management systems (RDBMS) are fit either for small but frequent read/write transactions or for large batch
  • 2. Graph Databases (Neo4J, Infinite Graph, Sones, As it is not in authors’ intention to provide a NoSQL InfoGrid, HyperGraphDB, Trinity, AllegroGraph, taxonomy in this paper, we will not tread further on the reasons Bigdata, DEX, OpenLink Virtuoso, VertexDB, the two sources used for their results. FlockDB, Java Universal Network / Graph It is easy for one to see that the two taxonomies, although Framework, Sesame, Filament, OWLim, NetworkX, seemingly using the same reason (the manner of iGraph), implementation) are providing different results (products which B. Soft NoSQL Systems, most of them being older or are in the same category in one taxonomy are listed in separate newer systems which are not related to any Web 2.0 service but categories in the other one, the categories labels and divisions are sharing the traits being described as NoSQL characteristics are different). (A/N: some of them are having strong ACID / relational For this reason we decided to use as grouping criteria, capabilities and, from this reason, they may be misplaced in a instead of a single property, an ad-hoc set composed of: main list of NoSQL systems; further analysis may be needed on this intended usage, manner of implementation, ease of obtaining subject), with the following subtypes: and testing. We only searched for open-source solutions, • Object Databases (db4o, Versant, Objectivity, having roughly the same number of “users” (we mean Gemstone, Progress, Starcounter, Perst, ZODB, NEO, implementations in use), and with more or less the same size PicoLisp, Sterling, StupidDB, KiokuDB, Durus), for the average and the largest installation and, if possible, with the same intended use. • Grid & Cloud Database Solutions (GigaSpaces, Queplix, Hazelcast, Joafip, GridGain, Infinispan, As such, from the multitude of NoSQL solutions available Coherence, eXtremeScale), we restricted our research to a single type of NoSQL databases (meaning “the Wide Column Store / Column Families” subtype • XML Databases (Mark Logic Server, EMC from the first taxonomy which is roughly equivalent with the Documentum xDB, Tamino, eXist, Sedna, BaseX, “Key-value store” type from the second taxonomy) and from Xindice, Qizx, Berkeley DB XML), this set we took two of the products which have larger use at • Multivalue Databases (U2, OpenInsight, OpenQM, the present moment. The result was that we took into Globals), consideration for this study only Hbase and Cassandra (which, besides the qualities given earlier are also products from the • other NoSQL related databases (IBM Lotus/Domino, same family and based on the same framework – Hadoop). Intersystems Cache, eXtremeDB, ISIS Family, As some description of the selected solutions maybe in Prevayler, Yserial). order, here it is: Another taxonomy is provided by an unknown author on an “The Apache Hadoop software library is a framework that wiki page [23] and provides the following categories of allows for the distributed processing of large data sets across NoSQL databases: clusters of computers using a simple programming model. It is • Document store (Apache Jackrabbit, Apache designed to scale up from single servers to thousands of CouchDB, Lotus Notes, MongoDB, MarkLogic machines, each offering local computation and storage. Rather Server, eXist, SimpleDB, Terrastore), than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application • Graph (AllegroGraph, Neo4j, DEX, FlockDB), layer, so delivering a highly-available service on top of a • Key-value store, with the following subtypes: cluster of computers, each of which may be prone to Eventually‐consistent key‐value store (Cassandra, failures.”[20] Dynamo, Hibari, Project Voldemort, Riak), “HBase is an open-source, distributed, versioned, column- Hierarchical key-value store (GT.M), Hosted services oriented store modeled after Google' Bigtable: A Distributed (Freebase), Key-value cache in RAM (Citrusleaf Storage System for Structured by Chang et al. Just as Bigtable database, memcached, Oracle Coherence, Redis, Tuple leverages the distributed data storage provided by the Google space, Velocity), Key-value stores implementing the File System, HBase provides Bigtable-like capabilities on top Paxos algorithm (Keyspace), Key-value stores on disk of Hadoop.”[21] (BigTable, CDB, Citrusleaf database, Dynomite, Keyspace, membase, MemcacheDB, Redis, Tokyo “The Apache Cassandra Project develops a highly scalable Cabinet, TreapDB, Tuple space, MongoDB), second-generation distributed database, bringing together Multivalue databases (Extensible Storage Engine - Dynamo's fully distributed design and Bigtable's ESE/NT, OpenQM, Revelation Software's ColumnFamily-based data model.”[19] OpenInsight, Rocket U2), Object database (db4o, As a reference element we also took MySQL (also open- GemStone/S, InterSystems Caché, JADE, source, but full relational/SQL able) to see what is lost and Objectivity/DB, ObjectStore, Versant Object Database, what is gained by using a NoSQL solution instead of a ZODB), Ordered key-value store (Berkeley DB, IBM “classic” one. Informix C-ISAM, MemcacheDB, NMDB), Tabular (BigTable, Hbase, Hypertable, Mnesia), Tuple store (Apache River).
  • 3. III. A QUALITATIVE POINT OF VIEW IV. A QUANTITATIVE POINT OF VIEW One can compare some items based on qualitative or For quantitative evaluation criteria we used two different quantitative criteria. As such we will start by comparing what sets, one related to size and one related to performance. features are available for the NoSQL databases taken into account. The features we searched for are: A. Common instalations size measurements • Persistence (1) The information used for size related criteria are mainly taken from [19], [22] but also form various sources. There will • Replication (2) be no values given for MySQL as the NoSQL products are • High Availability (3) specially designed for large size databases so there is no point in comparing them with MySQL (it is common knowledge that • Transactions (4) the largest MySQL installations cannot be larger than, let’s say, 1 million records of average size without memory caching and • Rack-locality awareness (5) extended sharding; over that limit information retrieval is • Implementation Language (6) becoming too slow to be useful in any situation [15]). • Influences / sponsors (7) There is no official measurement unit for the size of a DB installation but we can take several factors into account: • License type (8) • Number of records / rows /documents stored: [22] The results are given in the following table. One can see is giving values of 6 to 450 million records for that the three products offer the same features, the only different installations of HBase, most of them differences being the ones related to transactions, being in the range of 6 to 25 million records; implementation language and license type (although the other various sources are giving sizes of 2 to 150 million features are not implemented or working in the same way). The records for diverse installations of Cassandra; dual licensing solution available now for MySQL is a result of the series of acquisitions from the last few years (Sun bought • Number of nodes in an installation: [22] is giving MySQL, Oracle bought Sun). values of 5 to 110 nodes for Hbase, most of them being in the range of 6 to 20 nodes; 4 to 150 nodes for Cassandra with most installations in the span TABLE I. A COMPARATIVE TABLE WITH THE FEATURES OF THE THREE of 5 to 25 nodes; SELECTED PRODUCTS Feat. Cassandra HBase MySQL • Total size of the installations: less documented; some instances are showing maximal sizes for 1 yes yes yes (using a current installations of 140 TB for Hbase and 150 different type of TB for Cassandra. connection than the typical one) B. Performance measurements 2 yes yes yes Most of the data from the following paragraphs, included in the figures is obtained from [2] which is describing a 3 distributed distributed distributed, available laboratory based benchmark which uses YCSB (Yahoo! Cloud with MySQL Cluster Serving Benchmark) as a measurement tool (more on YCSB 4 eventually locally consistent (full can be found at [25]). The benchmark was run on 120 million consistent (row-level) ACID actually) records of small size (1kB), 6 node, and 0.12 TB equivalent consistent installations of the three products. 5 yes yes yes 1) Performance in a write intensive environment (the number of writes is equal to the one of reads) (inherited from (inherited (with MySQL The performance achieved can be seen in Figure 1 and 2. Hadoop) from Cluster) Hadoop) Figure 1. Read latency in a write intensive environment (source: [2]) 6 Java Java ANSI C / ANSI C++ 7 Dynamo and BigTable Oracle BigTable, Facebook/Digg/ Rackspace 8 Apache 2.0 Apache 2.0 GPL+FLOSS / proprietary
  • 4. Figure 2. Write latency in a write intensive environment (source: [2]) Figure 4. Write latency in a read intensive environment (source: [2]) The latency for both reading and writing in Figures 1 and 2 is given as a dependency of number of operations per second. V. CONCLUSIONS The two figures are indicating that: Although the SQL and the NoSQL databases are having some shared features their behaviors are not similar in given • Over approximately 7000 read or write operations instances. This is suggesting that they cannot be used per second both MySQL and its variation called interchangeable for solving any type of problem but one shall Sherpa are becoming unresponsive – the latency rather choose between the two types of databases for a given time is becoming too great for a real life instance. application; • The write performance of Hbase is greatly REFERENCES improved by the fact that it’s committing to [1] Agrawal, Rakesh et al., "The Claremont report on database research", memory (and not directly to disk as the other http://doi.acm.org/10.1145/1462571.1462573, SIGMOD Record (ACM) products). [2] is indicating that the write 37 (3): 9–19. ISSN 0163-5808, performance of Cassandra, Sherpa and MySQL [2] Cooper, Brian F., “Yahoo! Cloud Serving Benchmark”, http://research.yahoo.com/files/ycsb-v4.pdf, (unpublished) can also be improved by using a log disk. [3] Bucur, Cristian; Tudorica, Bogdan George, “Solutions for working with 2) Performance in a read intensive environment (the read large data volumes in web applications”, The Proceedings of the IE 2011 „Education, Research & Business Technologies” International operations are accounting for 95% of the total number of Conference, 5-7 May 2011, (in press), operations) [4] Chang, Fay, et al., “Bigtable: A Distributed Storage System for Studying Figures 3 and 4, one can see that: Structured Data”, http://labs.google.com/papers/bigtable-osdi06.pdf, Google, (unpublished), • In a read intensive environment, MySQL and its [5] Cook, John D., “ACID versus BASE for database transactions”, Sherpa variation are offering better results, http://www.johndcook.com/blog/2009/07/06/brewer-cap-theorem-base/. keeping the pace with the NoSQL products [6] Cooper, Brian F.; Silberstein, Adam; Tam, Erwin; Ramakrishnan, (although, taken into account that the benchmark Raghu; Sears, Russell, “Yahoo! cloud serving benchmark”, database was not of a real large size, we do not http://research.yahoo.com/files/ycsb.pdf, ACM Symposium on Cloud think that this trend will look the same for larger Computing, ACM, Indianapolis, IN, USA (2010), installations); [7] De Sterck, Hans, Zhang, Chen, “Supporting multi-row distributed transactions with global snapshot isolation using bare-bones Hbase”, • A particular figure is given again by Hbase which http://www.cs.uwaterloo.ca/~c15zhang/ZhangDeSterckGrid2010.pdf, The 11th ACM/IEEE International Conference on Grid Computing is obtaining a very good write performance by (Grid 2010), Oct 25-29, 2010, Brussels, Belgium committing to memory. [8] Edlich, Stefan, “NoSQL, your ultimate guide to the non - relational universe!”, http://nosql-database.org/, (unpublished) Figure 3. Read latency in a read intensive environment (source: [2]) [9] Eure, Ian, "Looking to the future with Cassandra | Digg about", http://about.digg.com/blog/looking-future-cassandra, About.digg.com. 2009-09-09, (unpublished), [10] Hamilton, James, “One size does not fit all”, http://perspectives.mvdirona.com/CommentView,guid,afe46691-a293- 4f9a-8900-5688a597726a.aspx, (unpublished), [11] Kellerman, Jim, "HBase: structured storage of sparse data for Hadoop" http://blog.rapleaf.com/wp-content/uploads/2007/12/hbase.pdf, (unpublished), [12] Lakshman, Avinash; Malik, Prashant, “Cassandra, a decentralized structured storage system”, http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman- ladis2009.pdf, Cornell University, (unpublished),
  • 5. [13] Lakshman, Avinash; Malik, Prashant, “Cassandra, Structured storage [17] Wei, Zhou; Pierre, Guillaume; Chi, Chi-Hung, “CloudTPS: scalable system over a P2P network”, http://static.last.fm/johan/nosql- transactions for web applications in the cloud”, 20090611/cassandra_nosql.pdf, (unpublished), http://www.globule.org/publi/CSTWAC_ircs53.html, Technical report [14] Peng, Daniel; Dabek, Frank, “Large-scale incremental processing using IR-CS-53, Vrije Universiteit, February 2010, to be published at IEEE distributed transactions and notifications”, Transactions on Services Computing, 2011 (in press), http://www.google.ca/url?sa=t&source=web&cd=3&ved=0CCQQFjAC [18] Wei, Zhou; Pierre, Guillaume; Chi, Chi-Hung, “Consistent join queries &url=http%3A%2F%2Fwww.usenix.org%2Fevents%2Fosdi10%2Ftech in cloud data stores”, %2Ffull_papers%2FPeng.pdf&rct=j&q=Large- http://www.globule.org/publi/CJQCDS_ircs68.html, Technical report scale%20Incremental%20Processing%20Using%20Distributed%20Tran IR-CS-68, Vrije Universiteit, January 2011 (unpublished), sactions%20and%20Notifications&ei=eM24TOYnjqedB_mHmLUN&u [19] ***, “Cassandra”, http://cassandra.apache.org, (unpublished) sg=AFQjCNGGm1Xfaml5lq6Aj1R2BlX7WilIuQ&sig2=ZZcPWxhiMV [20] ***, “Hadoop”, http://hadoop.apache.org, (unpublished) SnY-DmewIFIg&cad=rja, The 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2010), Oct 4–6, 2010, [21] ***, ”Hbase”, http://hbase.apache.org, (unpublished) Vancouver, BC, Canada, [22] ***, “Hbase / Powered by”, [15] Peters, Mike, “How to install Cassandra + Thrift (and why you should http://wiki.apache.org/hadoop/Hbase/PoweredBy, (unpublished) care)”, http://www.softwareprojects.com/resources/programming/t-how- [23] ***, “NoSQL”, http://en.wikipedia.org/wiki/NoSQL, (unpublished) to-install-cassandra-+-thrift-and-why-you-shou-1956.html, [24] ***, “The next generation cloud database“, (unpublished) http://www.microsoft.com/windowsazure/sqlazure/database/, [16] Stack, Michael, “HBasics: an introduction to Hadoop Hbase”, (unpublished), http://static.last.fm/johan/huguk-20090414/michael_stack-hbase.pdf, [25] ***, “Yahoo! Cloud Serving Benchmark (YCSB)”, HUGUK, April 14th, 2009, https://github.com/brianfrankcooper/YCSB/wiki, (unpublished)