SlideShare a Scribd company logo
Using NoSQL Databases
for Interactive Applications
By Alexey Diomin and Kirill Grigorchuk
2
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
Contents
Introduction 3
Cassandra, MongoDB, and Couchbase 3
Key Considerations for Interactive Applications 3
Performance Benchmarking 5
Results 7
Analysis 10
Conclusion 10
About the authors 11
Additional Links 11
3
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
Introduction
Interactive web applications need high-performance and scalability, calling for a different kind of
database. If your website is not fast enough, users may quickly abandon it and look for alternatives. For
example, in paid online social games, players are extremely demanding and will drop out, even if there is
a slight delay. To deliver the best user experience, you must pick the right database.
Traditional RDBMS are the wrong tool for the job because they do not provide the necessary scalability
and performance for working with large amounts of data and application requests. In contrast, NoSQL
databases have become a viable alternative to RDBMS, particularly for applications that need to change
rapidly. They provide high throughput, low latency, and horizontal scaling. But with so many different
options around, choosing the right NoSQL database for your specific application needs can be tricky.
Recently we took the time to review and benchmark several NoSQL databases. This whitepaper provides
an overview of three popular NoSQL solutions: Cassandra, MongoDB, and Couchbase. In addition, it
presents a vendor-independent performance comparison of these products and can be used as a guide
when choosing a NoSQL database for an interactive application.
Cassandra, MongoDB, and Couchbase
Since we had to pick some NoSQL databases to start with, we looked around for commonly used open-
source NoSQL solutions. Cassandra, Couchbase, and MongoDB seemed to be the most mature open
source products in their class. If you are already familiar with these NoSQL databases, you might want to
skip the rest of this section and go directly to the performance evaluation.
Cassandra is a distributed columnar key-value database with eventual consistency. It is optimized for
write operations and has no central master—data can be written or read to and from any of the nodes in
the cluster. Cassandra provides seamless horizontal scaling and has no single point of failure—if a node
in the cluster fails, another node steps up to replace it. At the moment, Cassandra is an Apache 2.0
licensed project supported by the Apache Community.
MongoDB is a schema-free, document-oriented, NoSQL database. In MongoDB, data is stored in the
BSON format—BSON document is essentially a JSON document represented in a binary format, which
allows for easier and faster integration of data in certain types of applications. This database also
provides horizontal scalability and has no single point of failure. However, a MongoDB cluster is different
from a Cassandra or Couchbase Server cluster—it includes an arbiter, a master, and multiple slaves. As
of 2009, MongoDB is an open source project with an AGPL license supported by 10Gen.
Couchbase is a NoSQL document database. Documents in Couchbase Server are stored as JSON. With
built-in caching, Couchbase provides low-latency read and write operations with linearly scalable
throughput. The architecture has no single point of failure. It is easy to scale-out the cluster and support
live cluster topology changes. This means, there is no application downtime when you are upgrading your
database, software, or hardware using rolling upgrades. Couchbase, Inc. develops and provides
commercial support for the Couchbase Apache 2.0 licensed project.
Key Considerations for Interactive Applications
our database is the workhorse for your Web application. When choosing a database, the following factors
are important to keep in mind:
1 Scalability: It’s  hard  to   predict  when  your  application  needs  to  scale,  but  when  your   Web site
traffic suddenly spikes and your database does not have enough capacity, you need to scale your
database quickly, on demand, and without any application changes. Similarly, when your system
is idle, you should have a possibility to decrease the amount of resources used. Scaling your
4
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
database must be a simple operation—you should not need to deal with complicated procedures
or make any changes to your application.
In this paper, we will only speak about horizontal scalability, which involves dividing a system into
small structural components hosted on different physical machines (or groups of machines)
and/or increasing the number of servers that perform the same function in parallel.
a Cassandra meets the requirements of an ideal horizontally scalable system. Nodes can
be added seamlessly as you need more capacity. The cluster automatically utilizes the
new resources. A node can be decommissioned in automatic or semi-automatic mode.
b Couchbase scales horizontally. All nodes are identical and easy to setup. Nodes can be
added or removed from the cluster with a single button click and no changes to the
application. Auto-sharding evenly distributes data across all nodes in the cluster without
any hotspots. Cross datacenter replication makes it possible to scale a cluster across
datacenters for better data locality and faster data access.
c MongoDB—this database has a number of functions related to scalability. These include:
automatic sharding (auto-partitioning of data across servers), reads and writes distributed
over shards, and eventually-consistent reads that can be distributed over replicated
servers. When the system is idle, cluster size can only be decreased manually. The
administrator  uses  the  management  console  to  change  the  system’s configuration. After
that, the server process of MongoDB can be safely stopped on the vacant machines.
2 Performance: Interactive applications require very low read and write latencies. The database
must deliver consistently low latencies for read and write operations independent of load or the
size of data being accessed. In general, the read and write latency of NoSQL databases is very
low because data is shared across all the nodes in a cluster while the application’s working set is
in memory.
Interactive applications need to support millions of users and have different workloads—read,
write, or mixed. In the next section, we share some performance test results on different NoSQL
databases measuring latency versus varying levels of throughput.
3 Availability: Interactive Web applications need a database that is highly available. If your
application is down, you simply are not making any money. To ensure high availability, your
solution should be able to do online upgrades to the latest version, easily remove a node for
maintenance without affecting the availability of the cluster, handle online operations, such as
backups, and provide disaster recovery, if an entire datacenter goes down.
Below are examples of how availability is achieved in different NoSQL databases:
a Cassandra: Every node in a Cassandra  cluster,  or  “ring”,  is  given  a  range  of  data  for  
which it is responsible. When Cassandra receives a write operation designated to be
stored in a node that has failed, it will automatically route the write request to a node that
is alive. The node that receives the write request saves the write operation with a hint.
The hint is a message that contains information about the failed node that should have
handled the write request. The node that holds the hint monitors the node ring for the
recovery of the failed node that missed the write request. If the failed node comes back
online, the node that holds the hint will handoff the hint message to the recovered node,
so that the write requests can be persisted in their proper location. When a new node is
added to the cluster, the workload is distributed to this new node as well.
b Couchbase: Couchbase Server maintains multiple copies (up to 3 replicas) of each
document in a cluster. Each server is identical and serves active and replica documents.
Data is uniformly distributed across all the nodes and the clients are aware of the
topology. If a node in the cluster fails, Couchbase Server detects the failure and
promotes replica documents on other live nodes to active. The client cluster map is
updated to reflect the new topology, so the application continues to work without
5
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
downtime. When capacity is added, data is rebalanced automatically, also without any
downtime.
c MongoDB: Data in MongoDB is spread across several shards. Typically, each shard
(replica set) consists of multiple mongo-daemon instances, including an arbiter node, a
master node, and multiple slaves. If a slave node fails, the master node automatically
redistributes the workload to the rest of the slave nodes. In case the master node
crashes, the arbiter node elects a new master. If the arbiter node fails and there are no
instances left in the shard, the shard is dead. In MongoDB, a replica set can span across
multiple datacenters but writes can only go to one primary instance in one data-center
(master-slave replication).
4 Ease of development: Relational databases require a rigid schema to model an application. If
your application changes, your database schema needs to change as well. In this regard, NoSQL
databases have the following advantages:
a Flexible schema: You do not have to modify the existing structural elements when new
fields are added to a document. New documents can co-exist with existing documents
without any additional changes.
b Simple query language: Because data in a NoSQL document is stored in a de-
normalized state, you can get and update a document with the help of put and get
operations.
Performance Benchmarking
Our test infrastructure consisted of 4 extra-large instances on Amazon EC2 for the NoSQL databases and
1 instance for the client. Each instance had 4 virtual CPU cores with 2 Amazon compute units per core,
15GB of RAM, and 4 EBS 50GB volumes with RAID 0 striping. We used 64-bit Amazon Linux as the OS.
Networking was all 10GigE.
The client used the Yahoo! Cloud Serving Benchmark (YCSB), which was modified to suit our needs—we
added a warm-up phase and adjusted working-set load generation that simulates different users
accessing different data objects with meaningful data amounts and runtime. As shown in Figure 1, the
YCSB client consists of two main parts—the workload generator and workload scenarios.
The benchmark had 30 parallel client threads to drive the test, generating a mixed read-write workload
with 5% of creates, 33% of updates, 2% of deletes, and 60% of reads. For all the tests, we used 1.5 KB
documents (15 fields and 100 bytes each)—a typical document size across several NoSQL database
use-cases. The total number of documents in the cluster was 30 million—15 million of active and 15
million of replica documents for each database.
6
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
Figure 1: YCSB client—NoSQL database server architecture
We ran each test five times for every NoSQL database and compared the average data access latency
against different throughput levels. The NoSQL databases were setup using the following configuration:
Cassandra 1.1.2
Cassandra JVM settings:
1 MAX_HEAP_SIZE, which is the total amount of memory dedicated to the Java heap—6GB
2 HEAP_NEWSIZE, which is the total amount of memory for a new generation of objects—400MB
Cassandra settings:
1 RandomPartitioner that uses MD5 Hashing to evenly distribute rows across the cluster
2 Memtable of 4GB in size
7
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
Couchbase 2.0 - Beta build 1723
1 1 replica setting
2 12 GB used as per node RAM quota using the Couchbase bucket type
MongoDB
1 4 shards, each with 1 replica; each shard is a set of 2 nodes—primary and secondary
2 Journaling disabled
3 Each node was running 2 mongo daemon processes and 4 mongo router processes.
Results
Figure 2 shows the average latency at varying throughput levels for read, insert, and update operations
measured from the client to the server and back against varying levels of throughput for each NoSQL
database. The lower the latency values, the better.
8
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
Figure 2: Average latency vs. throughput
We also calculated the 95th percentile time taken for a request to execute a read, insert and update
operations measured from the client to the server and back against varying levels of throughput for each
NoSQL database.
9
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
Figure 3: 95th Percentile latency vs. throughput
10
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
Typically, you want to see flat latency curves irrespective of the throughput to ensure a consistent user
experience. Couchbase had faster read and write times than MongoDB and Cassandra.
Analysis
While not an exhaustive list, these are the most relevant pros and cons identified after reviewing these
databases:
MongoDB demonstrated the lowest throughput among all the databases compared in our test. We saw
high latencies for write operations at average throughput because the coarser locking in MongoDB limits
the write throughput of the server. Read requests were faster than in Cassandra but slower than in
Couchbase.
Increasing the size of the cluster in MongoDB was rather complicated. Many MongoDB operations need
to be done manually through the command line and it is mandatory that you have a highly skilled system
administrator. The advantages include support for in-built MapReduce and CAS transactions.
Cassandra showed better results than MongoDB because it uses an eventually consistent architecture
where in order to confirm a record you only need a reply from one node. In addition, unlike MongoDB,
Cassandra is rather flexible when the cluster needs to be resized. Unfortunately, its extreme
flexibility designed to sustain performance in highly distributed environments resulted in
additional limitations. The database supports no transactions and cannot block separate
records.
Couchbase showed the lowest latencies and highest throughput among all the databases compared.
The in-built object managed cache is responsible for the low latency. With fine grain locking at the
document level, Couchbase Server was capable of providing high throughput for both reads and writes.
The admin console in Couchbase has flexible settings for changing cluster size. Each document in the
cluster has an active copy and multiple replicas. Access requests for a particular document are processed
by the server holding the active document, which makes it possible to add extended transaction
processing systems, locking, and CAS. This also eliminates the problem with eventual consistency, when
read replicas have obsolete values. As a bonus for database administrators, Couchbase also comes with
advanced tools for monitoring the status of the whole cluster and its separate nodes.
Conclusion
Choosing the right NoSQL database for your application is a very complicated process because every
NoSQL solution is optimized for a particular type of load. This is why you should properly evaluate all
available options before picking a suitable data store for your application.
11
Using NoSQL Databases for Interactive Applications
©  Altoros  Systems
About the authors
Kirill Grigorchuk is the head of R&D department at Altoros Systems Inc. Mr. Grigorchuk has 15+ years
of experience in IT and profound skills in R&D process engineering, product and project management,
Web development, and big data. At the moment, he leads and coordinates research into a wide range of
cutting edge technologies, including distributed computing and NoSQL solutions.
Alexey Diomin is a senior Java developer at Altoros Systems Inc. with vast experience in distributed
computing, NoSQL databases, and Linux. Having excellent skills in building, administering, and
supporting large-scale distributed computing systems, Mr. Diomin did an extensive research into the field
of big data.
Additional Links
Cassandra website — http://cassandra.apache.org
Couchbase server website — http://www.couchbase.com
Mongodb website — http://www.mongodb.org
YCSB Github — https://github.com/Altoros/YCSB

More Related Content

PDF
Benchmarking Couchbase Server for Interactive Applications
PDF
Comparison between mongo db and cassandra using ycsb
PDF
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
PDF
Data Storage Management
PPTX
Allyourbase
PPT
NOSQL Database: Apache Cassandra
DOCX
Big Data - Hadoop Ecosystem
PPSX
A Seminar on NoSQL Databases.
Benchmarking Couchbase Server for Interactive Applications
Comparison between mongo db and cassandra using ycsb
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
Data Storage Management
Allyourbase
NOSQL Database: Apache Cassandra
Big Data - Hadoop Ecosystem
A Seminar on NoSQL Databases.

What's hot (20)

PPTX
Presentation of Apache Cassandra
PPTX
Cassandra
PDF
Design Patterns for Distributed Non-Relational Databases
PPTX
Cassandra an overview
PPTX
Cassandra vs. MongoDB
PPTX
1. beyond mission critical virtualizing big data and hadoop
PPTX
NoSQL databases - An introduction
PDF
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
PPTX
Cassandra tutorial
PPTX
Evaluating Apache Cassandra as a Cloud Database
PPT
Apache Cassandra training. Overview and Basics
PDF
Cassandra at eBay - Cassandra Summit 2012
PDF
Cassandra basics 2.0
PPTX
Cassandra Architecture FTW
PPT
Cassandra architecture
PPTX
Introduction to couchbase
PPTX
Why Cassandra?
PPTX
Couchbase presentation
PDF
cassandra
PPT
Introduction to cassandra
Presentation of Apache Cassandra
Cassandra
Design Patterns for Distributed Non-Relational Databases
Cassandra an overview
Cassandra vs. MongoDB
1. beyond mission critical virtualizing big data and hadoop
NoSQL databases - An introduction
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
Cassandra tutorial
Evaluating Apache Cassandra as a Cloud Database
Apache Cassandra training. Overview and Basics
Cassandra at eBay - Cassandra Summit 2012
Cassandra basics 2.0
Cassandra Architecture FTW
Cassandra architecture
Introduction to couchbase
Why Cassandra?
Couchbase presentation
cassandra
Introduction to cassandra
Ad

Viewers also liked (8)

PDF
Couchbase overview033113long
PDF
Couchbase Overview Nov 2013
PPTX
Partes de la computadora hernandez
PPTX
Partes de la computadora hernandez
PPTX
Sistema puesta a tierra Saia UFT
PDF
Couchbase overview033113long
PPTX
French English Relations
PDF
Build Application With MongoDB
Couchbase overview033113long
Couchbase Overview Nov 2013
Partes de la computadora hernandez
Partes de la computadora hernandez
Sistema puesta a tierra Saia UFT
Couchbase overview033113long
French English Relations
Build Application With MongoDB
Ad

Similar to Altoros using no sql databases for interactive_applications (20)

PPTX
Introduction to NoSQL
PDF
NOSQL- Presentation on NoSQL
PDF
NOSQL in big data is the not only structure langua.pdf
PDF
Dsm project-h base-cassandra
PPTX
No sql database
PPT
No sql databases explained
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
PDF
Data management in cloud study of existing systems and future opportunities
PPTX
MongoDB
PPTX
Why no sql ? Why Couchbase ?
DOCX
PPTX
No sqlpresentation
PDF
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
PDF
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
PPTX
Selecting best NoSQL
PDF
SQL vs NoSQL deep dive
PPT
NoSql Databases
PDF
Comparative study of no sql document, column store databases and evaluation o...
PDF
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
PDF
Nosql Presentation.pdf for DBMS understanding
Introduction to NoSQL
NOSQL- Presentation on NoSQL
NOSQL in big data is the not only structure langua.pdf
Dsm project-h base-cassandra
No sql database
No sql databases explained
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
Data management in cloud study of existing systems and future opportunities
MongoDB
Why no sql ? Why Couchbase ?
No sqlpresentation
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
Selecting best NoSQL
SQL vs NoSQL deep dive
NoSql Databases
Comparative study of no sql document, column store databases and evaluation o...
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
Nosql Presentation.pdf for DBMS understanding

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPT
Teaching material agriculture food technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
cuic standard and advanced reporting.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Electronic commerce courselecture one. Pdf
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Teaching material agriculture food technology
Programs and apps: productivity, graphics, security and other tools
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Dropbox Q2 2025 Financial Results & Investor Presentation
cuic standard and advanced reporting.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Altoros using no sql databases for interactive_applications

  • 1. Using NoSQL Databases for Interactive Applications By Alexey Diomin and Kirill Grigorchuk
  • 2. 2 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems Contents Introduction 3 Cassandra, MongoDB, and Couchbase 3 Key Considerations for Interactive Applications 3 Performance Benchmarking 5 Results 7 Analysis 10 Conclusion 10 About the authors 11 Additional Links 11
  • 3. 3 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems Introduction Interactive web applications need high-performance and scalability, calling for a different kind of database. If your website is not fast enough, users may quickly abandon it and look for alternatives. For example, in paid online social games, players are extremely demanding and will drop out, even if there is a slight delay. To deliver the best user experience, you must pick the right database. Traditional RDBMS are the wrong tool for the job because they do not provide the necessary scalability and performance for working with large amounts of data and application requests. In contrast, NoSQL databases have become a viable alternative to RDBMS, particularly for applications that need to change rapidly. They provide high throughput, low latency, and horizontal scaling. But with so many different options around, choosing the right NoSQL database for your specific application needs can be tricky. Recently we took the time to review and benchmark several NoSQL databases. This whitepaper provides an overview of three popular NoSQL solutions: Cassandra, MongoDB, and Couchbase. In addition, it presents a vendor-independent performance comparison of these products and can be used as a guide when choosing a NoSQL database for an interactive application. Cassandra, MongoDB, and Couchbase Since we had to pick some NoSQL databases to start with, we looked around for commonly used open- source NoSQL solutions. Cassandra, Couchbase, and MongoDB seemed to be the most mature open source products in their class. If you are already familiar with these NoSQL databases, you might want to skip the rest of this section and go directly to the performance evaluation. Cassandra is a distributed columnar key-value database with eventual consistency. It is optimized for write operations and has no central master—data can be written or read to and from any of the nodes in the cluster. Cassandra provides seamless horizontal scaling and has no single point of failure—if a node in the cluster fails, another node steps up to replace it. At the moment, Cassandra is an Apache 2.0 licensed project supported by the Apache Community. MongoDB is a schema-free, document-oriented, NoSQL database. In MongoDB, data is stored in the BSON format—BSON document is essentially a JSON document represented in a binary format, which allows for easier and faster integration of data in certain types of applications. This database also provides horizontal scalability and has no single point of failure. However, a MongoDB cluster is different from a Cassandra or Couchbase Server cluster—it includes an arbiter, a master, and multiple slaves. As of 2009, MongoDB is an open source project with an AGPL license supported by 10Gen. Couchbase is a NoSQL document database. Documents in Couchbase Server are stored as JSON. With built-in caching, Couchbase provides low-latency read and write operations with linearly scalable throughput. The architecture has no single point of failure. It is easy to scale-out the cluster and support live cluster topology changes. This means, there is no application downtime when you are upgrading your database, software, or hardware using rolling upgrades. Couchbase, Inc. develops and provides commercial support for the Couchbase Apache 2.0 licensed project. Key Considerations for Interactive Applications our database is the workhorse for your Web application. When choosing a database, the following factors are important to keep in mind: 1 Scalability: It’s  hard  to   predict  when  your  application  needs  to  scale,  but  when  your   Web site traffic suddenly spikes and your database does not have enough capacity, you need to scale your database quickly, on demand, and without any application changes. Similarly, when your system is idle, you should have a possibility to decrease the amount of resources used. Scaling your
  • 4. 4 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems database must be a simple operation—you should not need to deal with complicated procedures or make any changes to your application. In this paper, we will only speak about horizontal scalability, which involves dividing a system into small structural components hosted on different physical machines (or groups of machines) and/or increasing the number of servers that perform the same function in parallel. a Cassandra meets the requirements of an ideal horizontally scalable system. Nodes can be added seamlessly as you need more capacity. The cluster automatically utilizes the new resources. A node can be decommissioned in automatic or semi-automatic mode. b Couchbase scales horizontally. All nodes are identical and easy to setup. Nodes can be added or removed from the cluster with a single button click and no changes to the application. Auto-sharding evenly distributes data across all nodes in the cluster without any hotspots. Cross datacenter replication makes it possible to scale a cluster across datacenters for better data locality and faster data access. c MongoDB—this database has a number of functions related to scalability. These include: automatic sharding (auto-partitioning of data across servers), reads and writes distributed over shards, and eventually-consistent reads that can be distributed over replicated servers. When the system is idle, cluster size can only be decreased manually. The administrator  uses  the  management  console  to  change  the  system’s configuration. After that, the server process of MongoDB can be safely stopped on the vacant machines. 2 Performance: Interactive applications require very low read and write latencies. The database must deliver consistently low latencies for read and write operations independent of load or the size of data being accessed. In general, the read and write latency of NoSQL databases is very low because data is shared across all the nodes in a cluster while the application’s working set is in memory. Interactive applications need to support millions of users and have different workloads—read, write, or mixed. In the next section, we share some performance test results on different NoSQL databases measuring latency versus varying levels of throughput. 3 Availability: Interactive Web applications need a database that is highly available. If your application is down, you simply are not making any money. To ensure high availability, your solution should be able to do online upgrades to the latest version, easily remove a node for maintenance without affecting the availability of the cluster, handle online operations, such as backups, and provide disaster recovery, if an entire datacenter goes down. Below are examples of how availability is achieved in different NoSQL databases: a Cassandra: Every node in a Cassandra  cluster,  or  “ring”,  is  given  a  range  of  data  for   which it is responsible. When Cassandra receives a write operation designated to be stored in a node that has failed, it will automatically route the write request to a node that is alive. The node that receives the write request saves the write operation with a hint. The hint is a message that contains information about the failed node that should have handled the write request. The node that holds the hint monitors the node ring for the recovery of the failed node that missed the write request. If the failed node comes back online, the node that holds the hint will handoff the hint message to the recovered node, so that the write requests can be persisted in their proper location. When a new node is added to the cluster, the workload is distributed to this new node as well. b Couchbase: Couchbase Server maintains multiple copies (up to 3 replicas) of each document in a cluster. Each server is identical and serves active and replica documents. Data is uniformly distributed across all the nodes and the clients are aware of the topology. If a node in the cluster fails, Couchbase Server detects the failure and promotes replica documents on other live nodes to active. The client cluster map is updated to reflect the new topology, so the application continues to work without
  • 5. 5 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems downtime. When capacity is added, data is rebalanced automatically, also without any downtime. c MongoDB: Data in MongoDB is spread across several shards. Typically, each shard (replica set) consists of multiple mongo-daemon instances, including an arbiter node, a master node, and multiple slaves. If a slave node fails, the master node automatically redistributes the workload to the rest of the slave nodes. In case the master node crashes, the arbiter node elects a new master. If the arbiter node fails and there are no instances left in the shard, the shard is dead. In MongoDB, a replica set can span across multiple datacenters but writes can only go to one primary instance in one data-center (master-slave replication). 4 Ease of development: Relational databases require a rigid schema to model an application. If your application changes, your database schema needs to change as well. In this regard, NoSQL databases have the following advantages: a Flexible schema: You do not have to modify the existing structural elements when new fields are added to a document. New documents can co-exist with existing documents without any additional changes. b Simple query language: Because data in a NoSQL document is stored in a de- normalized state, you can get and update a document with the help of put and get operations. Performance Benchmarking Our test infrastructure consisted of 4 extra-large instances on Amazon EC2 for the NoSQL databases and 1 instance for the client. Each instance had 4 virtual CPU cores with 2 Amazon compute units per core, 15GB of RAM, and 4 EBS 50GB volumes with RAID 0 striping. We used 64-bit Amazon Linux as the OS. Networking was all 10GigE. The client used the Yahoo! Cloud Serving Benchmark (YCSB), which was modified to suit our needs—we added a warm-up phase and adjusted working-set load generation that simulates different users accessing different data objects with meaningful data amounts and runtime. As shown in Figure 1, the YCSB client consists of two main parts—the workload generator and workload scenarios. The benchmark had 30 parallel client threads to drive the test, generating a mixed read-write workload with 5% of creates, 33% of updates, 2% of deletes, and 60% of reads. For all the tests, we used 1.5 KB documents (15 fields and 100 bytes each)—a typical document size across several NoSQL database use-cases. The total number of documents in the cluster was 30 million—15 million of active and 15 million of replica documents for each database.
  • 6. 6 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems Figure 1: YCSB client—NoSQL database server architecture We ran each test five times for every NoSQL database and compared the average data access latency against different throughput levels. The NoSQL databases were setup using the following configuration: Cassandra 1.1.2 Cassandra JVM settings: 1 MAX_HEAP_SIZE, which is the total amount of memory dedicated to the Java heap—6GB 2 HEAP_NEWSIZE, which is the total amount of memory for a new generation of objects—400MB Cassandra settings: 1 RandomPartitioner that uses MD5 Hashing to evenly distribute rows across the cluster 2 Memtable of 4GB in size
  • 7. 7 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems Couchbase 2.0 - Beta build 1723 1 1 replica setting 2 12 GB used as per node RAM quota using the Couchbase bucket type MongoDB 1 4 shards, each with 1 replica; each shard is a set of 2 nodes—primary and secondary 2 Journaling disabled 3 Each node was running 2 mongo daemon processes and 4 mongo router processes. Results Figure 2 shows the average latency at varying throughput levels for read, insert, and update operations measured from the client to the server and back against varying levels of throughput for each NoSQL database. The lower the latency values, the better.
  • 8. 8 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems Figure 2: Average latency vs. throughput We also calculated the 95th percentile time taken for a request to execute a read, insert and update operations measured from the client to the server and back against varying levels of throughput for each NoSQL database.
  • 9. 9 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems Figure 3: 95th Percentile latency vs. throughput
  • 10. 10 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems Typically, you want to see flat latency curves irrespective of the throughput to ensure a consistent user experience. Couchbase had faster read and write times than MongoDB and Cassandra. Analysis While not an exhaustive list, these are the most relevant pros and cons identified after reviewing these databases: MongoDB demonstrated the lowest throughput among all the databases compared in our test. We saw high latencies for write operations at average throughput because the coarser locking in MongoDB limits the write throughput of the server. Read requests were faster than in Cassandra but slower than in Couchbase. Increasing the size of the cluster in MongoDB was rather complicated. Many MongoDB operations need to be done manually through the command line and it is mandatory that you have a highly skilled system administrator. The advantages include support for in-built MapReduce and CAS transactions. Cassandra showed better results than MongoDB because it uses an eventually consistent architecture where in order to confirm a record you only need a reply from one node. In addition, unlike MongoDB, Cassandra is rather flexible when the cluster needs to be resized. Unfortunately, its extreme flexibility designed to sustain performance in highly distributed environments resulted in additional limitations. The database supports no transactions and cannot block separate records. Couchbase showed the lowest latencies and highest throughput among all the databases compared. The in-built object managed cache is responsible for the low latency. With fine grain locking at the document level, Couchbase Server was capable of providing high throughput for both reads and writes. The admin console in Couchbase has flexible settings for changing cluster size. Each document in the cluster has an active copy and multiple replicas. Access requests for a particular document are processed by the server holding the active document, which makes it possible to add extended transaction processing systems, locking, and CAS. This also eliminates the problem with eventual consistency, when read replicas have obsolete values. As a bonus for database administrators, Couchbase also comes with advanced tools for monitoring the status of the whole cluster and its separate nodes. Conclusion Choosing the right NoSQL database for your application is a very complicated process because every NoSQL solution is optimized for a particular type of load. This is why you should properly evaluate all available options before picking a suitable data store for your application.
  • 11. 11 Using NoSQL Databases for Interactive Applications ©  Altoros  Systems About the authors Kirill Grigorchuk is the head of R&D department at Altoros Systems Inc. Mr. Grigorchuk has 15+ years of experience in IT and profound skills in R&D process engineering, product and project management, Web development, and big data. At the moment, he leads and coordinates research into a wide range of cutting edge technologies, including distributed computing and NoSQL solutions. Alexey Diomin is a senior Java developer at Altoros Systems Inc. with vast experience in distributed computing, NoSQL databases, and Linux. Having excellent skills in building, administering, and supporting large-scale distributed computing systems, Mr. Diomin did an extensive research into the field of big data. Additional Links Cassandra website — http://cassandra.apache.org Couchbase server website — http://www.couchbase.com Mongodb website — http://www.mongodb.org YCSB Github — https://github.com/Altoros/YCSB