History and Introduction to NoSQL over Traditional Rdbms

Introduction
• It’s born out of a need to handle larger data
volumes which forced a fundamental shift to
building large hardware platforms through
clusters of commodity servers.
• Advocates of NoSQL databases claim that they
can build systems that are more performant,
scale much better, and are easier to program
with.

Why Are NoSQL Databases
Interesting?
• Application development productivity. A lot
of application development effort is spent on
mapping data between in-memory data
structures and a relational database.
• A NoSQL database may provide a data model
that better fits the application’s needs, thus
simplifying that interaction and resulting in
less code to write, debug, and evolve.

Cont’d
• Large-scale data. Organizations are finding it valuable
to capture more data and process it more quickly.
• They are finding it expensive, if even possible, to do so
with relational databases.
• The primary reason is that a relational database is
designed to run on a single machine, but it is usually
more economic to run large data and computing loads
on clusters of many smaller and cheaper machines.
• Many NoSQL databases are designed explicitly to run
on clusters, so they make a better fit for big data
scenarios.

The Value of Relational Databases
• Getting at Persistent Data – provide a “backing”
store for volatile memory
– Two areas of memory:
• Fast, small, volatile main memory
• Larger, slower, non volatile backing store
• Since main memory is volatile to keep data around,
we write it to a backing store, commonly seen a
disk which can be persistent memory.
The backing store can be: • File system • Database
The database allows more flexibility than a file
system in storing large amounts of data in a way
that allows an application program to get
information quickly and easily.

Concurrency
• Multiple applications accessing shared data
– Transactions
• Enterprise applications tend to have many people using
same data at once, possibly modifying that data.
• We have to worry about coordinating interactions
between them to avoid things like double booking of
hotel rooms
• Since enterprise applications can have lots of users and
other systems all working concurrently, there’s a lot of
room for bad things to happen.
• Relational databases help to handle this by controlling
all access to their data through transactions..

Integration
• Enterprise requires multiple applications, written by
different teams, to collaborate in order to get things
done.
• Applications often need to use the same data and
updates made through one application have to be
visible to others.
• A common way to do this is shared database
integration where multiple applications store their
data in a single database.
• Using a single database allows all the applications to
use each others’ data easily, while the database’s
concurrency control handles multiple applications in
the same way as it handles multiple users in a single
application.

Impedance Mismatch
• Impedance mismatch is a term used in computer science
to describe the problem that arises when two systems or
components that are supposed to work together have
different data models, structures, or interfaces that
make communication difficult or inefficient.
• In the context of databases, impedance mismatch refers
to the discrepancy between the object-oriented
programming (OOP) model used in application code and
the relational model used in database management
systems (DBMS).
• While OOP models are designed to represent data as
objects with properties and methods, relational models
represent data as tables with columns and rows.
• This impedance mismatch can create challenges when it
comes to mapping objects in code to tables in a database
or vice versa.

Impedance Mismatch
• The difference between the relational model
and the in-memory data structures.
• The relational data model organizes data into
a structure of tables.
– Where a tuple is a set of name-value pairs and a
relation is a set of tuples.
• Structure and relationships have to be
mapped
– Rich, in-memory structures have to be translated
to relational representation to be stored on disk
– Translation: impedance mismatch

Cont’d
• Impedance mismatch has been made much
easier to deal with by the wide availability of
object relational mapping frameworks.
• Impedance mismatch has been made much
easier to deal with by the wide availability of
object relational mapping frameworks, such as
Hibernate and iBATIS that implement well-
known mapping patterns but the mapping
problem is still an issue.

Application and Integration Databases
• Data integration is the process of taking data
from different sources and formats and
combining it into a single data set.
• Integration database - with multiple applications,
usually developed by separate teams, storing
their data in a common database.
• This improves communication because all the
applications are operating on a consistent set of
persistent data.
Or
• An integration database is a database which acts
as the data store for multiple applications, and
thus integrates data across these applications .

Cont’d
Integrate many applications becomes (dramatically)
more complex than any single application needs
−Changes to the data model must be
coordinated
−Different structural and performance needs for
different applications
−Database integrity becomes an issue
Instead, treat the database as an application
database
−Single application, single development team
−Provide alternate integration mechanisms

Cont’d
• Data integration platforms are an efficient
approach to data utilization and storage.
• Rather than replicating data across locations
or environments, the integration database
serves as a single source of truth.

During the 2000s we saw a distinct shift to web
services where applications would communicate over
HTTP.
Alternate Integration Mechanism: Services
More recent push to use Web Services where applications
integrate over HTTP communications
−XML-RPC, SOAP, REST
∙Results in more flexibility for exchange data structure
−XML, JSON, etc.
−Text-based protocols
∙Results in letting application developers choose database
−Application databases
−Relational databases are often still an appropriate
choice

Application Database
• Application Database for a database that is
controlled and accessed by a single application.
• With an application database, only the team
using the application needs to know about the
database structure, which makes it much easier
to maintain and evolve the schema.
• Since the application team controls both the
database and the application code, the
responsibility for database integrity can be put in
the application code.

The Attack of the Clusters
The 2000s saw the web grow enormously
−Web use tracking data, social networks, activity logs,
mapping data, etc.
−Huge websites serving huge numbers of visitors
∙To handle the increase in data and traffic required more
computing resources
∙Instead of building bigger machines with more
processors, storage, and memory, use clusters of small,
commodity machines
−Cheaper, more resilient
∙But relational databases are not designed to be run on
clusters

Cont’d
• Coping with the increase in data and traffic
required more computing resources.
• To handle this kind of increase, you have two
choices:
• 1. Scaling up implies:
– bigger machines
– more processors
– more disk storage
– more memory
• Scaling up disadvantages:
– But bigger machines get more and more expensive.
– There are real limits as size increases.

Cont’d
• Use lots of small machines in a cluster:
– A cluster of small machines can use commodity
hardware and ends up being cheaper at these
kinds of scales.
– It can also be more resilient—while individual
machine failures are common, the overall cluster
can be built to keep going despite such failures,
providing high reliability.

Clustered Relational Databases
• Relational databases are not designed to be run on
Clusters.
• Clustered relational databases, such as the Oracle RAC
or Microsoft SQL Server, work on the concept of a
shared disk subsystem where cluster still has the disk
subsystem as a single point of failure.
• Relational databases could also be run as separate
servers for different sets of data, effectively sharding
the database.
• Even though this separates the load, all the sharding
has to be controlled by the application which has to
keep track of which database server to talk to for each
bit of data.

Cont’d
• We lose any querying, referential integrity,
transactions, or consistency controls that cross shards.
• Commercial relational databases (licensed) are usually
priced on a single-server assumption, so running on a
cluster raised prices.
• This mismatch between relational databases and
clusters led some organization to consider an
alternative route to data storage. Two companies in
particular
– 1. Google
– 2.Amazon
• Both were running large clusters
• They were capturing huge amounts of data

The Emergence of NoSQL
• Historical note: ‘NoSQL’ was first used to name an
open-source relational database development led by
Carlo Strozzi.
• Current use of the phrase came from a conference
meet up discussing “open-source, distributed,
nonrelational databases.
• The name NoSQL comes from the fact that the NoSQL
databases doesn’t use SQL as a query language.
• Instead, the database is manipulated through shell
scripts that can be combined into the usual UNIX
pipelines.

Cont’d
• Most NoSQL databases are driven by the need to run
on clusters.
• Relational databases use ACID transactions to handle
consistency across the whole database.
• This inherently clashes with a cluster environment, so
NoSQL databases offer a range of options for
consistency and distribution.
• Not all NoSQL databases are strongly oriented
towards running on clusters.
• Graph databases are one style of NoSQL databases
that uses a distribution model similar to relational
databases but offers a different data model that makes
it better at handling data with complex relationships.

Cont’d
• NoSQL databases operate without a schema,
allowing you to freely add fields to database
records without having to define any changes
in structure first.
• Two primary reasons for considering NoSQL:
– 1) To handle data access with sizes and
performance that demand a cluster
– 2) To improve the productivity of application
development by using a more convenient data
interaction style.

Cont’d
• A NoSQL is a database that provides a
mechanism for storage and retrieval of data,
they are used in real-time web applications
and big data and their use are increasing over
time.
• Many NoSQL stores compromise consistency
in favor of availability, speed and partition
tolerance.

Advantages of NoSQL
• 1. High Scalability
– NoSQL databases use sharding for horizontal
scaling.
– It can handle huge amount of data because of
scalability, as the data grows NoSQL scale itself to
handle that data in efficient manner.
• 2. High Availability
– Auto replication feature in NoSQL databases
makes it highly available.

Disadvantages of NoSQL
1. Narrow Focus: It is mainly designed for storage, but it
provides very little functionality.
2. Open Source: NoSQL is open-source database that is
two database systems are likely to be unequal.
3. Management Challenge: Big data management in
NoSQL is much more complex than a relational
database.
4. GUI is not available: GUI mode tools to access the
database is not flexibly available in the market.
5. Backup: it is a great weak point for some NoSQL
databases like MongoDB.
6. Large Document size: Data in JSON format increases
the document size.

When should NoSQL be used
• When huge amount of data need to be stored
and retrieved.
• The relationship between data you store is not
that important.
• The data changing over time and is not
structured.
• Support of constraint and joins is not required at
database level.
• The data is growing continuously and you need to
scale the database regular to handle the data.

Characteristics of NoSQL Databases
They do not use SQL and the relational model
• Some do have query languages which are similar to SQL to
be easy to learn and use.
∙ Mostly open-source projects
∙Designed to be distributed –clustered
−No expectation of ACID properties
−Range of options for consistency and distribution
∙Schema free
−Freely add fields to records without having to define any
changes in structure first
−Non-uniform data and custom fields
∙A no Definition of NoSQL: An ill-defined set of mostly open-
source databases, mostly developed in the early 21stcentury, and
mostly not using SQL

Polyglot Persistence
• Polyglot persistence is a conceptual term that refers to the use of
different data storage approaches and technologies to support the
unique storage requirements of various data types that live within
enterprise applications.
• Polyglot persistence refers to using different data storage
technologies to handle varying data storage needs.
• Polyglot Persistence is a fancy term to mean that when storing data,
it is best to use multiple data storage technologies, chosen based
upon the way data is being used by individual applications or
components of a single application.
• Different kinds of data are best dealt with different data stores. In
short, it means picking the right tool for the right use case.

Example
• Looking at a Polyglot Persistence example, an
e-commerce platform will deal with many
types of data (i.e. shopping cart, inventory,
completed orders, etc). Instead of trying to
store all this data in one database, which
would require a lot of data conversion to make
the format of the data all the same, store the
data in the database best suited for that type
of data. So the e-commerce platform might
look like this:

History and Introduction to NoSQL over Traditional Rdbms

More Related Content

What's hot (20)

Similar to History and Introduction to NoSQL over Traditional Rdbms (20)

Recently uploaded (20)

History and Introduction to NoSQL over Traditional Rdbms