SlideShare a Scribd company logo
NoSQL
Introduction to NoSQL
UNIT -1
Why NoSQL?
• Relational databases have been the default
choice for serious data storage, especially in the
world of enterprise applications your only
choice can be which relational database to use.
• After such a long period of dominance, the
current excitement about NoSQL databases
comes as a surprise.
• Now we’ll explore why
relational databases
The Value of Relational Databases
1. Getting at Persistent Data
Two areas of memory:
• Fast, small, volatile main memory
• Larger, slower, non volatile backing store
• Since main memory is volatile to keep data around, we
write it to a backing store, commonly seen a disk which
can be persistent memory.
The backing store can be:
• File system
• Database
• The database allows more flexibility than a file system
in storing large amounts of data in a way that allows
an application program to get information quickly and
easily.
2. Concurrency
• Enterprise applications tend to have many people using
same data at once, possibly modifying that data. We
have to worry about coordinating interactions between
them to avoid things like double booking of hotel
rooms.
• Since enterprise applications can have lots of users and
other systems all working concurrently, there’s a lot of
room for bad things to happen. Relational databases
help to handle this by controlling all access to their
data through transactions.
3. Integration
• Enterprise requires multiple applications, written by
different teams, to collaborate in order to get things
done. Applications often need to use the same data and
updates made through one application have to be
visible to others.
• A common way to do this is shared database
integration where multiple applications store their data
in a single database.
• Using a single database allows all the applications to use
each others’ data easily, while the database’s
concurrency control handles multiple applications in
the same way as it handles multiple users in a single
application.
4. A (Mostly) Standard
Model
• Relational databases have succeeded because they
provide the core benefits in a (mostly) standard way.
• As a result, developers can learn the basic relational
model and apply it in many projects.
• Although there are differences between different
relational databases, the core mechanisms remain the
same.
Impedance Mismatch
• For Application developers using relational databases, the
biggest frustration has been what’s commonly called the
impedance mismatch: the difference between the relational
model and the in-memory data structures.
• The relational data model organizes data into a structure of
tables. Where a tuple is a set of name-value pairs and a relation
is a set of tuples.
• The values in a relational tuple have to be simple—they
cannot contain any structure, such as a nested record or a
list. This limitation isn’t true for in-memory data structures,
which can take on much richer structures than relations.
• So if you want to use a richer in-memory data structure, you
have to translate it to a relational representation to store it on
disk. Hence the impedance mismatch—two different
representations that require translation.
Figure: An order, which looks like a single
aggregate structure in the UI, is split into many
rows from many tables in a relational database
• The impedance mismatch lead to relational databases
being replaced with databases that replicate the in-
memory data structures to disk. That decade was marked
with the growth of object-oriented programming
languages, and with them came object-oriented
databases—both looking to be the dominant
environment for software development in the new
millennium. However, while object-oriented languages
succeeded in becoming the major force in programming,
object-oriented databases faded into obscurity.
• Impedance mismatch has been made much easier to deal
with by the wide availability of object relational
mapping frameworks, such as Hibernate and iBATIS
that implement well-known mapping patterns, but the
mapping problem is still an issue.
• Relational databases continued to dominate the
enterprise computing world in the 2000s, but during that
decade cracks began to open in their dominance.
Application and Integration Databases
• In relational databases, the database acts as an integration
database—where multiple applications developed by
separate teams storing their data in a common
database. This improves communication because all the
applications are operating on a consistent set of persistent
data.
There are downsides to shared database integration.
• A structure that’s designed to integrate many applications
is more complex than any single application needs.
• If an application wants to make changes to its data
storage, it needs to coordinate with all the other
applications using the database.
• Different applications have different structural and
performance needs, so an index required by one
application may cause a problematic hit on inserts for
another.
• A different approach is to treat your database as an
application database—which is only accessed by a
single application codebase that’s looked after by a
single team.
Advantages:
• With an application database, only the team using the
application needs to know about the database
structure, which makes it much easier to maintain and
evolve the schema.
• Since the application team controls both the database and
the application code, the responsibility for database
integrity can be put in the application code.
Web
Services
• During the 2000s we saw a distinct shift to web services
where applications would communicate over HTTP.
• If you communicate with SQL, the data must be
structured as relations. However, with a service, you are
able to use richer data structures with nested records
and lists. These are usually represented as documents in
XML or, more recently, JSON.
• In general, with remote communication you want to
reduce the number of round trips involved in the
interaction, so it’s useful to be able to put a rich structure
of information into a single request or response.
• If you are going to use services for integration, most
of the time web services —using text over HTTP—
is the way to go. However, if you are dealing with
highly performance-sensitive interactions, you may
need a binary protocol. Only do this if you are sure
you have the need, as text protocols are easier to
work with—consider the example of the Internet.
• Once you have made the decision to use an
application database, you get more freedom of
choosing a database. Since there is a decoupling
between your internal database and the services with
which you talk to the outside world, the outside
world doesn’t have to care how you store your
data, allowing you to consider non-relational
options.
Attack of the Clusters
• In 2000s several large web properties dramatically
increase in scale. This increase in scale was
happening along many dimensions.
Websites
• Started tracking activity and structure in a very
detailed way.
• Large sets of data appeared: links, social networks,
activity in logs, mapping data.
• With growth in data came a growth in users .
Coping with the increase in data and traffic required
more computing resources. To handle this kind of
increase, you have two choices:
1. Scaling up implies:
• bigger machines
• more processors
• more disk storage
• more memory
Scaling up disadvantages:
• But bigger machines get more and more expensive.
• There are real limits as size increases.
2. Use lots of small machines in a cluster:
• A cluster of small machines can use commodity
hardware and ends up being cheaper at these kinds of
scales.
• more resilient—while individual machine failures are
common, the overall cluster can be built to keep
going despite such failures, providing high
reliability.
Cluster disadvantages
• Relational databases are not designed to be run on
clusters.
• Clustered relational databases, such as the Oracle
Microsoft SQL Server, work on the concept of a
shared disk subsystem where cluster still has the
disk subsystem as a single point of failure.
• Relational databases could also be run as separate
servers for different sets of data, effectively sharding
the database. Even though this separates the load, all
the sharding has to be controlled by the
application which has to keep track of which
database server to talk to for each bit of data.
• We lose any querying, referential integrity,
transactions, or consistency controls that cross shards.
• Commercial relational databases (licensed) are
usually priced on a single-server assumption, so
running on a cluster raised prices.
This mismatch between relational databases and
clusters led some organization to consider an
alternative route to data storage. Two companies in
particular
1. Google
2. Amazon
• Both were running large clusters
• These things gave them the motive. Both were successful and
growing companies with strong technical components, which
gave them the means and opportunity. It was no wonder they had
murder in mind for their relational databases. As the 2000s
drew on, both companies produced brief but highly influential
papers about their efforts:
– BigTable from Google
– Dynamo from Amazon
• It’s often said that Amazon and Google operate at scales far
removed from most organizations, so the solutions they needed
may not be relevant to an average organization. But more and
more organizations are beginning to explore what they can do
by capturing and processing more data—and to run into the same
problems. So people began to explore making databases along
similar lines—explicitly designed to live in a world of clusters.
The Emergence of NoSQL
For NoSQL there is no generally accepted definition, nor an
authority to provide one, so all we can do is discuss some
common characteristics of the databases that tend to be called
“NoSQL.”
• The name NoSQL comes from the fact that the NoSQL
databases doesn’t use SQL as a query language. Instead,
the database is manipulated through shell scripts that can
be combined into the usual UNIX pipelines.
• They are generally open-source projects.
• Most NoSQL databases are driven by the need to run on
clusters. Relational databases use ACID transactions to
handle consistency across the whole database. This
inherently clashes with a cluster environment, so NoSQL
databases offer a range of options for consistency and
distribution.
• Not all NoSQL databases are strongly oriented
towards running on clusters. Graph databases are
one style of NoSQL databases that uses a distribution
model similar to relational databases but offers a
different data model that makes it better at handling
data with complex relationships.
• NoSQL databases operate without a schema,
allowing you to freely add fields to database records
without having to define any changes in structure
first. This is particularly useful when dealing with
non uniform data and custom fields which forced
relational databases to use names like customField6
or custom field tables that are awkward to process
and understand.
• When you first hear “NoSQL,” an immediate
question is what does it stand for—a “no” to SQL?
Most people who talk about NoSQL say that it really
means “Not Only SQL,” but this interpretation has
a couple of problems. Most people write “NoSQL”
whereas “Not Only SQL” would be written
“NOSQL.”
• To resolve these problems, don’t worry about what
the term stands for, but rather about what it means.
Thus, when “NoSQL” is applied to a database, it
refers to an ill-defined set of mostly open-source
databases, mostly developed in the early 21st
century, and mostly not using SQL.
• It’s better to think of NoSQL as a movement rather than a
technology. We don’t think that relational databases are going
away—they are still going to be the most common form of
database in use. Their familiarity, stability, feature set, and
available support are compelling arguments for most projects.
• The change is that now we see relational databases as one
option for data storage. This point of view is often referred to as
polyglot persistence—using different data stores in different
circumstances.
• We need to understand the nature of the data we’re storing and
how we want to manipulate it. The result is that most
organizations will have a mix of data storage technologies for
different circumstances. In order to make this polyglot world
work, our view is that organizations also need to shift from
integration databases to application databases.
• In our account of the history of NoSQL development,
we’ve concentrated on big data running on clusters.
The big data concerns have created an opportunity for
people to think freshly about their data storage needs,
and some development teams see that using a
NoSQL database can help their productivity by
simplifying their database access even if they have
no need to scale beyond a single machine.
Two primary reasons for considering NoSQL:
1) To handle data access with sizes and performance
that demand a cluster
2) To improve the productivity of application
development by using a more convenient data
interaction style.
A NoSQL is a database that provides a mechanism for
storage and retrieval of data, they are used in real-time
web applications and big data and their use are
increasing over time.
Many NoSQL stores compromise consistency in favor of
availability, speed and partition tolerance.
Advantages of NoSQL:
1. High Scalability
NoSQL databases use sharding for horizontal scaling. It
can handle huge amount of data because of scalability,
as the data grows NoSQL scale itself to handle that data in
efficient manner.
2. High Availability
Auto replication feature in NoSQL databases makes it
highly available.
Disadvantages of NoSQL:
1. Narrow Focus: It is mainly designed for storage,
but it provides very little functionality.
2. Open Source: NoSQL is open-source database that is
two database systems are likely to be unequal.
3.Management Challenge: Big data management in
NoSQL is much more complex than a relational
database.
4.GUI is not available: GUI mode tools to access the
database is not flexibly available in the market.
5. Backup: it is a great weak point for some NoSQL
databases like MongoDB.
6.Large Document size: Data in JSON format
When should NoSQL be
used
• When huge amount of data need to be stored and
retrieved.
• The relationship between data you store is not that
important.
• The data changing over time and is not structured.
• Support of constraint and joins is not required at
database level.
• The data is growing continuously and you need to
scale the database regular to handle the data.
Key Points
• Relational databases have been a successful
technology for twenty years, providing persistence,
concurrency control, and an integration mechanism.
• Application developers have been frustrated with
the impedance mismatch between the relational
model and the in-memory data structures.
• There is a movement away from using integration
databases towards encapsulating databases within
applications and integrating through services.
• The vital factor for a change in data storage was the
need to support large volumes of data by running
on clusters. Relational databases are not designed to
run efficiently on clusters.
The common characteristics of NoSQL databases
1. Not using the relational model
2. Running well on clusters
3. Open-source
4. Built for the 21st century web estates
5. Schemaless
6. The most important result of the rise of NoSQL is
Polyglot Persistence.
Aggregate Data Models
Data Model: Model through which we identify and manipulate
our data. It describes how we interact with the data in the
database.
Storage model: Model which describes how the database
stores and manipulates the data internally.
In NoSQL “data model” refer to the model by which the
database organizes data more formally called a metamodel.
The dominant data model is relational data model which
uses set of tables:
• Each table has rows
• Each row representing entity
• Column describe entity
NoSQL move away from the relational model.
Each NoSQL solution has a different model that it
uses:
1. Key-value
2. Document
3. Column-family
4. Graph
Out of this first three share a common characteristic
of their data models which is called as aggregate
orientation.
Aggregates
The relational model takes the information to store and
divides it into tuples.
A tuple is a limited data structure:
• You cannot nest one tuple within another to get nested
records.
• You cannot put a list of values or tuples within another.
Aggregate model recognizes that often we need to operate
on data that have a more complex structure than a set of
tuples.
• It has complex record that allows lists and other record
structures to be nested inside it.
• key-value, document, and column-family databases all
Definition:
• In Domain-Driven Design, an aggregate is a collection
of related objects that we wish to treat as a unit. It is a
unit for data manipulation and management of
consistency. Typically, we like to update aggregates
with atomic operations and communicate with our data
storage in terms of aggregates.
Advantages of Aggregate:
• Dealing in aggregates makes easy to handle operating
on a cluster, since the aggregate makes a natural unit
for replication and sharding.
• Aggregates are also often easier for application
programmers to work with, since they often
manipulate data through aggregate structures.
Example of Relations and Aggregates
• Let’s assume we have to build an e-commerce website; we are
going to be selling items directly to customers over the web, and
we will have to store information about users, our product catalog,
orders, shipping addresses, billing addresses, and payment data.
• Data model for a relational database:
Sample data for Relational Data Model
Everything is properly normalized, no data is repeated in multiple
tables. We also have referential integrity.
An aggregate data model
Sample Data for aggregate data model
// in customers
{
“id":1,
"name":"Martin",
"billingAddress":[{"city":"Chicago"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":27,
"price": 32.45,
"productName": "NoSQL Distilled"
}],
"shippingAddress":[{"city":"Chicago"}]
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
• We’ve used the black-diamond composition
marker in UML to show how data fits into the
aggregation structure.
• The customer aggregate contains a list of
billing addresses.
• The order aggregate contains a list of order
items, a shipping address, and payments.
• The payment itself contains a billing address
for that payment.
• Here single logical address record appears three times
but instead of using IDs it’s treated as a value and
copied each time. This fits the domain where we
would not want the shipping address, nor the
payment’s billing address, to change.
• The link between the customer and the order isn’t
within either aggregate—it’s a relationship between
aggregates. We’ve shown the product name as part of
the order item here—this kind of denormalization is
similar to the tradeoffs with relational databases, but is
more common with aggregates because we want to
minimize the number of aggregates we access
during a data interaction.
•To draw aggregate boundary you have to think about
accessing that data—and make that part of your thinking when
developing the application data model.
•Indeed we could draw our aggregate boundaries differently,
putting all the orders for a customer into the customer aggregate
Embed all the objects for customer and the customer’s orders
Sample Data for above aggregate data model
// in customers
{ "customer":
{
"id": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"orders": [
{
"id":99,
"customerId":1
,
"orderItems":[
{
"productId":27,
"price": 32.45,
"productName": "NoSQL Distilled"
}],
"shippingAddress":[{"city":"Chicago"}]
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city":
"Chicago"}
}],
• There’s no universal answer for how to draw your
aggregate boundaries. It depends entirely on how you
tend to manipulate your data.
• If you tend to access a customer together with all of
that customer’s orders at once, then you would prefer
a single aggregate.
• However, if you tend to focus on accessing a single
order at a time, then you should prefer having separate
aggregates for each order.
Consequences of Aggregate
Orientation
• Relational databases have no concept of aggregate within their data
model, so we call them aggregate-ignorant. In the NoSQL world,
graph databases are also aggregate-ignorant. Being aggregate-
ignorant is not a bad thing. It’s often difficult to draw aggregate
boundaries well, particularly if the same data is used in many
different contexts.
• An order makes a good aggregate when a customer is making and
reviewing orders, and when the retailer is processing orders.
• However, if a retailer wants to analyze its product sales over the
last few months, then an order aggregate becomes a trouble. To
get to product sales history, you’ll have to dig into every aggregate
in the database. So an aggregate structure may help with some
data interactions but be an obstacle for others.
• An aggregate-ignorant model allows you to easily look at
the data in different ways, so it is a better choice when
you don’t have a primary structure for manipulating
your data.
• The aggregate orientation helps greatly with running on
a cluster.
• If we’re running on a cluster, we need to minimize how
many nodes we need to query when we are gathering
data.
• By explicitly including aggregates, we give the database
important information about which bits of data will be
manipulated together, and thus should live on the same
node.
Aggregates have an important consequence for transactions:
• Relational databases allow you to manipulate any combination of
rows from any tables in a single transaction. Such transactions are
called ACID transactions.
• Many rows spanning many tables are updated as a single operation.
This operation either succeeds or fails in its entirety, and concurrent
operations are isolated from each other so they cannot see a partial
update.
• It’s often said that NoSQL databases don’t support ACID
transactions and thus sacrifice consistency, but they support
atomic manipulation of a single aggregate at a time.
• This means that if we need to manipulate multiple aggregates in an
atomic way, we have to manage that ourselves in the application
code. Graph and other aggregate-ignorant databases usually do
support ACID transactions similar to relational databases.
Key-Value and Document Data
Models
• Key-value and document databases were strongly
aggregate-oriented means we think these databases as
primarily constructed through aggregates.
• Both of these types of databases consist of lots of
aggregates with each aggregate having a key or ID
that’s used to get at the data.
• Riak and Redis database are examples of key-value
databases.
• MongoDB and CouchDB are most popular document
based databases.
Key-Value Data
Model
• Key-value databases are the simplest of the NoSQL
databases: The basic data structure is a dictionary or map.
You can store a value, such as an integer, string, a JSON
structure, or an array, along with a key used to reference that
value.
• For example, a simple key-value database might have a value
such as "Douglas Adams". This value is then assigned an ID,
such as cust1237.
• Using a JSON structure adds complexity to the database. For
example, the database could store a full mailing address in
addition to a person's name. In the previous example, key
cust1237 could point to the following information:
{ name: "Douglas Adams",
street: "782 Southwest St.",
city: "Austin",
state: "TX“
}
Weakness of key-value database
• This model will not provide any kind of
traditional database capabilities such as atomicity
of transaction, or consistency when multiple
transactions are executed simultaneously. Such
capability must be provided by application
itself.
• As the volume of data increases, maintain
unique values as keys may become more difficult;
addressing this issue requires the introduction of
some complexity in generating character
strings that will remain unique among an
extremely large set of keys.
Document Data Model
• It is a type of non-relational database that is designed to store and query data
as JSON-like documents which makes it easier for developer to store and query
data in a database.
• It works well with use cases such as catalogs, user profiles etc.
• In document store database the data which is collection of key-value pairs is
compressed as a document store.
• The flexible, semi-structured and hierarchical nature of documents and
document databases allows them to evolve with applications need.
• Example: Book document
{ “id” : ”98765432”,
“type” : ”book”,
“ISBN”: 987-6-543-21012-3,
“Author”:
{
“Lname”:”Roe”,
“MI”:”T”,
“Fname”:”Richard”
},
“Title”: “Understanding document databases”
}
Difference between key-value and document database
1. Opacity
• In key-value database, the aggregate is opaque to
the database—just some big blob of mostly
meaningless bits. The advantage of opacity is that
we can store whatever we like in the aggregate.
The database may impose some general size limit,
but other than that we have complete freedom.
• In contrast, a document database is able to see a
structure in the aggregate. A document database
imposes limits on what we can place in it, defining
allowable structures and types. In return, however,
we get more flexibility in access.
2. Access
• With a key-value store, we can only access an
aggregate by lookup based on its key.
• With a document database, we can submit
queries to the database based on the fields in
the aggregate.
• In document database we can retrieve part of the
aggregate rather than the whole thing, and
database can create indexes based on the
contents of the aggregate.
Column-Family Stores
• One of the early and powerful NoSQL databases was
Google’s BigTable, it is a two-level map. It has been a
model that influenced later databases such as HBase and
Cassandra.
• These databases with a BigTable-style data model are
often referred to as column stores. The thing that made
them different was the way in which they physically
stored data.
• Most databases have a row as a unit of storage which,
in particular, helps write performance. However, there
are many scenarios where writes are rare, but you
often need to read a few columns of many rows at
once.
• In this situation, it’s better to store groups of columns
for all rows as the basic storage unit—which is why
• BigTable and its next generation follow this notion of
storing groups of columns (column families)
together, we refer this as column-family databases.
• Column-family model is a two-level aggregate
structure. As with key-value stores, the first key is
often described as a row identifier, picking up the
aggregate of interest. The difference with column-
family structures is that this row aggregate is itself
formed of a map of more detailed values. These
second-level values are referred to as columns. As
well as accessing the row as a whole, operations also
allow picking out a particular column, so to get a
particular customer’s name from you could do
something like get('1234', 'name').
Fig. Representing customer info in a column-family structure
Column-family databases organize their columns into column
families. Each column has to be part of a single column family, and
the column acts as unit for access, with the assumption that data
for a particular column family will be usually accessed together.
• This also gives you a couple of ways to think about how
the data is structured.
• Row-oriented: Each row is an aggregate (for example,
customer with the ID of 1234) with column families
representing useful chunks of data (profile, order history)
within that aggregate.
• Column-oriented: Each column family defines a record
type (e.g., customer profiles) with rows for each of the
records. You then think of a row as the join of records in
all column families.
• This latter aspect reflects the columnar nature of
column-family databases. Since the database knows
about these common groupings of data, it can use this
information for its storage and access behavior.
• Cassandra uses the terms “wide” and “skinny.”
• Skinny rows have few columns with the same
columns used across the many different rows.
• In this case, the column family defines a
recordtype, each row is a record, and each
column is a field.
• A wide row has many columns (perhaps
thousands), with rows having very different
columns.
• A wide column family models a list, with each
column being one element in that list.
Summarizing Aggregate-Oriented
Databases
• These are the three different styles of aggregate-
oriented data models. What they all share is the
notion of an aggregate indexed by a key that you
can use for lookup. This aggregate is central to
running on a cluster, as the database will ensure that
all the data for an aggregate is stored together on
one node. The aggregate also acts as the atomic
unit for updates, providing a useful, if limited,
amount of transactional control.
• Within that notion of aggregate, we have some
differences. The key-value data model treats the
aggregate as an opaque whole, which means you
can only do key lookup for the whole aggregate—
you cannot run a query nor retrieve a part of the
aggregate.
• The document model makes the aggregate
transparent to the database allowing you to do
queries and partial retrievals. However, since the
document has no schema, the database cannot
act much on the structure of the document to
optimize the storage and retrieval of parts of the
aggregate.
• Column-family models divide the aggregate into
column families, allowing the database to treat
them as units of data within the row aggregate.
This imposes some structure on the aggregate
but allows the database to take advantage of that
structure to improve its accessibility.
Key Points
• An aggregate is a collection of data that we interact
with as a unit. Aggregates form the boundaries for
ACID operations with the database.
• Key-value, document, and column-family databases
can all be seen as forms of aggregate oriented
database.
• Aggregates make it easier for the database to
manage data storage over clusters.
• Aggregate-oriented databases work best when most
data interaction is done with the same aggregate;
aggregate-ignorant databases are better when
interactions use data organized in many different
More Details on Data Models
Relationships
• Aggregates are useful because they put together data
that is commonly accessed together. But there are still
lots of cases where data that’s related is accessed
differently.
• Consider the relationship between a customer and all of
his orders. Some applications will want to access the
order history whenever they access the customer; this
fits in well with combining the customer with his
order history into a single aggregate.
• Other applications, however, want to process orders
individually and thus model orders as independent
aggregates.
• In this case, you’ll want separate order and customer
aggregates but with some kind of relationship between
them so that any work on an order can look up customer
data. The simplest way to provide such a link is to embed
the ID of the customer within the order’s aggregate
data.
• That way, if you need data from the customer record, you
read the order, search out the customer ID, and make
another call to the database to read the customer data. This
will work, and will be just fine in many scenarios—but
the database will be ignorant of the relationship in the
data. This can be important because there are times when
it’s useful for the database to know about these links.
• As a result, many databases—even key-value stores—
provide ways to make these relationships visible to the
database. Document stores make the content of the
aggregate available to the database to form indexes and
queries.
• An important aspect of relationships between aggregates
is how they handle updates. Aggregate oriented
databases treat the aggregate as the unit of data-
retrieval. Consequently, atomicity is only supported
within the contents of a single aggregate.
• If you update multiple aggregates at once, you have
to deal yourself with a failure partway through.
• Relational databases help you with this by allowing
you to modify multiple records in a single
transaction, providing ACID guarantees while altering
many rows.
• All of this means that aggregate-oriented databases
become more awkward as you need to operate across
multiple aggregates.
• This may imply that if you have data based on lots
of relationships, you should prefer a relational
database over a NoSQL store.
• While that’s true for aggregate-oriented databases,
it’s worth remembering that relational databases
aren’t all that stellar with complex relationships
either.
• This makes it a good moment to introduce another
category of databases that’s often lumped into the
NoSQL pile.
Graph Databases
• Graph databases are an odd fish in the NoSQL
pond.
• Most NoSQL databases were inspired by the
need to run on clusters, which led to
aggregate-oriented data models of large
records with simple connections.
• Graph databases are motivated by a different
frustration with relational databases and thus
have an opposite model—small records with
complex interconnections, something like
Fig: An example graph structure
In this context, a graph isn’t a bar chart or histogram;
instead, we refer to a graph data structure of nodes
connected by edges.
• In Fig: we have a web of information whose nodes are
very small (nothing more than a name) but there is a
rich structure of interconnections between them. With
this structure, we can ask questions such as “find the
books in the Databases category that are written by
someone whom a friend of mine likes.”
• Graph databases are ideal for capturing any data
consisting of complex relationships such as social
networks, product preferences, or eligibility rules.
• The fundamental data model of a graph database is
very simple: nodes connected by edges (also called
arcs).
Difference between Graph & Relational databases
• Although relational databases can implement
relationships using foreign keys, the joins required to
navigate around can get quite expensive—which
means performance is often poor for highly connected
data models.
• Graph databases make traversal along the
relationships very cheap. A large part of this is
because graph databases shift most of the work of
navigating relationships from query time to insert
time. This naturally pays off for situations where
querying performance is more important than insert
speed.
• The emphasis on relationships makes graph
databases very different from aggregate-
oriented databases.
• Graph databases are more likely to run on a
single server rather than distributed across
clusters.
• ACID transactions need to cover multiple
nodes and edges to maintain consistency.
• The only thing graph database have in common
with aggregate-oriented databases is their
rejection of the relational model.
Schemaless Databases
• A common theme across all the forms of NoSQL
databases is that they are schemaless.
• When you want to store data in a relational
database, you first have to define a schema—a
defined structure for the database which says what
tables exist, which columns exist, and what data
types each column can hold.
• Before you store some data, you have to have the
schema defined for it in relational database.
With NoSQL databases, way of storing
data
• A key-value store allows you to store any data you
like under a key.
• A document database effectively does the same
thing, since it makes no restrictions on the
structure of the documents you store.
• Column-family databases allow you to store any
data under any column you like.
• Graph databases allow you to freely add new edges
and freely add properties to nodes and edges as you
wish.
With a schema:
• You have to figure out in advance what you need to
store, but that can be hard to do.
Without a schema:
• You can easily store whatever you need.
• This allows you to easily change your data storage as
you learn more about your project.
• You can easily add new things as you discover them.
• If you find you don’t need some things anymore, you
can just stop storing them, without worrying about
losing old data as you would if you delete columns in a
relational schema.
• A schema puts all rows of a table into a
straightjacket, which becomes awkward if you
have different kinds of data in different rows.
You either end up with lots of columns that are
usually null (a sparse table), or you end up
with meaningless columns like custom
column 4.
• A schemaless store also makes it easier to deal
with nonuniform data: data where each record
has a different set of fields. It allows each
record to contain just what it needs—no more,
no less.
Problems in Schemaless:
• If you are storing some data and displaying it in
a report as a simple list of fieldName: value
lines then a schema is only going to get in the
way.
• But usually we do with our data more than this,
and we do it with programs that need to know
that the billing address is called
billingAddress and not addressForBilling
and that the quantify field is going to be an
integer 5 and not five.
Fact is that whenever we write a program that accesses data,
that program almost always relies on some form of implicit
schema. Unless it just says something like
//pseudo code
foreach (Record r in records)
{
foreach (Field f in r.fields)
{ print (f.name, f.value)
}
}
Here it will assume that certain field names are present and
carry data with a certain meaning, and assume something
about the type of data stored within that field.
• Programs are not humans; they cannot read “qty” and conclude
that, that must be the same as “quantity”. So, however schemaless
our database is, there is usually an implicit schema present.
Having the implicit schema in the application code results in some
problems.
• In order to understand what data is present you have to dig into
the application code.
• The database remains ignorant of the schema—it can’t use the
schema to help it decide how to store and retrieve data efficiently. It
can’t apply its own validations upon that data to ensure that
different applications don’t manipulate data in an inconsistent way.
These are the reasons why relational databases have a fixed schema.
• Schemaless database shifts the schema into the application code
that accesses it. This becomes problematic if multiple
applications, developed by different people, access the same
database.
These problems can be
reduced with a couple
of
approaches:
• Encapsulate all database interaction within a single
application and integrate it with other applications using
web services.
• Another approach is to clearly define different areas of an
aggregate for access by different applications. These
could be different sections in a document database or
different column families in column-family database.
Relational schemas can also be changed at any time with
standard SQL commands. If necessary, you can create new
columns in an ad-hoc way to store nonuniform data. We have
only rarely seen this done.
Materialized Views
• When we talked about aggregate-oriented data models,
we stressed their advantages. If you want to access
orders, it’s useful to have all the data for an order
contained in a single aggregate that can be stored and
accessed as a unit.
• But aggregate-orientation has a corresponding
disadvantage: What happens if a product manager
wants to know how much a particular item has sold
over the last couple of weeks?
• Now the aggregate-orientation works against you,
forcing you to potentially read every order in the
database to answer the question. You can reduce this
burden by building an index on the product, but you’re
still working against the aggregate structure.
• Relational databases support accessing data in
different ways. Furthermore, they provide a
convenient mechanism that allows you to look at
data differently from the way it’s stored—views.
View:
• A view is like a relational table (it is a relation) but
it’s defined by computation over the base tables.
When you access a view, the database computes the
data in the view—a handy form of encapsulation.
• Views provide a mechanism to hide from the client
whether data is derived data or base data.
• But some views are expensive to compute.
Materialized Views:
• To cope with this, materialized views were
invented, which are views that are computed in
advance and cached on disk. Materialized views
are effective for data that is read heavily but can
stand being somewhat stale.
• Although NoSQL databases don’t have views,
they may have precomputed and cached queries,
and they reuse the term “materialized view” to
describe them. Often, NoSQL databases create
materialized views using a map-reduce
computation.
There are two strategies to building a materialized view
• The first is the eager approach where you update
the materialized view at the same time you update
the base data for it. In this case, adding an order
would also update the purchase history aggregates for
each product.
• This approach is good when you have more frequent
reads of the materialized view than you have writes
and you want the materialized views to be as fresh as
possible. The application database approach is
valuable here as it makes it easier to ensure that
any updates to base data also update materialized
views.
• If you don’t want to pay that overhead on each
update, you can run batch jobs to update the
materialized views at regular intervals as per
requirements.
• You can build materialized views outside of the
database by reading the data, computing the view,
and saving it back to the database.
• More often databases will support building
materialized views themselves.
• In this case, you provide the computation that needs
to be done, and the database executes the
computation when needed according to some
parameters that you configure. This is particularly
handy for eager updates of views with incremental
map-reduce.
Modeling for Data
Access
As mentioned earlier, when modeling data aggregates we
need to consider how the data is going to be read as well as
what are the side effects on data related to those aggregates.
1. Let’s start with the model where all the data for the customer
is embedded using a key-value store.
Fig: Embed all the objects for customer and their orders.
• In this scenario, the application can read the
customer’s information and all the related data by
using the key.
• If the requirements are to read the orders or the
products sold in each order, the whole object has to
be read and then parsed on the client side to build
the results.
• When references are needed, we could switch to
document stores and then query inside the
documents, or even change the data for the key-value
store to split the value object into Customer and
Order objects and then maintain these objects’
references to each other.
With the references (see Figure), we can now find the orders independently from the
Customer, and with the orderId reference in the Customer we can find all Orders for the
Customer.
# Customer object
{ "customerId": 1,
"customer": {
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"payment": [{"type": "debit","ccinfo": "1000-1000-1000-1000"}],
"orders":[{"orderId":99}]
}
}
# Order object
{ "customerId": 1,
"orderId": 99,
"order":{
"orderDate":"Nov-20-2011",
"orderItems":[{"productId":27, "price": 32.45}],
"orderPayment":[{"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft"}],
"shippingAddress":{"city":"Chicago"} } }
Fig: Customer is stored separately from Order
2. In document stores, since we can query inside documents, removing references
to Orders from the Customer object is possible. This change allows us to not
update the Customer object when new orders are placed by the Customer.
# Customer object
{ "customerId": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"payment": [
{"type": "debit",
"ccinfo": "1000-1000-1000-1000"}
]
}
#Order object
{ "orderId": 99,
"customerId": 1,
"orderDate":"Nov-20-2011",
"orderItems":[{"productId":27, "price": 32.45}],
"orderPayment":[{"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft"}],
"shippingAddress":{"city":"Chicago"}
}
• Since document data stores allow you to query by
attributes inside the document, searches such as
“find all orders that include the Refactoring
Databases product” are possible, but the decision to
create an aggregate of items and orders they belong
to is not based on the database’s query capability
but on the read optimization desired by the
application.
3. When using the column families to model the
data, it is important to remember to do it as per
your query requirements and not for the purpose
of writing; the general rule is to make it easy to
query and denormalize the data during write.
• There are multiple ways to model the data; one
way is to store the Customer and Order in
different column-family families (see Figure).
Here, it is important to note the reference to all the
orders placed by the customer are in the Customer
column family.
Fig: Conceptual view into a column data store
4. When using graph databases to model the same data, we
model all objects as nodes and relations within them as
relationships; these relationships have types and
directional significance.
• Each node has independent relationships with other
nodes. These relationships have names like
PURCHASED, PAID_WITH, or BELONGS_TO (see
Figure); these relationship names let you traverse
the graph.
• Let’s say you want to find all the Customers who
PURCHASED a product with the name
Refactoring Database. All we need to do is query
for the product node Refactoring Databases and
look for all the Customers with the incoming
PURCHASED relationship.
Fig: Graph model of e-commerce data
Key Points
• Aggregate-oriented databases make inter-aggregate
relationships more difficult to handle than intra-
aggregate relationships.
• Graph databases organize data into node and edge
graphs; they work best for data that has complex
relationship structures.
• Schemaless databases allow you to freely add fields
to records, but there is usually an implicit schema
expected by users of the data.
• Aggregate-oriented databases often compute
materialized views to provide data organized
differently from their primary aggregates. This is
often done with map-reduce computations.

More Related Content

PDF
Database-Technology_introduction and feature.pdf
PPTX
Relational databases store data in tables
PPTX
History and Introduction to NoSQL over Traditional Rdbms
PPTX
Nosql-Module 1 PPT.pptx
PPTX
Introduction to NoSQL Databases and Types of NOSQL Databases.pptx
PPTX
Module-1.pptx63.pptx
PPTX
What Is a Database Powerpoint Presentation.pptx
PPTX
UNIT-2.pptx
Database-Technology_introduction and feature.pdf
Relational databases store data in tables
History and Introduction to NoSQL over Traditional Rdbms
Nosql-Module 1 PPT.pptx
Introduction to NoSQL Databases and Types of NOSQL Databases.pptx
Module-1.pptx63.pptx
What Is a Database Powerpoint Presentation.pptx
UNIT-2.pptx

Similar to NOSQL DATAbASES INTRDUCTION powerpoint presentaion (20)

PPTX
dbms introduction.pptx
PPTX
Current trends in dbms
PDF
Big data rmoug
DOCX
Report 1.0.docx
PPTX
Database management system
DOCX
Report 2.0.docx
PDF
PDF
MongoDB vs Firebase
PPTX
DBMS basics and normalizations unit.pptx
PPTX
Database management system
PPTX
Computer applications.pptx
PDF
ITI015En-The evolution of databases (I)
PDF
IBM Data Analytics Module 2 Overview of data Repositories.
PPT
CouchBase The Complete NoSql Solution for Big Data
PPT
Big Data Analytics Materials, Chapter: 1
PPT
Intro Duction of Database and its fundamentals .ppt
PPT
0001 introduction to database management system
PDF
Types of data bases
PPTX
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
PDF
Complete dbms notes
dbms introduction.pptx
Current trends in dbms
Big data rmoug
Report 1.0.docx
Database management system
Report 2.0.docx
MongoDB vs Firebase
DBMS basics and normalizations unit.pptx
Database management system
Computer applications.pptx
ITI015En-The evolution of databases (I)
IBM Data Analytics Module 2 Overview of data Repositories.
CouchBase The Complete NoSql Solution for Big Data
Big Data Analytics Materials, Chapter: 1
Intro Duction of Database and its fundamentals .ppt
0001 introduction to database management system
Types of data bases
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Complete dbms notes
Ad

Recently uploaded (20)

PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Sustainable Sites - Green Building Construction
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
composite construction of structures.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
additive manufacturing of ss316l using mig welding
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Automation-in-Manufacturing-Chapter-Introduction.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Foundation to blockchain - A guide to Blockchain Tech
Operating System & Kernel Study Guide-1 - converted.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
R24 SURVEYING LAB MANUAL for civil enggi
Sustainable Sites - Green Building Construction
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Digital Logic Computer Design lecture notes
Model Code of Practice - Construction Work - 21102022 .pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
composite construction of structures.pdf
bas. eng. economics group 4 presentation 1.pptx
OOP with Java - Java Introduction (Basics)
additive manufacturing of ss316l using mig welding
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Lecture Notes Electrical Wiring System Components
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Ad

NOSQL DATAbASES INTRDUCTION powerpoint presentaion

  • 3. Why NoSQL? • Relational databases have been the default choice for serious data storage, especially in the world of enterprise applications your only choice can be which relational database to use. • After such a long period of dominance, the current excitement about NoSQL databases comes as a surprise. • Now we’ll explore why relational databases
  • 4. The Value of Relational Databases 1. Getting at Persistent Data Two areas of memory: • Fast, small, volatile main memory • Larger, slower, non volatile backing store • Since main memory is volatile to keep data around, we write it to a backing store, commonly seen a disk which can be persistent memory. The backing store can be: • File system • Database
  • 5. • The database allows more flexibility than a file system in storing large amounts of data in a way that allows an application program to get information quickly and easily. 2. Concurrency • Enterprise applications tend to have many people using same data at once, possibly modifying that data. We have to worry about coordinating interactions between them to avoid things like double booking of hotel rooms. • Since enterprise applications can have lots of users and other systems all working concurrently, there’s a lot of room for bad things to happen. Relational databases help to handle this by controlling all access to their data through transactions.
  • 6. 3. Integration • Enterprise requires multiple applications, written by different teams, to collaborate in order to get things done. Applications often need to use the same data and updates made through one application have to be visible to others. • A common way to do this is shared database integration where multiple applications store their data in a single database. • Using a single database allows all the applications to use each others’ data easily, while the database’s concurrency control handles multiple applications in the same way as it handles multiple users in a single application.
  • 7. 4. A (Mostly) Standard Model • Relational databases have succeeded because they provide the core benefits in a (mostly) standard way. • As a result, developers can learn the basic relational model and apply it in many projects. • Although there are differences between different relational databases, the core mechanisms remain the same.
  • 8. Impedance Mismatch • For Application developers using relational databases, the biggest frustration has been what’s commonly called the impedance mismatch: the difference between the relational model and the in-memory data structures. • The relational data model organizes data into a structure of tables. Where a tuple is a set of name-value pairs and a relation is a set of tuples. • The values in a relational tuple have to be simple—they cannot contain any structure, such as a nested record or a list. This limitation isn’t true for in-memory data structures, which can take on much richer structures than relations. • So if you want to use a richer in-memory data structure, you have to translate it to a relational representation to store it on disk. Hence the impedance mismatch—two different representations that require translation.
  • 9. Figure: An order, which looks like a single aggregate structure in the UI, is split into many rows from many tables in a relational database
  • 10. • The impedance mismatch lead to relational databases being replaced with databases that replicate the in- memory data structures to disk. That decade was marked with the growth of object-oriented programming languages, and with them came object-oriented databases—both looking to be the dominant environment for software development in the new millennium. However, while object-oriented languages succeeded in becoming the major force in programming, object-oriented databases faded into obscurity. • Impedance mismatch has been made much easier to deal with by the wide availability of object relational mapping frameworks, such as Hibernate and iBATIS that implement well-known mapping patterns, but the mapping problem is still an issue. • Relational databases continued to dominate the enterprise computing world in the 2000s, but during that decade cracks began to open in their dominance.
  • 11. Application and Integration Databases • In relational databases, the database acts as an integration database—where multiple applications developed by separate teams storing their data in a common database. This improves communication because all the applications are operating on a consistent set of persistent data. There are downsides to shared database integration. • A structure that’s designed to integrate many applications is more complex than any single application needs. • If an application wants to make changes to its data storage, it needs to coordinate with all the other applications using the database. • Different applications have different structural and performance needs, so an index required by one application may cause a problematic hit on inserts for another.
  • 12. • A different approach is to treat your database as an application database—which is only accessed by a single application codebase that’s looked after by a single team. Advantages: • With an application database, only the team using the application needs to know about the database structure, which makes it much easier to maintain and evolve the schema. • Since the application team controls both the database and the application code, the responsibility for database integrity can be put in the application code.
  • 13. Web Services • During the 2000s we saw a distinct shift to web services where applications would communicate over HTTP. • If you communicate with SQL, the data must be structured as relations. However, with a service, you are able to use richer data structures with nested records and lists. These are usually represented as documents in XML or, more recently, JSON. • In general, with remote communication you want to reduce the number of round trips involved in the interaction, so it’s useful to be able to put a rich structure of information into a single request or response.
  • 14. • If you are going to use services for integration, most of the time web services —using text over HTTP— is the way to go. However, if you are dealing with highly performance-sensitive interactions, you may need a binary protocol. Only do this if you are sure you have the need, as text protocols are easier to work with—consider the example of the Internet. • Once you have made the decision to use an application database, you get more freedom of choosing a database. Since there is a decoupling between your internal database and the services with which you talk to the outside world, the outside world doesn’t have to care how you store your data, allowing you to consider non-relational options.
  • 15. Attack of the Clusters • In 2000s several large web properties dramatically increase in scale. This increase in scale was happening along many dimensions. Websites • Started tracking activity and structure in a very detailed way. • Large sets of data appeared: links, social networks, activity in logs, mapping data. • With growth in data came a growth in users .
  • 16. Coping with the increase in data and traffic required more computing resources. To handle this kind of increase, you have two choices: 1. Scaling up implies: • bigger machines • more processors • more disk storage • more memory Scaling up disadvantages: • But bigger machines get more and more expensive. • There are real limits as size increases.
  • 17. 2. Use lots of small machines in a cluster: • A cluster of small machines can use commodity hardware and ends up being cheaper at these kinds of scales. • more resilient—while individual machine failures are common, the overall cluster can be built to keep going despite such failures, providing high reliability.
  • 18. Cluster disadvantages • Relational databases are not designed to be run on clusters. • Clustered relational databases, such as the Oracle Microsoft SQL Server, work on the concept of a shared disk subsystem where cluster still has the disk subsystem as a single point of failure. • Relational databases could also be run as separate servers for different sets of data, effectively sharding the database. Even though this separates the load, all the sharding has to be controlled by the application which has to keep track of which database server to talk to for each bit of data.
  • 19. • We lose any querying, referential integrity, transactions, or consistency controls that cross shards. • Commercial relational databases (licensed) are usually priced on a single-server assumption, so running on a cluster raised prices. This mismatch between relational databases and clusters led some organization to consider an alternative route to data storage. Two companies in particular 1. Google 2. Amazon • Both were running large clusters
  • 20. • These things gave them the motive. Both were successful and growing companies with strong technical components, which gave them the means and opportunity. It was no wonder they had murder in mind for their relational databases. As the 2000s drew on, both companies produced brief but highly influential papers about their efforts: – BigTable from Google – Dynamo from Amazon • It’s often said that Amazon and Google operate at scales far removed from most organizations, so the solutions they needed may not be relevant to an average organization. But more and more organizations are beginning to explore what they can do by capturing and processing more data—and to run into the same problems. So people began to explore making databases along similar lines—explicitly designed to live in a world of clusters.
  • 21. The Emergence of NoSQL For NoSQL there is no generally accepted definition, nor an authority to provide one, so all we can do is discuss some common characteristics of the databases that tend to be called “NoSQL.” • The name NoSQL comes from the fact that the NoSQL databases doesn’t use SQL as a query language. Instead, the database is manipulated through shell scripts that can be combined into the usual UNIX pipelines. • They are generally open-source projects. • Most NoSQL databases are driven by the need to run on clusters. Relational databases use ACID transactions to handle consistency across the whole database. This inherently clashes with a cluster environment, so NoSQL databases offer a range of options for consistency and distribution.
  • 22. • Not all NoSQL databases are strongly oriented towards running on clusters. Graph databases are one style of NoSQL databases that uses a distribution model similar to relational databases but offers a different data model that makes it better at handling data with complex relationships. • NoSQL databases operate without a schema, allowing you to freely add fields to database records without having to define any changes in structure first. This is particularly useful when dealing with non uniform data and custom fields which forced relational databases to use names like customField6 or custom field tables that are awkward to process and understand.
  • 23. • When you first hear “NoSQL,” an immediate question is what does it stand for—a “no” to SQL? Most people who talk about NoSQL say that it really means “Not Only SQL,” but this interpretation has a couple of problems. Most people write “NoSQL” whereas “Not Only SQL” would be written “NOSQL.” • To resolve these problems, don’t worry about what the term stands for, but rather about what it means. Thus, when “NoSQL” is applied to a database, it refers to an ill-defined set of mostly open-source databases, mostly developed in the early 21st century, and mostly not using SQL.
  • 24. • It’s better to think of NoSQL as a movement rather than a technology. We don’t think that relational databases are going away—they are still going to be the most common form of database in use. Their familiarity, stability, feature set, and available support are compelling arguments for most projects. • The change is that now we see relational databases as one option for data storage. This point of view is often referred to as polyglot persistence—using different data stores in different circumstances. • We need to understand the nature of the data we’re storing and how we want to manipulate it. The result is that most organizations will have a mix of data storage technologies for different circumstances. In order to make this polyglot world work, our view is that organizations also need to shift from integration databases to application databases.
  • 25. • In our account of the history of NoSQL development, we’ve concentrated on big data running on clusters. The big data concerns have created an opportunity for people to think freshly about their data storage needs, and some development teams see that using a NoSQL database can help their productivity by simplifying their database access even if they have no need to scale beyond a single machine. Two primary reasons for considering NoSQL: 1) To handle data access with sizes and performance that demand a cluster 2) To improve the productivity of application development by using a more convenient data interaction style.
  • 26. A NoSQL is a database that provides a mechanism for storage and retrieval of data, they are used in real-time web applications and big data and their use are increasing over time. Many NoSQL stores compromise consistency in favor of availability, speed and partition tolerance. Advantages of NoSQL: 1. High Scalability NoSQL databases use sharding for horizontal scaling. It can handle huge amount of data because of scalability, as the data grows NoSQL scale itself to handle that data in efficient manner. 2. High Availability Auto replication feature in NoSQL databases makes it highly available.
  • 27. Disadvantages of NoSQL: 1. Narrow Focus: It is mainly designed for storage, but it provides very little functionality. 2. Open Source: NoSQL is open-source database that is two database systems are likely to be unequal. 3.Management Challenge: Big data management in NoSQL is much more complex than a relational database. 4.GUI is not available: GUI mode tools to access the database is not flexibly available in the market. 5. Backup: it is a great weak point for some NoSQL databases like MongoDB. 6.Large Document size: Data in JSON format
  • 28. When should NoSQL be used • When huge amount of data need to be stored and retrieved. • The relationship between data you store is not that important. • The data changing over time and is not structured. • Support of constraint and joins is not required at database level. • The data is growing continuously and you need to scale the database regular to handle the data.
  • 29. Key Points • Relational databases have been a successful technology for twenty years, providing persistence, concurrency control, and an integration mechanism. • Application developers have been frustrated with the impedance mismatch between the relational model and the in-memory data structures. • There is a movement away from using integration databases towards encapsulating databases within applications and integrating through services. • The vital factor for a change in data storage was the need to support large volumes of data by running on clusters. Relational databases are not designed to run efficiently on clusters.
  • 30. The common characteristics of NoSQL databases 1. Not using the relational model 2. Running well on clusters 3. Open-source 4. Built for the 21st century web estates 5. Schemaless 6. The most important result of the rise of NoSQL is Polyglot Persistence.
  • 31. Aggregate Data Models Data Model: Model through which we identify and manipulate our data. It describes how we interact with the data in the database. Storage model: Model which describes how the database stores and manipulates the data internally. In NoSQL “data model” refer to the model by which the database organizes data more formally called a metamodel. The dominant data model is relational data model which uses set of tables: • Each table has rows • Each row representing entity • Column describe entity
  • 32. NoSQL move away from the relational model. Each NoSQL solution has a different model that it uses: 1. Key-value 2. Document 3. Column-family 4. Graph Out of this first three share a common characteristic of their data models which is called as aggregate orientation.
  • 33. Aggregates The relational model takes the information to store and divides it into tuples. A tuple is a limited data structure: • You cannot nest one tuple within another to get nested records. • You cannot put a list of values or tuples within another. Aggregate model recognizes that often we need to operate on data that have a more complex structure than a set of tuples. • It has complex record that allows lists and other record structures to be nested inside it. • key-value, document, and column-family databases all
  • 34. Definition: • In Domain-Driven Design, an aggregate is a collection of related objects that we wish to treat as a unit. It is a unit for data manipulation and management of consistency. Typically, we like to update aggregates with atomic operations and communicate with our data storage in terms of aggregates. Advantages of Aggregate: • Dealing in aggregates makes easy to handle operating on a cluster, since the aggregate makes a natural unit for replication and sharding. • Aggregates are also often easier for application programmers to work with, since they often manipulate data through aggregate structures.
  • 35. Example of Relations and Aggregates • Let’s assume we have to build an e-commerce website; we are going to be selling items directly to customers over the web, and we will have to store information about users, our product catalog, orders, shipping addresses, billing addresses, and payment data. • Data model for a relational database:
  • 36. Sample data for Relational Data Model Everything is properly normalized, no data is repeated in multiple tables. We also have referential integrity.
  • 38. Sample Data for aggregate data model // in customers { “id":1, "name":"Martin", "billingAddress":[{"city":"Chicago"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":27, "price": 32.45, "productName": "NoSQL Distilled" }], "shippingAddress":[{"city":"Chicago"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Chicago"}
  • 39. • We’ve used the black-diamond composition marker in UML to show how data fits into the aggregation structure. • The customer aggregate contains a list of billing addresses. • The order aggregate contains a list of order items, a shipping address, and payments. • The payment itself contains a billing address for that payment.
  • 40. • Here single logical address record appears three times but instead of using IDs it’s treated as a value and copied each time. This fits the domain where we would not want the shipping address, nor the payment’s billing address, to change. • The link between the customer and the order isn’t within either aggregate—it’s a relationship between aggregates. We’ve shown the product name as part of the order item here—this kind of denormalization is similar to the tradeoffs with relational databases, but is more common with aggregates because we want to minimize the number of aggregates we access during a data interaction.
  • 41. •To draw aggregate boundary you have to think about accessing that data—and make that part of your thinking when developing the application data model. •Indeed we could draw our aggregate boundaries differently, putting all the orders for a customer into the customer aggregate Embed all the objects for customer and the customer’s orders
  • 42. Sample Data for above aggregate data model // in customers { "customer": { "id": 1, "name": "Martin", "billingAddress": [{"city": "Chicago"}], "orders": [ { "id":99, "customerId":1 , "orderItems":[ { "productId":27, "price": 32.45, "productName": "NoSQL Distilled" }], "shippingAddress":[{"city":"Chicago"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Chicago"} }],
  • 43. • There’s no universal answer for how to draw your aggregate boundaries. It depends entirely on how you tend to manipulate your data. • If you tend to access a customer together with all of that customer’s orders at once, then you would prefer a single aggregate. • However, if you tend to focus on accessing a single order at a time, then you should prefer having separate aggregates for each order.
  • 44. Consequences of Aggregate Orientation • Relational databases have no concept of aggregate within their data model, so we call them aggregate-ignorant. In the NoSQL world, graph databases are also aggregate-ignorant. Being aggregate- ignorant is not a bad thing. It’s often difficult to draw aggregate boundaries well, particularly if the same data is used in many different contexts. • An order makes a good aggregate when a customer is making and reviewing orders, and when the retailer is processing orders. • However, if a retailer wants to analyze its product sales over the last few months, then an order aggregate becomes a trouble. To get to product sales history, you’ll have to dig into every aggregate in the database. So an aggregate structure may help with some data interactions but be an obstacle for others.
  • 45. • An aggregate-ignorant model allows you to easily look at the data in different ways, so it is a better choice when you don’t have a primary structure for manipulating your data. • The aggregate orientation helps greatly with running on a cluster. • If we’re running on a cluster, we need to minimize how many nodes we need to query when we are gathering data. • By explicitly including aggregates, we give the database important information about which bits of data will be manipulated together, and thus should live on the same node.
  • 46. Aggregates have an important consequence for transactions: • Relational databases allow you to manipulate any combination of rows from any tables in a single transaction. Such transactions are called ACID transactions. • Many rows spanning many tables are updated as a single operation. This operation either succeeds or fails in its entirety, and concurrent operations are isolated from each other so they cannot see a partial update. • It’s often said that NoSQL databases don’t support ACID transactions and thus sacrifice consistency, but they support atomic manipulation of a single aggregate at a time. • This means that if we need to manipulate multiple aggregates in an atomic way, we have to manage that ourselves in the application code. Graph and other aggregate-ignorant databases usually do support ACID transactions similar to relational databases.
  • 47. Key-Value and Document Data Models • Key-value and document databases were strongly aggregate-oriented means we think these databases as primarily constructed through aggregates. • Both of these types of databases consist of lots of aggregates with each aggregate having a key or ID that’s used to get at the data. • Riak and Redis database are examples of key-value databases. • MongoDB and CouchDB are most popular document based databases.
  • 48. Key-Value Data Model • Key-value databases are the simplest of the NoSQL databases: The basic data structure is a dictionary or map. You can store a value, such as an integer, string, a JSON structure, or an array, along with a key used to reference that value. • For example, a simple key-value database might have a value such as "Douglas Adams". This value is then assigned an ID, such as cust1237. • Using a JSON structure adds complexity to the database. For example, the database could store a full mailing address in addition to a person's name. In the previous example, key cust1237 could point to the following information: { name: "Douglas Adams", street: "782 Southwest St.", city: "Austin", state: "TX“ }
  • 49. Weakness of key-value database • This model will not provide any kind of traditional database capabilities such as atomicity of transaction, or consistency when multiple transactions are executed simultaneously. Such capability must be provided by application itself. • As the volume of data increases, maintain unique values as keys may become more difficult; addressing this issue requires the introduction of some complexity in generating character strings that will remain unique among an extremely large set of keys.
  • 50. Document Data Model • It is a type of non-relational database that is designed to store and query data as JSON-like documents which makes it easier for developer to store and query data in a database. • It works well with use cases such as catalogs, user profiles etc. • In document store database the data which is collection of key-value pairs is compressed as a document store. • The flexible, semi-structured and hierarchical nature of documents and document databases allows them to evolve with applications need. • Example: Book document { “id” : ”98765432”, “type” : ”book”, “ISBN”: 987-6-543-21012-3, “Author”: { “Lname”:”Roe”, “MI”:”T”, “Fname”:”Richard” }, “Title”: “Understanding document databases” }
  • 51. Difference between key-value and document database 1. Opacity • In key-value database, the aggregate is opaque to the database—just some big blob of mostly meaningless bits. The advantage of opacity is that we can store whatever we like in the aggregate. The database may impose some general size limit, but other than that we have complete freedom. • In contrast, a document database is able to see a structure in the aggregate. A document database imposes limits on what we can place in it, defining allowable structures and types. In return, however, we get more flexibility in access.
  • 52. 2. Access • With a key-value store, we can only access an aggregate by lookup based on its key. • With a document database, we can submit queries to the database based on the fields in the aggregate. • In document database we can retrieve part of the aggregate rather than the whole thing, and database can create indexes based on the contents of the aggregate.
  • 53. Column-Family Stores • One of the early and powerful NoSQL databases was Google’s BigTable, it is a two-level map. It has been a model that influenced later databases such as HBase and Cassandra. • These databases with a BigTable-style data model are often referred to as column stores. The thing that made them different was the way in which they physically stored data. • Most databases have a row as a unit of storage which, in particular, helps write performance. However, there are many scenarios where writes are rare, but you often need to read a few columns of many rows at once. • In this situation, it’s better to store groups of columns for all rows as the basic storage unit—which is why
  • 54. • BigTable and its next generation follow this notion of storing groups of columns (column families) together, we refer this as column-family databases. • Column-family model is a two-level aggregate structure. As with key-value stores, the first key is often described as a row identifier, picking up the aggregate of interest. The difference with column- family structures is that this row aggregate is itself formed of a map of more detailed values. These second-level values are referred to as columns. As well as accessing the row as a whole, operations also allow picking out a particular column, so to get a particular customer’s name from you could do something like get('1234', 'name').
  • 55. Fig. Representing customer info in a column-family structure Column-family databases organize their columns into column families. Each column has to be part of a single column family, and the column acts as unit for access, with the assumption that data for a particular column family will be usually accessed together.
  • 56. • This also gives you a couple of ways to think about how the data is structured. • Row-oriented: Each row is an aggregate (for example, customer with the ID of 1234) with column families representing useful chunks of data (profile, order history) within that aggregate. • Column-oriented: Each column family defines a record type (e.g., customer profiles) with rows for each of the records. You then think of a row as the join of records in all column families. • This latter aspect reflects the columnar nature of column-family databases. Since the database knows about these common groupings of data, it can use this information for its storage and access behavior.
  • 57. • Cassandra uses the terms “wide” and “skinny.” • Skinny rows have few columns with the same columns used across the many different rows. • In this case, the column family defines a recordtype, each row is a record, and each column is a field. • A wide row has many columns (perhaps thousands), with rows having very different columns. • A wide column family models a list, with each column being one element in that list.
  • 58. Summarizing Aggregate-Oriented Databases • These are the three different styles of aggregate- oriented data models. What they all share is the notion of an aggregate indexed by a key that you can use for lookup. This aggregate is central to running on a cluster, as the database will ensure that all the data for an aggregate is stored together on one node. The aggregate also acts as the atomic unit for updates, providing a useful, if limited, amount of transactional control. • Within that notion of aggregate, we have some differences. The key-value data model treats the aggregate as an opaque whole, which means you can only do key lookup for the whole aggregate— you cannot run a query nor retrieve a part of the aggregate.
  • 59. • The document model makes the aggregate transparent to the database allowing you to do queries and partial retrievals. However, since the document has no schema, the database cannot act much on the structure of the document to optimize the storage and retrieval of parts of the aggregate. • Column-family models divide the aggregate into column families, allowing the database to treat them as units of data within the row aggregate. This imposes some structure on the aggregate but allows the database to take advantage of that structure to improve its accessibility.
  • 60. Key Points • An aggregate is a collection of data that we interact with as a unit. Aggregates form the boundaries for ACID operations with the database. • Key-value, document, and column-family databases can all be seen as forms of aggregate oriented database. • Aggregates make it easier for the database to manage data storage over clusters. • Aggregate-oriented databases work best when most data interaction is done with the same aggregate; aggregate-ignorant databases are better when interactions use data organized in many different
  • 61. More Details on Data Models Relationships • Aggregates are useful because they put together data that is commonly accessed together. But there are still lots of cases where data that’s related is accessed differently. • Consider the relationship between a customer and all of his orders. Some applications will want to access the order history whenever they access the customer; this fits in well with combining the customer with his order history into a single aggregate. • Other applications, however, want to process orders individually and thus model orders as independent aggregates.
  • 62. • In this case, you’ll want separate order and customer aggregates but with some kind of relationship between them so that any work on an order can look up customer data. The simplest way to provide such a link is to embed the ID of the customer within the order’s aggregate data. • That way, if you need data from the customer record, you read the order, search out the customer ID, and make another call to the database to read the customer data. This will work, and will be just fine in many scenarios—but the database will be ignorant of the relationship in the data. This can be important because there are times when it’s useful for the database to know about these links. • As a result, many databases—even key-value stores— provide ways to make these relationships visible to the database. Document stores make the content of the aggregate available to the database to form indexes and queries.
  • 63. • An important aspect of relationships between aggregates is how they handle updates. Aggregate oriented databases treat the aggregate as the unit of data- retrieval. Consequently, atomicity is only supported within the contents of a single aggregate. • If you update multiple aggregates at once, you have to deal yourself with a failure partway through. • Relational databases help you with this by allowing you to modify multiple records in a single transaction, providing ACID guarantees while altering many rows. • All of this means that aggregate-oriented databases become more awkward as you need to operate across multiple aggregates.
  • 64. • This may imply that if you have data based on lots of relationships, you should prefer a relational database over a NoSQL store. • While that’s true for aggregate-oriented databases, it’s worth remembering that relational databases aren’t all that stellar with complex relationships either. • This makes it a good moment to introduce another category of databases that’s often lumped into the NoSQL pile.
  • 65. Graph Databases • Graph databases are an odd fish in the NoSQL pond. • Most NoSQL databases were inspired by the need to run on clusters, which led to aggregate-oriented data models of large records with simple connections. • Graph databases are motivated by a different frustration with relational databases and thus have an opposite model—small records with complex interconnections, something like
  • 66. Fig: An example graph structure In this context, a graph isn’t a bar chart or histogram; instead, we refer to a graph data structure of nodes connected by edges.
  • 67. • In Fig: we have a web of information whose nodes are very small (nothing more than a name) but there is a rich structure of interconnections between them. With this structure, we can ask questions such as “find the books in the Databases category that are written by someone whom a friend of mine likes.” • Graph databases are ideal for capturing any data consisting of complex relationships such as social networks, product preferences, or eligibility rules. • The fundamental data model of a graph database is very simple: nodes connected by edges (also called arcs).
  • 68. Difference between Graph & Relational databases • Although relational databases can implement relationships using foreign keys, the joins required to navigate around can get quite expensive—which means performance is often poor for highly connected data models. • Graph databases make traversal along the relationships very cheap. A large part of this is because graph databases shift most of the work of navigating relationships from query time to insert time. This naturally pays off for situations where querying performance is more important than insert speed.
  • 69. • The emphasis on relationships makes graph databases very different from aggregate- oriented databases. • Graph databases are more likely to run on a single server rather than distributed across clusters. • ACID transactions need to cover multiple nodes and edges to maintain consistency. • The only thing graph database have in common with aggregate-oriented databases is their rejection of the relational model.
  • 70. Schemaless Databases • A common theme across all the forms of NoSQL databases is that they are schemaless. • When you want to store data in a relational database, you first have to define a schema—a defined structure for the database which says what tables exist, which columns exist, and what data types each column can hold. • Before you store some data, you have to have the schema defined for it in relational database.
  • 71. With NoSQL databases, way of storing data • A key-value store allows you to store any data you like under a key. • A document database effectively does the same thing, since it makes no restrictions on the structure of the documents you store. • Column-family databases allow you to store any data under any column you like. • Graph databases allow you to freely add new edges and freely add properties to nodes and edges as you wish.
  • 72. With a schema: • You have to figure out in advance what you need to store, but that can be hard to do. Without a schema: • You can easily store whatever you need. • This allows you to easily change your data storage as you learn more about your project. • You can easily add new things as you discover them. • If you find you don’t need some things anymore, you can just stop storing them, without worrying about losing old data as you would if you delete columns in a relational schema.
  • 73. • A schema puts all rows of a table into a straightjacket, which becomes awkward if you have different kinds of data in different rows. You either end up with lots of columns that are usually null (a sparse table), or you end up with meaningless columns like custom column 4. • A schemaless store also makes it easier to deal with nonuniform data: data where each record has a different set of fields. It allows each record to contain just what it needs—no more, no less.
  • 74. Problems in Schemaless: • If you are storing some data and displaying it in a report as a simple list of fieldName: value lines then a schema is only going to get in the way. • But usually we do with our data more than this, and we do it with programs that need to know that the billing address is called billingAddress and not addressForBilling and that the quantify field is going to be an integer 5 and not five.
  • 75. Fact is that whenever we write a program that accesses data, that program almost always relies on some form of implicit schema. Unless it just says something like //pseudo code foreach (Record r in records) { foreach (Field f in r.fields) { print (f.name, f.value) } } Here it will assume that certain field names are present and carry data with a certain meaning, and assume something about the type of data stored within that field.
  • 76. • Programs are not humans; they cannot read “qty” and conclude that, that must be the same as “quantity”. So, however schemaless our database is, there is usually an implicit schema present. Having the implicit schema in the application code results in some problems. • In order to understand what data is present you have to dig into the application code. • The database remains ignorant of the schema—it can’t use the schema to help it decide how to store and retrieve data efficiently. It can’t apply its own validations upon that data to ensure that different applications don’t manipulate data in an inconsistent way. These are the reasons why relational databases have a fixed schema. • Schemaless database shifts the schema into the application code that accesses it. This becomes problematic if multiple applications, developed by different people, access the same database.
  • 77. These problems can be reduced with a couple of approaches: • Encapsulate all database interaction within a single application and integrate it with other applications using web services. • Another approach is to clearly define different areas of an aggregate for access by different applications. These could be different sections in a document database or different column families in column-family database. Relational schemas can also be changed at any time with standard SQL commands. If necessary, you can create new columns in an ad-hoc way to store nonuniform data. We have only rarely seen this done.
  • 78. Materialized Views • When we talked about aggregate-oriented data models, we stressed their advantages. If you want to access orders, it’s useful to have all the data for an order contained in a single aggregate that can be stored and accessed as a unit. • But aggregate-orientation has a corresponding disadvantage: What happens if a product manager wants to know how much a particular item has sold over the last couple of weeks? • Now the aggregate-orientation works against you, forcing you to potentially read every order in the database to answer the question. You can reduce this burden by building an index on the product, but you’re still working against the aggregate structure.
  • 79. • Relational databases support accessing data in different ways. Furthermore, they provide a convenient mechanism that allows you to look at data differently from the way it’s stored—views. View: • A view is like a relational table (it is a relation) but it’s defined by computation over the base tables. When you access a view, the database computes the data in the view—a handy form of encapsulation. • Views provide a mechanism to hide from the client whether data is derived data or base data. • But some views are expensive to compute.
  • 80. Materialized Views: • To cope with this, materialized views were invented, which are views that are computed in advance and cached on disk. Materialized views are effective for data that is read heavily but can stand being somewhat stale. • Although NoSQL databases don’t have views, they may have precomputed and cached queries, and they reuse the term “materialized view” to describe them. Often, NoSQL databases create materialized views using a map-reduce computation.
  • 81. There are two strategies to building a materialized view • The first is the eager approach where you update the materialized view at the same time you update the base data for it. In this case, adding an order would also update the purchase history aggregates for each product. • This approach is good when you have more frequent reads of the materialized view than you have writes and you want the materialized views to be as fresh as possible. The application database approach is valuable here as it makes it easier to ensure that any updates to base data also update materialized views. • If you don’t want to pay that overhead on each update, you can run batch jobs to update the materialized views at regular intervals as per requirements.
  • 82. • You can build materialized views outside of the database by reading the data, computing the view, and saving it back to the database. • More often databases will support building materialized views themselves. • In this case, you provide the computation that needs to be done, and the database executes the computation when needed according to some parameters that you configure. This is particularly handy for eager updates of views with incremental map-reduce.
  • 83. Modeling for Data Access As mentioned earlier, when modeling data aggregates we need to consider how the data is going to be read as well as what are the side effects on data related to those aggregates. 1. Let’s start with the model where all the data for the customer is embedded using a key-value store. Fig: Embed all the objects for customer and their orders.
  • 84. • In this scenario, the application can read the customer’s information and all the related data by using the key. • If the requirements are to read the orders or the products sold in each order, the whole object has to be read and then parsed on the client side to build the results. • When references are needed, we could switch to document stores and then query inside the documents, or even change the data for the key-value store to split the value object into Customer and Order objects and then maintain these objects’ references to each other.
  • 85. With the references (see Figure), we can now find the orders independently from the Customer, and with the orderId reference in the Customer we can find all Orders for the Customer. # Customer object { "customerId": 1, "customer": { "name": "Martin", "billingAddress": [{"city": "Chicago"}], "payment": [{"type": "debit","ccinfo": "1000-1000-1000-1000"}], "orders":[{"orderId":99}] } } # Order object { "customerId": 1, "orderId": 99, "order":{ "orderDate":"Nov-20-2011", "orderItems":[{"productId":27, "price": 32.45}], "orderPayment":[{"ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft"}], "shippingAddress":{"city":"Chicago"} } }
  • 86. Fig: Customer is stored separately from Order
  • 87. 2. In document stores, since we can query inside documents, removing references to Orders from the Customer object is possible. This change allows us to not update the Customer object when new orders are placed by the Customer. # Customer object { "customerId": 1, "name": "Martin", "billingAddress": [{"city": "Chicago"}], "payment": [ {"type": "debit", "ccinfo": "1000-1000-1000-1000"} ] } #Order object { "orderId": 99, "customerId": 1, "orderDate":"Nov-20-2011", "orderItems":[{"productId":27, "price": 32.45}], "orderPayment":[{"ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft"}], "shippingAddress":{"city":"Chicago"} }
  • 88. • Since document data stores allow you to query by attributes inside the document, searches such as “find all orders that include the Refactoring Databases product” are possible, but the decision to create an aggregate of items and orders they belong to is not based on the database’s query capability but on the read optimization desired by the application.
  • 89. 3. When using the column families to model the data, it is important to remember to do it as per your query requirements and not for the purpose of writing; the general rule is to make it easy to query and denormalize the data during write. • There are multiple ways to model the data; one way is to store the Customer and Order in different column-family families (see Figure). Here, it is important to note the reference to all the orders placed by the customer are in the Customer column family.
  • 90. Fig: Conceptual view into a column data store 4. When using graph databases to model the same data, we model all objects as nodes and relations within them as relationships; these relationships have types and directional significance.
  • 91. • Each node has independent relationships with other nodes. These relationships have names like PURCHASED, PAID_WITH, or BELONGS_TO (see Figure); these relationship names let you traverse the graph. • Let’s say you want to find all the Customers who PURCHASED a product with the name Refactoring Database. All we need to do is query for the product node Refactoring Databases and look for all the Customers with the incoming PURCHASED relationship.
  • 92. Fig: Graph model of e-commerce data
  • 93. Key Points • Aggregate-oriented databases make inter-aggregate relationships more difficult to handle than intra- aggregate relationships. • Graph databases organize data into node and edge graphs; they work best for data that has complex relationship structures. • Schemaless databases allow you to freely add fields to records, but there is usually an implicit schema expected by users of the data. • Aggregate-oriented databases often compute materialized views to provide data organized differently from their primary aggregates. This is often done with map-reduce computations.