SlideShare a Scribd company logo
Massively Parallel Cloud Data
Storage Systems & NOSQL
Why Cloud Data Stores
 Explosion of social media sites (Facebook,
Twitter) with large data needs
 Explosion of storage needs in large web
sites such as Google, Yahoo
 Much of the data is not files
 Rise of cloud-based solutions such as
Amazon S3 (simple storage solution)
 Shift to dynamically-typed data with frequent
schema changes
Parallel Databases and Data Stores
 Web-based applications have huge demands on data
storage volume and transaction rate
 Scalability of application servers is easy, but what about
the database?
 Approach 1: caching mechanisms to reduce database
access
 Limited in scalability
 Approach 2: Use existing parallel databases
 Expensive, and most parallel databases were designed for
decision support not OLTP
 Approach 3: Build parallel stores with databases
underneath
Scaling RDBMS - Partitioning
 “Sharding”
 Divide data amongst many cheap databases
(MySQL/PostgreSQL)
 Manage parallel access in the application
 Scales well for both reads and writes
 Not transparent, application needs to be partition-aware
Parallel Key-Value Data Stores
 Distributed key-value data storage systems allow
key-value pairs to be stored (and retrieved on key)
in a massively parallel system
 E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon
Dynamo, ..
 Partitioning, high availability etc completely
transparent to application
 Sharding systems and key-value stores don’t
support many relational features
 No join operations (except within partition)
 No referential integrity constraints across partitions
 etc.
What is NoSQL?
 Stands for No-SQL or Not Only SQL??
 Class of non-relational data storage systems
 E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
 Usually do not require a fixed table schema nor
do they use the concept of joins
 Distributed data storage systems
 All NoSQL offerings relax one or more of the
ACID properties (will talk about the CAP
theorem)
Typical NoSQL API
 Basic API access:
 get(key) -- Extract the value given a key
 put(key, value) -- Create or update the value
given its key
 delete(key) -- Remove the key and its
associated value
 execute(key, operation, parameters) --
Invoke an operation to the value (given its
key) which is a special data structure (e.g.
List, Set, Map .... etc).
Flexible Data Model
ColumnFamily: Rockets
Key Value
1
2
3
Name Value
toon
inventoryQty
brakes
Rocket-Powered Roller Skates
Ready, Set, Zoom
5
false
name
Name Value
toon
inventoryQty
brakes
Little Giant Do-It-Yourself Rocket-Sled Kit
Beep Prepared
4
false
Name Value
toon
inventoryQty
wheels
Acme Jet Propelled Unicycle
Hot Rod and Reel
1
1
name
name
NoSQL Data Storage: Classification
 Uninterpreted key/value or ‘the big hash
table’.
 Amazon S3 (Dynamo)
 Flexible schema
 BigTable, Cassandra, HBase (ordered keys,
semi-structured data),
 Sherpa/PNuts (unordered keys, JSON)
 MongoDB (based on JSON)
 CouchDB (name/value in text)
PNUTS Data Storage Architecture
CAP Theorem
 Three properties of a system
 Consistency (all copies have same value)
 Availability (system can run even if parts have failed)
 Via replication
 Partitions (network can break into two or more parts,
each with active systems that can’t talk to other parts)
 Brewer’s CAP “Theorem”: You can have at most
two of these three properties for any system
 Very large systems will partition at some point
 Choose one of consistency or availablity
 Traditional database choose consistency
 Most Web applications choose availability
 Except for specific parts such as order processing
Availability
 Traditionally, thought of as the
server/process available five 9’s (99.999 %).
 However, for large node system, at almost
any point in time there’s a good chance that
a node is either down or there is a network
disruption among the nodes.
 Want a system that is resilient in the face of
network disruption
Eventual Consistency
 When no updates occur for a long period of time,
eventually all updates will propagate through the system
and all the nodes will be consistent
 For a given accepted update and a given node, eventually
either the update reaches the node or the node is removed
from service
 Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
 Soft state: copies of a data item may be inconsistent
 Eventually Consistent – copies becomes consistent at
some later time if there are no more updates to that data
item
Common Advantages of NoSQL Systems
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
 When data is written, the latest version is on at least
one node and then replicated to other nodes
 No single point of failure
 Easy to distribute
 Don't require a schema
What does NoSQL Not Provide?
 Joins
 Group by
 But PNUTS provides interesting
materialized view approach to
joins/aggregation.
 ACID transactions
 SQL
 Integration with applications that are based
on SQL
Should I be using NoSQL Databases?
 NoSQL Data storage systems makes sense for
applications that need to deal with very very large
semi-structured data
 Log Analysis
 Social Networking Feeds
 Most of us work on organizational databases,
which are not that large and have low
update/query rates
 regular relational databases are the correct
solution for such applications
No sql databases

More Related Content

PPTX
Cloud Computing Open Stack Compute Node
PPTX
No sql databases
PDF
Open stack
ODP
Nosql availability & integrity
PPTX
Tesora DBaaS Platform Product Overview
PDF
Active stak cloud 2017
PDF
Distributed storage system
PPTX
No sq lv2
Cloud Computing Open Stack Compute Node
No sql databases
Open stack
Nosql availability & integrity
Tesora DBaaS Platform Product Overview
Active stak cloud 2017
Distributed storage system
No sq lv2

What's hot (20)

PPTX
Survey of distributed storage system
PPTX
PPTX
Nosql databases
PPT
An overview of snowflake
PDF
NOSQL- Presentation on NoSQL
PPSX
A Seminar on NoSQL Databases.
PPTX
Oracle OpenWorld 2014 Review Part two - IaaS
PPTX
Coding serbia meetup 29.09.2015.
PDF
Gwmep task manageras-a-service in apache cloud stack
PPTX
Why Cassandra?
PDF
Using extended events for troubleshooting sql server
PPT
NoSQL databases
PDF
Datastores
PPTX
Scalable relational database with SQL Azure
PPTX
Introduction to OpenStack (2012)
PPTX
A Deep Dive Into Trove
PDF
Tachyon: An Open Source Memory-Centric Distributed Storage System
PPT
NoSql Databases
PDF
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
PDF
Why nosql also_why_somany
Survey of distributed storage system
Nosql databases
An overview of snowflake
NOSQL- Presentation on NoSQL
A Seminar on NoSQL Databases.
Oracle OpenWorld 2014 Review Part two - IaaS
Coding serbia meetup 29.09.2015.
Gwmep task manageras-a-service in apache cloud stack
Why Cassandra?
Using extended events for troubleshooting sql server
NoSQL databases
Datastores
Scalable relational database with SQL Azure
Introduction to OpenStack (2012)
A Deep Dive Into Trove
Tachyon: An Open Source Memory-Centric Distributed Storage System
NoSql Databases
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
Why nosql also_why_somany
Ad

Similar to No sql databases (20)

PPT
05 No SQL Sudarshan.ppt
PPT
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
PPT
No SQL Databases.ppt
PPT
No SQL Databases as modern database concepts
PPTX
no sql presentation
PDF
NoSQL Basics - A Quick Tour
PPT
No sql (1)
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
PPTX
Master.pptx
PPT
No sql
PPTX
NoSql Database
PPTX
17-NoSQL.pptx
PPTX
Presentation on NoSQL Database related RDBMS
PPT
No sql
PPT
Bhupeshbansal bigdata
PPTX
Data Engineering for Data Scientists
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
PPTX
NoSQL(NOT ONLY SQL)
PPT
Schemaless Databases
PPTX
SQL and NoSQL in SQL Server
05 No SQL Sudarshan.ppt
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases.ppt
No SQL Databases as modern database concepts
no sql presentation
NoSQL Basics - A Quick Tour
No sql (1)
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
Master.pptx
No sql
NoSql Database
17-NoSQL.pptx
Presentation on NoSQL Database related RDBMS
No sql
Bhupeshbansal bigdata
Data Engineering for Data Scientists
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
NoSQL(NOT ONLY SQL)
Schemaless Databases
SQL and NoSQL in SQL Server
Ad

More from Ashish Kumar Thakur (13)

PDF
Home automation using bluetooth - Aurdino BASED
PDF
APRIORI Algorithm
PDF
Digital logic degin, Number system
DOC
Traveling salesman problem
PPTX
Cse image processing ppt
PPTX
A survey on artificial neural networks in cyber world
PPTX
An event driven campus navigation system on andriod121
PPTX
PDF
Number system
PPTX
Data warehousing ppt
PPTX
Objec oriented Analysis and design Pattern
PPTX
Dwd mdatamining intro-iep
PPTX
Biomass conversion technologies
Home automation using bluetooth - Aurdino BASED
APRIORI Algorithm
Digital logic degin, Number system
Traveling salesman problem
Cse image processing ppt
A survey on artificial neural networks in cyber world
An event driven campus navigation system on andriod121
Number system
Data warehousing ppt
Objec oriented Analysis and design Pattern
Dwd mdatamining intro-iep
Biomass conversion technologies

Recently uploaded (20)

PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Welding lecture in detail for understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Construction Project Organization Group 2.pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mechanical Engineering MATERIALS Selection
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
additive manufacturing of ss316l using mig welding
Welding lecture in detail for understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Internet of Things (IOT) - A guide to understanding
CYBER-CRIMES AND SECURITY A guide to understanding
Automation-in-Manufacturing-Chapter-Introduction.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Lecture Notes Electrical Wiring System Components
Construction Project Organization Group 2.pptx
Sustainable Sites - Green Building Construction
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Embodied AI: Ushering in the Next Era of Intelligent Systems
Foundation to blockchain - A guide to Blockchain Tech
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx

No sql databases

  • 1. Massively Parallel Cloud Data Storage Systems & NOSQL
  • 2. Why Cloud Data Stores  Explosion of social media sites (Facebook, Twitter) with large data needs  Explosion of storage needs in large web sites such as Google, Yahoo  Much of the data is not files  Rise of cloud-based solutions such as Amazon S3 (simple storage solution)  Shift to dynamically-typed data with frequent schema changes
  • 3. Parallel Databases and Data Stores  Web-based applications have huge demands on data storage volume and transaction rate  Scalability of application servers is easy, but what about the database?  Approach 1: caching mechanisms to reduce database access  Limited in scalability  Approach 2: Use existing parallel databases  Expensive, and most parallel databases were designed for decision support not OLTP  Approach 3: Build parallel stores with databases underneath
  • 4. Scaling RDBMS - Partitioning  “Sharding”  Divide data amongst many cheap databases (MySQL/PostgreSQL)  Manage parallel access in the application  Scales well for both reads and writes  Not transparent, application needs to be partition-aware
  • 5. Parallel Key-Value Data Stores  Distributed key-value data storage systems allow key-value pairs to be stored (and retrieved on key) in a massively parallel system  E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon Dynamo, ..  Partitioning, high availability etc completely transparent to application  Sharding systems and key-value stores don’t support many relational features  No join operations (except within partition)  No referential integrity constraints across partitions  etc.
  • 6. What is NoSQL?  Stands for No-SQL or Not Only SQL??  Class of non-relational data storage systems  E.g. BigTable, Dynamo, PNUTS/Sherpa, ..  Usually do not require a fixed table schema nor do they use the concept of joins  Distributed data storage systems  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)
  • 7. Typical NoSQL API  Basic API access:  get(key) -- Extract the value given a key  put(key, value) -- Create or update the value given its key  delete(key) -- Remove the key and its associated value  execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).
  • 8. Flexible Data Model ColumnFamily: Rockets Key Value 1 2 3 Name Value toon inventoryQty brakes Rocket-Powered Roller Skates Ready, Set, Zoom 5 false name Name Value toon inventoryQty brakes Little Giant Do-It-Yourself Rocket-Sled Kit Beep Prepared 4 false Name Value toon inventoryQty wheels Acme Jet Propelled Unicycle Hot Rod and Reel 1 1 name name
  • 9. NoSQL Data Storage: Classification  Uninterpreted key/value or ‘the big hash table’.  Amazon S3 (Dynamo)  Flexible schema  BigTable, Cassandra, HBase (ordered keys, semi-structured data),  Sherpa/PNuts (unordered keys, JSON)  MongoDB (based on JSON)  CouchDB (name/value in text)
  • 10. PNUTS Data Storage Architecture
  • 11. CAP Theorem  Three properties of a system  Consistency (all copies have same value)  Availability (system can run even if parts have failed)  Via replication  Partitions (network can break into two or more parts, each with active systems that can’t talk to other parts)  Brewer’s CAP “Theorem”: You can have at most two of these three properties for any system  Very large systems will partition at some point  Choose one of consistency or availablity  Traditional database choose consistency  Most Web applications choose availability  Except for specific parts such as order processing
  • 12. Availability  Traditionally, thought of as the server/process available five 9’s (99.999 %).  However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes.  Want a system that is resilient in the face of network disruption
  • 13. Eventual Consistency  When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent  For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service  Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID  Soft state: copies of a data item may be inconsistent  Eventually Consistent – copies becomes consistent at some later time if there are no more updates to that data item
  • 14. Common Advantages of NoSQL Systems  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned  When data is written, the latest version is on at least one node and then replicated to other nodes  No single point of failure  Easy to distribute  Don't require a schema
  • 15. What does NoSQL Not Provide?  Joins  Group by  But PNUTS provides interesting materialized view approach to joins/aggregation.  ACID transactions  SQL  Integration with applications that are based on SQL
  • 16. Should I be using NoSQL Databases?  NoSQL Data storage systems makes sense for applications that need to deal with very very large semi-structured data  Log Analysis  Social Networking Feeds  Most of us work on organizational databases, which are not that large and have low update/query rates  regular relational databases are the correct solution for such applications

Editor's Notes

  • #8: -> http://horicky.blogspot.com/2009/11/nosql-patterns.html
  • #10: .
  • #16: -> No JDBC -> Data integrity at the application layer