SlideShare a Scribd company logo
Chapter 4
NoSQL
Dr G Sudha Sadasivam
Mrs R Thirumahal
Agenda
• SQL vs NoSQL
• Limitations and advantages of NoSQL
• Types of NoSQL Stores with example
– KV store
– Column family
– Document
– Graph
• Comparison of NoSQL stores
• Principles of NoSQL models
• CAP
• BASE
• Polyglot persistence in ecommerce application
Introduction
• Coined by Carlo Strozzi in 1998
• Relational systems have
ACID properties, are transactional and hence performance degradation
Centralised control
rigid schema resulting in lack of flexibility and scalability.
• NoSQL – Not only SQL
• Schema less and hence
» have simple and fast data access
» Can store voluminous data
» Can store unstructured data from multiple sources
• work with large volumes of distributed data.
• have high operational speed, great flexibility, horizontal scalability
• BASE properties with eventual consistency
• Possess shared-nothing architecture
• Supports auto-sharding & replication;
• parallelism & distributed querying
NoSQL systems are complementary to SQL systems
s
Limitations
• cannot be used for transactional applications that have
constraints and consistency requirements
• Being schemaless necessitates use of constraints by app
developer
• Multiple data stores makes interoperability difficult
• Eventual consistency: changes in data will be updated to
all copies with a time lag
• Vendor lock-in: Each NoSQL data store exists as a silo
resulting in high coupling between data store and the
application.
• Lack of expertise in the usage of the NoSQL stores.
• NoSQL databases suffer from security issues based on
authentication, authorization and storage security.
• Key-value (KV) stores
• Associative arrays (dictionary)
• key-value pairs with unique ordered keys for every value.
• Good performance, so used for session management and caching
• RAM as in Memcached or secondary memory as in MemcacheDB.
• Document stores
• Organise data as a collection of documents with unique keys.
• information can be retrieved based on the contents of the document.
• Collections are analogous to tables & documents to records in a table.
• every document can have different fields.
• suitable to manage content and mobile data.
• MongoDB and Couch DB.
• Column family stores
– data is stored in columns instead of rows.
– columns with different types of data can be aggregated as a column
family for querying.
– HBase and BigTable are column family data stores.
• Graph data stores
– Entities in social networks are connected by relationships represented
by graphs ---- Neo4j
TYPES OF NoSQL STORES
KV Store:
Each record is stored in a row &read using RecordReader in HDFS
Each attribute is separated by a comma & extracted using a comma separator.
Column Family Store
Customer Table has 2 col families – Name & Address along with orders with TS
Order Table has Price and Item column families
Document Store
Two collections namely, Customer and Order.
Customer has 2 documents (rows) while Order has 3 documents
Graph Store:
Entities are CustID with Name, Address, OrderID with Price and Items.
EXAMPLE
RELATIONAL
Logical organization
in KV store
Physical organization in
KV store
Column Family Store
Document collection 1
Document 1
Document 2
Document 1
Document 2
Document 3
Document collection 2
Document Store
Graph datastore
• KV stores are simple and powerful but cannot process a range of
keys.
• Ordered KV stores can be used, but cannot model values.
• Column families model values as map-of-map-of-maps in terms of
column families, aggregated from columns aggregated from
timestamp values.
• Document stores can model values not only as aggregates but also
schema of arbitrary complexity. They also provide indexing based
on field names/keys.
• Graph data stores extend ordered KV systems by linking various
keys as a graph rather than a hierarchical model
Comparison
Comparison
CAP
• Eric Brewer proposed the Consistency, Availability, Partition
tolerance (CAP) theory in 2000
• Consistency is the ability to obtain same data from multiple
replicas. Consistency compliance ensures that all the cluster
nodes should have access to the same data.
• Availability is the ability of a system to continue its operation
even when some hardware/software components fail.
• Partition tolerance is the ability of the system to continue
operation a partitioned network due to network failures. It
guarantees independence of various data partitions.
Replication facilitates the availability of data. Eventual consistency
ensures that replicas are not stale. Partitioning ensures load
distribution and scalability.
• Only 2 can be satisfied at a time
– AP follows BASE properties with eventual consistency.
eg. Amazon’s Dynamo DB without strict consistency
– CP: ACID properties with strict consistency. Pessimistic
locking ensures consistency.
eg. MongoDB and MemChache A CA system.
– CA : cannot operate under network partitions and hence it is
neither ACID nor BASE. 2 phase commit protocol is used. For
eg Relational and Big table
BASE
• Web 2.0 applications
• basically available, soft state and eventually consistent
• works basically all the time
• Due to eventual consistency, maintains softstate
ACID BASE
Atomicity, Consistency, Isolation,
Durability
Basically Available, Softstate,
eventually consistent
Strong consistency Weak consistency
Consistency and Isolation first Availability first
Nested Transactions Approximate Answers
Conservative Simple
Schema Schema-less
Case Study
• Polyglot persistence applies multiple data storage technologies to
meet the needs of an application.
• Consider an e-commerce application with shopping cart,
inventory, orders, catalogue and customer details.
1. User sessions / activity logs require efficient read/write
operations - KV stores
2. Point of Sales high ingestion rate with high volume of write
operations. KV stores (storage) ; Column family (analytics)
3. Shopping cart requires high availability, and aggregates
information. Document Store.
4. Product Catalogue has frequent reads and infrequent writes.
They must also support aggregation. Document stores
5. Product recommendations are made based on similar products
or users. Graph Store
6. Financial data is relational and requires transactional updates –
RDBMS
Use a data parallel approach to proAcess
Exercises to be completed
• Consider the case study of AAA coffee shop in test 1 - Identify the
type of NoSQL stores that can be used for each and justify
• Consider a table with student details (Roll No, First Name, last
name, Department, Programme, Year, Semester), and faculty details
(FacultyId, First Name, Last Name, Department, Course handled1,
Course Handled 2, Course handled 3). Design keyvalue, column
family, document & graph databases for the same.
• Exercises in MongoDB. Create a data base in MongoDB for storing
patient and doctor details. Insert patient details and doctor details.
Establish connection between doctor and patient. Modify doctor
details for a patient. Add 2 /more doctors for a patient named XXX.
Identify count of patients under a doctor. If patints count > 4, allot a
new dotor to the patient. Allot doctor to patient based on
specialisation. If patients to a doctor becomes 0 generate an alert
message. If a doctor leaves a hospital, then delete doctor from
database, allot a new doctor based on speciality to his / her
patients.
Neo4j
• Create a Neo4j database with 5 people giving
their attributed, friendship relations. Create
new persons with attaributes. Create
relationships, modify relationships. Identify
how many friends a person has. Identify
friend-of-friend relationships.
Conclusion
• SQL vs NoSQL
• Limitations and advantages of NoSQL
• Types of NoSQL Stores with example
– KV store
– Column family
– Document
– Graph
• Comparison of NoSQL stores
• CAP
• BASE
• Polyglot persistence in ecommerce application
• Exercises in MongoDB & Neo4j

More Related Content

PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
PPTX
UNIT I Introduction to NoSQL.pptx
PDF
Nosql data models
PPTX
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
PPTX
PDF
Big Data technology Landscape
PPTX
Nosql databases
Module 2.2 Introduction to NoSQL Databases.pptx
UNIT I Introduction to NoSQL.pptx
Nosql data models
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
Big Data technology Landscape
Nosql databases

Similar to Use a data parallel approach to proAcess (20)

PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
UNIT I Introduction to NoSQL.pptx
PPTX
NoSQL.pptx
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
PPTX
NoSQL and MongoDB
PPTX
Introduction to Data Science NoSQL.pptx
PPT
NoSQL Fundamentals PowerPoint Presentation
PPTX
cloud computinghshdbbsbshdhsjdbxbxhdnxbxbsbxbxbxbx
PPTX
No SQL- The Future Of Data Storage
PPTX
Modern database
PPTX
Column db dol
PPTX
Master.pptx
PPTX
Relational and non relational database 7
PDF
Datastores
PDF
Presentation On NoSQL Databases
PPTX
UNIT-2.pptx
PPTX
Unit 5.pptx computer graphics and gaming
PDF
Database Technologies
PPTX
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
PPTX
nosqldatabnjxjdjases-240121150542-d4ec9e23.pptx
cours database pour etudiant NoSQL (1).pptx
UNIT I Introduction to NoSQL.pptx
NoSQL.pptx
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL and MongoDB
Introduction to Data Science NoSQL.pptx
NoSQL Fundamentals PowerPoint Presentation
cloud computinghshdbbsbshdhsjdbxbxhdnxbxbsbxbxbxbx
No SQL- The Future Of Data Storage
Modern database
Column db dol
Master.pptx
Relational and non relational database 7
Datastores
Presentation On NoSQL Databases
UNIT-2.pptx
Unit 5.pptx computer graphics and gaming
Database Technologies
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
nosqldatabnjxjdjases-240121150542-d4ec9e23.pptx
Ad

Recently uploaded (20)

PDF
composite construction of structures.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
web development for engineering and engineering
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Geodesy 1.pptx...............................................
PPTX
OOP with Java - Java Introduction (Basics)
DOCX
573137875-Attendance-Management-System-original
PDF
Digital Logic Computer Design lecture notes
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
composite construction of structures.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Lecture Notes Electrical Wiring System Components
Embodied AI: Ushering in the Next Era of Intelligent Systems
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
web development for engineering and engineering
Structs to JSON How Go Powers REST APIs.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Geodesy 1.pptx...............................................
OOP with Java - Java Introduction (Basics)
573137875-Attendance-Management-System-original
Digital Logic Computer Design lecture notes
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Ad

Use a data parallel approach to proAcess

  • 1. Chapter 4 NoSQL Dr G Sudha Sadasivam Mrs R Thirumahal
  • 2. Agenda • SQL vs NoSQL • Limitations and advantages of NoSQL • Types of NoSQL Stores with example – KV store – Column family – Document – Graph • Comparison of NoSQL stores • Principles of NoSQL models • CAP • BASE • Polyglot persistence in ecommerce application
  • 3. Introduction • Coined by Carlo Strozzi in 1998 • Relational systems have ACID properties, are transactional and hence performance degradation Centralised control rigid schema resulting in lack of flexibility and scalability. • NoSQL – Not only SQL • Schema less and hence » have simple and fast data access » Can store voluminous data » Can store unstructured data from multiple sources • work with large volumes of distributed data. • have high operational speed, great flexibility, horizontal scalability • BASE properties with eventual consistency • Possess shared-nothing architecture • Supports auto-sharding & replication; • parallelism & distributed querying NoSQL systems are complementary to SQL systems
  • 4. s
  • 5. Limitations • cannot be used for transactional applications that have constraints and consistency requirements • Being schemaless necessitates use of constraints by app developer • Multiple data stores makes interoperability difficult • Eventual consistency: changes in data will be updated to all copies with a time lag • Vendor lock-in: Each NoSQL data store exists as a silo resulting in high coupling between data store and the application. • Lack of expertise in the usage of the NoSQL stores. • NoSQL databases suffer from security issues based on authentication, authorization and storage security.
  • 6. • Key-value (KV) stores • Associative arrays (dictionary) • key-value pairs with unique ordered keys for every value. • Good performance, so used for session management and caching • RAM as in Memcached or secondary memory as in MemcacheDB. • Document stores • Organise data as a collection of documents with unique keys. • information can be retrieved based on the contents of the document. • Collections are analogous to tables & documents to records in a table. • every document can have different fields. • suitable to manage content and mobile data. • MongoDB and Couch DB. • Column family stores – data is stored in columns instead of rows. – columns with different types of data can be aggregated as a column family for querying. – HBase and BigTable are column family data stores. • Graph data stores – Entities in social networks are connected by relationships represented by graphs ---- Neo4j TYPES OF NoSQL STORES
  • 7. KV Store: Each record is stored in a row &read using RecordReader in HDFS Each attribute is separated by a comma & extracted using a comma separator. Column Family Store Customer Table has 2 col families – Name & Address along with orders with TS Order Table has Price and Item column families Document Store Two collections namely, Customer and Order. Customer has 2 documents (rows) while Order has 3 documents Graph Store: Entities are CustID with Name, Address, OrderID with Price and Items. EXAMPLE RELATIONAL
  • 8. Logical organization in KV store Physical organization in KV store
  • 10. Document collection 1 Document 1 Document 2 Document 1 Document 2 Document 3 Document collection 2 Document Store
  • 12. • KV stores are simple and powerful but cannot process a range of keys. • Ordered KV stores can be used, but cannot model values. • Column families model values as map-of-map-of-maps in terms of column families, aggregated from columns aggregated from timestamp values. • Document stores can model values not only as aggregates but also schema of arbitrary complexity. They also provide indexing based on field names/keys. • Graph data stores extend ordered KV systems by linking various keys as a graph rather than a hierarchical model
  • 15. CAP • Eric Brewer proposed the Consistency, Availability, Partition tolerance (CAP) theory in 2000 • Consistency is the ability to obtain same data from multiple replicas. Consistency compliance ensures that all the cluster nodes should have access to the same data. • Availability is the ability of a system to continue its operation even when some hardware/software components fail. • Partition tolerance is the ability of the system to continue operation a partitioned network due to network failures. It guarantees independence of various data partitions. Replication facilitates the availability of data. Eventual consistency ensures that replicas are not stale. Partitioning ensures load distribution and scalability.
  • 16. • Only 2 can be satisfied at a time – AP follows BASE properties with eventual consistency. eg. Amazon’s Dynamo DB without strict consistency – CP: ACID properties with strict consistency. Pessimistic locking ensures consistency. eg. MongoDB and MemChache A CA system. – CA : cannot operate under network partitions and hence it is neither ACID nor BASE. 2 phase commit protocol is used. For eg Relational and Big table
  • 17. BASE • Web 2.0 applications • basically available, soft state and eventually consistent • works basically all the time • Due to eventual consistency, maintains softstate ACID BASE Atomicity, Consistency, Isolation, Durability Basically Available, Softstate, eventually consistent Strong consistency Weak consistency Consistency and Isolation first Availability first Nested Transactions Approximate Answers Conservative Simple Schema Schema-less
  • 18. Case Study • Polyglot persistence applies multiple data storage technologies to meet the needs of an application. • Consider an e-commerce application with shopping cart, inventory, orders, catalogue and customer details. 1. User sessions / activity logs require efficient read/write operations - KV stores 2. Point of Sales high ingestion rate with high volume of write operations. KV stores (storage) ; Column family (analytics) 3. Shopping cart requires high availability, and aggregates information. Document Store. 4. Product Catalogue has frequent reads and infrequent writes. They must also support aggregation. Document stores 5. Product recommendations are made based on similar products or users. Graph Store 6. Financial data is relational and requires transactional updates – RDBMS
  • 20. Exercises to be completed • Consider the case study of AAA coffee shop in test 1 - Identify the type of NoSQL stores that can be used for each and justify • Consider a table with student details (Roll No, First Name, last name, Department, Programme, Year, Semester), and faculty details (FacultyId, First Name, Last Name, Department, Course handled1, Course Handled 2, Course handled 3). Design keyvalue, column family, document & graph databases for the same. • Exercises in MongoDB. Create a data base in MongoDB for storing patient and doctor details. Insert patient details and doctor details. Establish connection between doctor and patient. Modify doctor details for a patient. Add 2 /more doctors for a patient named XXX. Identify count of patients under a doctor. If patints count > 4, allot a new dotor to the patient. Allot doctor to patient based on specialisation. If patients to a doctor becomes 0 generate an alert message. If a doctor leaves a hospital, then delete doctor from database, allot a new doctor based on speciality to his / her patients.
  • 21. Neo4j • Create a Neo4j database with 5 people giving their attributed, friendship relations. Create new persons with attaributes. Create relationships, modify relationships. Identify how many friends a person has. Identify friend-of-friend relationships.
  • 22. Conclusion • SQL vs NoSQL • Limitations and advantages of NoSQL • Types of NoSQL Stores with example – KV store – Column family – Document – Graph • Comparison of NoSQL stores • CAP • BASE • Polyglot persistence in ecommerce application • Exercises in MongoDB & Neo4j