SlideShare a Scribd company logo
11
Most read
15
Most read
17
Most read
CCS334 BIG DATAANALYTICS
(R-21 III (I Sem))
Department of Artificial Intelligence and Data Science )
Session 2
by
Asst.Prof.M.Gokilavani
NIET
9/19/2023 Department of AI & DS 1
TEXT BOOKS
• Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data,
Big Analytics: Emerging Business Intelligence and Analytic Trends for
Today's Businesses", Wiley, 2013.
• Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
• Sadalage, Pramod J. “NoSQL distilled”, 2013.
REFERENCES
• E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive",
O'Reilley, 2012.
• Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
• Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
9/19/2023 Department of AI & DS 2
Topics covered in Unit 2 session
9/19/2023 Department of AI & DS 3
UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL – aggregate data models – key-value and
document data models – relationships – graph databases –
schema less databases – materialized views – distribution models –
master-slave replication – consistency - Cassandra – Cassandra data
model – Cassandra examples – Cassandra clients.
Summarization of todays session 1
• Database-Organized collection of data in table format.
• DBMS-Database Management System
• RDBMS Characteristics
• ACID properties
• Abstraction on physical layer
• Standard Query language (SQL)
• NoSQL why, what and when?
• What’s NoSQL?
• Characteristics of NoSQL databases
• Difference between SQL and NoSQL
9/19/2023 Department of AI & DS 4
CAP Theorem
• CAP THEOREM Stands for :
• Consistency
• Availability
• partition tolerance
9/19/2023 Department of AI & DS 5
Definition: The CAP theorem states
that distributed databases can have at
most two of the three properties:
consistency, availability, and partition
tolerance. As a result, database systems
prioritize only two properties at a time.
CAP Theorem related to SQL and NoSQL
9/19/2023 Department of AI & DS 6
NoSQL Database Types
Discussing NoSQL databases is complicated because there are a variety of
types:
•Sorted ordered Column Store
•Optimized for queries over large datasets, and store columns of data together, instead of
rows
•Document databases:
•pair each key with a complex data structure known as a document.
•Key-Value Store :
•Are the simplest NoSQL databases. Every single item in the database is stored as an
attribute name (or 'key'), together with its value.
•Graph Databases :
•are used to store information about networks of data, such as social connections.
9/19/2023 Department of AI & DS 7
9/19/2023 Department of AI & DS 8
Document Databases (Document Store)
• Documents
• Loosely structured sets of key/value pairs in documents, e.g., XML, JSON,
BSON
• Encapsulate and encode data in some standard formats or encodings
• Are addressed in the database via a unique key
• Documents are treated as a whole, avoiding splitting a document into its
constituent name/value pairs
• Allow documents retrieving by keys or contents
• Notable for:
• MongoDB (used in FourSquare, Github, and more)
• CouchDB (used in Apple, BBC, Canonical, Cern, and more)
9/19/2023 Department of AI & DS 9
9/19/2023 Department of AI & DS 10
Document Databases, JSON
{
_id: ObjectId("51156a1e056d6f966f268f81"),
type: "Article",
author: "Derick Rethans",
title: "Introduction to Document Databases with MongoDB",
date: ISODate("2013-04-24T16:26:31.911Z"),
body: "This arti…"
},
{
_id: ObjectId("51156a1e056d6f966f268f82"),
type: "Book",
author: "Derick Rethans",
title: "php|architect's Guide to Date and Time Programming with PHP",
isbn: "978-0-9738621-5-7"
}
9/19/2023 Department of AI & DS 11
Key/Value stores
• Store data in a schema-less way
• Store data as maps
• Hash Maps or associative arrays
• Provide a very efficient average running time algorithm for accessing data
• Notable for:
• Couch base (Zynga, Vimeo, NAVTEQ, ...)
• Redis (Craig list, Instagram, Stack Overfow, flickr, ...)
• Amazon Dynamo (Amazon, Elsevier, IMDb, ...)
• Apache Cassandra (Facebook, Digg, Reddit, Twitter,...)
• Voldemort (LinkedIn, eBay, …)
• Riak (Github, Comcast, Mochi, ...)
9/19/2023 Department of AI & DS 12
Scheme less Database
What are schema-less databases?
• Schema-less databases are a type of NoSQL database that do not
require a predefined schema to store data.
• Instead, they allow data to be stored in flexible and dynamic
formats, such as JSON documents, key-value pairs, graphs, or
columns.
9/19/2023 Department of AI & DS 13
SQL QUERIES
9/19/2023 Department of AI & DS 14
Scheme Less Database
9/19/2023 Department of AI & DS 15
Materialized View
• A materialized view takes the regular view described above and
materializes it by proactively computing the results and storing them
in a “virtual” table.
• Materialized View definition: A view can be “materialized” by storing
the tuples of the view in the database.
• Index structures can be built on the materialized view.
• Database system uses one of the three ways to keep the materialized
view updated:
• Update the materialized view as soon as the relation on which it is
defined is updated.
• Update the materialized view every time the view is accessed.
• Update the materialized view periodically.
9/19/2023 Department of AI & DS 16
Summarization
9/19/2023 Department of AI & DS 17
Topics to be covered in next session 3
• Distributed models
9/19/2023 Department of CSE (AI/ML) 18
Thank you!!!

More Related Content

PDF
NOSQL- Presentation on NoSQL
ZIP
NoSQL databases
PPTX
NoSQL databases - An introduction
PDF
Building Data Quality pipelines with Apache Spark and Delta Lake
PPTX
Nosql databases
PDF
Big Data Ecosystem
PPT
Schemaless Databases
PPTX
MongoDB
NOSQL- Presentation on NoSQL
NoSQL databases
NoSQL databases - An introduction
Building Data Quality pipelines with Apache Spark and Delta Lake
Nosql databases
Big Data Ecosystem
Schemaless Databases
MongoDB

What's hot (20)

PPTX
Consistency in NoSQL
PPTX
NOSQL vs SQL
PPTX
Introduction to NoSQL
PPTX
Big data and Hadoop
PPT
8. column oriented databases
PPTX
Apache PIG
PPTX
Privacy, security and ethics in data science
PPTX
Data streaming fundamentals
PPTX
Databricks Fundamentals
PDF
Big Data Architecture
PDF
Moving to Databricks & Delta
PDF
Introducing Databricks Delta
PDF
Build Real-Time Applications with Databricks Streaming
PDF
UNIT 1 -BIG DATA ANALYTICS Full.pdf
PPTX
Mongo Nosql CRUD Operations
PDF
Data Streaming For Big Data
PPT
Hive(ppt)
PPTX
Major issues in data mining
PDF
Introduction to Azure Data Lake
PPTX
Clustering in Data Mining
Consistency in NoSQL
NOSQL vs SQL
Introduction to NoSQL
Big data and Hadoop
8. column oriented databases
Apache PIG
Privacy, security and ethics in data science
Data streaming fundamentals
Databricks Fundamentals
Big Data Architecture
Moving to Databricks & Delta
Introducing Databricks Delta
Build Real-Time Applications with Databricks Streaming
UNIT 1 -BIG DATA ANALYTICS Full.pdf
Mongo Nosql CRUD Operations
Data Streaming For Big Data
Hive(ppt)
Major issues in data mining
Introduction to Azure Data Lake
Clustering in Data Mining
Ad

Similar to CCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptx (20)

PPTX
Session 1 Introduction to NoSQL.pptx
PPTX
cours database pour etudiant NoSQL (1).pptx
PDF
NoSql and it's introduction features-Unit-1.pdf
PPTX
NoSQL.pptx
PPTX
Introduction to Data Science NoSQL.pptx
PDF
NOsql Presentation.pdf
PPTX
Introduction to asdfghjkln b vfgh n v
PPTX
gayathrinosql.pptx
PPTX
UNIT I Introduction to NoSQL.pptx
PPTX
NoSQL Basics and MongDB
PPTX
2018 05 08_biological_databases_no_sql
PPT
NoSQL Fundamentals PowerPoint Presentation
PPTX
UNIT I Introduction to NoSQL.pptx
PDF
NoSQL Databases Introduction - UTN 2013
PPTX
NoSQL and MongoDB
PPTX
DBMS outline.pptx
PDF
Database Systems - A Historical Perspective
PPTX
No SQL DATABASE Description about 4 no sql database.pptx
PPTX
No sq lv2
PDF
Big Data technology Landscape
Session 1 Introduction to NoSQL.pptx
cours database pour etudiant NoSQL (1).pptx
NoSql and it's introduction features-Unit-1.pdf
NoSQL.pptx
Introduction to Data Science NoSQL.pptx
NOsql Presentation.pdf
Introduction to asdfghjkln b vfgh n v
gayathrinosql.pptx
UNIT I Introduction to NoSQL.pptx
NoSQL Basics and MongDB
2018 05 08_biological_databases_no_sql
NoSQL Fundamentals PowerPoint Presentation
UNIT I Introduction to NoSQL.pptx
NoSQL Databases Introduction - UTN 2013
NoSQL and MongoDB
DBMS outline.pptx
Database Systems - A Historical Perspective
No SQL DATABASE Description about 4 no sql database.pptx
No sq lv2
Big Data technology Landscape
Ad

More from Guru Nanak Technical Institutions (20)

PPTX
22PCOAM21 Data Quality Session 3 Data Quality.pptx
PPTX
22PCOAM21 Session 1 Data Management.pptx
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
PDF
III Year II Sem 22PCOAM21 Data Analytics Syllabus.pdf
PDF
22PCOAM16 _ML_Unit 3 Notes & Question bank
PDF
22PCOAM16 Machine Learning Unit V Full notes & QB
PDF
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
PDF
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
PPTX
22PCOAM16 Unit 3 Session 23 Different ways to Combine Classifiers.pptx
PPTX
22PCOAM16 Unit 3 Session 22 Ensemble Learning .pptx
PPTX
22PCOAM16 Unit 3 Session 24 K means Algorithms.pptx
PPTX
22PCOAM16 ML Unit 3 Session 18 Learning with tree.pptx
PPTX
22PCOAM16 ML Unit 3 Session 21 Classification and Regression Trees .pptx
PPTX
22PCOAM16 ML Unit 3 Session 20 ID3 Algorithm and working.pptx
PPTX
22PCOAM16 ML Unit 3 Session 19 Constructing Decision Trees.pptx
PDF
22PCOAM16 ML UNIT 2 NOTES & QB QUESTION WITH ANSWERS
PDF
22PCOAM16 _ML_ Unit 2 Full unit notes.pdf
PDF
22PCOAM16_ML_Unit 1 notes & Question Bank with answers.pdf
PDF
22PCOAM16_MACHINE_LEARNING_UNIT_I_NOTES.pdf
PPTX
22PCOAM16 Unit 2 Session 17 Support vector Machine.pptx
22PCOAM21 Data Quality Session 3 Data Quality.pptx
22PCOAM21 Session 1 Data Management.pptx
22PCOAM21 Session 2 Understanding Data Source.pptx
III Year II Sem 22PCOAM21 Data Analytics Syllabus.pdf
22PCOAM16 _ML_Unit 3 Notes & Question bank
22PCOAM16 Machine Learning Unit V Full notes & QB
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 Unit 3 Session 23 Different ways to Combine Classifiers.pptx
22PCOAM16 Unit 3 Session 22 Ensemble Learning .pptx
22PCOAM16 Unit 3 Session 24 K means Algorithms.pptx
22PCOAM16 ML Unit 3 Session 18 Learning with tree.pptx
22PCOAM16 ML Unit 3 Session 21 Classification and Regression Trees .pptx
22PCOAM16 ML Unit 3 Session 20 ID3 Algorithm and working.pptx
22PCOAM16 ML Unit 3 Session 19 Constructing Decision Trees.pptx
22PCOAM16 ML UNIT 2 NOTES & QB QUESTION WITH ANSWERS
22PCOAM16 _ML_ Unit 2 Full unit notes.pdf
22PCOAM16_ML_Unit 1 notes & Question Bank with answers.pdf
22PCOAM16_MACHINE_LEARNING_UNIT_I_NOTES.pdf
22PCOAM16 Unit 2 Session 17 Support vector Machine.pptx

Recently uploaded (20)

PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
UNIT 4 Total Quality Management .pptx
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PDF
737-MAX_SRG.pdf student reference guides
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PDF
PPT on Performance Review to get promotions
PPTX
communication and presentation skills 01
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Current and future trends in Computer Vision.pptx
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPT
introduction to datamining and warehousing
PPTX
introduction to high performance computing
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Exploratory_Data_Analysis_Fundamentals.pdf
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
UNIT 4 Total Quality Management .pptx
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
737-MAX_SRG.pdf student reference guides
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPT on Performance Review to get promotions
communication and presentation skills 01
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Current and future trends in Computer Vision.pptx
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
introduction to datamining and warehousing
introduction to high performance computing
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
III.4.1.2_The_Space_Environment.p pdffdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx

CCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptx

  • 1. CCS334 BIG DATAANALYTICS (R-21 III (I Sem)) Department of Artificial Intelligence and Data Science ) Session 2 by Asst.Prof.M.Gokilavani NIET 9/19/2023 Department of AI & DS 1
  • 2. TEXT BOOKS • Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. • Eric Sammer, "Hadoop Operations", O'Reilley, 2012. • Sadalage, Pramod J. “NoSQL distilled”, 2013. REFERENCES • E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012. • Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. • Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010. 9/19/2023 Department of AI & DS 2
  • 3. Topics covered in Unit 2 session 9/19/2023 Department of AI & DS 3 UNIT II NOSQL DATA MANAGEMENT Introduction to NoSQL – aggregate data models – key-value and document data models – relationships – graph databases – schema less databases – materialized views – distribution models – master-slave replication – consistency - Cassandra – Cassandra data model – Cassandra examples – Cassandra clients.
  • 4. Summarization of todays session 1 • Database-Organized collection of data in table format. • DBMS-Database Management System • RDBMS Characteristics • ACID properties • Abstraction on physical layer • Standard Query language (SQL) • NoSQL why, what and when? • What’s NoSQL? • Characteristics of NoSQL databases • Difference between SQL and NoSQL 9/19/2023 Department of AI & DS 4
  • 5. CAP Theorem • CAP THEOREM Stands for : • Consistency • Availability • partition tolerance 9/19/2023 Department of AI & DS 5 Definition: The CAP theorem states that distributed databases can have at most two of the three properties: consistency, availability, and partition tolerance. As a result, database systems prioritize only two properties at a time.
  • 6. CAP Theorem related to SQL and NoSQL 9/19/2023 Department of AI & DS 6
  • 7. NoSQL Database Types Discussing NoSQL databases is complicated because there are a variety of types: •Sorted ordered Column Store •Optimized for queries over large datasets, and store columns of data together, instead of rows •Document databases: •pair each key with a complex data structure known as a document. •Key-Value Store : •Are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. •Graph Databases : •are used to store information about networks of data, such as social connections. 9/19/2023 Department of AI & DS 7
  • 9. Document Databases (Document Store) • Documents • Loosely structured sets of key/value pairs in documents, e.g., XML, JSON, BSON • Encapsulate and encode data in some standard formats or encodings • Are addressed in the database via a unique key • Documents are treated as a whole, avoiding splitting a document into its constituent name/value pairs • Allow documents retrieving by keys or contents • Notable for: • MongoDB (used in FourSquare, Github, and more) • CouchDB (used in Apple, BBC, Canonical, Cern, and more) 9/19/2023 Department of AI & DS 9
  • 11. Document Databases, JSON { _id: ObjectId("51156a1e056d6f966f268f81"), type: "Article", author: "Derick Rethans", title: "Introduction to Document Databases with MongoDB", date: ISODate("2013-04-24T16:26:31.911Z"), body: "This arti…" }, { _id: ObjectId("51156a1e056d6f966f268f82"), type: "Book", author: "Derick Rethans", title: "php|architect's Guide to Date and Time Programming with PHP", isbn: "978-0-9738621-5-7" } 9/19/2023 Department of AI & DS 11
  • 12. Key/Value stores • Store data in a schema-less way • Store data as maps • Hash Maps or associative arrays • Provide a very efficient average running time algorithm for accessing data • Notable for: • Couch base (Zynga, Vimeo, NAVTEQ, ...) • Redis (Craig list, Instagram, Stack Overfow, flickr, ...) • Amazon Dynamo (Amazon, Elsevier, IMDb, ...) • Apache Cassandra (Facebook, Digg, Reddit, Twitter,...) • Voldemort (LinkedIn, eBay, …) • Riak (Github, Comcast, Mochi, ...) 9/19/2023 Department of AI & DS 12
  • 13. Scheme less Database What are schema-less databases? • Schema-less databases are a type of NoSQL database that do not require a predefined schema to store data. • Instead, they allow data to be stored in flexible and dynamic formats, such as JSON documents, key-value pairs, graphs, or columns. 9/19/2023 Department of AI & DS 13
  • 14. SQL QUERIES 9/19/2023 Department of AI & DS 14 Scheme Less Database
  • 16. Materialized View • A materialized view takes the regular view described above and materializes it by proactively computing the results and storing them in a “virtual” table. • Materialized View definition: A view can be “materialized” by storing the tuples of the view in the database. • Index structures can be built on the materialized view. • Database system uses one of the three ways to keep the materialized view updated: • Update the materialized view as soon as the relation on which it is defined is updated. • Update the materialized view every time the view is accessed. • Update the materialized view periodically. 9/19/2023 Department of AI & DS 16
  • 18. Topics to be covered in next session 3 • Distributed models 9/19/2023 Department of CSE (AI/ML) 18 Thank you!!!