SlideShare a Scribd company logo
4
Most read
5
Most read
7
Most read
CCS334 BIG DATAANALYTICS
(R-21 III (I Sem))
Department of Artificial Intelligence and Data Science )
Session 3
by
Asst.Prof.M.Gokilavani
NIET
9/19/2023 Department of AI & DS 1
TEXT BOOKS
• Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data,
Big Analytics: Emerging Business Intelligence and Analytic Trends for
Today's Businesses", Wiley, 2013.
• Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
• Sadalage, Pramod J. “NoSQL distilled”, 2013.
REFERENCES
• E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive",
O'Reilley, 2012.
• Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
• Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
9/19/2023 Department of AI & DS 2
Topics covered in Unit 2 session
9/19/2023 Department of AI & DS 3
UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL – aggregate data models – key-value and
document data models – relationships – graph databases – schema
less databases – materialized views – distribution models – master-
slave replication – consistency - Cassandra – Cassandra data model
– Cassandra examples – Cassandra clients.
Distribution Models
• Depending on the distribution model the data store can give us the ability:
• To handle large Quantity of data
• To process a grater read or write traffic
• To have more availability in the case of network slowdowns of
breakages.
• There are two path for distribution:
• Sharding: Sharding distributes different data across multiple
servers, so each server acts as the single source for a subset of
data.
• Replication: Replication copies data across multiple servers,
so each bit of data can be found in multiple places.
9/19/2023 Department of AI & DS 4
Distributed Models
• Single server
• Sharding
• Master Slave Replication
• Peer-to-Peer Replication
• Combining Sharding & Replication
9/19/2023 Department of AI & DS 5
Single Server
• It is the first and simplest distribution option.
• Also if NoSQL database are designed to run on a cluster they can be
used in a single server application.
• This make sense if a NoSQL database is more suited for the
application data model.
• Graph database are the more obvious.
• If data usage is most about processing aggregates, than a key or a
document store may be useful.
9/19/2023 Department of AI & DS 6
Sharding
• Often, a data store is busy because different people are accessing
different part of the dataset.
• In this cases we can support horizontal scalability by putting different
part of the data onto different servers (Sharding)
• The concept of sharding is not new as a part of application logic.
• It consists in put all the customer with surname A-D on one shard and
E-G to another
• This complicates the programming model as the application code
needs to distributed the load across the shards.
• In the ideal setting we have each user to talk one server and the load is
balanced.
• Of course the ideal case is rare.
9/19/2023 Department of AI & DS 7
9/19/2023 Department of AI & DS 8
Sharding and NoSQL
• In general, many NoSQL databases offers auto- sharding.
• This can make much easier to use sharding in an application.
• Sharding is especially valuable for performance because it improves
read and write performances.
• It scales read and writes on the different nodes of the same cluster.
9/19/2023 Department of AI & DS 9
Master – Slave Replication
• In this setting one node is designated as the master, or primary ad the
other as slaves.
• The master is the authoritative source for the date and designed to
process updates and send them to slaves.
• The slaves are used for read operations.
• This allows us to scale in data intensive dataset.
9/19/2023 Department of AI & DS 10
9/19/2023 Department of AI & DS 11
Master – Slave Replication
• We can scale horizontally by adding more slaves.
• But, we are limited by the ability of the master in processing incoming
data.
An advantage is read resilience.
Disadvantage:
• Also if the master fails the slaves can still handle read requests.
• Anyway writes are not allowed until the master is not restored.
9/19/2023 Department of AI & DS 12
Master – Slave Replication
• Another characteristic is that a slave can be appointed as master.
• Masters can be appointed manually or automatically.
• In order to achieve resilience we need that read and write paths are
different.
• This is normally done using separate database connections.
• Disadvantage: Replication in master-slave have the analyzed
advantages but it come with the problem of inconsistency.
• The readers reading from the slaves can read data not updated.
9/19/2023 Department of AI & DS 13
Master – Slave Replication in MongoDB
9/19/2023 Department of AI & DS 14
Peer to Peer Replication
• Master-Slave replication helps with read scalability but has problems
on scalability of writes.
• Moreover, it provides resilience on read but not on writes.
• The master is still a single point of failure.
• Peer-to-Peer attacks these problems by not having a master.
• All the replica are equal (accept writes and reads).
• With a Peer-to-Peer we can have node failures without lose write
capability and losing data.
9/19/2023 Department of AI & DS 15
9/19/2023 Department of AI & DS 16
Combine Sharing
• We have multiple masters, but each data has a single master.
• Depending on the configuration we can decide the master for each
group of data.
• Peer-to-Peer and sharding is a common strategy for column-family
databases.
• This is commonly composed using replication of the shards.
9/19/2023 Department of AI & DS 17
Topics to be covered in next session 4
• Cassandra data model
9/19/2023 Department of AI & DS 18
Thank you!!!

More Related Content

PPT
20. Parallel Databases in DBMS
PPT
Hive(ppt)
PPTX
Software Engineering unit 2
PDF
DBMS Unit - 5 - Query processing and optimization
PPTX
Concurrency Control in Distributed Database.
PPT
Indexing and Hashing
PPTX
introduction to NOSQL Database
PDF
DDBMS_ Chap 7 Optimization of Distributed Queries
20. Parallel Databases in DBMS
Hive(ppt)
Software Engineering unit 2
DBMS Unit - 5 - Query processing and optimization
Concurrency Control in Distributed Database.
Indexing and Hashing
introduction to NOSQL Database
DDBMS_ Chap 7 Optimization of Distributed Queries

What's hot (20)

PPTX
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
PPTX
File organization
PPTX
serializability in dbms
PPTX
Validation based protocol
PPT
5 Data Modeling for NoSQL 1/2
PPTX
CCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptx
PPTX
Database , 8 Query Optimization
PDF
Identifying classes and objects ooad
PPTX
Design Concepts in Software Engineering-1.pptx
PPTX
Distributed shred memory architecture
PPT
Schemaless Databases
PPTX
Deductive databases
PPT
Data Models.ppt
PPT
6 Data Modeling for NoSQL 2/2
PPT
Spatial data mining
PPT
Deadlock management
PPT
Clustering: Large Databases in data mining
PPTX
Data cube computation
PPT
Database systems
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
File organization
serializability in dbms
Validation based protocol
5 Data Modeling for NoSQL 1/2
CCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptx
Database , 8 Query Optimization
Identifying classes and objects ooad
Design Concepts in Software Engineering-1.pptx
Distributed shred memory architecture
Schemaless Databases
Deductive databases
Data Models.ppt
6 Data Modeling for NoSQL 2/2
Spatial data mining
Deadlock management
Clustering: Large Databases in data mining
Data cube computation
Database systems
Ad

Similar to CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx (20)

PPTX
Session 1 Introduction to NoSQL.pptx
DOCX
Report 2.0.docx
PPTX
NOSQL IN BIGDATA FOR PG STUDENTS FOR COL
PPTX
Introduction to NoSQL and MongoDB
PPTX
NoSQL 5 2_graph Database Edited - Updated.pptx.pptx
PPTX
NoSQLDatabases
PPTX
No sql database
PDF
the rising no sql technology
DOCX
Report 1.0.docx
PPTX
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
PDF
Scaling Up vs. Scaling-out
PPTX
NOSQL DATABASES UNIT-3 FOR ENGINEERING STUDENTS
PDF
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
PPTX
Distribution Models.pptxgdfgdfgdfgfdgdfg
PDF
A survey on data mining and analysis in hadoop and mongo db
PDF
A survey on data mining and analysis in hadoop and mongo db
PDF
Processing Drone data @Scale
PDF
Cloud Computing: The Hard Problems Never Go Away
PPTX
Erciyes university
PPTX
Relational and non relational database 7
Session 1 Introduction to NoSQL.pptx
Report 2.0.docx
NOSQL IN BIGDATA FOR PG STUDENTS FOR COL
Introduction to NoSQL and MongoDB
NoSQL 5 2_graph Database Edited - Updated.pptx.pptx
NoSQLDatabases
No sql database
the rising no sql technology
Report 1.0.docx
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
Scaling Up vs. Scaling-out
NOSQL DATABASES UNIT-3 FOR ENGINEERING STUDENTS
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Distribution Models.pptxgdfgdfgdfgfdgdfg
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Processing Drone data @Scale
Cloud Computing: The Hard Problems Never Go Away
Erciyes university
Relational and non relational database 7
Ad

More from Guru Nanak Technical Institutions (20)

PPTX
22PCOAM21 Data Quality Session 3 Data Quality.pptx
PPTX
22PCOAM21 Session 1 Data Management.pptx
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
PDF
III Year II Sem 22PCOAM21 Data Analytics Syllabus.pdf
PDF
22PCOAM16 _ML_Unit 3 Notes & Question bank
PDF
22PCOAM16 Machine Learning Unit V Full notes & QB
PDF
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
PDF
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
PPTX
22PCOAM16 Unit 3 Session 23 Different ways to Combine Classifiers.pptx
PPTX
22PCOAM16 Unit 3 Session 22 Ensemble Learning .pptx
PPTX
22PCOAM16 Unit 3 Session 24 K means Algorithms.pptx
PPTX
22PCOAM16 ML Unit 3 Session 18 Learning with tree.pptx
PPTX
22PCOAM16 ML Unit 3 Session 21 Classification and Regression Trees .pptx
PPTX
22PCOAM16 ML Unit 3 Session 20 ID3 Algorithm and working.pptx
PPTX
22PCOAM16 ML Unit 3 Session 19 Constructing Decision Trees.pptx
PDF
22PCOAM16 ML UNIT 2 NOTES & QB QUESTION WITH ANSWERS
PDF
22PCOAM16 _ML_ Unit 2 Full unit notes.pdf
PDF
22PCOAM16_ML_Unit 1 notes & Question Bank with answers.pdf
PDF
22PCOAM16_MACHINE_LEARNING_UNIT_I_NOTES.pdf
PPTX
22PCOAM16 Unit 2 Session 17 Support vector Machine.pptx
22PCOAM21 Data Quality Session 3 Data Quality.pptx
22PCOAM21 Session 1 Data Management.pptx
22PCOAM21 Session 2 Understanding Data Source.pptx
III Year II Sem 22PCOAM21 Data Analytics Syllabus.pdf
22PCOAM16 _ML_Unit 3 Notes & Question bank
22PCOAM16 Machine Learning Unit V Full notes & QB
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 Unit 3 Session 23 Different ways to Combine Classifiers.pptx
22PCOAM16 Unit 3 Session 22 Ensemble Learning .pptx
22PCOAM16 Unit 3 Session 24 K means Algorithms.pptx
22PCOAM16 ML Unit 3 Session 18 Learning with tree.pptx
22PCOAM16 ML Unit 3 Session 21 Classification and Regression Trees .pptx
22PCOAM16 ML Unit 3 Session 20 ID3 Algorithm and working.pptx
22PCOAM16 ML Unit 3 Session 19 Constructing Decision Trees.pptx
22PCOAM16 ML UNIT 2 NOTES & QB QUESTION WITH ANSWERS
22PCOAM16 _ML_ Unit 2 Full unit notes.pdf
22PCOAM16_ML_Unit 1 notes & Question Bank with answers.pdf
22PCOAM16_MACHINE_LEARNING_UNIT_I_NOTES.pdf
22PCOAM16 Unit 2 Session 17 Support vector Machine.pptx

Recently uploaded (20)

PPTX
additive manufacturing of ss316l using mig welding
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Sustainable Sites - Green Building Construction
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Digital Logic Computer Design lecture notes
PPT
Mechanical Engineering MATERIALS Selection
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Construction Project Organization Group 2.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
web development for engineering and engineering
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
additive manufacturing of ss316l using mig welding
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Sustainable Sites - Green Building Construction
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Digital Logic Computer Design lecture notes
Mechanical Engineering MATERIALS Selection
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Construction Project Organization Group 2.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
OOP with Java - Java Introduction (Basics)
web development for engineering and engineering
bas. eng. economics group 4 presentation 1.pptx
R24 SURVEYING LAB MANUAL for civil enggi
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd

CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

  • 1. CCS334 BIG DATAANALYTICS (R-21 III (I Sem)) Department of Artificial Intelligence and Data Science ) Session 3 by Asst.Prof.M.Gokilavani NIET 9/19/2023 Department of AI & DS 1
  • 2. TEXT BOOKS • Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. • Eric Sammer, "Hadoop Operations", O'Reilley, 2012. • Sadalage, Pramod J. “NoSQL distilled”, 2013. REFERENCES • E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012. • Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. • Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010. 9/19/2023 Department of AI & DS 2
  • 3. Topics covered in Unit 2 session 9/19/2023 Department of AI & DS 3 UNIT II NOSQL DATA MANAGEMENT Introduction to NoSQL – aggregate data models – key-value and document data models – relationships – graph databases – schema less databases – materialized views – distribution models – master- slave replication – consistency - Cassandra – Cassandra data model – Cassandra examples – Cassandra clients.
  • 4. Distribution Models • Depending on the distribution model the data store can give us the ability: • To handle large Quantity of data • To process a grater read or write traffic • To have more availability in the case of network slowdowns of breakages. • There are two path for distribution: • Sharding: Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data. • Replication: Replication copies data across multiple servers, so each bit of data can be found in multiple places. 9/19/2023 Department of AI & DS 4
  • 5. Distributed Models • Single server • Sharding • Master Slave Replication • Peer-to-Peer Replication • Combining Sharding & Replication 9/19/2023 Department of AI & DS 5
  • 6. Single Server • It is the first and simplest distribution option. • Also if NoSQL database are designed to run on a cluster they can be used in a single server application. • This make sense if a NoSQL database is more suited for the application data model. • Graph database are the more obvious. • If data usage is most about processing aggregates, than a key or a document store may be useful. 9/19/2023 Department of AI & DS 6
  • 7. Sharding • Often, a data store is busy because different people are accessing different part of the dataset. • In this cases we can support horizontal scalability by putting different part of the data onto different servers (Sharding) • The concept of sharding is not new as a part of application logic. • It consists in put all the customer with surname A-D on one shard and E-G to another • This complicates the programming model as the application code needs to distributed the load across the shards. • In the ideal setting we have each user to talk one server and the load is balanced. • Of course the ideal case is rare. 9/19/2023 Department of AI & DS 7
  • 9. Sharding and NoSQL • In general, many NoSQL databases offers auto- sharding. • This can make much easier to use sharding in an application. • Sharding is especially valuable for performance because it improves read and write performances. • It scales read and writes on the different nodes of the same cluster. 9/19/2023 Department of AI & DS 9
  • 10. Master – Slave Replication • In this setting one node is designated as the master, or primary ad the other as slaves. • The master is the authoritative source for the date and designed to process updates and send them to slaves. • The slaves are used for read operations. • This allows us to scale in data intensive dataset. 9/19/2023 Department of AI & DS 10
  • 12. Master – Slave Replication • We can scale horizontally by adding more slaves. • But, we are limited by the ability of the master in processing incoming data. An advantage is read resilience. Disadvantage: • Also if the master fails the slaves can still handle read requests. • Anyway writes are not allowed until the master is not restored. 9/19/2023 Department of AI & DS 12
  • 13. Master – Slave Replication • Another characteristic is that a slave can be appointed as master. • Masters can be appointed manually or automatically. • In order to achieve resilience we need that read and write paths are different. • This is normally done using separate database connections. • Disadvantage: Replication in master-slave have the analyzed advantages but it come with the problem of inconsistency. • The readers reading from the slaves can read data not updated. 9/19/2023 Department of AI & DS 13
  • 14. Master – Slave Replication in MongoDB 9/19/2023 Department of AI & DS 14
  • 15. Peer to Peer Replication • Master-Slave replication helps with read scalability but has problems on scalability of writes. • Moreover, it provides resilience on read but not on writes. • The master is still a single point of failure. • Peer-to-Peer attacks these problems by not having a master. • All the replica are equal (accept writes and reads). • With a Peer-to-Peer we can have node failures without lose write capability and losing data. 9/19/2023 Department of AI & DS 15
  • 17. Combine Sharing • We have multiple masters, but each data has a single master. • Depending on the configuration we can decide the master for each group of data. • Peer-to-Peer and sharding is a common strategy for column-family databases. • This is commonly composed using replication of the shards. 9/19/2023 Department of AI & DS 17
  • 18. Topics to be covered in next session 4 • Cassandra data model 9/19/2023 Department of AI & DS 18 Thank you!!!