SlideShare a Scribd company logo
1. NOSQL KEY-VALUE
DATABASE
1
Lecture 2
Dr. Shaimaa Galal
Review Question
• What is the main challenge of the traditional databases?
Managing of semi-structured and unstructured data.
Managing large amounts of structured data.
2
Question
3
4
Key-value database
• Example: (DynamoDB)
• items having one or more attributes
(name, value)
• An attribute can be single-valued or
multi-valued like set.
• items are combined into a table
• key-value database is a system that stores values indexed
by keys. It can store structured and unstructured data.
• Focus on scaling to huge amounts of data designed to
handle massive data loads
• Data model: (global) collection of Key-value pairs.
Key-value
Pros:
• very fast
• very scalable (horizontally distributed to nodes based on key)
• simple data model
• eventual consistency
• fault-tolerance
Cons:
- Can’t model more complex data structure such as objects
5
Big Data: Google
6
1. Google Stack Software
• Google developed major software layers as foundation for
google platform:
1. Google File System (GFS): a distributed cluster file
system that allows all of the disks within the Google
data center to be accessed as one massive, distributed,
redundant file system.
2. MapReduce: a distributed processing framework for
parallelizing algorithms across large numbers of
potentially unreliable servers and being capable of
dealing with massive datasets.
3. BigTable: a nonrelational database system that uses
the Google File System for storage.
7
Google Software Architecture
8
Simple MapReduce Example: WordCount
9
Map Function
10
Reduce Function
11
MultiStage MapReduce Example
12
2. Hadoop and Hive
13
14
Key-value Database API Functions:
Key-value
• Basic API access:
• Get(key): extract the value given a key
• Put(key, value): create or update the value given its key
• Delete(key): remove the key and its associated value
• Update(key, value): create or update the value given its key
• Execute(key, operation, parameters): invoke an operation to the
value (given its key) which is a special data structure (e.g. List, Set,
Map .... etc)
15
Key-value Platforms
16
Name Producer Data model Querying
SimpleDB Amazon set of couples (key, {attribute}),
where attribute is a couple
(name, value)
restricted SQL; select, delete,
GetAttributes, and
PutAttributes operations
Redis Salvatore
Sanfilippo
set of couples (key, value),
where value is simple typed
value, list, ordered (according
to ranking) or unordered set,
hash value
primitive operations for each
value type
Dynamo Amazon like SimpleDB simple get operation and put
in a context
Voldemort LinkeId like SimpleDB similar to Dynamo
Apache Cassandra
• Is a free and open-source distributed NoSQL database
management.
• Handles large amounts of data across many commodity
servers, providing high availability with no single point
of failure.
• It was started by Facebook and it is an open source
Apache project written in Java.
17
18
DataStax Astra
19
Apache Cassandra - Advantages
1. Cassandra is developed to be a distributed server, but it
can also be run as a simple node.
2. Horizontal scalability (Distributed storage.).
3. Quick answers even if demand grows.
4. High write speeds to manage incremental data volumes
5. Ability to change the data structure.
6. A simple API for your favorite programming language.
7. Automatic fault detection and fault tolerant.
8. There is no single point of failure which means that each
node knows about the others.
9. Decentalized.
10.Allows the use of Hadoop to use Map Reduce.
20
21
Apache Cassandra - Disadvantages
1. Ad-hoc queries: You must model your data
around the queries, rather than around the
structure of data.
2. No-Aggregations: because Cassandra is a key-
value store doing functions like Sum, Min, Max,
and Average are incredibly resource intensive if
even possible to accomplish.
3. Unpredictable performance: Because
Cassandra has many different Asynchronous Jobs
in the background.
22
Comparing Alternatives
23
24
25
Cassandra Gossip Protocol
• What is Gossip protocol ?
Gossip is the message system
that Cassandra nodes, virtual
nodes used to make their data
consistent with each other.
A node has a data replica. If
something goes wrong, a
replica can respond. The
replication_factor parameter
in the creation of a KeySpace
(database) indicates how many
machines in the cluster will
receive copies of the same
data.
26
27
Key-Value Concepts
• Cassandra manages columns and family of columns.
• Column family is a container of rows containing columns.
• A keyspace is analogous to a database in a relational
model but without interrelations (stores data).
• The keyspaces require that some attributes be defined,
such as user-defined names, replication strategies and
others.
28
Key-Value Concepts
• These KeySpaces require configuration according to
consistency that are:
1. The replication factor which indicates how much do you
want to pay performance in favor of consistency.
2. The replica placement strategy, which indicates how
the replicas are placed in the ring such as
SimpleStrategy, OldNetwork TopologyStrategy, and
NetworkTopologyStrategy.
• Read more: https://docs.datastax.com/en/cassandra-
oss/2.1/cassandra/architecture/architectureDataDistributeR
eplication_c.html#architectureDataDistributeReplication_c_
_networkToplogyStrategy-ph
29
30
31
32
CQL (Cassandra Query Language)
• CQL offers a more than close to SQL to create schema
and manipulate data.
33
Some of the features CQL has are:
• Data types • Security
• Data definition • Functions
• Data manipulation • Arithmetic operations
• Secondary indexes • JSON support
• Materialized views • Triggers
CQL Example
34
Use Case
35
Use Case
36

More Related Content

PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
PDF
Slide presentation pycassa_upload
PPTX
unit2-ppt1.pptx
PPTX
Nosql databases
PPT
NOSQL Database: Apache Cassandra
PPT
The No SQL Principles and Basic Application Of Casandra Model
PPTX
Big Data Platforms: An Overview
PPTX
An Intro to NoSQL Databases
Module 2.2 Introduction to NoSQL Databases.pptx
Slide presentation pycassa_upload
unit2-ppt1.pptx
Nosql databases
NOSQL Database: Apache Cassandra
The No SQL Principles and Basic Application Of Casandra Model
Big Data Platforms: An Overview
An Intro to NoSQL Databases

Similar to 2. Lecture2_NOSQL_KeyValue.ppt (20)

PPT
No sql or Not only SQL
PPTX
Introduction to NoSQL CassandraDB
PDF
Scaling the Web: Databases & NoSQL
PPTX
2018 05 08_biological_databases_no_sql
PPTX
Appache Cassandra
PPTX
No sql databases
PDF
Spring one2gx2010 spring-nonrelational_data
PDF
NoSql and it's introduction features-Unit-1.pdf
PPTX
Whynosql
PDF
Vskills Apache Cassandra sample material
PDF
Ciel, mes données ne sont plus relationnelles
ODP
Vote NO for MySQL
PPTX
No SQL DATABASE Description about 4 no sql database.pptx
PPTX
NoSQL - Cassandra & MongoDB.pptx
PPTX
L6.sp17.pptx
PPTX
No SQL- The Future Of Data Storage
PPTX
PPT
No sql
PPTX
Introduction to asdfghjkln b vfgh n v
PPT
No sql (1)
No sql or Not only SQL
Introduction to NoSQL CassandraDB
Scaling the Web: Databases & NoSQL
2018 05 08_biological_databases_no_sql
Appache Cassandra
No sql databases
Spring one2gx2010 spring-nonrelational_data
NoSql and it's introduction features-Unit-1.pdf
Whynosql
Vskills Apache Cassandra sample material
Ciel, mes données ne sont plus relationnelles
Vote NO for MySQL
No SQL DATABASE Description about 4 no sql database.pptx
NoSQL - Cassandra & MongoDB.pptx
L6.sp17.pptx
No SQL- The Future Of Data Storage
No sql
Introduction to asdfghjkln b vfgh n v
No sql (1)
Ad

More from ShaimaaMohamedGalal (10)

PDF
Clustering techniques data mining book ....
PDF
Data mining ..... Association rule mining
PDF
Lecture 0 - Advanced DB.pdf
PDF
Lecture8_AdvancedPHP(Continue)-APICalls_SPring2023.pdf
PDF
Lecture15_LaravelGetStarted_SPring2023.pdf
PDF
Lecture11_LaravelGetStarted_SPring2023.pdf
PDF
Lecture2_IntroductionToPHP_Spring2023.pdf
PPTX
Lecture9_OOPHP_SPring2023.pptx
PDF
1. Lecture1_NOSQL_Introduction.pdf
PPT
Lecture3.ppt
Clustering techniques data mining book ....
Data mining ..... Association rule mining
Lecture 0 - Advanced DB.pdf
Lecture8_AdvancedPHP(Continue)-APICalls_SPring2023.pdf
Lecture15_LaravelGetStarted_SPring2023.pdf
Lecture11_LaravelGetStarted_SPring2023.pdf
Lecture2_IntroductionToPHP_Spring2023.pdf
Lecture9_OOPHP_SPring2023.pptx
1. Lecture1_NOSQL_Introduction.pdf
Lecture3.ppt
Ad

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Spectroscopy.pptx food analysis technology
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
sap open course for s4hana steps from ECC to s4
Advanced methodologies resolving dimensionality complications for autism neur...

2. Lecture2_NOSQL_KeyValue.ppt

  • 2. Review Question • What is the main challenge of the traditional databases? Managing of semi-structured and unstructured data. Managing large amounts of structured data. 2
  • 4. 4 Key-value database • Example: (DynamoDB) • items having one or more attributes (name, value) • An attribute can be single-valued or multi-valued like set. • items are combined into a table • key-value database is a system that stores values indexed by keys. It can store structured and unstructured data. • Focus on scaling to huge amounts of data designed to handle massive data loads • Data model: (global) collection of Key-value pairs.
  • 5. Key-value Pros: • very fast • very scalable (horizontally distributed to nodes based on key) • simple data model • eventual consistency • fault-tolerance Cons: - Can’t model more complex data structure such as objects 5
  • 7. 1. Google Stack Software • Google developed major software layers as foundation for google platform: 1. Google File System (GFS): a distributed cluster file system that allows all of the disks within the Google data center to be accessed as one massive, distributed, redundant file system. 2. MapReduce: a distributed processing framework for parallelizing algorithms across large numbers of potentially unreliable servers and being capable of dealing with massive datasets. 3. BigTable: a nonrelational database system that uses the Google File System for storage. 7
  • 13. 2. Hadoop and Hive 13
  • 15. Key-value • Basic API access: • Get(key): extract the value given a key • Put(key, value): create or update the value given its key • Delete(key): remove the key and its associated value • Update(key, value): create or update the value given its key • Execute(key, operation, parameters): invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc) 15
  • 16. Key-value Platforms 16 Name Producer Data model Querying SimpleDB Amazon set of couples (key, {attribute}), where attribute is a couple (name, value) restricted SQL; select, delete, GetAttributes, and PutAttributes operations Redis Salvatore Sanfilippo set of couples (key, value), where value is simple typed value, list, ordered (according to ranking) or unordered set, hash value primitive operations for each value type Dynamo Amazon like SimpleDB simple get operation and put in a context Voldemort LinkeId like SimpleDB similar to Dynamo
  • 17. Apache Cassandra • Is a free and open-source distributed NoSQL database management. • Handles large amounts of data across many commodity servers, providing high availability with no single point of failure. • It was started by Facebook and it is an open source Apache project written in Java. 17
  • 18. 18
  • 20. Apache Cassandra - Advantages 1. Cassandra is developed to be a distributed server, but it can also be run as a simple node. 2. Horizontal scalability (Distributed storage.). 3. Quick answers even if demand grows. 4. High write speeds to manage incremental data volumes 5. Ability to change the data structure. 6. A simple API for your favorite programming language. 7. Automatic fault detection and fault tolerant. 8. There is no single point of failure which means that each node knows about the others. 9. Decentalized. 10.Allows the use of Hadoop to use Map Reduce. 20
  • 21. 21
  • 22. Apache Cassandra - Disadvantages 1. Ad-hoc queries: You must model your data around the queries, rather than around the structure of data. 2. No-Aggregations: because Cassandra is a key- value store doing functions like Sum, Min, Max, and Average are incredibly resource intensive if even possible to accomplish. 3. Unpredictable performance: Because Cassandra has many different Asynchronous Jobs in the background. 22
  • 24. 24
  • 25. 25
  • 26. Cassandra Gossip Protocol • What is Gossip protocol ? Gossip is the message system that Cassandra nodes, virtual nodes used to make their data consistent with each other. A node has a data replica. If something goes wrong, a replica can respond. The replication_factor parameter in the creation of a KeySpace (database) indicates how many machines in the cluster will receive copies of the same data. 26
  • 27. 27
  • 28. Key-Value Concepts • Cassandra manages columns and family of columns. • Column family is a container of rows containing columns. • A keyspace is analogous to a database in a relational model but without interrelations (stores data). • The keyspaces require that some attributes be defined, such as user-defined names, replication strategies and others. 28
  • 29. Key-Value Concepts • These KeySpaces require configuration according to consistency that are: 1. The replication factor which indicates how much do you want to pay performance in favor of consistency. 2. The replica placement strategy, which indicates how the replicas are placed in the ring such as SimpleStrategy, OldNetwork TopologyStrategy, and NetworkTopologyStrategy. • Read more: https://docs.datastax.com/en/cassandra- oss/2.1/cassandra/architecture/architectureDataDistributeR eplication_c.html#architectureDataDistributeReplication_c_ _networkToplogyStrategy-ph 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. CQL (Cassandra Query Language) • CQL offers a more than close to SQL to create schema and manipulate data. 33 Some of the features CQL has are: • Data types • Security • Data definition • Functions • Data manipulation • Arithmetic operations • Secondary indexes • JSON support • Materialized views • Triggers