SlideShare a Scribd company logo
1
That is the question
{
"_id": "555ae00a475a9b259281b21a",
"name": "Nicola Galgano",
"alias": "alikon",
"gender": "male",
"work": "DB consultant on banking systems",
"company": "looking for a new one",
"email": "info@alikonweb.it",
"twitter": "@alikon",
"address": "Roma, Italy, EU“,
“current_hobby”:”run away from dentist”
}
2
Henri Poincaré
3
Ipse dixit
4
What is Big
Data ?
Big data is an all-
encompassing term for
any collection of data
sets so large or
complex that it
becomes difficult to
process them using
traditional data
processing applications.
From wikipedia
5
How much is Big data ?
DVD 4.7 GB
Human brain 2.5 PB
LHC 1 PB/s
Net traffic 1 ZB/year
6
Internet
of
Everything
 IPv6 = 2^128
3,4e+38
7
IPv6 can address
every quark
in the world
8
Structured / Unstructured
Volume
9
 Volume
 Velocity
 Variety
 Veracity
10
Availability
Downtime/year Downtime/month Downtime/week
90 % (1 nine)
36.5 days 72 hours
16.8 hours
99 % (2 nines)
3.65 days 7.20 hours 1.68 hours
99,9 % (3 nines)
8.76 hours 43.8 minutes 10.1 minutes
99,99 % (4 nines)
52.56 minutes 4.38 minutes 1.01 minutes
99,999% (5 nines)
5.26 minutes 25.9 seconds 6.05 seconds
11
12
Next Generation Databases mostly addressing some of the points:
 non-relational
 distributed
 horizontal scalable
 open-source
From www.nosql-database.org
13
 Key / value
 Column
 Document
 Graph
14
A data model is a rapresentation that we use to perceive and manipulate data
15
•Logic model
•Normalization
• 1NF,2NF,3NF,..
• E-R
• Schema (rigid)
• Algebra of sets
•Impedance mismatch
16
Schemaless
(dynamic/implicit)
Denormalization
Aggregate
Aggregates are the
basic element of data
storage
17
Simple data model
Blob/Opaque
Only 3 API function
• Get(key)
• Set(key, value)
• Delete(key)
Key and value can be complex
More trasparent
18
JSON
(JavaScript Object Notation)
A lightweight data
interchange format
Easy for humans
and machines to
read and write
Column
Sparse semi structured,
sorted map.
Flexible number of columns
Column key can be grouped to
family
19
How is stored
 Graph theory model G = ( V, E )
 Store, map and query relationships
20
•Node connected by edges
•Complex relationships
•Recommend products
•ACID
Queries = graph traversal
The map job
takes a set of data and converts it
into another set of data, where
individual elements are broken down
into tuples (key/value pairs)
The reduce job
takes the output from a map as input
and combines those data tuples into
a smaller set of tuples
21
refers to 2 separate and distinct tasks
Tasks runs in parallel
22
 There are multiple ways to model data
 How the data is going to be accessed
 Read intensive or Write intensive
 Complex queries
23
Schemaless Normalized
Model
Vertical (up)
Add more power (ram/cpu/disk)
Horizontal (out)
Add more commodity systems
24
 1. The network is reliable.
 2. Latency is zero.
 3. Bandwidth is infinite.
 4. The network is secure.
 5. Topology doesn't change.
 6. There is one administrator.
 7. Transport cost is zero.
 8. The network is homogeneous.
25
 Split up data into multiple chunks
 Store each chunk in a separate data node
 Partitioning strategy “The shard key“
 Multishard ops (Join/aggregate)
 Load balancing
26
 Master / Slave
 Multi / Master
 Synchonous
 Asynchonous
 Provide redundancy
 Increase availability
 Failover (automatic)
27
28
Maria NickData
Get(X)
T0
Get(X)
T1
T2
Put(X)
Put(X)
T3
Transaction
 A sequence of operations that form a single unit of work
 Transaction have 4 properties
 Atomic
 Consistent
 Isolated
 Durable
29
ACID - Atomicity
Transfer 100€ from A to B
1. Read(a)
2. If a > 100
3. A=A-100
4. Write(A)
5. Read(b)
6. B=B+100
7. Write(B)
30
ACID - Consistency
Transfer 100€ from A to B
1. Read(a)
2. If A > 100
3. A=A-100
4. Write(A)
5. Read(B)
6. B=B+100
7. Write(B)
31
ACID - Isolation
Transfer 100€ from A to B
1. Read(A)
2. If A > 100
3. A=A-100
4. Write(A)
5. Read(B)
6. B=B+100
7. Write(B)
32
ACID - Durability
Transfer 100€ from A to B
1. Read(A)
2. If A > 100
3. A=A-100
4. Write(A)
5. Read(b)
6. B=B+100
7. Write(B)
33
Basically Available:
 There will be a response to any request.
 Fast response even if some replicas are slow or crashed
Soft State:
 The state of the system could change over time
 It’s user application task to guarantee consistency
Eventual consistent:
 The system will eventually become consistent once it stops
receiving input.
 The data will propagate to everywhere
34
 Nick finds a cool photo and shares with Maria by posting
on her Facebook wall
 Nick asks Maria to check it out
 Maria logs in her account, checks her Facebook wall but:
- Nothing is there! (x apart)
 Nick tells Maria to wait a bit and check out later
 Maria waits for a minute or so and checks back:
- She finds the photo Nick shared with her!
35
 It’s impossible for a distributed computer system to
simultaneously provide all this three guarantees:
 Consistency – all node see the same data at same time
 Availability – all can always read and write
 Partition tollerance – the system will work on failure*
 A distributed system can satisfay only 2 at the same time
36
37
Nick Maria
Who will take the next flight ?
EU US
38
 ATM will allow you to withdraw money even if the
machine is partitioned from the network
 Higher availability means higher revenue
 However, it puts a limit on the amount of withdraw
 The bank might also charge you a fee when a
overdraft happens
In the absence of partitions
how does the system trade off
latency (L) and consistency (C)?
39
40
ACID RDBMS BASE NOSQL
 Strong consistency
 Isolation
 Transaction
 Mature technology
 SQL
 Available & consistent
 Scale up (limited)
 Shared something (disk/ram/proc)
 Weak consistenct (stale data)
 Last write wins
 Program managed
 New technology
 No standard
 Available & partition tolerant
 Scale out (unlimited*)
 Shared nothing (parallelizable)
41

More Related Content

PPTX
Need 4 speed
PDF
Tutorial - Modern Real Time Streaming Architectures
PDF
Distributed Systems, Blockchain, Bitcoin, and Smart Contracts: An Introduction
PPTX
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
PDF
Our Concurrent Past; Our Distributed Future
PPT
Introduction to Cluster Computing and Map Reduce (from Google)
PPTX
Trivento summercamp masterclass 9/9/2016
PDF
Iot presentation and hand on building tools
Need 4 speed
Tutorial - Modern Real Time Streaming Architectures
Distributed Systems, Blockchain, Bitcoin, and Smart Contracts: An Introduction
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Our Concurrent Past; Our Distributed Future
Introduction to Cluster Computing and Map Reduce (from Google)
Trivento summercamp masterclass 9/9/2016
Iot presentation and hand on building tools

Similar to Sql or NoSql: that is the question... (20)

PPTX
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
PDF
blockchain-advanced-cryptography-2023.pdf
PDF
Simplified Data Processing On Large Cluster
PDF
Introduction to Big Data
PPT
Lec1 Intro
PPT
Lec1 Intro
PPTX
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
PDF
NoSql And The Semantic Web
PPTX
Big data ppt diala
PPSX
Complete Osi Model Explained
PDF
Scalability20140226
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
PDF
Interpreting the data parallel analysis with sawzall
PPTX
Tackling node failure in
PPTX
Data Streaming (in a Nutshell) ... and Spark's window operations
PPTX
Microsoft Dryad
PDF
Is this normal?
PPTX
DCN _ unit 2.pptx
PDF
Networking-Basics-with-Cisco-Packet-Tracer-for-Beginners.pdf
PDF
Big Data et eGovernment
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
blockchain-advanced-cryptography-2023.pdf
Simplified Data Processing On Large Cluster
Introduction to Big Data
Lec1 Intro
Lec1 Intro
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
NoSql And The Semantic Web
Big data ppt diala
Complete Osi Model Explained
Scalability20140226
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Interpreting the data parallel analysis with sawzall
Tackling node failure in
Data Streaming (in a Nutshell) ... and Spark's window operations
Microsoft Dryad
Is this normal?
DCN _ unit 2.pptx
Networking-Basics-with-Cisco-Packet-Tracer-for-Beginners.pdf
Big Data et eGovernment
Ad

Recently uploaded (20)

PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Microsoft Core Cloud Services powerpoint
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
How to run a consulting project- client discovery
PDF
Introduction to Data Science and Data Analysis
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Transcultural that can help you someday.
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Managing Community Partner Relationships
PPT
Predictive modeling basics in data cleaning process
PPTX
Leprosy and NLEP programme community medicine
PDF
Introduction to the R Programming Language
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Global Data and Analytics Market Outlook Report
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Microsoft Core Cloud Services powerpoint
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Database Infoormation System (DBIS).pptx
How to run a consulting project- client discovery
Introduction to Data Science and Data Analysis
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Transcultural that can help you someday.
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Managing Community Partner Relationships
Predictive modeling basics in data cleaning process
Leprosy and NLEP programme community medicine
Introduction to the R Programming Language
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Qualitative Qantitative and Mixed Methods.pptx
Global Data and Analytics Market Outlook Report
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Ad

Sql or NoSql: that is the question...

  • 1. 1 That is the question
  • 2. { "_id": "555ae00a475a9b259281b21a", "name": "Nicola Galgano", "alias": "alikon", "gender": "male", "work": "DB consultant on banking systems", "company": "looking for a new one", "email": "info@alikonweb.it", "twitter": "@alikon", "address": "Roma, Italy, EU“, “current_hobby”:”run away from dentist” } 2
  • 4. 4
  • 5. What is Big Data ? Big data is an all- encompassing term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data processing applications. From wikipedia 5
  • 6. How much is Big data ? DVD 4.7 GB Human brain 2.5 PB LHC 1 PB/s Net traffic 1 ZB/year 6
  • 7. Internet of Everything  IPv6 = 2^128 3,4e+38 7 IPv6 can address every quark in the world
  • 8. 8
  • 10.  Volume  Velocity  Variety  Veracity 10
  • 11. Availability Downtime/year Downtime/month Downtime/week 90 % (1 nine) 36.5 days 72 hours 16.8 hours 99 % (2 nines) 3.65 days 7.20 hours 1.68 hours 99,9 % (3 nines) 8.76 hours 43.8 minutes 10.1 minutes 99,99 % (4 nines) 52.56 minutes 4.38 minutes 1.01 minutes 99,999% (5 nines) 5.26 minutes 25.9 seconds 6.05 seconds 11
  • 12. 12
  • 13. Next Generation Databases mostly addressing some of the points:  non-relational  distributed  horizontal scalable  open-source From www.nosql-database.org 13
  • 14.  Key / value  Column  Document  Graph 14
  • 15. A data model is a rapresentation that we use to perceive and manipulate data 15 •Logic model •Normalization • 1NF,2NF,3NF,.. • E-R • Schema (rigid) • Algebra of sets •Impedance mismatch
  • 17. 17 Simple data model Blob/Opaque Only 3 API function • Get(key) • Set(key, value) • Delete(key) Key and value can be complex
  • 18. More trasparent 18 JSON (JavaScript Object Notation) A lightweight data interchange format Easy for humans and machines to read and write
  • 19. Column Sparse semi structured, sorted map. Flexible number of columns Column key can be grouped to family 19 How is stored
  • 20.  Graph theory model G = ( V, E )  Store, map and query relationships 20 •Node connected by edges •Complex relationships •Recommend products •ACID Queries = graph traversal
  • 21. The map job takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs) The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples 21 refers to 2 separate and distinct tasks Tasks runs in parallel
  • 22. 22
  • 23.  There are multiple ways to model data  How the data is going to be accessed  Read intensive or Write intensive  Complex queries 23 Schemaless Normalized Model
  • 24. Vertical (up) Add more power (ram/cpu/disk) Horizontal (out) Add more commodity systems 24
  • 25.  1. The network is reliable.  2. Latency is zero.  3. Bandwidth is infinite.  4. The network is secure.  5. Topology doesn't change.  6. There is one administrator.  7. Transport cost is zero.  8. The network is homogeneous. 25
  • 26.  Split up data into multiple chunks  Store each chunk in a separate data node  Partitioning strategy “The shard key“  Multishard ops (Join/aggregate)  Load balancing 26
  • 27.  Master / Slave  Multi / Master  Synchonous  Asynchonous  Provide redundancy  Increase availability  Failover (automatic) 27
  • 29. Transaction  A sequence of operations that form a single unit of work  Transaction have 4 properties  Atomic  Consistent  Isolated  Durable 29
  • 30. ACID - Atomicity Transfer 100€ from A to B 1. Read(a) 2. If a > 100 3. A=A-100 4. Write(A) 5. Read(b) 6. B=B+100 7. Write(B) 30
  • 31. ACID - Consistency Transfer 100€ from A to B 1. Read(a) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(B) 6. B=B+100 7. Write(B) 31
  • 32. ACID - Isolation Transfer 100€ from A to B 1. Read(A) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(B) 6. B=B+100 7. Write(B) 32
  • 33. ACID - Durability Transfer 100€ from A to B 1. Read(A) 2. If A > 100 3. A=A-100 4. Write(A) 5. Read(b) 6. B=B+100 7. Write(B) 33
  • 34. Basically Available:  There will be a response to any request.  Fast response even if some replicas are slow or crashed Soft State:  The state of the system could change over time  It’s user application task to guarantee consistency Eventual consistent:  The system will eventually become consistent once it stops receiving input.  The data will propagate to everywhere 34
  • 35.  Nick finds a cool photo and shares with Maria by posting on her Facebook wall  Nick asks Maria to check it out  Maria logs in her account, checks her Facebook wall but: - Nothing is there! (x apart)  Nick tells Maria to wait a bit and check out later  Maria waits for a minute or so and checks back: - She finds the photo Nick shared with her! 35
  • 36.  It’s impossible for a distributed computer system to simultaneously provide all this three guarantees:  Consistency – all node see the same data at same time  Availability – all can always read and write  Partition tollerance – the system will work on failure*  A distributed system can satisfay only 2 at the same time 36
  • 37. 37 Nick Maria Who will take the next flight ? EU US
  • 38. 38  ATM will allow you to withdraw money even if the machine is partitioned from the network  Higher availability means higher revenue  However, it puts a limit on the amount of withdraw  The bank might also charge you a fee when a overdraft happens
  • 39. In the absence of partitions how does the system trade off latency (L) and consistency (C)? 39
  • 40. 40
  • 41. ACID RDBMS BASE NOSQL  Strong consistency  Isolation  Transaction  Mature technology  SQL  Available & consistent  Scale up (limited)  Shared something (disk/ram/proc)  Weak consistenct (stale data)  Last write wins  Program managed  New technology  No standard  Available & partition tolerant  Scale out (unlimited*)  Shared nothing (parallelizable) 41