SlideShare a Scribd company logo
Introduction to asdfghjkln b vfgh n    v
What is covered in thispresentation?
A brief history of databases
NoSQL WHY, WHAT & WHEN?
Characteristics of NoSQL databases
Aggregate data models
CAP theorem
Introduction
• Database - Organized collection of data
• DBMS - a software package with computerprograms
that controls the creation, maintenance and use of a
database
• Databases are created to operate large quantities of
information by inputting, storing, retrieving, and
managing that information
Abrief history
• Benefits of Relational databases:
Designed for all purposes
ACID
Strong consistancy, concurrency, recovery
Mathematical background
Standard Query language (SQL)
Lots of tools to use with i.e: Reporting services, entity
frameworks, ...
Relational databases
SQLdatabases
But...
• Relational databases were not
built for distributed applications.
Because...
• Joins are expensive
• Hard to scale horizontally
• Impedance mismatch occurs
• Expensive (product cost,
hardware, Maintenance)
NoSQL why, what and when?
And....
It’s weakin:
 Speed(performance)
 High availability
 Partition tolerance
NoSQL why, what and when?
Why NOSQL now?? Ans.DrivingTrends
RDBMS performance
Data
Data is a new class of economic asset, like currency and
gold
Source: World Economic Forum 2012
Data is the new raw material
Data size growth
• 150 exabytes in 2005
(exabyte is a billion
gigabytes)
• 1200 exabytes in 2010
• 35000 exabytes in 2020
(expected by IBM)
Volume of data/information created, captured,
copied, and consumed worldwide from 2010 to 2025
Data size growth
Examples:
• ISRO launches the advanced earth observation
and mapping satellite CARTOSAT-3 along with
13 other commercial nano-satellites
– Information and images coming from the satellite
• Maharashtra Election : 20000 tweets/second
• Around 30 billion RFID tags produced/year
– Automatic toll collection using RFID
• Oil drilling platforms have 20k to 40k sensors
95% of data produced is unstructured
Challenge
Big Data’s characteristics are challenging conventional information
management architectures
 Massive and growing amounts of information residing internal
and external to the organization
 Unconventional semi structured or unstructured (diverse)
including web pages, log files, social media, click-streams,
instant messages, text messages, emails, sensor data from
active and passive systems, etc.
 Changing information
15
Multi-Channel
analytics
Sentiment
analytics Transaction
analytics
Call Detail Records
analytics
Warranty claim
analytics
Surveillance
analytics
Claim fraud
analytics
What is big data?
“A massive volume of both structured and unstructured data
that is so large that it's difficult to store, analyse, process,
share, visualise and manage with traditional database and
software techniques.” - Roger Magoulas of O’reilly in 2005
• Big data technologies describe a new generation of
technologies and architectures, designed to economically
extract value from very large volumes of a wide variety of
data, by enabling high velocity capture, discovery, and/or
analysis
• IBM / MS
– Volume (Terabytes -> Zettabytes)
– Variety (Structured -> Semi-structured -> Unstructured)
– Velocity (Batch -> Streaming Data)
What Makes it Big Data? (V3)
VOLUME VELOCITY VARIETY VALUE
SOCIAL
BLOG
SMART
METER
1011001010010
0100110101010
1011100101010
100100101
• Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015),
Exabyte(1018), Zettabytes(1021)
• Variety: Structured,semi-structured, unstructured; Text, image,
audio, video, record
• Velocity (Dynamic, sometimes time-varying)
Variability:
Variability vs variety. 6
different coffee blends tastes
different every day, that is
variability.
The same is true of data, if the
meaning is constantly
changing it can have a huge
impact on your data
homogenization.
Visualization:
Using charts and graphs to
visualize large amounts of
complex data
A NoSQL database provides a
mechanism for storage and retrieval
of data that employs less constrained
consistency models than traditional
relational database
No SQL systems are also referred to
as "NotonlySQL“ to emphasize that
they do in fact allow SQL-like query
languages to be used.
But What is NoSQL?
NoSQL avoids:
Overhead of ACID transactions
Complexity of SQL query
Burden of up-front schema design
DBA presence
Transactions (It should be handled
at application layer)
Provides:
Easy and frequent changes to DB
Fast development
Large data volumes(eg.Google)
Schema less
Characteristics of NoSQLdatabases
NoSQLis getting more & morepopular
In relational Databases:
You can’t add a record which does
not fit the schema
You need to add NULLs to
unused items in a row
We should consider the datatypes.
i.e : you can’t add a stirng to an
interger field
You can’t add multiple items in a
field (You should create another
table: primary-key, foreign key,
joins, normalization, ... !!!)
What is aschema-lessdatamodel?
In NoSQL Databases:
There is no schema to consider
There is no unused cell
There is no datatype (implicit)
Most of considerations are done
in application layer
We gather all items in an aggregate
(document)
What is aschema-lessdatamodel?
NoSQL databases are classified in four
major datamodels:
• Key-value
• Document
• Column family
• Graph
Each DB has its own query language
Categories of NoSQL databases
 Simplest NOSQL databases
 The main idea is the use
of a hash table
 Access data (values) by
strings called keys
 Data has no required format
data may have any format
 Data model: (key, value) pairs
 Basic Operations:
Insert(key,value),
Fetch(key), Update(key),
Delete(key)
Key-value data model
Row oriented DB – stores row by row, suitable for
OLTP
Column oriented DB – stores column by column –
OLAP
Companies such as Facebook, Twitter, Yahoo, and
Adobe use HBase internally (large data and
random read/write)
The column is lowest/smallest instance of data.
It is a tuple that contains a name, a value and a
timestamp
Column family datamodel
Introduction to asdfghjkln b vfgh n    v
Example
28
Some statistics about Facebook Search (usingCassandra)
 MySQL>50 GBData
 Writes Average: ~300ms
 ReadsAverage: ~350 ms
 Rewritten with Cassandra>50 GBData
 Writes Average: 0.12ms
 ReadsAverage: 15 ms
Column family datamodel
 Based on Graph Theory.
 Scale vertically, no clustering.
 You can use graph algorithms
easily
 Transactions
 ACID
Graph data model
• Pair each key with complex data
structure known as data
structure.
• Indexes are done via B-Trees.
• Documents can contain many
different key-value pairs, or key-
array pairs, or even nested
documents.
Document baseddata model
SQL vs NOSQL
• NoSQL may complement RDBMS
– RDBMS may hold smaller amounts of high-value structured data
– NoSQL may hold vast amounts of less valued and less structured
• Relational implementations provide ACID guarantees
– Atomicity: transaction treated an all or nothing operation
– Consistency: database values correct before and after
– Isolation: as if only transaction.
– Durability: upon completion of transaction, operation is not reversed.
• NoSQL often provides BASE
– Basically available: Allowance for parts of a system to fail (sharding/
partitioning)
– Soft state: An object may have multiple simultaneous values (at
different times)
– Eventually consistent: Consistency achieved over time (not on every
commit)
• CAP Theorem
– It is impossible to have consistency, availability, and partition
tolerance in a distributed system
What we need?
• Weneed adistributed database system having such features:
•
•
•
•
– Faulttolerance
– Highavailability
– Consistency
– Scalability
Which isimpossible!!!
According to CAPtheorem
Wecannot achieve all the three items
In distributed databasesystems(center)
The CAP theorem
CAPtheorem
Conclusion….

More Related Content

PPTX
Relational databases vs Non-relational databases
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
PPTX
NoSQL Architecture Overview
PPTX
NoSql Brownbag
PPTX
NoSQLDatabases
PPTX
nosql.pptx
PPTX
Introduction to Data Science NoSQL.pptx
PDF
Big Data technology Landscape
Relational databases vs Non-relational databases
Module 2.2 Introduction to NoSQL Databases.pptx
NoSQL Architecture Overview
NoSql Brownbag
NoSQLDatabases
nosql.pptx
Introduction to Data Science NoSQL.pptx
Big Data technology Landscape

Similar to Introduction to asdfghjkln b vfgh n v (20)

PPTX
Introduction to No SQL - Learn nosql databases
PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
NoSQL.pptx
PDF
NoSQL Databases Introduction - UTN 2013
PPT
NO SQL: What, Why, How
PDF
Nosql data models
PDF
SQL or NoSQL, is this the question? - George Grammatikos
PPT
SQL, NoSQL, BigData in Data Architecture
PPTX
NoSQL databases - An introduction
PDF
Beyond Relational Databases
PPTX
Introduction to NoSQL
PDF
the rising no sql technology
PPTX
BigData, NoSQL & ElasticSearch
PPTX
NoSQL
PPTX
No sql database
PDF
NOsql Presentation.pdf
PDF
Prague data management meetup 2018-03-27
PDF
Database Revolution - Exploratory Webcast
PDF
Database revolution opening webcast 01 18-12
PPTX
2018 05 08_biological_databases_no_sql
Introduction to No SQL - Learn nosql databases
cours database pour etudiant NoSQL (1).pptx
NoSQL.pptx
NoSQL Databases Introduction - UTN 2013
NO SQL: What, Why, How
Nosql data models
SQL or NoSQL, is this the question? - George Grammatikos
SQL, NoSQL, BigData in Data Architecture
NoSQL databases - An introduction
Beyond Relational Databases
Introduction to NoSQL
the rising no sql technology
BigData, NoSQL & ElasticSearch
NoSQL
No sql database
NOsql Presentation.pdf
Prague data management meetup 2018-03-27
Database Revolution - Exploratory Webcast
Database revolution opening webcast 01 18-12
2018 05 08_biological_databases_no_sql
Ad

Recently uploaded (20)

PPTX
Institutional Correction lecture only . . .
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
master seminar digital applications in india
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Pre independence Education in Inndia.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
Institutional Correction lecture only . . .
102 student loan defaulters named and shamed – Is someone you know on the list?
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Module 4: Burden of Disease Tutorial Slides S2 2025
VCE English Exam - Section C Student Revision Booklet
O7-L3 Supply Chain Operations - ICLT Program
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Supply Chain Operations Speaking Notes -ICLT Program
master seminar digital applications in india
O5-L3 Freight Transport Ops (International) V1.pdf
human mycosis Human fungal infections are called human mycosis..pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Cell Structure & Organelles in detailed.
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pre independence Education in Inndia.pdf
Final Presentation General Medicine 03-08-2024.pptx
Ad

Introduction to asdfghjkln b vfgh n v

  • 2. What is covered in thispresentation? A brief history of databases NoSQL WHY, WHAT & WHEN? Characteristics of NoSQL databases Aggregate data models CAP theorem
  • 3. Introduction • Database - Organized collection of data • DBMS - a software package with computerprograms that controls the creation, maintenance and use of a database • Databases are created to operate large quantities of information by inputting, storing, retrieving, and managing that information
  • 5. • Benefits of Relational databases: Designed for all purposes ACID Strong consistancy, concurrency, recovery Mathematical background Standard Query language (SQL) Lots of tools to use with i.e: Reporting services, entity frameworks, ... Relational databases
  • 7. But... • Relational databases were not built for distributed applications. Because... • Joins are expensive • Hard to scale horizontally • Impedance mismatch occurs • Expensive (product cost, hardware, Maintenance) NoSQL why, what and when?
  • 8. And.... It’s weakin:  Speed(performance)  High availability  Partition tolerance NoSQL why, what and when?
  • 9. Why NOSQL now?? Ans.DrivingTrends
  • 11. Data Data is a new class of economic asset, like currency and gold Source: World Economic Forum 2012 Data is the new raw material
  • 12. Data size growth • 150 exabytes in 2005 (exabyte is a billion gigabytes) • 1200 exabytes in 2010 • 35000 exabytes in 2020 (expected by IBM)
  • 13. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025
  • 14. Data size growth Examples: • ISRO launches the advanced earth observation and mapping satellite CARTOSAT-3 along with 13 other commercial nano-satellites – Information and images coming from the satellite • Maharashtra Election : 20000 tweets/second • Around 30 billion RFID tags produced/year – Automatic toll collection using RFID • Oil drilling platforms have 20k to 40k sensors 95% of data produced is unstructured
  • 15. Challenge Big Data’s characteristics are challenging conventional information management architectures  Massive and growing amounts of information residing internal and external to the organization  Unconventional semi structured or unstructured (diverse) including web pages, log files, social media, click-streams, instant messages, text messages, emails, sensor data from active and passive systems, etc.  Changing information 15 Multi-Channel analytics Sentiment analytics Transaction analytics Call Detail Records analytics Warranty claim analytics Surveillance analytics Claim fraud analytics
  • 16. What is big data? “A massive volume of both structured and unstructured data that is so large that it's difficult to store, analyse, process, share, visualise and manage with traditional database and software techniques.” - Roger Magoulas of O’reilly in 2005 • Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery, and/or analysis • IBM / MS – Volume (Terabytes -> Zettabytes) – Variety (Structured -> Semi-structured -> Unstructured) – Velocity (Batch -> Streaming Data)
  • 17. What Makes it Big Data? (V3) VOLUME VELOCITY VARIETY VALUE SOCIAL BLOG SMART METER 1011001010010 0100110101010 1011100101010 100100101 • Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabytes(1021) • Variety: Structured,semi-structured, unstructured; Text, image, audio, video, record • Velocity (Dynamic, sometimes time-varying)
  • 18. Variability: Variability vs variety. 6 different coffee blends tastes different every day, that is variability. The same is true of data, if the meaning is constantly changing it can have a huge impact on your data homogenization. Visualization: Using charts and graphs to visualize large amounts of complex data
  • 19. A NoSQL database provides a mechanism for storage and retrieval of data that employs less constrained consistency models than traditional relational database No SQL systems are also referred to as "NotonlySQL“ to emphasize that they do in fact allow SQL-like query languages to be used. But What is NoSQL?
  • 20. NoSQL avoids: Overhead of ACID transactions Complexity of SQL query Burden of up-front schema design DBA presence Transactions (It should be handled at application layer) Provides: Easy and frequent changes to DB Fast development Large data volumes(eg.Google) Schema less Characteristics of NoSQLdatabases
  • 21. NoSQLis getting more & morepopular
  • 22. In relational Databases: You can’t add a record which does not fit the schema You need to add NULLs to unused items in a row We should consider the datatypes. i.e : you can’t add a stirng to an interger field You can’t add multiple items in a field (You should create another table: primary-key, foreign key, joins, normalization, ... !!!) What is aschema-lessdatamodel?
  • 23. In NoSQL Databases: There is no schema to consider There is no unused cell There is no datatype (implicit) Most of considerations are done in application layer We gather all items in an aggregate (document) What is aschema-lessdatamodel?
  • 24. NoSQL databases are classified in four major datamodels: • Key-value • Document • Column family • Graph Each DB has its own query language Categories of NoSQL databases
  • 25.  Simplest NOSQL databases  The main idea is the use of a hash table  Access data (values) by strings called keys  Data has no required format data may have any format  Data model: (key, value) pairs  Basic Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Key-value data model
  • 26. Row oriented DB – stores row by row, suitable for OLTP Column oriented DB – stores column by column – OLAP Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally (large data and random read/write) The column is lowest/smallest instance of data. It is a tuple that contains a name, a value and a timestamp Column family datamodel
  • 29. Some statistics about Facebook Search (usingCassandra)  MySQL>50 GBData  Writes Average: ~300ms  ReadsAverage: ~350 ms  Rewritten with Cassandra>50 GBData  Writes Average: 0.12ms  ReadsAverage: 15 ms Column family datamodel
  • 30.  Based on Graph Theory.  Scale vertically, no clustering.  You can use graph algorithms easily  Transactions  ACID Graph data model
  • 31. • Pair each key with complex data structure known as data structure. • Indexes are done via B-Trees. • Documents can contain many different key-value pairs, or key- array pairs, or even nested documents. Document baseddata model
  • 33. • NoSQL may complement RDBMS – RDBMS may hold smaller amounts of high-value structured data – NoSQL may hold vast amounts of less valued and less structured • Relational implementations provide ACID guarantees – Atomicity: transaction treated an all or nothing operation – Consistency: database values correct before and after – Isolation: as if only transaction. – Durability: upon completion of transaction, operation is not reversed. • NoSQL often provides BASE – Basically available: Allowance for parts of a system to fail (sharding/ partitioning) – Soft state: An object may have multiple simultaneous values (at different times) – Eventually consistent: Consistency achieved over time (not on every commit) • CAP Theorem – It is impossible to have consistency, availability, and partition tolerance in a distributed system
  • 34. What we need? • Weneed adistributed database system having such features: • • • • – Faulttolerance – Highavailability – Consistency – Scalability Which isimpossible!!! According to CAPtheorem
  • 35. Wecannot achieve all the three items In distributed databasesystems(center) The CAP theorem

Editor's Notes