SlideShare a Scribd company logo
Manu Cohen-Yashar
The Cloud, Big Data and
NoSQL
Agenda
Data boom
Problems with RDBMS
No SQL
Big Data
What’s next
NO SQL Databases, Big Data and the cloud
Understand NO SQL
Types of databases
Primary usage
Data model
Pros and Cons
Lots of Data
Data is doubles every 18 month
Pictures
Web site
emails
Sensors
Geo Information
Financial Information
Science
Art
. . . (Infinite list)
No Limits
With the cloud it is now possible to mount any
size if cluster and conduct any computation in
any scale.
The one who will make sense of all available
data will rule the world.
The conclusion:
Use the cloud to analyze large scale of data.
Lets Talk about data
When we think of data we think of …
Data has many forms
Yet data comes in many forms and shapes
Graphs Documents
Time
Series
Blobs
Geo
Sensors
Unstructured
Structured
Web
Problems with RDBMS
Does not scale very well
Sharding
Replication
Models data according to the relational model
Is this the best model for all data types?
Complex and Expensive
Require a DBA
Expensive to buy
Oracle
SQL
No Relational
Not all types of data fit well into the relational
world.
Not all data use cases fit well into the ACID
convention
The relational model does not scale very good
Difficult to distribute
Difficult to replicate
The CAP Theory
RDBMS
Replicated
NoSQL
Sharded
NoSQL
During a network partition, a distributed system must choose
either Consistency or Availability.
NO SQL
Large family of databases
No Schema
No relations enforced
Designed for high scale and distribution
Types of NO SQL DB
Key Value
Wide Columns
Documents
Graph
Motivation for NO SQL
Large Scale and Distribution
Simplicity
Low cost
Good fit with the data model
Volume, Velocity and Variety
What Is No Schema
Some data is structured, and some does not.
No SQL databases do not ENFORCE a
schema like RDBMS systems.
You can leverage data structure by creating
indexes and smart queries.
Types of NO SQL Databases
Key values
Wide column
Document
Graph
Key values
Data is ordered as a key - values pair
Query by key and values
Simple indexes (by partition key)
Examples
Azure Table Storage
Amazon DynamoDB
Key1 Key2 VaIue1 VaIue2 VaIue3 VaIue4 VaIue5
Israel 1234 1 2 3
France 2345 4 5 8
Demo
DynamoDB and Azure Tables
Wide column / Column Families
Data is ordered as a key – value groups
Store data by column
A column family is how the
data is stored on the disk
Query by keykey range only
No Indexes (on some dbs)
Examples
Google Big-Table
Cassandra
HBase
Example – Cassandra Data Model
Column
Key value
Super Column
Collection of columns
Column Family
Dictionary of columns
Super Column Family
Dictionary of Column Families
Demo
Cassandra
Document Database
Data is ordered as a Key – Document
Query by key and document content
Use indexes
Examples
Mongo
Raven
CouchDB  Couchbase
Demo
Graph databases
Data is ordered in elements and relations.
Query by relations
Supports complicated mathematical graph
calculus
Examples
Neo 4J
StarDog (used for sematic web)
RDF and OWL
Triple
Subject - Predicate – Object
Define facts
RDF (Resource Description Framework)
Defines some extra structure to triples.
Example: "rdf:type“ is used to say that things are of certain types.
Schema:
Defines some classes which represent the concept of subjects,
objects, predicates etc.
Enables making statements about classes of thing, and types of
relationship.
OWL
Adds semantics to the schema.
Expressed in triples.
Example: "If A isMarriedTo B" then this implies "B isMarriedTo A".
Demo
NO SQL Databases, Big Data and the cloud
There is no one NO SQL solution for all
use cases
Important
There are over than 150 possible offerings…
Replication and Sharding
No SQL databases can span over a large
cluster
Replication
Copy the data to multiple servers
Usually each data element is copied 3 times
One master two slaves
Result: High Availability
Sharding
Split the data between servers
Horizontal partitioning of the data
Result: Horizontal scale
Replication and Sharding can be done together
The Cloud and NO SQL
All Cloud Providers have NO SQL solutions
Azure Tables
Google Big Table
Amazon DynamoDB
NO SQL Databases are deployed on a cluster
There are large number of cloud hosting offerings for
no-sql clusters
MongoHQ (MongoDB)
Cassandra on Google Compute engine
Many more
Example – Mongo in Azure
NO SQL Databases, Big Data and the cloud
Check your schema
Be open to use NO-SQL data stores
Identify your use-case and find the right
database for you
Create a simple POC
Questions

More Related Content

PPT
NO SQL: What, Why, How
PPTX
Not only SQL - Database Choices
PPT
Graph Database and Neo4j
PDF
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
PPTX
Improvement of no sql technology for relational databases v2
PPTX
CuRious about R in Power BI? End to end R in Power BI for beginners
PPTX
PPTX
Database Choices
NO SQL: What, Why, How
Not only SQL - Database Choices
Graph Database and Neo4j
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Improvement of no sql technology for relational databases v2
CuRious about R in Power BI? End to end R in Power BI for beginners
Database Choices

What's hot (20)

PDF
MongoDB introduction at Google Cloud next Algiers
PPTX
Spark on Azure, a gentle introduction (nov 2015)
PPTX
Big Data - Part IV
PDF
Clustering output of Apache Nutch using Apache Spark
PPTX
Big data technology unit 3
PPTX
The IoT and big data
PPTX
Big Data - Part I
PPTX
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
PPTX
Big Data - Part II
PPTX
Big Data - Part III
PPTX
Modern database
ODP
Graphing Your Data
PPTX
Big Data Overview
PDF
Signals from outer space
PPTX
PPTX
How Linked Data Can Speed Information Discovery
PPTX
Intro to bigdata on gcp (1)
PPTX
Data Analytics with R and SQL Server
PPT
The World of Structured Storage System
MongoDB introduction at Google Cloud next Algiers
Spark on Azure, a gentle introduction (nov 2015)
Big Data - Part IV
Clustering output of Apache Nutch using Apache Spark
Big data technology unit 3
The IoT and big data
Big Data - Part I
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
Big Data - Part II
Big Data - Part III
Modern database
Graphing Your Data
Big Data Overview
Signals from outer space
How Linked Data Can Speed Information Discovery
Intro to bigdata on gcp (1)
Data Analytics with R and SQL Server
The World of Structured Storage System
Ad

Similar to NO SQL Databases, Big Data and the cloud (20)

PPTX
gayathrinosql.pptx
PPTX
No sql database
PPTX
NOSQL IN BIGDATA FOR PG STUDENTS FOR COL
PDF
the rising no sql technology
PPTX
Introduction to Data Science NoSQL.pptx
PDF
NOsql Presentation.pdf
PDF
Big Data technology Landscape
PPTX
No sq lv2
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
PPTX
no sql presentation
PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
NoSQL.pptx
PPTX
Erciyes university
PPTX
Presentation on NOSQL and mongodb .pptx
PPTX
PPTX
Unit 5.pptx computer graphics and gaming
PDF
A Beginners Guide to noSQL
PDF
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
PDF
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
PDF
Hybrid Database System for Big Data Storage and Management
gayathrinosql.pptx
No sql database
NOSQL IN BIGDATA FOR PG STUDENTS FOR COL
the rising no sql technology
Introduction to Data Science NoSQL.pptx
NOsql Presentation.pdf
Big Data technology Landscape
No sq lv2
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
no sql presentation
cours database pour etudiant NoSQL (1).pptx
NoSQL.pptx
Erciyes university
Presentation on NOSQL and mongodb .pptx
Unit 5.pptx computer graphics and gaming
A Beginners Guide to noSQL
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
Hybrid Database System for Big Data Storage and Management
Ad

Recently uploaded (20)

PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Global Data and Analytics Market Outlook Report
PPTX
Introduction to Inferential Statistics.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
modul_python (1).pptx for professional and student
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Microsoft Core Cloud Services powerpoint
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Transcultural that can help you someday.
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Global Data and Analytics Market Outlook Report
Introduction to Inferential Statistics.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Business Analytics and business intelligence.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
modul_python (1).pptx for professional and student
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
SAP 2 completion done . PRESENTATION.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Microsoft Core Cloud Services powerpoint
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Transcultural that can help you someday.
ISS -ESG Data flows What is ESG and HowHow
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...

NO SQL Databases, Big Data and the cloud

  • 1. Manu Cohen-Yashar The Cloud, Big Data and NoSQL
  • 2. Agenda Data boom Problems with RDBMS No SQL Big Data What’s next
  • 4. Understand NO SQL Types of databases Primary usage Data model Pros and Cons
  • 5. Lots of Data Data is doubles every 18 month Pictures Web site emails Sensors Geo Information Financial Information Science Art . . . (Infinite list)
  • 6. No Limits With the cloud it is now possible to mount any size if cluster and conduct any computation in any scale. The one who will make sense of all available data will rule the world. The conclusion: Use the cloud to analyze large scale of data.
  • 7. Lets Talk about data When we think of data we think of …
  • 8. Data has many forms Yet data comes in many forms and shapes Graphs Documents Time Series Blobs Geo Sensors Unstructured Structured Web
  • 9. Problems with RDBMS Does not scale very well Sharding Replication Models data according to the relational model Is this the best model for all data types? Complex and Expensive Require a DBA Expensive to buy Oracle SQL
  • 10. No Relational Not all types of data fit well into the relational world. Not all data use cases fit well into the ACID convention The relational model does not scale very good Difficult to distribute Difficult to replicate
  • 11. The CAP Theory RDBMS Replicated NoSQL Sharded NoSQL During a network partition, a distributed system must choose either Consistency or Availability.
  • 12. NO SQL Large family of databases No Schema No relations enforced Designed for high scale and distribution Types of NO SQL DB Key Value Wide Columns Documents Graph
  • 13. Motivation for NO SQL Large Scale and Distribution Simplicity Low cost Good fit with the data model Volume, Velocity and Variety
  • 14. What Is No Schema Some data is structured, and some does not. No SQL databases do not ENFORCE a schema like RDBMS systems. You can leverage data structure by creating indexes and smart queries.
  • 15. Types of NO SQL Databases Key values Wide column Document Graph
  • 16. Key values Data is ordered as a key - values pair Query by key and values Simple indexes (by partition key) Examples Azure Table Storage Amazon DynamoDB Key1 Key2 VaIue1 VaIue2 VaIue3 VaIue4 VaIue5 Israel 1234 1 2 3 France 2345 4 5 8
  • 18. Wide column / Column Families Data is ordered as a key – value groups Store data by column A column family is how the data is stored on the disk Query by keykey range only No Indexes (on some dbs) Examples Google Big-Table Cassandra HBase
  • 19. Example – Cassandra Data Model Column Key value Super Column Collection of columns Column Family Dictionary of columns Super Column Family Dictionary of Column Families
  • 21. Document Database Data is ordered as a Key – Document Query by key and document content Use indexes Examples Mongo Raven CouchDB Couchbase
  • 22. Demo
  • 23. Graph databases Data is ordered in elements and relations. Query by relations Supports complicated mathematical graph calculus Examples Neo 4J StarDog (used for sematic web)
  • 24. RDF and OWL Triple Subject - Predicate – Object Define facts RDF (Resource Description Framework) Defines some extra structure to triples. Example: "rdf:type“ is used to say that things are of certain types. Schema: Defines some classes which represent the concept of subjects, objects, predicates etc. Enables making statements about classes of thing, and types of relationship. OWL Adds semantics to the schema. Expressed in triples. Example: "If A isMarriedTo B" then this implies "B isMarriedTo A".
  • 25. Demo
  • 27. There is no one NO SQL solution for all use cases Important There are over than 150 possible offerings…
  • 28. Replication and Sharding No SQL databases can span over a large cluster Replication Copy the data to multiple servers Usually each data element is copied 3 times One master two slaves Result: High Availability Sharding Split the data between servers Horizontal partitioning of the data Result: Horizontal scale Replication and Sharding can be done together
  • 29. The Cloud and NO SQL All Cloud Providers have NO SQL solutions Azure Tables Google Big Table Amazon DynamoDB NO SQL Databases are deployed on a cluster There are large number of cloud hosting offerings for no-sql clusters MongoHQ (MongoDB) Cassandra on Google Compute engine Many more
  • 30. Example – Mongo in Azure
  • 32. Check your schema Be open to use NO-SQL data stores Identify your use-case and find the right database for you Create a simple POC

Editor's Notes

  • #12: Consistency: A read sees all previously completed writes.Availability: Reads and writes always succeed.Partition tolerance: Guaranteed properties are maintained even when network failures prevent some machines from communicating with others.https://foundationdb.com/white-papers/the-cap-theorem/The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write. Now you're faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?