SlideShare a Scribd company logo
NOSQL
Agenda
 Introduction to NOSQL
 Objective
 Examples of NOSQL databases
 NOSQL vs SQL
 Conclusion
Basic Concepts

 Database – is a organized collection of data.
 Data base Management System (DBMS)- is a software
  package with computer program that controls the
  creation , maintainance & use of a database.
     for DBMS , we use structured language to interact with it
     Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc.
 Relational DBMS - A relational database is a
  collection of data items organized as a set of formally
  described tables from which data can be accessed easily.
  A relational database is created using the relational
  model. The software used in a relational database is
  called a relational database management
  system (RDBMS).
SQL

 Stuctured Query Language
 Special purpose programming language designed for
    managing data in RDBMS.
   Origininally based upon relational algebra & tuple relation
    calculas.
   SQl’s scope include data insert,upadte & delete, schema
    creation and modification , data access control.
   It is static and strong used in database.
   Most used widely used database language.
   Query is the most important operation in SQL.
   Ex. SELECT *
         FROM Book
         WHERE price > 100.00
         ORDER BY title;
NOSQL

 Stands for Not Only SQL
 Class of non-relational data storage systems
 Usually do not require a fixed table schema nor do
  they use the concept of joins
 All NOSQL offerings relax one or more of the ACID
  properties .
    Atomicity , Consistancy , Isolation , Durability ( ACID )
 “NOSQL” = “Not Only SQL” =
       Not Only using traditional relational DBMS
NOSQL

•   Alternative to traditional relational DBMS
    •   Flexible schema
    •   Quicker/cheaper to set up
    •   Massive scalability
    •   Relaxed consistency higher performance &
        availability

    * No declarative query language more programming
    * Relaxed consistency fewer guarantees
Why NOSQL?


 Every problem cannot be solved by traditional
    relational database system exclusively.
   Handles huge databases.
   Redundancy, data is pretty safe on commodity
    hardware
   Super flexible queries using map/reduce
   Rapid development (no fixed schema, yeah!)
   Very fast for common use cases
Contd..


 Inspired by Distributed Data Storage problems
 Scale easily by adding servers
 Not suited to all problem types, but super-suited to
  certain large problem types
 High-write situations (eg activity tracking or timeline
  rendering for millions of users)
 A lot of relational uses are really dumbed down (eg
  fetch by PK with update)
Architecture
How does it work?

 Clients know how to:
  Send items to servers (consistent hashing)
  What to do when a server fails
  How to fetch keys from servers
  Can “weigh” to server capacities

 Servers know how to:
  Store items they receive
  Expire them from the cache
  No inter-server comms – everything is unaware
Performance

 RDBMS uses buffer to ensure ACID properties
 NoSQL does not guarantee ACID and is therefore
  much faster
 We don’t need ACID everywhere!
 Ex. Data processing (every minute) is 4x faster with
  MongoDB, despite being a lot more detailed (due to
  much simple development)
Why NOSQL is faster than SQL ? - Scalling

 Simple web application with not much traffic
   Application server, database server all on one machine
Scalling contd..

 More traffic comes in
   Application server

   Database server




 Even more traffic comes in
   Load balancer

   Application server x2

   Database server
Scalling contd..


 Even more traffic comes in
     Load balancer x N
       easy
     Application server x N
       easy
     Database server xN
       hard for SQL databases
SQL Slowdown




 Not linear!
Scalling contd..


 NoSQL Scalling -
 Need more storage?
   Add more servers!

 Need higher performance?
   Add more servers!

 Need better reliability?
   Add more servers!
Scalling Summary

 You can scale SQL databases (Oracle, MySQL, SQL
  Server…)
     This will cost you dearly
     If you don’t have a lot of money, you will reach limits quickly
 You can scale NoSQL databases
   Very easy horizontal scaling

   Lots of open-source solutions

   Scaling is one of the basic incentives for design, so it is well
    handled
   Scaling is the cause of trade-offs causing you to have to use
    map/reduce
Characterstics

 Almost infinite horizontal scaling
 Very fast
 Performance doesn’t deteriorate with growth (much)
 No fixed table schemas
 No join operations
 Ad-hoc queries difficult or impossible
 Structured storage
 Almost everything happens in RAM
NOSQL Types


 Wide Column Store / Column Families
 Document Store
 Key Value / Tuple Store
 Graph Databases
 Object Databases
 XML Databases
 Multivalue Databases
Main types -

 Key-Value Stores
 Map Reduce Framework
 Document Databases
 Graph Databases
Key Value Stores

 Lineage: Amazon's Dynamo paper and Distributed
  HashTables.
 Data model: A global collection of key-value pairs
 Example systems
   Google BigTable , Amazon Dynamo, Cassandra,
     Voldemort , Hbase , …
 Implementation: efficiency, scalability, fault-tolerance
   Records distributed to nodes based on key
   Replication

   Single-record transactions, “eventual consistency”
Documented Databases

 Lineage: Inspired by Lotus Notes.
 Data model: Collections of documents, which
  contain key-value collections (called "documents").
 Example: CouchDB, MongoDB, Riak
Graph Database

 Lineage: Draws from Euler and graph theory.
 Data model: Nodes & relationships, both which can
  hold key-value pairs
 Example: AllegroGraph, InfoGrid, Neo4j
Map Reduce Framework

 Google’s framework for processing highly
  distributable problems across huge datasets
  using a large number of computers
 Let’s define large number of computers
    Cluster if all of them have same hardware
    Grid unless Cluster (if !Cluster for old-style programmers)
 Process split into two phases
   Map
      Take the input, partition it delegate to other machines
      Other machines can repeat the process, leading to tree structure
      Each machine returns results to the machine who gave it the task
Map Reduce Framework contd..

   Reduce
     collect results from machines you gave the tasks
     combine results and return it to requester

   Slower than sequential data processing, but massively parallel
   Sort petabyte of data in a few hours
   Input, Map, Shuffle, Reduce, Output
Popular NoSQL


 Hadoop / Hbase       MemcacheDB
 Cassandra            Voldemort
 Amazon               Hypertable
  SimpleDB             Cloudata
 MongoDB              IBM
 CouchDB              Lotus/Domino
 Redis
Real World Use

 Cassandra
   Facebook (original developer, used it till late 2010)
   Twitter
   Digg
   Reddit
   Rackspace
   Cisco

 BigTable
   Google (open-source version is HBase)

 MongoDB
   Foursquare
   Craigslist
   Bit.ly
   SourceForge
   GitHub
MONGODB

  Document store
  Basic support for dynamic (ad hoc) queries
  Query by example (nice!)




 Conditional Operators
    <, <=, >, >=
    $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and, $si
     ze, $type
MONGODB

 Data is stored as BSON (binary JSON)
     Makes it very well suited for languages with native JSON support
 Map/Reduce written in Javascript
     Slow! There is one single thread of execution in Javascript
 Master/slave replication (auto failover with replica sets)
 Sharding built-in
 Uses memory mapped files for data storage
 Performance over features
 On 32bit systems, limited to ~2.5Gb
 An empty database takes up 192Mb
 GridFS to store big data + metadata (not actually an FS)
CASANDRA

 Written in: Java
 Protocol: Custom, binary (Thrift)
 Tunable trade-offs for distribution and replication
  (N, R, W)
 Querying by column, range of keys
 BigTable-like features: columns, column families
 Writes are much faster than reads (!)
    Constant write time regardless of database size
 Map/reduce possible with Apache Hadoop
Some more info about Cassndra in Facebook

 Cassandra is open source DBMS from Appache
  software foundation.
 Cassandra provides a structured key-value
  store with tunable consistency
 Cassandra is a distributed storage system for
  managing structured data that is designed to scale to
  a very large size across many commodity
  servers, with no single point of failure
 It is a NoSQL solution that was initially developed
  by Facebook and powered their Inbox Search feature
  until late 2010
HBASE

 Written in: Java
 Main point: Billions of rows X millions of columns
 Modeled after BigTable
 Map/reduce with Hadoop
 Query predicate push down via server side scan and get filters
 Optimizations for real time queries
 A high performance Thrift gateway
 HTTP supports XML, Protobuf, and binary
 Cascading, hive, and pig source and sink modules
 No single point of failure
 While Hadoop streams data efficiently, it has overhead for
  starting map/reduce jobs. HBase is column oriented
  key/value store and allows for low latency read and writes.
 Random access performance is like MySQL
COUCHDB

 Written in: Erlang
 Main point: DB consistency, ease of use
 Bi-directional (!) replication, continuous or ad-hoc, with conflict
    detection, thus, master-master replication. (!)
   MVCC - write operations do not block reads
   Previous versions of documents are available
   Crash-only (reliable) design
   Needs compacting from time to time
   Views: embedded map/reduce
   Formatting views: lists & shows
   Server-side document validation possible
   Authentication possible
   Real-time updates via _changes (!)
   Attachment handling
   CouchApps (standalone JS apps)
HADOOP

 Apache project
 A framework that allows for the distributed processing of
    large data sets across clusters of computers
   Designed to scale up from single servers to thousands of
    machines
   Designed to detect and handle failures at the application
    layer, instead of relying on hardware for it
   Created by Doug Cutting, who named it after his son's toy
    elephant
   Hadoop subprojects
       Cassandra
       HBase
       Pig
   Hive was a Hadoop subproject, but is now a top-level Apache project
HADOOP contd..

 Scales to hundreds or thousands of computers, each with several
    processor cores
   Designed to efficiently distribute large amounts of work across a
    set of machines
   Hundreds of gigabytes of data constitute the low end of Hadoop-
    scale
   Built to process "web-scale" data on the order of hundreds of
    gigabytes to terabytes or petabytes
   Uses Java, but allows streaming so other languages can easily
    send and accept data items to/from Hadoop
HADOOP contd..

 Uses distributed file system (HDFS)
   Designed to hold very large amounts of data (terabytes or even
    petabytes)
   Files are stored in a redundant fashion across multiple
    machines to ensure their durability to failure and high
    availability to very parallel applications
   Data organized into directories and files

   Files are divided into block (64MB by default) and distributed
    across nodes
 Design of HDFS is based on the design of the Google
  File System
HIVE

 A petabyte-scale data warehouse system for Hadoop
 Easy data summarization, ad-hoc queries
 Query the data using a SQL-like language called
  HiveQL
 Hive compiler generates map-reduce jobs for most
  queries
Conclusion

 NoSQL is a great problem solver if you need it
 Choose your NoSQL platform carefully as each is
  designed for specific purpose
 Get used to Map/Reduce
 It’s not a sin to use NoSQL alongside (yes)SQL
  database
Referance

 http://www.facebook.com/note.php?note_id=24413
    138919
   http://en.wikipedia.org/wiki/Apache_Cassandra
   http://en.wikipedia.org/wiki/SQL
   http://en.wikipedia.org/wiki/NoSQL
   www.slideshare.com
THANK
YOU..!!

More Related Content

PPSX
A Seminar on NoSQL Databases.
PPTX
Introduction to NOSQL databases
PPTX
NoSQL databases
PPTX
Distributed Computing system
PDF
Cloud computing writeup
PPTX
Nosql databases
PPTX
UAV(unmanned aerial vehicle) and its application
A Seminar on NoSQL Databases.
Introduction to NOSQL databases
NoSQL databases
Distributed Computing system
Cloud computing writeup
Nosql databases
UAV(unmanned aerial vehicle) and its application

What's hot (20)

PPTX
NoSQL databases - An introduction
PPTX
Introduction to NoSQL Databases
PPTX
Mongo db intro.pptx
PPTX
Sql vs NoSQL-Presentation
PPTX
Introduction to MongoDB and CRUD operations
PPTX
Mongodb basics and architecture
PPTX
introduction to NOSQL Database
PDF
NOSQL- Presentation on NoSQL
PPTX
Introduction to NoSQL
PPTX
Data partitioning
PPTX
Non relational databases-no sql
PPTX
MongoDB
PPTX
Introduction to MongoDB
PPTX
NOSQL Databases types and Uses
PPTX
Relational databases vs Non-relational databases
PPT
9. Document Oriented Databases
PPTX
An Overview of Apache Cassandra
PDF
Introduction to Apache Hive
PPTX
MongoDB presentation
PDF
Azure Cosmos DB
NoSQL databases - An introduction
Introduction to NoSQL Databases
Mongo db intro.pptx
Sql vs NoSQL-Presentation
Introduction to MongoDB and CRUD operations
Mongodb basics and architecture
introduction to NOSQL Database
NOSQL- Presentation on NoSQL
Introduction to NoSQL
Data partitioning
Non relational databases-no sql
MongoDB
Introduction to MongoDB
NOSQL Databases types and Uses
Relational databases vs Non-relational databases
9. Document Oriented Databases
An Overview of Apache Cassandra
Introduction to Apache Hive
MongoDB presentation
Azure Cosmos DB
Ad

Viewers also liked (20)

PPT
NoSQL - 05March2014 Seminar
DOC
Smart quill seminar report final
PPT
NoSQL Seminer
PDF
NoSQL databases
PDF
Introduction to Mongodb
PPTX
Final ppt
PPTX
NoSQL and MapReduce
PPTX
Alpha compositing computer technology
PPT
NoSQL Slideshare Presentation
DOCX
Jini network technology
PPTX
PRESENTATION ON MIRROR LINK
PPTX
PPT
NoSQL databases pros and cons
PPT
PDF
smart quill pen
PPSX
PPTX
Presentation_NEW.PPTX
PPT
The Most effective models for Customer Support Operations
PDF
Retail Idea
PPTX
Coneixer barcelona(15 16). ppt
NoSQL - 05March2014 Seminar
Smart quill seminar report final
NoSQL Seminer
NoSQL databases
Introduction to Mongodb
Final ppt
NoSQL and MapReduce
Alpha compositing computer technology
NoSQL Slideshare Presentation
Jini network technology
PRESENTATION ON MIRROR LINK
NoSQL databases pros and cons
smart quill pen
Presentation_NEW.PPTX
The Most effective models for Customer Support Operations
Retail Idea
Coneixer barcelona(15 16). ppt
Ad

Similar to Nosql seminar (20)

PPTX
NoSQL
KEY
DynamoDB Gluecon 2012
ZIP
Gluecon 2012 - DynamoDB
PPTX
Big data vahidamiri-tabriz-13960226-datastack.ir
PPTX
Big data technology unit 3
PPTX
NoSQL in Big Data Analytics Tools .pptx
PPT
NO SQL: What, Why, How
PDF
Vskills Apache Cassandra sample material
PPTX
Big data concepts
PPTX
Selecting best NoSQL
PDF
Prague data management meetup 2018-03-27
PPTX
Minnebar 2013 - Scaling with Cassandra
PPTX
SQL and NoSQL in SQL Server
PPT
No sql
PPTX
Introduction to NoSQL
PPT
Mongo Bb - NoSQL tutorial
PPTX
Introduction to NoSQL
PDF
The ABC of Big Data
PPTX
NoSQL
DynamoDB Gluecon 2012
Gluecon 2012 - DynamoDB
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data technology unit 3
NoSQL in Big Data Analytics Tools .pptx
NO SQL: What, Why, How
Vskills Apache Cassandra sample material
Big data concepts
Selecting best NoSQL
Prague data management meetup 2018-03-27
Minnebar 2013 - Scaling with Cassandra
SQL and NoSQL in SQL Server
No sql
Introduction to NoSQL
Mongo Bb - NoSQL tutorial
Introduction to NoSQL
The ABC of Big Data

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharma ospi slides which help in ospi learning
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Pre independence Education in Inndia.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Basic Mud Logging Guide for educational purpose
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Classroom Observation Tools for Teachers
Microbial diseases, their pathogenesis and prophylaxis
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Pharma ospi slides which help in ospi learning
O5-L3 Freight Transport Ops (International) V1.pdf
TR - Agricultural Crops Production NC III.pdf
Pre independence Education in Inndia.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Week 4 Term 3 Study Techniques revisited.pptx
Basic Mud Logging Guide for educational purpose
STATICS OF THE RIGID BODIES Hibbelers.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
O7-L3 Supply Chain Operations - ICLT Program
Renaissance Architecture: A Journey from Faith to Humanism
VCE English Exam - Section C Student Revision Booklet
Classroom Observation Tools for Teachers

Nosql seminar

  • 2. Agenda  Introduction to NOSQL  Objective  Examples of NOSQL databases  NOSQL vs SQL  Conclusion
  • 3. Basic Concepts  Database – is a organized collection of data.  Data base Management System (DBMS)- is a software package with computer program that controls the creation , maintainance & use of a database.  for DBMS , we use structured language to interact with it  Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc.  Relational DBMS - A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model. The software used in a relational database is called a relational database management system (RDBMS).
  • 4. SQL  Stuctured Query Language  Special purpose programming language designed for managing data in RDBMS.  Origininally based upon relational algebra & tuple relation calculas.  SQl’s scope include data insert,upadte & delete, schema creation and modification , data access control.  It is static and strong used in database.  Most used widely used database language.  Query is the most important operation in SQL.  Ex. SELECT * FROM Book WHERE price > 100.00 ORDER BY title;
  • 5. NOSQL  Stands for Not Only SQL  Class of non-relational data storage systems  Usually do not require a fixed table schema nor do they use the concept of joins  All NOSQL offerings relax one or more of the ACID properties .  Atomicity , Consistancy , Isolation , Durability ( ACID )  “NOSQL” = “Not Only SQL” = Not Only using traditional relational DBMS
  • 6. NOSQL • Alternative to traditional relational DBMS • Flexible schema • Quicker/cheaper to set up • Massive scalability • Relaxed consistency higher performance & availability * No declarative query language more programming * Relaxed consistency fewer guarantees
  • 7. Why NOSQL?  Every problem cannot be solved by traditional relational database system exclusively.  Handles huge databases.  Redundancy, data is pretty safe on commodity hardware  Super flexible queries using map/reduce  Rapid development (no fixed schema, yeah!)  Very fast for common use cases
  • 8. Contd..  Inspired by Distributed Data Storage problems  Scale easily by adding servers  Not suited to all problem types, but super-suited to certain large problem types  High-write situations (eg activity tracking or timeline rendering for millions of users)  A lot of relational uses are really dumbed down (eg fetch by PK with update)
  • 10. How does it work?  Clients know how to: Send items to servers (consistent hashing) What to do when a server fails How to fetch keys from servers Can “weigh” to server capacities  Servers know how to: Store items they receive Expire them from the cache No inter-server comms – everything is unaware
  • 11. Performance  RDBMS uses buffer to ensure ACID properties  NoSQL does not guarantee ACID and is therefore much faster  We don’t need ACID everywhere!  Ex. Data processing (every minute) is 4x faster with MongoDB, despite being a lot more detailed (due to much simple development)
  • 12. Why NOSQL is faster than SQL ? - Scalling  Simple web application with not much traffic  Application server, database server all on one machine
  • 13. Scalling contd..  More traffic comes in  Application server  Database server  Even more traffic comes in  Load balancer  Application server x2  Database server
  • 14. Scalling contd..  Even more traffic comes in  Load balancer x N  easy  Application server x N  easy  Database server xN  hard for SQL databases
  • 16. Scalling contd..  NoSQL Scalling -  Need more storage?  Add more servers!  Need higher performance?  Add more servers!  Need better reliability?  Add more servers!
  • 17. Scalling Summary  You can scale SQL databases (Oracle, MySQL, SQL Server…)  This will cost you dearly  If you don’t have a lot of money, you will reach limits quickly  You can scale NoSQL databases  Very easy horizontal scaling  Lots of open-source solutions  Scaling is one of the basic incentives for design, so it is well handled  Scaling is the cause of trade-offs causing you to have to use map/reduce
  • 18. Characterstics  Almost infinite horizontal scaling  Very fast  Performance doesn’t deteriorate with growth (much)  No fixed table schemas  No join operations  Ad-hoc queries difficult or impossible  Structured storage  Almost everything happens in RAM
  • 19. NOSQL Types  Wide Column Store / Column Families  Document Store  Key Value / Tuple Store  Graph Databases  Object Databases  XML Databases  Multivalue Databases
  • 20. Main types -  Key-Value Stores  Map Reduce Framework  Document Databases  Graph Databases
  • 21. Key Value Stores  Lineage: Amazon's Dynamo paper and Distributed HashTables.  Data model: A global collection of key-value pairs  Example systems  Google BigTable , Amazon Dynamo, Cassandra, Voldemort , Hbase , …  Implementation: efficiency, scalability, fault-tolerance  Records distributed to nodes based on key  Replication  Single-record transactions, “eventual consistency”
  • 22. Documented Databases  Lineage: Inspired by Lotus Notes.  Data model: Collections of documents, which contain key-value collections (called "documents").  Example: CouchDB, MongoDB, Riak
  • 23. Graph Database  Lineage: Draws from Euler and graph theory.  Data model: Nodes & relationships, both which can hold key-value pairs  Example: AllegroGraph, InfoGrid, Neo4j
  • 24. Map Reduce Framework  Google’s framework for processing highly distributable problems across huge datasets using a large number of computers  Let’s define large number of computers  Cluster if all of them have same hardware  Grid unless Cluster (if !Cluster for old-style programmers)  Process split into two phases  Map  Take the input, partition it delegate to other machines  Other machines can repeat the process, leading to tree structure  Each machine returns results to the machine who gave it the task
  • 25. Map Reduce Framework contd..  Reduce  collect results from machines you gave the tasks  combine results and return it to requester  Slower than sequential data processing, but massively parallel  Sort petabyte of data in a few hours  Input, Map, Shuffle, Reduce, Output
  • 26. Popular NoSQL  Hadoop / Hbase  MemcacheDB  Cassandra  Voldemort  Amazon  Hypertable SimpleDB  Cloudata  MongoDB  IBM  CouchDB Lotus/Domino  Redis
  • 27. Real World Use  Cassandra  Facebook (original developer, used it till late 2010)  Twitter  Digg  Reddit  Rackspace  Cisco  BigTable  Google (open-source version is HBase)  MongoDB  Foursquare  Craigslist  Bit.ly  SourceForge  GitHub
  • 28. MONGODB  Document store  Basic support for dynamic (ad hoc) queries  Query by example (nice!)  Conditional Operators  <, <=, >, >=  $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and, $si ze, $type
  • 29. MONGODB  Data is stored as BSON (binary JSON)  Makes it very well suited for languages with native JSON support  Map/Reduce written in Javascript  Slow! There is one single thread of execution in Javascript  Master/slave replication (auto failover with replica sets)  Sharding built-in  Uses memory mapped files for data storage  Performance over features  On 32bit systems, limited to ~2.5Gb  An empty database takes up 192Mb  GridFS to store big data + metadata (not actually an FS)
  • 30. CASANDRA  Written in: Java  Protocol: Custom, binary (Thrift)  Tunable trade-offs for distribution and replication (N, R, W)  Querying by column, range of keys  BigTable-like features: columns, column families  Writes are much faster than reads (!)  Constant write time regardless of database size  Map/reduce possible with Apache Hadoop
  • 31. Some more info about Cassndra in Facebook  Cassandra is open source DBMS from Appache software foundation.  Cassandra provides a structured key-value store with tunable consistency  Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure  It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010
  • 32. HBASE  Written in: Java  Main point: Billions of rows X millions of columns  Modeled after BigTable  Map/reduce with Hadoop  Query predicate push down via server side scan and get filters  Optimizations for real time queries  A high performance Thrift gateway  HTTP supports XML, Protobuf, and binary  Cascading, hive, and pig source and sink modules  No single point of failure  While Hadoop streams data efficiently, it has overhead for starting map/reduce jobs. HBase is column oriented key/value store and allows for low latency read and writes.  Random access performance is like MySQL
  • 33. COUCHDB  Written in: Erlang  Main point: DB consistency, ease of use  Bi-directional (!) replication, continuous or ad-hoc, with conflict detection, thus, master-master replication. (!)  MVCC - write operations do not block reads  Previous versions of documents are available  Crash-only (reliable) design  Needs compacting from time to time  Views: embedded map/reduce  Formatting views: lists & shows  Server-side document validation possible  Authentication possible  Real-time updates via _changes (!)  Attachment handling  CouchApps (standalone JS apps)
  • 34. HADOOP  Apache project  A framework that allows for the distributed processing of large data sets across clusters of computers  Designed to scale up from single servers to thousands of machines  Designed to detect and handle failures at the application layer, instead of relying on hardware for it  Created by Doug Cutting, who named it after his son's toy elephant  Hadoop subprojects  Cassandra  HBase  Pig  Hive was a Hadoop subproject, but is now a top-level Apache project
  • 35. HADOOP contd..  Scales to hundreds or thousands of computers, each with several processor cores  Designed to efficiently distribute large amounts of work across a set of machines  Hundreds of gigabytes of data constitute the low end of Hadoop- scale  Built to process "web-scale" data on the order of hundreds of gigabytes to terabytes or petabytes  Uses Java, but allows streaming so other languages can easily send and accept data items to/from Hadoop
  • 36. HADOOP contd..  Uses distributed file system (HDFS)  Designed to hold very large amounts of data (terabytes or even petabytes)  Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications  Data organized into directories and files  Files are divided into block (64MB by default) and distributed across nodes  Design of HDFS is based on the design of the Google File System
  • 37. HIVE  A petabyte-scale data warehouse system for Hadoop  Easy data summarization, ad-hoc queries  Query the data using a SQL-like language called HiveQL  Hive compiler generates map-reduce jobs for most queries
  • 38. Conclusion  NoSQL is a great problem solver if you need it  Choose your NoSQL platform carefully as each is designed for specific purpose  Get used to Map/Reduce  It’s not a sin to use NoSQL alongside (yes)SQL database
  • 39. Referance  http://www.facebook.com/note.php?note_id=24413 138919  http://en.wikipedia.org/wiki/Apache_Cassandra  http://en.wikipedia.org/wiki/SQL  http://en.wikipedia.org/wiki/NoSQL  www.slideshare.com