SlideShare a Scribd company logo
MongoDB

And NoSQL Databases
MongoDB Overview
•   From “humongous”
•   Document-oriented database, not relational
•   Schema free
•   Manages hierarchical collection of BSON (bee-son) documents
•   Written in C++
•   Has an official driver for C# with support from 10gen
•   Scalable with high-performance (scales horizontally)
•   Designed to address today’s workloads
•   BASE rather than ACID compliant
•   Replication
•   Part of the “NoSQL” class of DBMS
•   Website with list of all features - http://www.mongodb.org/
What is NoSQL?
• Class of DBMS that differ from relational model
• Do not expose the standard SQL interface
• May not require fixed table schemas, usually
  avoid join operations, and typically scale horizontally.
• Term coined by Carlo Strozzi in 1998 to name his lightweight,
  open-source relational database that did not expose the
  standard SQL interface.
• However as Strozzi said, because the current NoSQL
  movement is departing “from the relational model
  altogether; it should therefore have been called more
  appropriately 'NoREL', or something to that effect.“
• http://nosql-database.org/ for examples.
Why are these interesting?
• New requirements are arising in environments where we have
  higher volumes of data with high operation rates, agile
  development and cloud computing. This reflects the growing
  interactivity of applications which are becoming more
  networked and social, driving more requests to the database
  where high-performance DBMS such as MongoDB become
  favorable.
• Not requiring a schema or migration scripts before you add
  data makes it fit well with agile development approaches.
  Each time you complete new features, the schema of your
  database often needs to change. If the database is large, this
  can mean a slow process.
ACID
• Relational databases make the ACID promise:
     –      Atomicity - a transaction is all or nothing
     –      Consistency - only valid data is written to the database
     –      Isolation - pretend all transactions are happening serially and the data is correct
     –      Durability - what you write is what you get
• The problem is ACID can give you too much, it trips you up when you are
  trying to scale a system across multiple nodes.
• Down time is unacceptable so your system needs to be reliable. Reliability
  requires multiple nodes to handle machine failures.
• To make scalable systems that can handle lots and lots of reads and writes
  you need many more nodes.
• Once you try to scale ACID across many machines you hit problems with
  network failures and delays. The algorithms don't work in a distributed
  environment at any acceptable speed.

   http://highscalability.com/drop-acid-and-think-about-data
CAP
•   If you can't have all of the ACID guarantees it turns out you can have two of the following
    three characteristics:
     –    Consistency - your data is correct all the time. What you write is what you read.
     –    Availability - you can read and write and write your data all the time
     –    Partition Tolerance - if one or more nodes fails the system still works and becomes consistent when
          the system comes on-line.
•   In distributed systems, network partitioning is inevitable and must be tolerated, so essential
    CAP means that we cannot have both consistency and 100% availability.
    “If the network is broken, your database won’t work.”
•   However, we do get to pick the definition of “won’t work”. It can either mean down
    (unavailable) or inconsistent (stale data).

    http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
BASE
•    The types of large systems based on CAP aren't ACID, they are BASE (ha ha)
       – Basically Available - system seems to work all the time
       – Soft State - it doesn't have to be consistent all the time
       – Eventually Consistent - becomes consistent at some later time
•    Many companies building big applications build them on CAP and BASE: Google, Yahoo,
     Facebook, Amazon, eBay, etc.

•    Amazon popularized the concept of “Eventual Consistency”. Their definition is:
       the storage system guarantees that if no new updates are made to the object, eventually all
       accesses will return the last updated value.
•    A few examples of eventually consistent systems:
       – Asynchronous master/slave replication on an RDBMS or MongoDB
       – DNS
       – memcached in front of mysql, caching reads

For more depth and different configuration examples: http://blog.mongodb.org/post/498145601/on-distributed-consistency-part-
     2-some-eventual
To get an idea..
BSON
• Stands for Binary JSON
• Is a binary encoded serialisation of JSON-like documents.
• Like JSON, BSON supports the embedding of documents and
  arrays within other documents and arrays. BSON also contains
  extensions that allow representation of data types that are
  not part of the JSON spec. For example, BSON has a Date type
  and a BinData type.
• The driver performs translation from the language’s “object”
  (ordered associative array) data representation to BSON, and
  back:
• C#: new BsonDocument("x", 1) Javascript: {x: 1}
Querying in MongoDB
• The query expression in MongoDB (and other things, such as index key
  patterns) is represented like JSON objects (BSON).
  However, the actual verb (e.g. "find") is done in one's regular
  programming language.
• Usually we think of query object as the equivalent of a SQL "WHERE"
  clause:
   C#:               db[“users"].Find(Query. EQ(“x”, 3)).SetSortOrder(SortBy.Ascending(“y"));
                     // select * from users where x=3 order by x asc;
Javascript:          db.users.find( {x : 3} ).sort( {y : 1} );

   More on ways of creating queries:
   http://www.mongodb.org/display/DOCS/CSharp+Driver+Tutorial#CSharpDriverTutorial-
   FindandFindAsmethods

• Note: In MongoDB, just like in an RDBMS, creating appropriate indexes for
  queries is quite important for performance.

• For a quick tutorial using the shell visit http://try.mongodb.org/
•   To insert a document in the collection create an object representing the document
    and call Insert. The object can be an instance of BsonDocument or of any class that
    can be successfully serialized as a BSON document. For example:




•   If you have a class called Book the code might look like:




•   You can insert more than one document at a time using the InsertBatch method. For
    example:
A Many to Many Association
• In a relational DBMS use an intersection table and joins
• In MongoDB use either embedding or linking

  BsonDocument user = new BsonDocument {
       { "name", "John" },
       { "roles", new BsonArray{“Admin”, “User”, “Engineer”}}
  };
  users.Insert(user);

  //To get all Engineers
  users.Find(Query.EQ(“roles”,”Engineer”));
• Embedding is the nesting of objects and arrays inside
  a BSON document. Links are references between
  documents.
• There are no joins in MongoDB – distributed joins
  would be difficult on a 1,000 server cluster.
  Embedding is a bit like "prejoined" data. Operations
  within a document are easy for the server to handle;
  these operations can be fairly rich. Links in contrast
  must be processed client-side by the application; the
  application does this by issuing a follow-up query.
• Generally, for "contains" relationships between
  entities, embedding should be chosen. Use linking
  when not using linking would result in duplication of
  data.
More detail on referencing:
http://www.mongodb.org/display/DOCS/Database+References
Replication through Replica Sets
• Replica sets are a form of asynchronous master/slave replication, adding
  automatic failover and automatic recovery of member nodes.
• A replica set consists of two or more nodes that are copies of each other.
  (i.e.: replicas)
• The replica set automatically elects a primary (master). No one member is
  intrinsically primary; that is, this is a share-nothing design.
• Drivers (and mongos) can automatically detect when a replica set primary
  changes and will begin sending writes to the new primary. (Also works
  with sharding)
• Replica sets have several common uses (detail in next slide):
    –   Data Redundancy
    –   Automated Failover / High Availability
    –   Distributing read load
    –   Simplify maintenance (compared to "normal" master-slave)
    –   Disaster recovery
• http://www.mongodb.org/display/DOCS/Replica+Sets
Why Replica Sets
•   Data Redundancy
     –   Replica sets provide an automated method for storing multiple copies of your data.
     –   Supported drivers allow for the control of "write concerns". This allows for writes to be confirmed by multiple nodes
         before returning a success message to the client.
•   Automated Failover
     –   Replica sets will coordinate to have a single primary in a given set.
     –   Supported drivers will recognize the change of a primary within a replica set.
           •   In most cases, this means that the failure of a primary can be handled by the client without any configuration changes.
     –   A correctly configured replica set basically provides a “hot backup”. Recovering from backups is typically very time
         consuming and can result in data loss. Having an active replica set is generally much faster than working with backups.
•   Read Scaling
     –   By default, the primary node of a replica set is accessed for all reads and writes.
     –   Most drivers provide a slaveOkay method for identifying that a specific operation can be run on a secondary node.
         When using slaveOkay, a system can share the read load amongst several nodes.
•   Maintenance
     –   When performing tasks such as upgrades, backups and compaction, it is typically required to remove a node from
         service.
     –   Replica sets allow for these maintenance tasks to be performed while operating a production system. As long as the
         production system can withstand the removal of a single node, then it’s possible to perform a “rolling” upgrade on
         such things.
•   Disaster Recovery
     –   Replica sets allows for a “delayed secondary” node.
     –   This node can provide a window for recovering from disastrous events such as:
           •   bad deployments
           •   dropped tables and collections
Horizontal Scalability

• Rather than buying bigger servers, MongoDB scales by
  adding additional servers - improvements come in the
  form of more processors and cores rather than faster
  processors from packing more CPUs and ram into a
  server (vertical scaling).
• MongoDB easily supports high transaction rate
  applications because as more servers are added,
  transactions are distributed across the larger cluster of
  nodes, which linearly increases database capacity. With
  this model additional capacity can be added without
  reaching any limits.
• MongoDB achieves this through auto-sharding.
Sharding
• For applications that outgrow the resources of a single database server,
  MongoDB can convert to a sharded cluster, automatically managing
  failover and balancing of nodes, with few or no changes to the original
  application code.
• Each shard consists of one or more servers and stores data
  using mongod processes (mongod being the core MongoDB database
  process). In a production situation, each shard will consist of multiple
  replicated servers per shard to ensure availability and automated failover.
  The set of servers/mongod process within the shard comprise a replica
  set.
• Sharding offers:
    –   Automatic balancing for changes in load and data distribution
    –   Easy addition of new machines
    –   Scaling out to one thousand nodes
    –   No single points of failure
    –   Automatic failover
• http://www.mongodb.org/display/DOCS/Sharding+Introduction
Large MongoDB Deployment example
1. One or more shards, each shard holds a portion of the total data (managed
    automatically). Reads and writes are automatically routed to the appropriate
    shard(s). Each shard is backed by a replica set – which just holds the data for that
    shard.
    A replica set is one or more servers, each holding copies of the same data. At any
    given time one is primary and the rest are secondaries. If the primary goes down
    one of the secondaries takes over automatically as primary. All writes and
    consistent reads go to the primary, and all eventually consistent reads are
    distributed amongst all the secondaries.
2. Multiple config servers, each one holds a copy of the meta data indicating which
    data lives on which shard.
3. One or more routers, each one acts as a server for one or more clients. Clients issue
    queries/updates to a router and the router routes them to the appropriate shard
    while consulting the config servers.
4. One or more clients, each one is (part of) the user's application and issues
    commands to a router via the mongo client library (driver) for its language.

mongod is the server program (data or config). mongos is the router program.
MongoDB
MongoDB
Companies using MongoDB
• Just a few here..
• Foursquare – moved over from PostgreSQL
• Craigslist – moved over from a large MySQL cluster. Schema
  changes were taking forever and it wasn’t really relational
  information. They wanted to be able to add new machines without
  downtime (which sharding provides) and route around dead
  machines without clients failing (which replica sets provide).
• Sourcefourge – moved over from MySQL. MongoDB is used for
  back-end storage on the SourceForge front pages, project pages,
  and download pages for all projects.
• The New York Times - using it in a form-building application for
  photo submissions. Mongo's dynamic schema gives producers the
  ability to define any combination of custom form fields.
• Full list at
  http://www.mongodb.org/display/DOCS/Production+Deployments
Further Resources
• Article: Going NoSQL with MongoDB
• C# Driver Tutorial
• GUI Tool like Management Studio -
  http://www.mongovue.com/
• 10gen White Paper
• http://www.mongodb.com/
• Wikipedia pages for MongoDB, NoSQL etc.
• Google Groups mongodb-user and mongodb-csharp

More Related Content

PDF
Scalability, Availability & Stability Patterns
PPT
No SQL and MongoDB - Hyderabad Scalability Meetup
PDF
Understanding and building big data Architectures - NoSQL
PDF
Scalability Design Principles - Internal Session
ODP
Data massage: How databases have been scaled from one to one million nodes
PPT
Building a Scalable Architecture for web apps
PPTX
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
PDF
Introduction to hazelcast
Scalability, Availability & Stability Patterns
No SQL and MongoDB - Hyderabad Scalability Meetup
Understanding and building big data Architectures - NoSQL
Scalability Design Principles - Internal Session
Data massage: How databases have been scaled from one to one million nodes
Building a Scalable Architecture for web apps
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Introduction to hazelcast

What's hot (20)

PDF
NewSQL overview, Feb 2015
PDF
Distributed applications using Hazelcast
PPTX
HBase: Where Online Meets Low Latency
PPTX
Introduction to mongodb
PPTX
Mongo DB
PDF
Usage case of HBase for real-time application
PPTX
Geek Sync | SQL Security Principals and Permissions 101
PPT
LinkedIn - A highly scalable Architecture on Java!
PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PDF
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
PPTX
Stability Patterns for Microservices
PDF
What Every Developer Should Know About Database Scalability
PDF
Codemotion 2015 Infinispan Tech lab
PPTX
Using Cassandra with your Web Application
PDF
Azure sql database limitations
PDF
Intro to HBase - Lars George
PDF
Nuxeo JavaOne 2007 presentation (in original format)
PPT
High availability solutions bakostech
PDF
HBaseCon 2015- HBase @ Flipboard
PDF
Performance Analysis of HBASE and MONGODB
NewSQL overview, Feb 2015
Distributed applications using Hazelcast
HBase: Where Online Meets Low Latency
Introduction to mongodb
Mongo DB
Usage case of HBase for real-time application
Geek Sync | SQL Security Principals and Permissions 101
LinkedIn - A highly scalable Architecture on Java!
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
Stability Patterns for Microservices
What Every Developer Should Know About Database Scalability
Codemotion 2015 Infinispan Tech lab
Using Cassandra with your Web Application
Azure sql database limitations
Intro to HBase - Lars George
Nuxeo JavaOne 2007 presentation (in original format)
High availability solutions bakostech
HBaseCon 2015- HBase @ Flipboard
Performance Analysis of HBASE and MONGODB
Ad

Viewers also liked (13)

ODP
MongoDB - javascript for your data
PPTX
introduction to Mongodb
PPTX
Deriving an Emergent Relational Schema from RDF Data
PDF
Search-Based Testing of Relational Schema Integrity Constraints Across Multip...
PDF
MongoDB: Queries and Aggregation Framework with NBA Game Data
PPTX
KEY
OSCON 2012 MongoDB Tutorial
PDF
Sql Injection Myths and Fallacies
PPT
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
PPTX
ER model to Relational model mapping
PPT
ER DIAGRAM TO RELATIONAL SCHEMA MAPPING
PPT
Erd examples
MongoDB - javascript for your data
introduction to Mongodb
Deriving an Emergent Relational Schema from RDF Data
Search-Based Testing of Relational Schema Integrity Constraints Across Multip...
MongoDB: Queries and Aggregation Framework with NBA Game Data
OSCON 2012 MongoDB Tutorial
Sql Injection Myths and Fallacies
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
ER model to Relational model mapping
ER DIAGRAM TO RELATIONAL SCHEMA MAPPING
Erd examples
Ad

Similar to MongoDB (20)

PDF
Mongo db transcript
PDF
Mongodb my
PDF
MongoDB
PDF
Thoughts on Transaction and Consistency Models
PPTX
Master.pptx
PDF
Datastores
PDF
Is NoSQL The Future of Data Storage?
PPTX
NoSQL and Couchbase
PDF
So you want to liberate your data?
PPTX
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
PDF
MongoDB: What, why, when
KEY
PDF
NoSql and it's introduction features-Unit-1.pdf
PDF
NoSQL databases
PPTX
Drop acid
PPTX
PPTX
UNIT I Introduction to NoSQL.pptx
PDF
SDEC2011 NoSQL concepts and models
PPTX
UNIT I Introduction to NoSQL.pptx
Mongo db transcript
Mongodb my
MongoDB
Thoughts on Transaction and Consistency Models
Master.pptx
Datastores
Is NoSQL The Future of Data Storage?
NoSQL and Couchbase
So you want to liberate your data?
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
MongoDB: What, why, when
NoSql and it's introduction features-Unit-1.pdf
NoSQL databases
Drop acid
UNIT I Introduction to NoSQL.pptx
SDEC2011 NoSQL concepts and models
UNIT I Introduction to NoSQL.pptx

Recently uploaded (20)

PPTX
Role, role conflict and ascribed and achieved role.pptx
PDF
Economic and Financial Abuse - Hidden Tools of Power in Domestic Violence
PPTX
James 1 Bible verses sermonbbbbbbbbbb.pptx
PDF
Echoes of Tomorrow - A Sustainable Legacy for Future Generations.pdf
PPTX
CLASS9 CLIMATE PPT MY VERSION ( FOR ONLINE CLASS [Autosaved].pptx
DOCX
Free Pomodoro Tecnique Effect Guide -25mint - pomodorotimer.com.au
PDF
8 Effective Mosquito Control Tips for Gardens in the Monsoon Season
PDF
12 Reasons Why Women Stay In Abusive Relationships
PPTX
Ethical_Clothing_Presentation for everyone
PDF
Pink Lined Illustration Communication Training Talking Presentation_20250804_...
PPTX
hhhsyysvwvsydxuguduehshsvdhvdjbuwbjwjdbwubs
PDF
Student Housing Security From Metal Keys to Smart Access.pdf
PPTX
Term review 2023 Hirata TYPHOON.pptx review
PPTX
Hyperlipidemia current medication with lifestyle.
PDF
Special Needs Dogs – How to Care for Them with Love .pdf
PDF
Maslow's Hierarchy Isn't a Ladder — It's a Loop (by Meenakshi Khakat)
DOC
学历学位硕士ACAP毕业证,澳大利亚凯斯林大学毕业证留学未毕业
PPTX
Expert Custom Tailoring Services for All Needs.pptx
PDF
Renovating a Midwest Ranch Rustic Modern Charm with Carved Doors
PPTX
Too Lucky to Be a Victim., an essay on social media
Role, role conflict and ascribed and achieved role.pptx
Economic and Financial Abuse - Hidden Tools of Power in Domestic Violence
James 1 Bible verses sermonbbbbbbbbbb.pptx
Echoes of Tomorrow - A Sustainable Legacy for Future Generations.pdf
CLASS9 CLIMATE PPT MY VERSION ( FOR ONLINE CLASS [Autosaved].pptx
Free Pomodoro Tecnique Effect Guide -25mint - pomodorotimer.com.au
8 Effective Mosquito Control Tips for Gardens in the Monsoon Season
12 Reasons Why Women Stay In Abusive Relationships
Ethical_Clothing_Presentation for everyone
Pink Lined Illustration Communication Training Talking Presentation_20250804_...
hhhsyysvwvsydxuguduehshsvdhvdjbuwbjwjdbwubs
Student Housing Security From Metal Keys to Smart Access.pdf
Term review 2023 Hirata TYPHOON.pptx review
Hyperlipidemia current medication with lifestyle.
Special Needs Dogs – How to Care for Them with Love .pdf
Maslow's Hierarchy Isn't a Ladder — It's a Loop (by Meenakshi Khakat)
学历学位硕士ACAP毕业证,澳大利亚凯斯林大学毕业证留学未毕业
Expert Custom Tailoring Services for All Needs.pptx
Renovating a Midwest Ranch Rustic Modern Charm with Carved Doors
Too Lucky to Be a Victim., an essay on social media

MongoDB

  • 2. MongoDB Overview • From “humongous” • Document-oriented database, not relational • Schema free • Manages hierarchical collection of BSON (bee-son) documents • Written in C++ • Has an official driver for C# with support from 10gen • Scalable with high-performance (scales horizontally) • Designed to address today’s workloads • BASE rather than ACID compliant • Replication • Part of the “NoSQL” class of DBMS • Website with list of all features - http://www.mongodb.org/
  • 3. What is NoSQL? • Class of DBMS that differ from relational model • Do not expose the standard SQL interface • May not require fixed table schemas, usually avoid join operations, and typically scale horizontally. • Term coined by Carlo Strozzi in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface. • However as Strozzi said, because the current NoSQL movement is departing “from the relational model altogether; it should therefore have been called more appropriately 'NoREL', or something to that effect.“ • http://nosql-database.org/ for examples.
  • 4. Why are these interesting? • New requirements are arising in environments where we have higher volumes of data with high operation rates, agile development and cloud computing. This reflects the growing interactivity of applications which are becoming more networked and social, driving more requests to the database where high-performance DBMS such as MongoDB become favorable. • Not requiring a schema or migration scripts before you add data makes it fit well with agile development approaches. Each time you complete new features, the schema of your database often needs to change. If the database is large, this can mean a slow process.
  • 5. ACID • Relational databases make the ACID promise: – Atomicity - a transaction is all or nothing – Consistency - only valid data is written to the database – Isolation - pretend all transactions are happening serially and the data is correct – Durability - what you write is what you get • The problem is ACID can give you too much, it trips you up when you are trying to scale a system across multiple nodes. • Down time is unacceptable so your system needs to be reliable. Reliability requires multiple nodes to handle machine failures. • To make scalable systems that can handle lots and lots of reads and writes you need many more nodes. • Once you try to scale ACID across many machines you hit problems with network failures and delays. The algorithms don't work in a distributed environment at any acceptable speed. http://highscalability.com/drop-acid-and-think-about-data
  • 6. CAP • If you can't have all of the ACID guarantees it turns out you can have two of the following three characteristics: – Consistency - your data is correct all the time. What you write is what you read. – Availability - you can read and write and write your data all the time – Partition Tolerance - if one or more nodes fails the system still works and becomes consistent when the system comes on-line. • In distributed systems, network partitioning is inevitable and must be tolerated, so essential CAP means that we cannot have both consistency and 100% availability. “If the network is broken, your database won’t work.” • However, we do get to pick the definition of “won’t work”. It can either mean down (unavailable) or inconsistent (stale data). http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
  • 7. BASE • The types of large systems based on CAP aren't ACID, they are BASE (ha ha) – Basically Available - system seems to work all the time – Soft State - it doesn't have to be consistent all the time – Eventually Consistent - becomes consistent at some later time • Many companies building big applications build them on CAP and BASE: Google, Yahoo, Facebook, Amazon, eBay, etc. • Amazon popularized the concept of “Eventual Consistency”. Their definition is: the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value. • A few examples of eventually consistent systems: – Asynchronous master/slave replication on an RDBMS or MongoDB – DNS – memcached in front of mysql, caching reads For more depth and different configuration examples: http://blog.mongodb.org/post/498145601/on-distributed-consistency-part- 2-some-eventual
  • 8. To get an idea..
  • 9. BSON • Stands for Binary JSON • Is a binary encoded serialisation of JSON-like documents. • Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type. • The driver performs translation from the language’s “object” (ordered associative array) data representation to BSON, and back: • C#: new BsonDocument("x", 1) Javascript: {x: 1}
  • 10. Querying in MongoDB • The query expression in MongoDB (and other things, such as index key patterns) is represented like JSON objects (BSON). However, the actual verb (e.g. "find") is done in one's regular programming language. • Usually we think of query object as the equivalent of a SQL "WHERE" clause: C#: db[“users"].Find(Query. EQ(“x”, 3)).SetSortOrder(SortBy.Ascending(“y")); // select * from users where x=3 order by x asc; Javascript: db.users.find( {x : 3} ).sort( {y : 1} ); More on ways of creating queries: http://www.mongodb.org/display/DOCS/CSharp+Driver+Tutorial#CSharpDriverTutorial- FindandFindAsmethods • Note: In MongoDB, just like in an RDBMS, creating appropriate indexes for queries is quite important for performance. • For a quick tutorial using the shell visit http://try.mongodb.org/
  • 11. To insert a document in the collection create an object representing the document and call Insert. The object can be an instance of BsonDocument or of any class that can be successfully serialized as a BSON document. For example: • If you have a class called Book the code might look like: • You can insert more than one document at a time using the InsertBatch method. For example:
  • 12. A Many to Many Association • In a relational DBMS use an intersection table and joins • In MongoDB use either embedding or linking BsonDocument user = new BsonDocument { { "name", "John" }, { "roles", new BsonArray{“Admin”, “User”, “Engineer”}} }; users.Insert(user); //To get all Engineers users.Find(Query.EQ(“roles”,”Engineer”));
  • 13. • Embedding is the nesting of objects and arrays inside a BSON document. Links are references between documents. • There are no joins in MongoDB – distributed joins would be difficult on a 1,000 server cluster. Embedding is a bit like "prejoined" data. Operations within a document are easy for the server to handle; these operations can be fairly rich. Links in contrast must be processed client-side by the application; the application does this by issuing a follow-up query. • Generally, for "contains" relationships between entities, embedding should be chosen. Use linking when not using linking would result in duplication of data. More detail on referencing: http://www.mongodb.org/display/DOCS/Database+References
  • 14. Replication through Replica Sets • Replica sets are a form of asynchronous master/slave replication, adding automatic failover and automatic recovery of member nodes. • A replica set consists of two or more nodes that are copies of each other. (i.e.: replicas) • The replica set automatically elects a primary (master). No one member is intrinsically primary; that is, this is a share-nothing design. • Drivers (and mongos) can automatically detect when a replica set primary changes and will begin sending writes to the new primary. (Also works with sharding) • Replica sets have several common uses (detail in next slide): – Data Redundancy – Automated Failover / High Availability – Distributing read load – Simplify maintenance (compared to "normal" master-slave) – Disaster recovery • http://www.mongodb.org/display/DOCS/Replica+Sets
  • 15. Why Replica Sets • Data Redundancy – Replica sets provide an automated method for storing multiple copies of your data. – Supported drivers allow for the control of "write concerns". This allows for writes to be confirmed by multiple nodes before returning a success message to the client. • Automated Failover – Replica sets will coordinate to have a single primary in a given set. – Supported drivers will recognize the change of a primary within a replica set. • In most cases, this means that the failure of a primary can be handled by the client without any configuration changes. – A correctly configured replica set basically provides a “hot backup”. Recovering from backups is typically very time consuming and can result in data loss. Having an active replica set is generally much faster than working with backups. • Read Scaling – By default, the primary node of a replica set is accessed for all reads and writes. – Most drivers provide a slaveOkay method for identifying that a specific operation can be run on a secondary node. When using slaveOkay, a system can share the read load amongst several nodes. • Maintenance – When performing tasks such as upgrades, backups and compaction, it is typically required to remove a node from service. – Replica sets allow for these maintenance tasks to be performed while operating a production system. As long as the production system can withstand the removal of a single node, then it’s possible to perform a “rolling” upgrade on such things. • Disaster Recovery – Replica sets allows for a “delayed secondary” node. – This node can provide a window for recovering from disastrous events such as: • bad deployments • dropped tables and collections
  • 16. Horizontal Scalability • Rather than buying bigger servers, MongoDB scales by adding additional servers - improvements come in the form of more processors and cores rather than faster processors from packing more CPUs and ram into a server (vertical scaling). • MongoDB easily supports high transaction rate applications because as more servers are added, transactions are distributed across the larger cluster of nodes, which linearly increases database capacity. With this model additional capacity can be added without reaching any limits. • MongoDB achieves this through auto-sharding.
  • 17. Sharding • For applications that outgrow the resources of a single database server, MongoDB can convert to a sharded cluster, automatically managing failover and balancing of nodes, with few or no changes to the original application code. • Each shard consists of one or more servers and stores data using mongod processes (mongod being the core MongoDB database process). In a production situation, each shard will consist of multiple replicated servers per shard to ensure availability and automated failover. The set of servers/mongod process within the shard comprise a replica set. • Sharding offers: – Automatic balancing for changes in load and data distribution – Easy addition of new machines – Scaling out to one thousand nodes – No single points of failure – Automatic failover • http://www.mongodb.org/display/DOCS/Sharding+Introduction
  • 18. Large MongoDB Deployment example 1. One or more shards, each shard holds a portion of the total data (managed automatically). Reads and writes are automatically routed to the appropriate shard(s). Each shard is backed by a replica set – which just holds the data for that shard. A replica set is one or more servers, each holding copies of the same data. At any given time one is primary and the rest are secondaries. If the primary goes down one of the secondaries takes over automatically as primary. All writes and consistent reads go to the primary, and all eventually consistent reads are distributed amongst all the secondaries. 2. Multiple config servers, each one holds a copy of the meta data indicating which data lives on which shard. 3. One or more routers, each one acts as a server for one or more clients. Clients issue queries/updates to a router and the router routes them to the appropriate shard while consulting the config servers. 4. One or more clients, each one is (part of) the user's application and issues commands to a router via the mongo client library (driver) for its language. mongod is the server program (data or config). mongos is the router program.
  • 21. Companies using MongoDB • Just a few here.. • Foursquare – moved over from PostgreSQL • Craigslist – moved over from a large MySQL cluster. Schema changes were taking forever and it wasn’t really relational information. They wanted to be able to add new machines without downtime (which sharding provides) and route around dead machines without clients failing (which replica sets provide). • Sourcefourge – moved over from MySQL. MongoDB is used for back-end storage on the SourceForge front pages, project pages, and download pages for all projects. • The New York Times - using it in a form-building application for photo submissions. Mongo's dynamic schema gives producers the ability to define any combination of custom form fields. • Full list at http://www.mongodb.org/display/DOCS/Production+Deployments
  • 22. Further Resources • Article: Going NoSQL with MongoDB • C# Driver Tutorial • GUI Tool like Management Studio - http://www.mongovue.com/ • 10gen White Paper • http://www.mongodb.com/ • Wikipedia pages for MongoDB, NoSQL etc. • Google Groups mongodb-user and mongodb-csharp