SlideShare a Scribd company logo
Technical Overview
Learnizo Global LLP
1
Agenda
• Welcome and Introductions
• The World We Live In
• MongoDB Technical Overview
• Use Case Discussion
• Demo
2
MongoDB – It’s About the Data
3
MongoDB, It’s for the Developers
4
MongoDB Brings It All Together
5
Volume of Data
Agile Development
• Cloud Computing
• Commodity servers
• Trillions of records
• 100’s of millions of
queries per second
• Iterative
• Continuous
Hardware Architectures
MongoDB Use Cases
User Data Management High Volume Data Feeds
Content Management Operational Intelligence Product Data Mgt
6
10gen: The Creators of MongoDB
Set the
direction &
contribute
code to
MongoDB
Foster
community
& ecosystem
Provide
MongoDB
management
services
Provide
commercial
services
• Founded in 2007
– Dwight Merriman, Eliot
Horowitz
– Doubleclick, Oracle,
Marklogic, HP
• $31M+ in funding
– Flybridge, Sequoia, Union
Square
• Worldwide Expanding
Team
– 150+ employees
– NY, CA and UK
7
Agenda
• Welcome and Introductions
• The World We Live In
• MongoDB Technical Overview
• Use Case Discussion
• Demo
8
Why was MongoDB built?
9
1
0
NoSQL
• Key-value
• Graph database
• Document-oriented
• Column family
Traditional Architecture
• Relational
– Hard to map to the way we code
• Complex ORM frameworks
– Hard to evolve quickly
• Rigid schema is hard to change, necessitates migrations
– Hard to scale horizontally
• Joins, transactions make scaling by adding servers hard
11
RDBMS Limitations
12
Productivity
Cost
MongoDB
• Built from the start to solve the
scaling problem
• Consistency, Availability, Partitioning
- (can’t have it all)
• Configurable to fit requirements
13
1
4
Theory of noSQL: CAP
CAP Theorem:
satisfying all three at the
same time is impossible
A P
• Many nodes
• Nodes contain replicas of
partitions of data
• Consistency
– all replicas contain the same
version of data
• Availability
– system remains operational on
failing nodes
• Partition tolarence
– multiple entry points
– system remains operational on
system split
C
Supported languages
15
1
6
ACID - BASE
• Atomicity
• Consistency
• Isolation
• Durability
• Basically
• Available (CP)
• Soft-state
• Eventually
consistent (AP)
MongoDB is easy to use
17
START TRANSACTION;
INSERT INTO contacts VALUES
(NULL, ‘joeblow’);
INSERT INTO contact_emails VALUES
( NULL, ”joe@blow.com”,
LAST_INSERT_ID() ),
( NULL, “joseph@blow.com”,
LAST_INSERT_ID() );
COMMIT;
db.contacts.save( {
userName: “joeblow”,
emailAddresses: [
“joe@blow.com”,
“joseph@blow.com” ] } );
MongoDB
MySQL
As simple as possible,
but no simpler
Depth of functionality
Scalability
&
Performance
Memcached
Key / Value
RDBMS
18
Representing & Querying Data
19
Schema design
 RDBMS: join
20
Schema design
 MongoDB: embed and link
 Embedding is the nesting of objects and arrays inside
a BSON document(prejoined). Links are references
between documents(client-side follow-up query).
 "contains" relationships, one to many; duplication of
data, many to many
21
Schema design
22
Tables to Collections
of JSON Documents
{
title: ‘MongoDB’,
contributors:
[
{ name: ‘Eliot Horowitz’,
email: ‘eliot@10gen.com’ },
{ name: ‘Dwight Merriman’,
email: ‘dwight@10gen.com’ }
],
model:
{
relational: false,
awesome: true
}
}
23
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
24
Documents
Collections contain documents
Documents can contain other documents
and/or
Documents can reference other documents
Flexible/powerful ability to relate data
Schemaless
Flexible Schema
25
2
6
CRUD
• Create
– db.collection.insert( <document> )
– db.collection.save( <document> )
– db.collection.update( <query>, <update>, { upsert: true } )
• Read
– db.collection.find( <query>, <projection> )
– db.collection.findOne( <query>, <projection> )
• Update
– db.collection.update( <query>, <update>, <options> )
• Delete
– db.collection.remove( <query>, <justOne> )
Documents
var p = { author: “roger”,
date: new Date(),
title: “Spirited Away”,
avgRating: 9.834,
tags: [“Tezuka”, “Manga”]}
> db.posts.save(p)
27
Linked vs Embedded Documents
{ _id :
ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 …”,
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ],
comments : [
{
author : "Fred",
date : "Sat Jul 26 2010…”,
text : "Best Movie Ever"
}
],
avgRating: 9.834 }
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ],
comments : [ 6, 274, 1135, 1298, 2245, 5623],
avg_rating: 9.834 }
comments { _id : 274,
movie_id : ObjectId(“4c4ba5c0672c6
author : "Fred",
date : "Sat Jul 24 2010 20:51:0
text : "Best Movie Ever”}
{ _id : 275,
movie_id : ObjectId(“3d5ffc88
author : "Fred",
28
Querying
>db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "roger",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "Spirited Away",
tags : [ "Tezuka", "Manga" ] }
Note:
- _id is unique, but can be anything you’d like
29
Query Operators
• Conditional Operators
– $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type
– $lt, $lte, $gt, $gte
// find posts with any tags
> db.posts.find( {tags: {$exists: true }} )
// find posts matching a regular expression
> db.posts.find( {author: /^rog*/i } )
// count posts by author
> db.posts.find( {author: ‘roger’} ).count()
30
Atomic Operations
• $set, $unset, $inc, $push, $pushAll, $pull, $pullAll,
$bit
> comment = { author: “fred”,
date: new Date(),
text: “Best Movie Ever”}
> db.posts.update( { _id: “...” },
$push: {comments: comment} );
31
Arrays
• $push - append
• $pushAll – append array
• $addToSet and $each – add if not contained,
add list
• $pop – remove last
• $pull – remove all occurrences/criteria
• { $pull : { field : {$gt: 3} } }
• $pullAll - removes all occurrences of each
value 32
Indexes
// Index nested documents
> db.posts.ensureIndex( “comments.author”:1 )
> db.posts.find({‘comments.author’:’Fred’})
// Index on tags (array values)
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: ’Manga’ } )
// geospatial index
> db.posts.ensureIndex({ “author.location”: “2d” )
> db.posts.find( “author.location” : { $near : [22,42] } )
Create index on any Field in Document
>db.posts.ensureIndex({author: 1})
33
Aggregation/Batch Data Processing
• Map/Reduce can be used for batch data processing
– Currently being used for totaling, averaging, etc
– Map/Reduce is a big hammer
• Simple aggregate functions available
• (2.2) Aggregation Framework: Simple, Fast
– No Javascript Needed, runs natively on server
– Filter or Select Only Matching Sub-documents or
Arrays via new operators
• MongoDB Hadoop Connector
– Useful for Hadoop Integration
– Massive Batch Processing Jobs
34
Deployment & Scaling
35
• Data Redundancy
• Automatic Failover / High Availability
• Distribution of read load
• Disaster recovery
Why Replicate?
36
Replica Sets
Asynchronous
Replication
37
Replica Sets
• One primary, many secondaries
– Automatic replication to all secondaries
• Different delays may be configured
– Automatic election of new primary on failure
– Writes to primaries, reads can go to secondaries
• Priority of secondary can be set
– Hidden for administration/back-ups
– Lower score for less powerful machines
• Election of new primary is automatic
– Majority of replica set must be available
– Arbiters can be used
• Many configurations possible (based on use case)
38
Replica Sets
Asynchronous
Replication
39
Replica Sets
40
Replica Sets
Automatic
Leader Election
41
Replica Sets
42
Sharding
43
Sharding
mongod
Write Scalability
Key Range
0..100
mongod mongod
Key Range
0..30
Key Range
31..100
44
Sharding
mongod mongod
Write Scalability
Key Range
0..30
Key Range
31..100
Write Scalability
45
Sharding
mongod mongod
mongod mongod
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
Write Scalability
Key Range
31..100
46
• Splitting data into chunks
– Automatic
– Existing data can be manually “pre-split”
• Migration of chunks/balancing between servers
– Automatic
– Can be turned off/chunks can be manually moved
• Shard key
– Must be selected by you
– Very important for performance!
• Each shard is really a replica set
Sharding Administration
47
Full Deployment
mongod mongod
mongod mongod
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
Write Scalability
MongoS MongoS MongoS
Primary
Secondary
Secondary
Key Range
0..30
Primary
Secondary
Secondary
Key Range
31..60
Primary
Secondary
Secondary
Key Range
61..90
Primary
Secondary
Secondary
Key Range
91.. 100
48
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Key Range
0..30
Key Range
31..60
Key Range
61..90
Key Range
91.. 100
MongoS MongoS
Queries
MongoS
Config
Config
Config
49
MMS: MongoDB Monitoring Service
• SaaS solution providing instrumentation and visibility
into your MongoDB systems
• 3,500+ customers signed up and using service
50
Agenda
• Welcome and Introductions
• MongoDB and the New Frontier
• MongoDB Technical Overview
• Use Case Discussion
• Demo
51
Agenda
• Welcome and Introductions
• MongoDB and the New Frontier
• MongoDB Technical Overview
• Use Case Discussion
• Demo
52
Queries
• Importing Data into Mongodb
– mongoimport --db test --collection restaurants --
file dataset.json
• Exporting Data from MongoDB
– mongoexport -db test -collection newcolln -file
myexport.json
53
Queries
• MongoDB query operation:
• Query in SQL
54
Create/Insert Queries
• Db.collection.insert()
db.inventory.insert(
{ item: "ABC1",
details: { model: "14Q3", manufacturer: "XYZ
Company" },
stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ],
category: "clothing" } )
55
Find Queries
• db.collection.find()
• db.inventory.find( {} )
• db.inventory.find( { type: "snacks" } )
• db.inventory.find( { type: { $in: [ 'food', 'snacks'
] } } )
• db.inventory.find( { type: 'food', price: { $lt:
9.95 } } )
• db.inventory.find( { $or: [ { qty: { $gt: 100 } }, {
price: { $lt: 9.95 } } ] } )
56
Update Queries
• To use operators on fields of a subdoc use $
db.inventory.update( { item: "MNO2" },
{ $set: { category: "apparel", details: { model: "14Q3",
manufacturer: "XYZ Company" } },
$currentDate: { lastModified: true } },
false, true )
• False: Update by replacement
• True: update all matching documents
57
Queries
• Aggregation
– SQL Query
– SELECT state, SUM(pop) AS totalPop FROM
zipcodes GROUP BY state HAVING totalPop >=
(10*1000*1000)
– MongoDB
– db.zipcodes.aggregate( [ { $group: { _id: "$state",
totalPop: { $sum: "$pop" } } }, { $match: { totalPop:
{ $gte: 10*1000*1000 } } } ] )
58
Indexes
• db.collection.find({field:’value’}).explain()
• db.collection. ensureIndex({title: 1 });
• db.collection.dropIndex("index_name");
• db.mycoll.ensureIndex({'address.coord': ‘2d’})
• db.mycoll.find({"address.coord": { $near: [70,
40], $minDistance: 0.05 }})
59
Indexed and NonIndexed
Search
> db.mycoll.find({"name" : "Tov Kosher Kitchen"}).pretty().explain()
{
"cursor" : "BtreeCursor name_1","isMultiKey" : false, "n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,…
}
> db.mycoll.find({"cuisine" : "Jewish/Kosher"}).pretty().explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 316,
"nscannedObjects" : 25359,
"nscanned" : 25359,…
}
>
60
Thank You
61

More Related Content

KEY
Mongodb intro
KEY
PPT
MongoDB Pros and Cons
PPT
MongoDB Tick Data Presentation
PDF
MongoDB: a gentle, friendly overview
PPTX
MongoDB at Scale
PDF
10gen MongoDB Video Presentation at WebGeek DevCup
PPTX
Dev Jumpstart: Build Your First App with MongoDB
Mongodb intro
MongoDB Pros and Cons
MongoDB Tick Data Presentation
MongoDB: a gentle, friendly overview
MongoDB at Scale
10gen MongoDB Video Presentation at WebGeek DevCup
Dev Jumpstart: Build Your First App with MongoDB

Similar to MongoDB.pdf (20)

PPTX
Dev Jumpstart: Build Your First App with MongoDB
PPTX
Scaling MongoDB
PPTX
NoSQL and MongoDB Introdction
PDF
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
PDF
Mongo db eveningschemadesign
PPTX
Conceptos básicos. Seminario web 6: Despliegue de producción
PPT
5 Pitfalls to Avoid with MongoDB
PDF
Mongo db japan
PPTX
MongoDB Schema Design: Practical Applications and Implications
PDF
Building your first app with MongoDB
PPT
Fast querying indexing for performance (4)
PDF
mongodb tutorial
PDF
Accra MongoDB User Group
PPTX
Big Data, NoSQL with MongoDB and Cassasdra
PPTX
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
PDF
MongoDB 4.0 새로운 기능 소개
PPTX
Conceptos básicos. Seminario web 1: Introducción a NoSQL
PPTX
CDC to the Max!
PPTX
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
Dev Jumpstart: Build Your First App with MongoDB
Scaling MongoDB
NoSQL and MongoDB Introdction
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
Mongo db eveningschemadesign
Conceptos básicos. Seminario web 6: Despliegue de producción
5 Pitfalls to Avoid with MongoDB
Mongo db japan
MongoDB Schema Design: Practical Applications and Implications
Building your first app with MongoDB
Fast querying indexing for performance (4)
mongodb tutorial
Accra MongoDB User Group
Big Data, NoSQL with MongoDB and Cassasdra
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB 4.0 새로운 기능 소개
Conceptos básicos. Seminario web 1: Introducción a NoSQL
CDC to the Max!
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
Ad

Recently uploaded (20)

PDF
Well-logging-methods_new................
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Digital Logic Computer Design lecture notes
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
PPT on Performance Review to get promotions
PPT
Project quality management in manufacturing
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Geodesy 1.pptx...............................................
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
bas. eng. economics group 4 presentation 1.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Well-logging-methods_new................
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
R24 SURVEYING LAB MANUAL for civil enggi
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Digital Logic Computer Design lecture notes
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Automation-in-Manufacturing-Chapter-Introduction.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT on Performance Review to get promotions
Project quality management in manufacturing
CH1 Production IntroductoryConcepts.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Geodesy 1.pptx...............................................
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CYBER-CRIMES AND SECURITY A guide to understanding
bas. eng. economics group 4 presentation 1.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Ad

MongoDB.pdf

  • 2. Agenda • Welcome and Introductions • The World We Live In • MongoDB Technical Overview • Use Case Discussion • Demo 2
  • 3. MongoDB – It’s About the Data 3
  • 4. MongoDB, It’s for the Developers 4
  • 5. MongoDB Brings It All Together 5 Volume of Data Agile Development • Cloud Computing • Commodity servers • Trillions of records • 100’s of millions of queries per second • Iterative • Continuous Hardware Architectures
  • 6. MongoDB Use Cases User Data Management High Volume Data Feeds Content Management Operational Intelligence Product Data Mgt 6
  • 7. 10gen: The Creators of MongoDB Set the direction & contribute code to MongoDB Foster community & ecosystem Provide MongoDB management services Provide commercial services • Founded in 2007 – Dwight Merriman, Eliot Horowitz – Doubleclick, Oracle, Marklogic, HP • $31M+ in funding – Flybridge, Sequoia, Union Square • Worldwide Expanding Team – 150+ employees – NY, CA and UK 7
  • 8. Agenda • Welcome and Introductions • The World We Live In • MongoDB Technical Overview • Use Case Discussion • Demo 8
  • 9. Why was MongoDB built? 9
  • 10. 1 0 NoSQL • Key-value • Graph database • Document-oriented • Column family
  • 11. Traditional Architecture • Relational – Hard to map to the way we code • Complex ORM frameworks – Hard to evolve quickly • Rigid schema is hard to change, necessitates migrations – Hard to scale horizontally • Joins, transactions make scaling by adding servers hard 11
  • 13. MongoDB • Built from the start to solve the scaling problem • Consistency, Availability, Partitioning - (can’t have it all) • Configurable to fit requirements 13
  • 14. 1 4 Theory of noSQL: CAP CAP Theorem: satisfying all three at the same time is impossible A P • Many nodes • Nodes contain replicas of partitions of data • Consistency – all replicas contain the same version of data • Availability – system remains operational on failing nodes • Partition tolarence – multiple entry points – system remains operational on system split C
  • 16. 1 6 ACID - BASE • Atomicity • Consistency • Isolation • Durability • Basically • Available (CP) • Soft-state • Eventually consistent (AP)
  • 17. MongoDB is easy to use 17 START TRANSACTION; INSERT INTO contacts VALUES (NULL, ‘joeblow’); INSERT INTO contact_emails VALUES ( NULL, ”joe@blow.com”, LAST_INSERT_ID() ), ( NULL, “joseph@blow.com”, LAST_INSERT_ID() ); COMMIT; db.contacts.save( { userName: “joeblow”, emailAddresses: [ “joe@blow.com”, “joseph@blow.com” ] } ); MongoDB MySQL
  • 18. As simple as possible, but no simpler Depth of functionality Scalability & Performance Memcached Key / Value RDBMS 18
  • 21. Schema design  MongoDB: embed and link  Embedding is the nesting of objects and arrays inside a BSON document(prejoined). Links are references between documents(client-side follow-up query).  "contains" relationships, one to many; duplication of data, many to many 21
  • 23. Tables to Collections of JSON Documents { title: ‘MongoDB’, contributors: [ { name: ‘Eliot Horowitz’, email: ‘eliot@10gen.com’ }, { name: ‘Dwight Merriman’, email: ‘dwight@10gen.com’ } ], model: { relational: false, awesome: true } } 23
  • 24. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking 24
  • 25. Documents Collections contain documents Documents can contain other documents and/or Documents can reference other documents Flexible/powerful ability to relate data Schemaless Flexible Schema 25
  • 26. 2 6 CRUD • Create – db.collection.insert( <document> ) – db.collection.save( <document> ) – db.collection.update( <query>, <update>, { upsert: true } ) • Read – db.collection.find( <query>, <projection> ) – db.collection.findOne( <query>, <projection> ) • Update – db.collection.update( <query>, <update>, <options> ) • Delete – db.collection.remove( <query>, <justOne> )
  • 27. Documents var p = { author: “roger”, date: new Date(), title: “Spirited Away”, avgRating: 9.834, tags: [“Tezuka”, “Manga”]} > db.posts.save(p) 27
  • 28. Linked vs Embedded Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 …”, text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 26 2010…”, text : "Best Movie Ever" } ], avgRating: 9.834 } { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ 6, 274, 1135, 1298, 2245, 5623], avg_rating: 9.834 } comments { _id : 274, movie_id : ObjectId(“4c4ba5c0672c6 author : "Fred", date : "Sat Jul 24 2010 20:51:0 text : "Best Movie Ever”} { _id : 275, movie_id : ObjectId(“3d5ffc88 author : "Fred", 28
  • 29. Querying >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ] } Note: - _id is unique, but can be anything you’d like 29
  • 30. Query Operators • Conditional Operators – $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type – $lt, $lte, $gt, $gte // find posts with any tags > db.posts.find( {tags: {$exists: true }} ) // find posts matching a regular expression > db.posts.find( {author: /^rog*/i } ) // count posts by author > db.posts.find( {author: ‘roger’} ).count() 30
  • 31. Atomic Operations • $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit > comment = { author: “fred”, date: new Date(), text: “Best Movie Ever”} > db.posts.update( { _id: “...” }, $push: {comments: comment} ); 31
  • 32. Arrays • $push - append • $pushAll – append array • $addToSet and $each – add if not contained, add list • $pop – remove last • $pull – remove all occurrences/criteria • { $pull : { field : {$gt: 3} } } • $pullAll - removes all occurrences of each value 32
  • 33. Indexes // Index nested documents > db.posts.ensureIndex( “comments.author”:1 ) > db.posts.find({‘comments.author’:’Fred’}) // Index on tags (array values) > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: ’Manga’ } ) // geospatial index > db.posts.ensureIndex({ “author.location”: “2d” ) > db.posts.find( “author.location” : { $near : [22,42] } ) Create index on any Field in Document >db.posts.ensureIndex({author: 1}) 33
  • 34. Aggregation/Batch Data Processing • Map/Reduce can be used for batch data processing – Currently being used for totaling, averaging, etc – Map/Reduce is a big hammer • Simple aggregate functions available • (2.2) Aggregation Framework: Simple, Fast – No Javascript Needed, runs natively on server – Filter or Select Only Matching Sub-documents or Arrays via new operators • MongoDB Hadoop Connector – Useful for Hadoop Integration – Massive Batch Processing Jobs 34
  • 36. • Data Redundancy • Automatic Failover / High Availability • Distribution of read load • Disaster recovery Why Replicate? 36
  • 38. Replica Sets • One primary, many secondaries – Automatic replication to all secondaries • Different delays may be configured – Automatic election of new primary on failure – Writes to primaries, reads can go to secondaries • Priority of secondary can be set – Hidden for administration/back-ups – Lower score for less powerful machines • Election of new primary is automatic – Majority of replica set must be available – Arbiters can be used • Many configurations possible (based on use case) 38
  • 44. Sharding mongod Write Scalability Key Range 0..100 mongod mongod Key Range 0..30 Key Range 31..100 44
  • 45. Sharding mongod mongod Write Scalability Key Range 0..30 Key Range 31..100 Write Scalability 45
  • 46. Sharding mongod mongod mongod mongod Key Range 0..30 Key Range 31..60 Key Range 61..90 Key Range 91.. 100 Write Scalability Key Range 31..100 46
  • 47. • Splitting data into chunks – Automatic – Existing data can be manually “pre-split” • Migration of chunks/balancing between servers – Automatic – Can be turned off/chunks can be manually moved • Shard key – Must be selected by you – Very important for performance! • Each shard is really a replica set Sharding Administration 47
  • 48. Full Deployment mongod mongod mongod mongod Key Range 0..30 Key Range 31..60 Key Range 61..90 Key Range 91.. 100 Write Scalability MongoS MongoS MongoS Primary Secondary Secondary Key Range 0..30 Primary Secondary Secondary Key Range 31..60 Primary Secondary Secondary Key Range 61..90 Primary Secondary Secondary Key Range 91.. 100 48
  • 50. MMS: MongoDB Monitoring Service • SaaS solution providing instrumentation and visibility into your MongoDB systems • 3,500+ customers signed up and using service 50
  • 51. Agenda • Welcome and Introductions • MongoDB and the New Frontier • MongoDB Technical Overview • Use Case Discussion • Demo 51
  • 52. Agenda • Welcome and Introductions • MongoDB and the New Frontier • MongoDB Technical Overview • Use Case Discussion • Demo 52
  • 53. Queries • Importing Data into Mongodb – mongoimport --db test --collection restaurants -- file dataset.json • Exporting Data from MongoDB – mongoexport -db test -collection newcolln -file myexport.json 53
  • 54. Queries • MongoDB query operation: • Query in SQL 54
  • 55. Create/Insert Queries • Db.collection.insert() db.inventory.insert( { item: "ABC1", details: { model: "14Q3", manufacturer: "XYZ Company" }, stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ], category: "clothing" } ) 55
  • 56. Find Queries • db.collection.find() • db.inventory.find( {} ) • db.inventory.find( { type: "snacks" } ) • db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } ) • db.inventory.find( { type: 'food', price: { $lt: 9.95 } } ) • db.inventory.find( { $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] } ) 56
  • 57. Update Queries • To use operators on fields of a subdoc use $ db.inventory.update( { item: "MNO2" }, { $set: { category: "apparel", details: { model: "14Q3", manufacturer: "XYZ Company" } }, $currentDate: { lastModified: true } }, false, true ) • False: Update by replacement • True: update all matching documents 57
  • 58. Queries • Aggregation – SQL Query – SELECT state, SUM(pop) AS totalPop FROM zipcodes GROUP BY state HAVING totalPop >= (10*1000*1000) – MongoDB – db.zipcodes.aggregate( [ { $group: { _id: "$state", totalPop: { $sum: "$pop" } } }, { $match: { totalPop: { $gte: 10*1000*1000 } } } ] ) 58
  • 59. Indexes • db.collection.find({field:’value’}).explain() • db.collection. ensureIndex({title: 1 }); • db.collection.dropIndex("index_name"); • db.mycoll.ensureIndex({'address.coord': ‘2d’}) • db.mycoll.find({"address.coord": { $near: [70, 40], $minDistance: 0.05 }}) 59
  • 60. Indexed and NonIndexed Search > db.mycoll.find({"name" : "Tov Kosher Kitchen"}).pretty().explain() { "cursor" : "BtreeCursor name_1","isMultiKey" : false, "n" : 1, "nscannedObjects" : 1, "nscanned" : 1, "nscannedObjectsAllPlans" : 1,… } > db.mycoll.find({"cuisine" : "Jewish/Kosher"}).pretty().explain() { "cursor" : "BasicCursor", "isMultiKey" : false, "n" : 316, "nscannedObjects" : 25359, "nscanned" : 25359,… } > 60