SlideShare a Scribd company logo
Perl Engineer & Evangelist, 10gen
Mike Friedman
#MongoDBdays
Schema Design
Four Real-World Use
Cases
Single Table En
Agenda
• Why is schema design important
• 4 Real World Schemas
– Inbox
– History
– IndexedAttributes
– Multiple Identities
• Conclusions
Why is Schema Design
important?
• Largest factor for a performant system
• Schema design with MongoDB is different
• RDBMS – "What answers do I have?"
• MongoDB – "What question will I have?"
#1 - Message Inbox
Let’s get
Social
Sending Messages
?
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
Reading my Inbox
?
3 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
// Shard on "from"
db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox reads
db.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
db.inbox.save( msg )
// Read my inbox
db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
Fan out on read
Fan out on read – I/O
Shard
1 Shard 2
Shard
3
Send
Message
Fan out on read – I/O
Shard
1 Shard 2
Shard
3
Read
Inbox
Send
Message
Considerations
• Write: One document per message sent
• Read: Find all messages with my own name in
the recipient field
• Read: Requires scatter-gather on sharded
cluster
• A lot of random I/O on a shard to find everything
// Shard on “recipient” and “sent”
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
for ( recipient in msg.to ) {
msg.recipient = msg.to[recipient]
db.inbox.save( msg );
}
// Read my inbox
db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
Fan out on write
Fan out on write – I/O
Shard
1
Shard
2
Shard
3
Send
Message
Fan out on write – I/O
Read
Inbox
Send
Message
Shard
1
Shard
2
Shard
3
Considerations
• Write: One document per recipient
• Read: Find all of the messages with me as the
recipient
• Can shard on recipient, so inbox reads hit one
shard
• But still lots of random I/O on the shard
// Shard on "owner / sequence"
db.shardCollection( "mongodbdays.inbox",
{ owner: 1, sequence: 1 } )
db.shardCollection( "mongodbdays.users", { user_name: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
Fan out on write with buckets
// Send a message
for( recipient in msg.to) {
count = db.users.findAndModify({
query: { user_name: msg.to[recipient] },
update: { "$inc": { "msg_count": 1 } },
upsert: true,
new: true }).msg_count;
sequence = Math.floor(count / 50);
db.inbox.update({
owner: msg.to[recipient], sequence: sequence },
{ $push: { "messages": msg } },
{ upsert: true } );
}
// Read my inbox
db.inbox.find( { owner: "Joe" } )
.sort ( { sequence: -1 } ).limit( 2 )
Fan out on write with buckets
Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inboxes so there’s not too many
messages per document
• Can shard on recipient, so inbox reads hit one
shard
• 1 or 2 documents to read the whole inbox
Fan out on write with buckets – I/O
Shard
1
Shard
2
Shard
3
Send
Message
Shard
1
Shard
2
Shard
3
Fan out on write with buckets – I/O
Read
Inbox
Send
Message
#2 – History
Data Modeling Examples from the Real World
Design Goals
• Need to retain a limited amount of history e.g.
– Hours, Days, Weeks
– May be legislative requirement (e.g. HIPPA, SOX, DPA)
• Need to query efficiently by
– match
– ranges
3 Approaches (there are
more)
• Bucket by Number of messages
• Fixed size array
• Bucket by date + TTL collections
db.inbox.find()
{ owner: "Joe", sequence: 25,
messages: [
{ from: "Joe",
to: [ "Bob", "Jane" ],
sent: ISODate("2013-03-01T09:59:42.689Z"),
message: "Hi!"
},
…
] }
// Query with a date range
db.inbox.find ({owner: "friend1",
messages: {
$elemMatch: {sent:{$gte: ISODate("…") }}}})
// Remove elements based on a date
db.inbox.update({owner: "friend1" },
{ $pull: { messages: {
sent: { $gte: ISODate("…") } } } } )
Bucket by number of
messages
Considerations
• Shrinking documents, space can be reclaimed
with
– db.runCommand ( { compact: '<collection>' } )
• Removing the document after the last element in
the array as been removed
– { "_id" : …, "messages" : [ ], "owner" : "friend1",
"sequence" : 0 }
msg = {
from: "Your Boss",
to: [ "Bob" ],
sent: new Date(),
message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $slice for $push
db.messages.update(
{ _id: 1 },
{ $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50 }
}
}
)
Fixed Size Array
Considerations
• Need to compute the size of the array based on
retention period
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"),
messages: [ ]
}
// Auto expires data after 31536000 seconds = 1 year
db.messages.ensureIndex( { sequence: 1 },
{ expireAfterSeconds: 31536000 } )
TTL Collections
#3 – Indexed Attributes
Design Goal
• Application needs to stored a variable number of
attributes e.g.
– User defined Form
– Meta Data tags
• Queries needed
– Equality
– Range based
• Need to be efficient, regardless of the number of
attributes
2 Approaches (there are
more)
• Attributes as Embedded Document
• Attributes as Objects in an Array
db.files.insert( { _id: "local.0",
attr: { type: "text", size: 64,
created: ISODate("..." } } )
db.files.insert( { _id: "local.1",
attr: { type: "text", size: 128} } )
db.files.insert( { _id: "mongod",
attr: { type: "binary", size: 256,
created: ISODate("...") } } )
// Need to create an index for each item in the sub-document
db.files.ensureIndex( { "attr.type": 1 } )
db.files.find( { "attr.type": "text"} )
// Can perform range queries
db.files.ensureIndex( { "attr.size": 1 } )
db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
Attributes as a Sub-
Document
Considerations
• Each attribute needs an Index
• Each time you extend, you add an index
• Lots and lots of indexes
db.files.insert( {_id: "local.0",
attr: [ { type: "text" },
{ size: 64 },
{ created: ISODate("...") } ] } )
db.files.insert( { _id: "local.1",
attr: [ { type: "text" },
{ size: 128 } ] } )
db.files.insert( { _id: "mongod",
attr: [ { type: "binary" },
{ size: 256 },
{ created: ISODate("...") } ] } )
db.files.ensureIndex( { attr: 1 } )
Attributes as Objects in Array
Considerations
• Only one index needed on attr
• Can support range queries, etc.
• Index can be used only once per query
#4 – Multiple Identities
Design Goal
• Ability to look up by a number of different
identities e.g.
• Username
• Email address
• FB Handle
• LinkedIn URL
2 Approaches (there are
more)
• Identifiers in a single document
• Separate Identifiers from Content
db.users.findOne()
{ _id: "joe",
email: "joe@example.com,
fb: "joe.smith", // facebook
li: "joe.e.smith", // linkedin
other: {…}
}
// Shard collection by _id
db.shardCollection("mongodbdays.users", { _id: 1 } )
// Create indexes on each key
db.users.ensureIndex( { email: 1} )
db.users.ensureIndex( { fb: 1 } )
db.users.ensureIndex( { li: 1 } )
Single Document by User
Read by _id (shard key)
Shard 1 Shard 2 Shard 3
find( { _id: "joe"} )
Read by email (non-shard
key)
Shard 1 Shard 2 Shard 3
find ( { email: joe@example.com }
)
Considerations
• Lookup by shard key is routed to 1 shard
• Lookup by other identifier is scatter gathered
across all shards
• Secondary keys cannot have a unique index
// Create unique index
db.identities.ensureIndex( { identifier : 1} , { unique: true} )
// Create a document for each users document
db.identities.save(
{ identifier : { hndl: "joe" }, user: "1200-42" } )
db.identities.save(
{ identifier : { email: "joe@abc.com" }, user: "1200-42" } )
db.identities.save(
{ identifier : { li: "joe.e.smith" }, user: "1200-42" } )
// Shard collection by _id
db.shardCollection( "mydb.identities", { identifier : 1 } )
// Create unique index
db.users.ensureIndex( { _id: 1} , { unique: true} )
// Shard collection by _id
db.shardCollection( "mydb.users", { _id: 1 } )
Document per Identity
Read requires 2 reads
Shard 1 Shard 2 Shard 3
db.identities.find({"identifier" : {
"hndl" : "joe" }})
db.users.find( { _id: "1200-42"}
)
Considerations
• Lookup to Identities is a routed query
• Lookup to Users is a routed query
• Unique indexes available
• Must do two queries per lookup
Conclusion
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Random I/O should be avoided
Perl Engineer & Evangelist, 10gen
Mike Friedman
#MongoDBdays
Thank You
Next Sessions at 3:40
5th Floor:
West Side Ballroom 3&4:Advanced Replication Internals
West Side Ballroom 1&2: Building a High-Performance Distributed
Task Queue on MongoDB
Juilliard Complex: WhiteBoard Q&A
Lyceum Complex: Ask the Experts
7th Floor:
Empire Complex: Managing a Maturing MongoDB Ecosystem
SoHo Complex: MongoDB Indexing Constraints and Creative
Schemas

More Related Content

PPTX
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
PPTX
MongoDB Advanced Schema Design - Inboxes
PPTX
MongoDB Schema Design: Four Real-World Examples
PPTX
Dev Jumpstart: Schema Design Best Practices
PPTX
Data Modeling for the Real World
PDF
Mongo DB schema design patterns
PPTX
Webinar: Data Modeling Examples in the Real World
PDF
MongoDB Schema Design
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB Advanced Schema Design - Inboxes
MongoDB Schema Design: Four Real-World Examples
Dev Jumpstart: Schema Design Best Practices
Data Modeling for the Real World
Mongo DB schema design patterns
Webinar: Data Modeling Examples in the Real World
MongoDB Schema Design

What's hot (20)

KEY
Schema Design by Example ~ MongoSF 2012
KEY
Schema Design with MongoDB
PDF
Agile Schema Design: An introduction to MongoDB
PPTX
Data Modeling Deep Dive
PPT
Building web applications with mongo db presentation
PPTX
Building a Scalable Inbox System with MongoDB and Java
PPTX
Webinar: Schema Design
PPTX
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
PDF
Building your first app with mongo db
PPT
MongoDB Schema Design
PPTX
Webinar: Back to Basics: Thinking in Documents
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
PPT
Building Your First MongoDB App ~ Metadata Catalog
PPTX
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
PPTX
Socialite, the Open Source Status Feed
PPTX
Back to Basics 1: Thinking in documents
PPTX
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
PDF
Building a Social Network with MongoDB
PPTX
Building Your First App: An Introduction to MongoDB
Schema Design by Example ~ MongoSF 2012
Schema Design with MongoDB
Agile Schema Design: An introduction to MongoDB
Data Modeling Deep Dive
Building web applications with mongo db presentation
Building a Scalable Inbox System with MongoDB and Java
Webinar: Schema Design
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
Building your first app with mongo db
MongoDB Schema Design
Webinar: Back to Basics: Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
Building Your First MongoDB App ~ Metadata Catalog
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed
Back to Basics 1: Thinking in documents
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Building a Social Network with MongoDB
Building Your First App: An Introduction to MongoDB
Ad

Similar to Data Modeling Examples from the Real World (20)

PPTX
MongoDB Schema Design: Four Real-World Examples
PPTX
Choosing a Shard key
PPTX
Schema Design - Real world use case
PDF
Mongodb in-anger-boston-rb-2011
KEY
2012 phoenix mug
PDF
MongoDB and Schema Design
KEY
Scaling with MongoDB
KEY
Managing Social Content with MongoDB
PPTX
MongoDB: How We Did It – Reanimating Identity at AOL
PDF
Building your first app with MongoDB
KEY
Schema design
KEY
Schema Design (Mongo Austin)
KEY
2011 mongo sf-schemadesign
PDF
Getting Started with MongoDB: 4 Application Designs
ODP
MongoDB - A Document NoSQL Database
PDF
Mongo db for C# Developers
PPTX
Intro To Mongo Db
PPTX
Schema design mongo_boston
PDF
10gen Presents Schema Design and Data Modeling
PDF
MongoDB Tokyo - Monitoring and Queueing
MongoDB Schema Design: Four Real-World Examples
Choosing a Shard key
Schema Design - Real world use case
Mongodb in-anger-boston-rb-2011
2012 phoenix mug
MongoDB and Schema Design
Scaling with MongoDB
Managing Social Content with MongoDB
MongoDB: How We Did It – Reanimating Identity at AOL
Building your first app with MongoDB
Schema design
Schema Design (Mongo Austin)
2011 mongo sf-schemadesign
Getting Started with MongoDB: 4 Application Designs
MongoDB - A Document NoSQL Database
Mongo db for C# Developers
Intro To Mongo Db
Schema design mongo_boston
10gen Presents Schema Design and Data Modeling
MongoDB Tokyo - Monitoring and Queueing
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
EVs U-5 ONE SHOT Notes_c49f9e68-5eac-4201-bf86-b314ef5930ba.pdf
PPTX
Safety_Pharmacology_Tier2_Edibbbbbbbbbbbbbbbable.pptx
DOCX
Elisabeth de Pot, the Witch of Flanders .
PPTX
Understanding Colour Prediction Games – Explained Simply
PPTX
the Honda_ASIMO_Presentation_Updated.pptx
PPTX
Hacking Movie – Best Films on Cybercrime & Digital Intrigue
PDF
TAIPANQQ SITUS MUDAH MENANG DAN MUDAH MAXWIN SEGERA DAFTAR DI TAIPANQQ DAN RA...
PDF
Gess1025.pdfdadaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PPTX
just letters randomized coz i need to up
PDF
WKA #29: "FALLING FOR CUPID" TRANSCRIPT.pdf
PDF
How Old Radio Shows in the 1940s and 1950s Helped Ella Fitzgerald Grow.pdf
PDF
Download GTA 5 Free Full PC Game+Latest Version 2025
DOC
NSCAD毕业证学历认证,温哥华岛大学毕业证国外证书制作申请
PDF
Keanu Reeves Beyond the Legendary Hollywood Movie Star.pdf
PDF
Commercial arboriculture Commercial Tree consultant Essex, Kent, Thaxted.pdf
PPTX
What Makes an Entertainment App Addictive?
PPTX
Other Dance Forms - G10 MAPEH Reporting.pptx
PDF
A New Kind of Director for a New Kind of World Why Enzo Zelocchi Matters More...
PDF
WKA #29: "FALLING FOR CUPID" TRANSCRIPT.pdf
PPTX
providenetworksystemadministration.pptxhnnhgcbdjckk
EVs U-5 ONE SHOT Notes_c49f9e68-5eac-4201-bf86-b314ef5930ba.pdf
Safety_Pharmacology_Tier2_Edibbbbbbbbbbbbbbbable.pptx
Elisabeth de Pot, the Witch of Flanders .
Understanding Colour Prediction Games – Explained Simply
the Honda_ASIMO_Presentation_Updated.pptx
Hacking Movie – Best Films on Cybercrime & Digital Intrigue
TAIPANQQ SITUS MUDAH MENANG DAN MUDAH MAXWIN SEGERA DAFTAR DI TAIPANQQ DAN RA...
Gess1025.pdfdadaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
just letters randomized coz i need to up
WKA #29: "FALLING FOR CUPID" TRANSCRIPT.pdf
How Old Radio Shows in the 1940s and 1950s Helped Ella Fitzgerald Grow.pdf
Download GTA 5 Free Full PC Game+Latest Version 2025
NSCAD毕业证学历认证,温哥华岛大学毕业证国外证书制作申请
Keanu Reeves Beyond the Legendary Hollywood Movie Star.pdf
Commercial arboriculture Commercial Tree consultant Essex, Kent, Thaxted.pdf
What Makes an Entertainment App Addictive?
Other Dance Forms - G10 MAPEH Reporting.pptx
A New Kind of Director for a New Kind of World Why Enzo Zelocchi Matters More...
WKA #29: "FALLING FOR CUPID" TRANSCRIPT.pdf
providenetworksystemadministration.pptxhnnhgcbdjckk

Data Modeling Examples from the Real World

  • 1. Perl Engineer & Evangelist, 10gen Mike Friedman #MongoDBdays Schema Design Four Real-World Use Cases
  • 2. Single Table En Agenda • Why is schema design important • 4 Real World Schemas – Inbox – History – IndexedAttributes – Multiple Identities • Conclusions
  • 3. Why is Schema Design important? • Largest factor for a performant system • Schema design with MongoDB is different • RDBMS – "What answers do I have?" • MongoDB – "What question will I have?"
  • 4. #1 - Message Inbox
  • 7. Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
  • 9. 3 Approaches (there are more) • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
  • 10. // Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } ) Fan out on read
  • 11. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Send Message
  • 12. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Read Inbox Send Message
  • 13. Considerations • Write: One document per message sent • Read: Find all messages with my own name in the recipient field • Read: Requires scatter-gather on sharded cluster • A lot of random I/O on a shard to find everything
  • 14. // Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.to ) { msg.recipient = msg.to[recipient] db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } ) Fan out on write
  • 15. Fan out on write – I/O Shard 1 Shard 2 Shard 3 Send Message
  • 16. Fan out on write – I/O Read Inbox Send Message Shard 1 Shard 2 Shard 3
  • 17. Considerations • Write: One document per recipient • Read: Find all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random I/O on the shard
  • 18. // Shard on "owner / sequence" db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } Fan out on write with buckets
  • 19. // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update({ owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Joe" } ) .sort ( { sequence: -1 } ).limit( 2 ) Fan out on write with buckets
  • 20. Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inboxes so there’s not too many messages per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
  • 21. Fan out on write with buckets – I/O Shard 1 Shard 2 Shard 3 Send Message
  • 22. Shard 1 Shard 2 Shard 3 Fan out on write with buckets – I/O Read Inbox Send Message
  • 25. Design Goals • Need to retain a limited amount of history e.g. – Hours, Days, Weeks – May be legislative requirement (e.g. HIPPA, SOX, DPA) • Need to query efficiently by – match – ranges
  • 26. 3 Approaches (there are more) • Bucket by Number of messages • Fixed size array • Bucket by date + TTL collections
  • 27. db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ] } // Query with a date range db.inbox.find ({owner: "friend1", messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}}) // Remove elements based on a date db.inbox.update({owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("…") } } } } ) Bucket by number of messages
  • 28. Considerations • Shrinking documents, space can be reclaimed with – db.runCommand ( { compact: '<collection>' } ) • Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
  • 29. msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } ) Fixed Size Array
  • 30. Considerations • Need to compute the size of the array based on retention period
  • 31. // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } ) TTL Collections
  • 32. #3 – Indexed Attributes
  • 33. Design Goal • Application needs to stored a variable number of attributes e.g. – User defined Form – Meta Data tags • Queries needed – Equality – Range based • Need to be efficient, regardless of the number of attributes
  • 34. 2 Approaches (there are more) • Attributes as Embedded Document • Attributes as Objects in an Array
  • 35. db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("..." } } ) db.files.insert( { _id: "local.1", attr: { type: "text", size: 128} } ) db.files.insert( { _id: "mongod", attr: { type: "binary", size: 256, created: ISODate("...") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} ) // Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } ) Attributes as a Sub- Document
  • 36. Considerations • Each attribute needs an Index • Each time you extend, you add an index • Lots and lots of indexes
  • 37. db.files.insert( {_id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("...") } ] } ) db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } ) db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("...") } ] } ) db.files.ensureIndex( { attr: 1 } ) Attributes as Objects in Array
  • 38. Considerations • Only one index needed on attr • Can support range queries, etc. • Index can be used only once per query
  • 39. #4 – Multiple Identities
  • 40. Design Goal • Ability to look up by a number of different identities e.g. • Username • Email address • FB Handle • LinkedIn URL
  • 41. 2 Approaches (there are more) • Identifiers in a single document • Separate Identifiers from Content
  • 42. db.users.findOne() { _id: "joe", email: "joe@example.com, fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} } // Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } ) Single Document by User
  • 43. Read by _id (shard key) Shard 1 Shard 2 Shard 3 find( { _id: "joe"} )
  • 44. Read by email (non-shard key) Shard 1 Shard 2 Shard 3 find ( { email: joe@example.com } )
  • 45. Considerations • Lookup by shard key is routed to 1 shard • Lookup by other identifier is scatter gathered across all shards • Secondary keys cannot have a unique index
  • 46. // Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "joe@abc.com" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mydb.identities", { identifier : 1 } ) // Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} ) // Shard collection by _id db.shardCollection( "mydb.users", { _id: 1 } ) Document per Identity
  • 47. Read requires 2 reads Shard 1 Shard 2 Shard 3 db.identities.find({"identifier" : { "hndl" : "joe" }}) db.users.find( { _id: "1200-42"} )
  • 48. Considerations • Lookup to Identities is a routed query • Lookup to Users is a routed query • Unique indexes available • Must do two queries per lookup
  • 50. Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Random I/O should be avoided
  • 51. Perl Engineer & Evangelist, 10gen Mike Friedman #MongoDBdays Thank You
  • 52. Next Sessions at 3:40 5th Floor: West Side Ballroom 3&4:Advanced Replication Internals West Side Ballroom 1&2: Building a High-Performance Distributed Task Queue on MongoDB Juilliard Complex: WhiteBoard Q&A Lyceum Complex: Ask the Experts 7th Floor: Empire Complex: Managing a Maturing MongoDB Ecosystem SoHo Complex: MongoDB Indexing Constraints and Creative Schemas