SlideShare a Scribd company logo
Schema Design — MongoBerlin

                              Richard M Kreuter
                                   10gen Inc.
                              richard@10gen.com


                               March 25, 2011




Schema Design — MongoBerlin
Observations about Relational Database Schemas


         Relational schema design is often presented and thought of as
         an exercise in normalization. While academics debate how
         many normal forms can fit on the head of a pin, practitioners
         tend to employ just one or two.
         However, all nontrivial real-world applications employ a variety
         of strategic denormalizations: materialized views in the
         RDBMS, caching layers outside the RDBMS. These
         denormalizations tend to be vital to real-world performance.
         Finally, application programmers seldom code in relations, but
         rather in object graphs; the RDBMS’s model, the set of
         tuples, isn’t a great fit for modern programming languages or
         developers’ minds.


   Schema Design — MongoBerlin
MongoDB Documents, Queries, Features



        MongoDB documents are deeply nestable sequences key-value
        pairs, thus permitting “rich” structure.
        The MongoDB query language is relatively SQL-like in its
        capacity to find documents satisfying complicated, dynamic
        criteria.
        MongoDB documents can be updated atomically, with
        special efficiency at updates that don’t alter a document’s size
        or shape.




  Schema Design — MongoBerlin
MongoDB Schema Design Generalities



  When designing for MongoDB, do...
        ... let the application direct the schema.
        ... denormalize judiciously.
        ... design your schema for indexing.
        ... resort to application-level JOINs when needed
  And don’t ...
        ... treat collections as heaps.
        ... frequently resize documents.




  Schema Design — MongoBerlin
Letting the application direct the schema




   Most applications mostly view their data in a small number of,
   distinguished “shape”, generally congruent to graphs of
   inter-object has-a relationships among instance classes in the
   applications’ models. MongoDB lets you store your data more or
   less directly according to the shape of your model.




   Schema Design — MongoBerlin
Letting the application direct the schema, continued




   db.blog_posts.findOne()
   { _id : Object(...)
     text : "A blazingly clever blog post.",
     by : "A. U. Thor",
     date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",
     tags : [ "funny", "ironic" ]
   }




   Schema Design — MongoBerlin
Denormalizing Judiciously




   Most application entities turn out to have some fields that are very
   frequently altered, and other fields that are exceedingly seldom
   altered. Embedding infrequently altered attributes around the
   database is a reasonable strategy to improve performance.




   Schema Design — MongoBerlin
Denormalizing Judiciously, continued

   db.product_reviews.findOne()
   { _id : Object(...)
     comment : "The best thing ever!"
     date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",
     reviewer : { uid : ObjectId("987654abcxyz"),
                  name : "Khan Sumer",
                  thumbnail : "thumb-123456.jpg",
                  url : "http://blahblah.com/" } }
   db.users.find({ _id : ObjectId("987654abcxyz")})
   { uid : ObjectId("987654abcxyz"),
     name : "Khan Sumer",
     thumbnail : ..., url : ...
     last_post : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",
     favorites : [ ... ], friends : [ ... ] }
   }
   Schema Design — MongoBerlin
Design your schema for indexing



   There’s a subtle relationship between schemas and indexes.
   Consider this query:

   db.boxes.find({$where : "this.height > this.width"})

   This query doesn’t take advantage of MongoDB indexes, both
   because of the JavaScript and also because this predicate isn’t
   something MongoDB knows how to index. If this sort of query is
   important, maintaining a separate boolean attribute in the
   document is the right thing; and the separate value can be indexed.




   Schema Design — MongoBerlin
Application-level JOINs




   Because most MongoDB documents are “richer” than RDBMS
   rows, they tend to represent “pre-JOINed” data; and so
   application-level JOIN operations should be few. However,
   sometimes you do need relational-style normalization and
   application-level JOINS. This comes up in some many-to-many
   relationships, and may not cost much in practice.




   Schema Design — MongoBerlin
Don’t treat collections as heaps




   Although MongoDB permits quite a bit of freedom in document
   structure, documents in a collection ought to share a common
   subset of attributes, for programmatic processing effective
   indexing, and developer comprehension. If you have documents
   with very different sets of attributes, consider storing them in
   separate collections.




   Schema Design — MongoBerlin
Don’t frequently resize documents




   Resizing a document (e.g. by adding/removing attributes or
   adding/removing elements of lists) is generally costly. (In-place
   updates are quite efficient, however.) In general, a schema whose
   documents’ sizes are highly volatile should be considered suspect;
   such data might best be stored as separate documents.




   Schema Design — MongoBerlin
Don’t frequently resize documents, continued
   So, instead of this
   db.urlhits.findOne()
   { _id : ..., url : "http://10gen.com",
     // this is counting with granularity of 1 day
     counts : { "2011-03-01" :
                 { firefox : 12345, chrome : 23456 },
                "2011-03-02" :
                 { firefox : 15678, chrome : 24567 }
                ... } }
   consider this:
   db.urlhits2.findOne()
   { _id : ..., url : "http://10gen.com",
     date : "2011-03-01",
     counts : { "firefox : 12345, chrome : 23456 } }
   Schema Design — MongoBerlin
Don’t frequently resize documents, continued

   So, instead of this

   db.user_events.findOne()
   { _id : ..., user : "kreuter"
     clicks : [ { url : <url1>, time : <time1> },
                { url : <url2>, time : <time2> },
                ... ] }

   consider this:

   db.user_events.findOne()
   { _id : ..., user : "kreuter", url: <url1>, time: <time1> }




   Schema Design — MongoBerlin
Going forward



         www.mongodb.org — downloads, docs, community
         mongodb-user@googlegroups.com — mailing list
         #mongodb on irc.freenode.net
         try.mongodb.org — web-based shell
         10gen is hiring. Email jobs@10gen.com.
         10gen offers support, training, and advising services for
         mongodb




   Schema Design — MongoBerlin

More Related Content

PDF
Data Modeling for MongoDB
PDF
MongodB Internals
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PPTX
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
PPTX
MongoDB World 2015 - A Technical Introduction to WiredTiger
PDF
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
PDF
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
PDF
Real Life Clean Architecture
Data Modeling for MongoDB
MongodB Internals
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
MongoDB World 2015 - A Technical Introduction to WiredTiger
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
Real Life Clean Architecture

What's hot (20)

PDF
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PPTX
1. 아키텍쳐 설계 프로세스
PPTX
Introduction to MongoDB
ODP
Product catalog using MongoDB
PPTX
Retail referencearchitecture productcatalog
PDF
Migrating from RDBMS to MongoDB
PPTX
REST API 설계
PPTX
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
PPSX
Cassandra and Riak at BestBuy.com
PDF
안정적인 서비스 운영 2014.03
PDF
잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다
PDF
An introduction to MongoDB
PPTX
Mongoose and MongoDB 101
PDF
Python Flask Tutorial For Beginners | Flask Web Development Tutorial | Python...
PDF
Elasticsearch in Netflix
PDF
[DEVIEW 2021] 1000만 글로벌 유저를 지탱하는 기술과 사람들
PPTX
Model Your Application Domain, Not Your JSON Structures
PPTX
Flask – Python
PPTX
Introduction to MongoDB
PPTX
The Right (and Wrong) Use Cases for MongoDB
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
1. 아키텍쳐 설계 프로세스
Introduction to MongoDB
Product catalog using MongoDB
Retail referencearchitecture productcatalog
Migrating from RDBMS to MongoDB
REST API 설계
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Cassandra and Riak at BestBuy.com
안정적인 서비스 운영 2014.03
잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다
An introduction to MongoDB
Mongoose and MongoDB 101
Python Flask Tutorial For Beginners | Flask Web Development Tutorial | Python...
Elasticsearch in Netflix
[DEVIEW 2021] 1000만 글로벌 유저를 지탱하는 기술과 사람들
Model Your Application Domain, Not Your JSON Structures
Flask – Python
Introduction to MongoDB
The Right (and Wrong) Use Cases for MongoDB
Ad

Similar to MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso) (20)

PPTX
MediaGlu and Mongo DB
PPTX
Mongo db
PDF
Mongo db basics
PPTX
How to learn MongoDB for beginner's
PDF
On no sql.partiii
PDF
MongoDB NoSQL database a deep dive -MyWhitePaper
DOCX
MongoDB DOC v1.5
KEY
2012 phoenix mug
PDF
how_can_businesses_address_storage_issues_using_mongodb.pdf
PPTX
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
KEY
PPTX
SQL to NoSQL: Top 6 Questions
PPT
Mongo Bb - NoSQL tutorial
PPTX
nosql [Autosaved].pptx
PDF
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
PDF
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
PDF
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
PPTX
how_can_businesses_address_storage_issues_using_mongodb.pptx
PDF
MongoDB Design Patterns
PPTX
MongoDB
MediaGlu and Mongo DB
Mongo db
Mongo db basics
How to learn MongoDB for beginner's
On no sql.partiii
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB DOC v1.5
2012 phoenix mug
how_can_businesses_address_storage_issues_using_mongodb.pdf
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
SQL to NoSQL: Top 6 Questions
Mongo Bb - NoSQL tutorial
nosql [Autosaved].pptx
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
how_can_businesses_address_storage_issues_using_mongodb.pptx
MongoDB Design Patterns
MongoDB
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
project resource management chapter-09.pdf
PDF
Architecture types and enterprise applications.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
August Patch Tuesday
PDF
WOOl fibre morphology and structure.pdf for textiles
PPT
What is a Computer? Input Devices /output devices
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
1 - Historical Antecedents, Social Consideration.pdf
NewMind AI Weekly Chronicles - August'25-Week II
DP Operators-handbook-extract for the Mautical Institute
project resource management chapter-09.pdf
Architecture types and enterprise applications.pdf
The various Industrial Revolutions .pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Enhancing emotion recognition model for a student engagement use case through...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
August Patch Tuesday
WOOl fibre morphology and structure.pdf for textiles
What is a Computer? Input Devices /output devices
NewMind AI Weekly Chronicles – August ’25 Week III
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Assigned Numbers - 2025 - Bluetooth® Document
OMC Textile Division Presentation 2021.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Getting Started with Data Integration: FME Form 101
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
1 - Historical Antecedents, Social Consideration.pdf

MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

  • 1. Schema Design — MongoBerlin Richard M Kreuter 10gen Inc. richard@10gen.com March 25, 2011 Schema Design — MongoBerlin
  • 2. Observations about Relational Database Schemas Relational schema design is often presented and thought of as an exercise in normalization. While academics debate how many normal forms can fit on the head of a pin, practitioners tend to employ just one or two. However, all nontrivial real-world applications employ a variety of strategic denormalizations: materialized views in the RDBMS, caching layers outside the RDBMS. These denormalizations tend to be vital to real-world performance. Finally, application programmers seldom code in relations, but rather in object graphs; the RDBMS’s model, the set of tuples, isn’t a great fit for modern programming languages or developers’ minds. Schema Design — MongoBerlin
  • 3. MongoDB Documents, Queries, Features MongoDB documents are deeply nestable sequences key-value pairs, thus permitting “rich” structure. The MongoDB query language is relatively SQL-like in its capacity to find documents satisfying complicated, dynamic criteria. MongoDB documents can be updated atomically, with special efficiency at updates that don’t alter a document’s size or shape. Schema Design — MongoBerlin
  • 4. MongoDB Schema Design Generalities When designing for MongoDB, do... ... let the application direct the schema. ... denormalize judiciously. ... design your schema for indexing. ... resort to application-level JOINs when needed And don’t ... ... treat collections as heaps. ... frequently resize documents. Schema Design — MongoBerlin
  • 5. Letting the application direct the schema Most applications mostly view their data in a small number of, distinguished “shape”, generally congruent to graphs of inter-object has-a relationships among instance classes in the applications’ models. MongoDB lets you store your data more or less directly according to the shape of your model. Schema Design — MongoBerlin
  • 6. Letting the application direct the schema, continued db.blog_posts.findOne() { _id : Object(...) text : "A blazingly clever blog post.", by : "A. U. Thor", date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)", tags : [ "funny", "ironic" ] } Schema Design — MongoBerlin
  • 7. Denormalizing Judiciously Most application entities turn out to have some fields that are very frequently altered, and other fields that are exceedingly seldom altered. Embedding infrequently altered attributes around the database is a reasonable strategy to improve performance. Schema Design — MongoBerlin
  • 8. Denormalizing Judiciously, continued db.product_reviews.findOne() { _id : Object(...) comment : "The best thing ever!" date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)", reviewer : { uid : ObjectId("987654abcxyz"), name : "Khan Sumer", thumbnail : "thumb-123456.jpg", url : "http://blahblah.com/" } } db.users.find({ _id : ObjectId("987654abcxyz")}) { uid : ObjectId("987654abcxyz"), name : "Khan Sumer", thumbnail : ..., url : ... last_post : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)", favorites : [ ... ], friends : [ ... ] } } Schema Design — MongoBerlin
  • 9. Design your schema for indexing There’s a subtle relationship between schemas and indexes. Consider this query: db.boxes.find({$where : "this.height > this.width"}) This query doesn’t take advantage of MongoDB indexes, both because of the JavaScript and also because this predicate isn’t something MongoDB knows how to index. If this sort of query is important, maintaining a separate boolean attribute in the document is the right thing; and the separate value can be indexed. Schema Design — MongoBerlin
  • 10. Application-level JOINs Because most MongoDB documents are “richer” than RDBMS rows, they tend to represent “pre-JOINed” data; and so application-level JOIN operations should be few. However, sometimes you do need relational-style normalization and application-level JOINS. This comes up in some many-to-many relationships, and may not cost much in practice. Schema Design — MongoBerlin
  • 11. Don’t treat collections as heaps Although MongoDB permits quite a bit of freedom in document structure, documents in a collection ought to share a common subset of attributes, for programmatic processing effective indexing, and developer comprehension. If you have documents with very different sets of attributes, consider storing them in separate collections. Schema Design — MongoBerlin
  • 12. Don’t frequently resize documents Resizing a document (e.g. by adding/removing attributes or adding/removing elements of lists) is generally costly. (In-place updates are quite efficient, however.) In general, a schema whose documents’ sizes are highly volatile should be considered suspect; such data might best be stored as separate documents. Schema Design — MongoBerlin
  • 13. Don’t frequently resize documents, continued So, instead of this db.urlhits.findOne() { _id : ..., url : "http://10gen.com", // this is counting with granularity of 1 day counts : { "2011-03-01" : { firefox : 12345, chrome : 23456 }, "2011-03-02" : { firefox : 15678, chrome : 24567 } ... } } consider this: db.urlhits2.findOne() { _id : ..., url : "http://10gen.com", date : "2011-03-01", counts : { "firefox : 12345, chrome : 23456 } } Schema Design — MongoBerlin
  • 14. Don’t frequently resize documents, continued So, instead of this db.user_events.findOne() { _id : ..., user : "kreuter" clicks : [ { url : <url1>, time : <time1> }, { url : <url2>, time : <time2> }, ... ] } consider this: db.user_events.findOne() { _id : ..., user : "kreuter", url: <url1>, time: <time1> } Schema Design — MongoBerlin
  • 15. Going forward www.mongodb.org — downloads, docs, community mongodb-user@googlegroups.com — mailing list #mongodb on irc.freenode.net try.mongodb.org — web-based shell 10gen is hiring. Email jobs@10gen.com. 10gen offers support, training, and advising services for mongodb Schema Design — MongoBerlin