SlideShare a Scribd company logo
Schema Design
Marc Schwering
Solutions Architect, MongoDB
marc@mongodb.com
@m4rcsch
All application
deployment is Schema
Design
Success comes from a
Proper Data Structure
RDBMS MongoDB
Database Database
Table Collection
Row Document
Index Index
Join Embedding & Linking
Terminology
Working with
Documents
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
authors: [
{ _id: "kchodorow", name: "Kristina Chodorow“ },
{ _id: "mdirold", name: “Mike Dirolf“ }
],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
What is a Document?
7
Traditional Schema Design
Focus on Data Storage
8
Document Schema Design
Focus on Data Usage
9
Traditional Schema Design
What answers do I have?
10
Document Schema Design
What questions do I have?
Schema Design By
Example
12
Library Management Application
• Patrons/Users
• Books
• Authors
• Publishers
13
Question:
What is a Patron’s Address?
A Patron and their Address
> patron = db.patrons.find({ _id : “joe” })
_id: "joe“,
name: "Joe Bookreader”
}
> address = db.addresses.find({ _id : “joe” })
{
_id: "joe“,
street: "123 Fake St. ",
city: "Faketon",
state: "MA",
zip: 12345
}
A Patron and their Address
> patron = db.patrons.find({ _id : “joe” })
{
_id: "joe",
name: "Joe Bookreader",
address: {
street: "123 Fake St. ",
city: "Faketon",
state: "MA",
zip: 12345
}
}
16
One-to-One Relationships
• “Belongs to” relationships are often embedded.
• Holistic representation of entities with their embedded
attributes and relationships.
• Optimized for read performance
17
Question:
What are a Patron’s Addresses?
A Patron and their Addresses
> patron = db.patrons.find({ _id : “bob” })
{
_id: “bob",
name: “Bob Knowitall",
addresses: [
{street: "1 Vernon St.", city: "Newton", …},
{street: "52 Main St.", city: "Boston", …},
]
}
A Patron and their Addresses
> patron = db.patrons.find({ _id : “bob” })
{
_id: “bob",
name: “Bob Knowitall",
addresses: [
{street: "1 Vernon St.", city: "Newton", …},
{street: "52 Main St.", city: "Boston", …},
]
}
> patron = db.patrons.find({ _id : “joe” })
{
_id: "joe",
name: "Joe Bookreader",
address: { street: "123 Fake St. ", city: "Faketon", …}
}
20
Migration Possibilities
• Migrate all documents when the schema changes.
• Migrate On-Demand
– As we pull up a patron’s document, we make the change.
– Any patrons that never come into the library never get updated.
• Leave it alone
– As long as the application knows about both types…
21
Question:
Who is the publisher of this
book?
22
Book
• MongoDB: The Definitive Guide,
• By Kristina Chodorow and Mike Dirolf
• Published: 9/24/2010
• Pages: 216
• Language: English
• Publisher: O’Reilly Media, CA
Book with embedded Publisher
> book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
24
Book with embedded Publisher
• Optimized for read performance of Books
• Other queries become difficult
25
Question:
Who are all the publishers in the
system?
All Publishers
> publishers = db.publishers.find()
{
_id: “oreilly”,
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
{
_id: “penguin”,
name: “Penguin”,
founded: “1983”,
location: “CA”
}
Book with linked Publisher
> book = db.books.find({ _id: “123” })
{
_id: “123”,
publisher_id: “oreilly”,
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
> db.publishers.find({ _id : book.publisher_id })
{
_id: “oreilly”,
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
28
Question:
What are all the books a
publisher has published?
Publisher with linked Books
> publisher = db.publishers.find({ _id : “oreilly” })
{
_id: “oreilly”,
name: "O’Reilly Media",
founded: "1980",
location: "CA“,
books: [“123”,…]
}
> books = db.books.find({ _id: { $in : publisher.books } })
30
Question:
Who are the authors of a given
book?
Books with linked Authors
> book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English“,
authors: [“kchodorow”, “mdirolf”]
}
> authors = db.authors.find({ _id : { $in : book.authors }
})
{ _id: "kchodorow", name: "Kristina Chodorow”, hometown: …
}
{ _id: “mdirolf", name: “Mike Dirolf“, hometown: … }
Books with linked Authors
> book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English“,
authors: [
{ id: "kchodorow", name: "Kristina Chodorow” },
{ id: "mdirolf", name: "Mike Dirolf” }
]
}
33
Question:
What are all the books an author
has written?
> authors = db.authors.find({ _id : “kchodorow” })
{
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "Cincinnati",
books: [ {id: “123”, title : "MongoDB: The Definitive
Guide“ } ]
}
Authors with linked Books
Links on both Authors and Books
> authors = db.authors.find({ _id : “kchodorow” })
{
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "Cincinnati",
books: [ {id: “123”, title : "MongoDB: The Definitive
Guide“ } ]
}
> book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
authors: [
{ id: "kchodorow", name: "Kristina Chodorow” },
{ id: "mdirolf", name: "Mike Dirolf” }
]
}
36
Linking vs. Embedding
• Embedding
– Great for read performance
– Writes can be slow
– Data integrity needs to be managed
• Linking
– Flexible
– Data integrity is built-in
– Work is done during reads
37
Question:
What are all the books about
databases?
Categories as Documents
> authors = db.authors.find({ _id : “kchodorow” })
{
_id: "kchodorow",
name: "Kristina Chodorow",
homea> book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
category: “MongoDB”
}
> categories = db.categories.find({ _id: “MongoDB” })
{
_id: “MongoDB”,
parent: “Databases”
}
town: "Cincinnati",
books: [ {id: “123”, title : "MongoDB: The Definitive Guide“
} ]
}
Categories as an Array
> book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
categories: [“MongoDB”, “Databases”, “Programming”]
}
> db.books.find({ categories: “Databases” })
Categories as a Path
> book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
category: “Programming/Databases/MongoDB”
}
> db.books.find({ category: ^Programming/Databases/* })
41
Conclusion
• Schema design is different in MongoDB
• Basic data design principals stay the same
• Focus on how an application accesses/manipulates data
• Evolve the schema to meet requirements as they change
Schema Design
Marc Schwering
Solutions Architect, MongoDB
marc@mongodb.com
@m4rcsch
Jumpstart: Schema Design

More Related Content

PPTX
Schema Design
PDF
Schema Design
PDF
Schema Design
PDF
Schema design
PDF
Schema Design
PDF
Schema Design
PPTX
Webinar: Schema Design
PPTX
Schema Design
Schema Design
Schema Design
Schema Design
Schema design
Schema Design
Schema Design
Webinar: Schema Design
Schema Design

What's hot (14)

PPTX
Schema design mongo_boston
PPTX
Schema Design
PPTX
Schema Design
PDF
Schema & Design
PPT
MongoDB Schema Design
PDF
Schema Design
PPTX
MongoDB San Francisco 2013: Schema design presented by Jason Zucchetto, Consu...
PDF
MongoDB Schema Design
PPTX
Building Your First App with MongoDB
PPTX
Back to Basics 1: Thinking in documents
PDF
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
PDF
JSON Learning
KEY
Schema Design
KEY
Introduction to MongoDB
Schema design mongo_boston
Schema Design
Schema Design
Schema & Design
MongoDB Schema Design
Schema Design
MongoDB San Francisco 2013: Schema design presented by Jason Zucchetto, Consu...
MongoDB Schema Design
Building Your First App with MongoDB
Back to Basics 1: Thinking in documents
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
JSON Learning
Schema Design
Introduction to MongoDB
Ad

Similar to Jumpstart: Schema Design (20)

PDF
Schema Design
PPTX
Dev Jumpstart: Schema Design Best Practices
PPTX
Webinar: Schema Design
PPTX
Webinar: Back to Basics: Thinking in Documents
PDF
Schema Design
PPT
No SQL and MongoDB - Hyderabad Scalability Meetup
KEY
Modeling Data in MongoDB
PPTX
lecture_34e.pptx
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
PPTX
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
PDF
Building Your First App: An Introduction to MongoDB
PDF
Building your first app with mongo db
PDF
Schema Design in MongoDB - TriMug Meetup North Carolina
PPTX
Schema Design
PDF
Which Questions We Should Have
PPTX
Modeling JSON data for NoSQL document databases
PDF
MongoDB and Schema Design
PPTX
Schema Design
PPTX
Introduction to MongoDB
PPTX
MongoDB
Schema Design
Dev Jumpstart: Schema Design Best Practices
Webinar: Schema Design
Webinar: Back to Basics: Thinking in Documents
Schema Design
No SQL and MongoDB - Hyderabad Scalability Meetup
Modeling Data in MongoDB
lecture_34e.pptx
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Building Your First App: An Introduction to MongoDB
Building your first app with mongo db
Schema Design in MongoDB - TriMug Meetup North Carolina
Schema Design
Which Questions We Should Have
Modeling JSON data for NoSQL document databases
MongoDB and Schema Design
Schema Design
Introduction to MongoDB
MongoDB
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
project resource management chapter-09.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
August Patch Tuesday
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
A Presentation on Touch Screen Technology
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A comparative study of natural language inference in Swahili using monolingua...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
DP Operators-handbook-extract for the Mautical Institute
project resource management chapter-09.pdf
Programs and apps: productivity, graphics, security and other tools
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A comparative analysis of optical character recognition models for extracting...
Zenith AI: Advanced Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Hindi spoken digit analysis for native and non-native speakers
Univ-Connecticut-ChatGPT-Presentaion.pdf
August Patch Tuesday
Chapter 5: Probability Theory and Statistics
Group 1 Presentation -Planning and Decision Making .pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A Presentation on Touch Screen Technology
Heart disease approach using modified random forest and particle swarm optimi...
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Jumpstart: Schema Design

  • 1. Schema Design Marc Schwering Solutions Architect, MongoDB marc@mongodb.com @m4rcsch
  • 3. Success comes from a Proper Data Structure
  • 4. RDBMS MongoDB Database Database Table Collection Row Document Index Index Join Embedding & Linking Terminology
  • 6. { _id: “123”, title: "MongoDB: The Definitive Guide", authors: [ { _id: "kchodorow", name: "Kristina Chodorow“ }, { _id: "mdirold", name: “Mike Dirolf“ } ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" } } What is a Document?
  • 9. 9 Traditional Schema Design What answers do I have?
  • 10. 10 Document Schema Design What questions do I have?
  • 12. 12 Library Management Application • Patrons/Users • Books • Authors • Publishers
  • 13. 13 Question: What is a Patron’s Address?
  • 14. A Patron and their Address > patron = db.patrons.find({ _id : “joe” }) _id: "joe“, name: "Joe Bookreader” } > address = db.addresses.find({ _id : “joe” }) { _id: "joe“, street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 }
  • 15. A Patron and their Address > patron = db.patrons.find({ _id : “joe” }) { _id: "joe", name: "Joe Bookreader", address: { street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 } }
  • 16. 16 One-to-One Relationships • “Belongs to” relationships are often embedded. • Holistic representation of entities with their embedded attributes and relationships. • Optimized for read performance
  • 17. 17 Question: What are a Patron’s Addresses?
  • 18. A Patron and their Addresses > patron = db.patrons.find({ _id : “bob” }) { _id: “bob", name: “Bob Knowitall", addresses: [ {street: "1 Vernon St.", city: "Newton", …}, {street: "52 Main St.", city: "Boston", …}, ] }
  • 19. A Patron and their Addresses > patron = db.patrons.find({ _id : “bob” }) { _id: “bob", name: “Bob Knowitall", addresses: [ {street: "1 Vernon St.", city: "Newton", …}, {street: "52 Main St.", city: "Boston", …}, ] } > patron = db.patrons.find({ _id : “joe” }) { _id: "joe", name: "Joe Bookreader", address: { street: "123 Fake St. ", city: "Faketon", …} }
  • 20. 20 Migration Possibilities • Migrate all documents when the schema changes. • Migrate On-Demand – As we pull up a patron’s document, we make the change. – Any patrons that never come into the library never get updated. • Leave it alone – As long as the application knows about both types…
  • 21. 21 Question: Who is the publisher of this book?
  • 22. 22 Book • MongoDB: The Definitive Guide, • By Kristina Chodorow and Mike Dirolf • Published: 9/24/2010 • Pages: 216 • Language: English • Publisher: O’Reilly Media, CA
  • 23. Book with embedded Publisher > book = db.books.find({ _id : “123” }) { _id: “123”, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" } }
  • 24. 24 Book with embedded Publisher • Optimized for read performance of Books • Other queries become difficult
  • 25. 25 Question: Who are all the publishers in the system?
  • 26. All Publishers > publishers = db.publishers.find() { _id: “oreilly”, name: "O’Reilly Media", founded: "1980", location: "CA" } { _id: “penguin”, name: “Penguin”, founded: “1983”, location: “CA” }
  • 27. Book with linked Publisher > book = db.books.find({ _id: “123” }) { _id: “123”, publisher_id: “oreilly”, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English" } > db.publishers.find({ _id : book.publisher_id }) { _id: “oreilly”, name: "O’Reilly Media", founded: "1980", location: "CA" }
  • 28. 28 Question: What are all the books a publisher has published?
  • 29. Publisher with linked Books > publisher = db.publishers.find({ _id : “oreilly” }) { _id: “oreilly”, name: "O’Reilly Media", founded: "1980", location: "CA“, books: [“123”,…] } > books = db.books.find({ _id: { $in : publisher.books } })
  • 30. 30 Question: Who are the authors of a given book?
  • 31. Books with linked Authors > book = db.books.find({ _id : “123” }) { _id: “123”, title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English“, authors: [“kchodorow”, “mdirolf”] } > authors = db.authors.find({ _id : { $in : book.authors } }) { _id: "kchodorow", name: "Kristina Chodorow”, hometown: … } { _id: “mdirolf", name: “Mike Dirolf“, hometown: … }
  • 32. Books with linked Authors > book = db.books.find({ _id : “123” }) { _id: “123”, title: "MongoDB: The Definitive Guide", published_date: ISODate("2010-09-24"), pages: 216, language: "English“, authors: [ { id: "kchodorow", name: "Kristina Chodorow” }, { id: "mdirolf", name: "Mike Dirolf” } ] }
  • 33. 33 Question: What are all the books an author has written?
  • 34. > authors = db.authors.find({ _id : “kchodorow” }) { _id: "kchodorow", name: "Kristina Chodorow", hometown: "Cincinnati", books: [ {id: “123”, title : "MongoDB: The Definitive Guide“ } ] } Authors with linked Books
  • 35. Links on both Authors and Books > authors = db.authors.find({ _id : “kchodorow” }) { _id: "kchodorow", name: "Kristina Chodorow", hometown: "Cincinnati", books: [ {id: “123”, title : "MongoDB: The Definitive Guide“ } ] } > book = db.books.find({ _id : “123” }) { _id: “123”, title: "MongoDB: The Definitive Guide", authors: [ { id: "kchodorow", name: "Kristina Chodorow” }, { id: "mdirolf", name: "Mike Dirolf” } ] }
  • 36. 36 Linking vs. Embedding • Embedding – Great for read performance – Writes can be slow – Data integrity needs to be managed • Linking – Flexible – Data integrity is built-in – Work is done during reads
  • 37. 37 Question: What are all the books about databases?
  • 38. Categories as Documents > authors = db.authors.find({ _id : “kchodorow” }) { _id: "kchodorow", name: "Kristina Chodorow", homea> book = db.books.find({ _id : “123” }) { _id: “123”, title: "MongoDB: The Definitive Guide", category: “MongoDB” } > categories = db.categories.find({ _id: “MongoDB” }) { _id: “MongoDB”, parent: “Databases” } town: "Cincinnati", books: [ {id: “123”, title : "MongoDB: The Definitive Guide“ } ] }
  • 39. Categories as an Array > book = db.books.find({ _id : “123” }) { _id: “123”, title: "MongoDB: The Definitive Guide", categories: [“MongoDB”, “Databases”, “Programming”] } > db.books.find({ categories: “Databases” })
  • 40. Categories as a Path > book = db.books.find({ _id : “123” }) { _id: “123”, title: "MongoDB: The Definitive Guide", category: “Programming/Databases/MongoDB” } > db.books.find({ category: ^Programming/Databases/* })
  • 41. 41 Conclusion • Schema design is different in MongoDB • Basic data design principals stay the same • Focus on how an application accesses/manipulates data • Evolve the schema to meet requirements as they change
  • 42. Schema Design Marc Schwering Solutions Architect, MongoDB marc@mongodb.com @m4rcsch

Editor's Notes

  • #3: Schema Design is very important; its impact on your application is pervasive. We call the “dynamic” nature of a schema in MongoDB an “Application Defined Schema”.
  • #4: Wrong data structure will hurt you. Proper data structure can make all the pieces fall into place.
  • #7: A document is JSON. A value can be an integer, string, document, array, array of documents, etc…
  • #8: Focus on the way we store our data, neglecting the way we use it.
  • #9: Focus on how we use our data, neglecting (sort-of) how we store it.
  • #10: Has all the answers, but none can be given in an optimal way. Has zero knowledge of your application’s known queries, use cases, or client-side data structures.
  • #11: Has all the answers, but also knows what questions are going to be asked. Takes advantage of known queries, use cases, and client-side data structures.
  • #14: Imagine a patron walks up to the counter and presents his/her library card to check out some books. The first thing a librarian might want to do is confirm the patron’s address so as to have a place to send the library police when the book isn’t returned in a timely manner.
  • #15: This is entirely doable, and might be advantageous in a number of other use cases. But since we want to lookup the patron and their address at the same time, this is inefficient as it requires 2 queries.
  • #16: Embedded directly into the patron document. Only 1 query is necessary. Holistic view of a patron.
  • #17: Read performance is optimized because we only need a single query and a single disk/memory hit. Write performance change is negligible.
  • #18: Business Requirements Change! A librarian want’s all the places his/her book might be hiding out, and having more addresses for a patron is more places to look.
  • #19: Now, just store addresses as an array. Embedded directly into the patron document. Only 1 query is necessary. Holistic view of a patron.
  • #20: Schema isn’t rigid, but dynamic. An application defines the schema, and having two ways to represent addresses is entirely possible.
  • #24: Duplicate publisher in every book that the publisher has published. Data duplication is OK because the publisher is immutable.
  • #25: Best way to figure out how something is going to perform is to measure.
  • #28: Still have the previous question, who is the publisher of this book? Takes 2 queries. Same problems that exist in traditional systems. Foreign keys, while keeping data integrity, tend to erase history.
  • #30: Unbounded arrays are BAD!
  • #33: Take advantage of data that’s immutable. Duplicate data is OK.
  • #39: Recursive search to find all books about databases.
  • #40: When a category hierarchy gets changed, all documents will need to be re-categorized. If one category name exists in multiple hierarchies, then further refinement would need to happen. Uses a multi-key index.
  • #41: When a category hierarchy gets changed, all documents will need to be re-categorized. Uses an index because of the anchored regular expression.