SlideShare a Scribd company logo
Modeling with Document
Databases:
5 Key Patterns
Dan Sullivan, Principal
DS Applied Technologies
Enterprise Data World 2015
Washington, D.C.
April 1, 2015
My Background
 Data Architect / Engineer
 NoSQL and relational data
modeler
 Big data
 Analytics, machine learning
and text mining
 Cloud computing

 Computational Biologist
 Author
 No SQL for Mere Mortals
 Contributor to TechTarget
Overview
 Patterns and Data Modeling
 5 Key Patterns
 Anti-Patterns to Avoid
 Last suggestions
{
--------------
--------------
--------------
{ ------
------ }
}
[ ----------- ]
}
Document
{
--------------
--------------
--------------
{ ------
------ }
}
[ ----------- ]
}
Document
{
--------------
--------------
--------------
{ ------
------ }
}
[ ----------- ]
}
Document
{
--------------
--------------
--------------
{ ------
------ }
}
[ ----------- ]
}
Document
Patterns and Data Modeling
Schema-less <> Model-less
 Schema-less Document
Databases
 No fixed schema
 Polymorphic documents

 ...however, not a Design
Free for All
 Queries drives organization
 Performance Considerations
 Long-term Maintenance

 Middle Ground: Data
Model Patterns
 Reusable methods for
organizing data
 Model is implicit in
document structures
Patterns
 Commonly used structure and
organization
 Abstraction, not implementation
specific
 Popularized by “Gang of Four”
 Applied to Relational Databases
by David C. Hay
 Applies to NoSQL
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 1: One-to-Many
 Embed Documents
 Multiple documents
embedded
 “Many” attributes stored
with “One” document
 Pros
 Single fetch returns
primary and related data
 Might improve
performance
 Simplifies application
code
 Cons
 Increases document size
 Might degrade
performance
{
OrderID: 1837373,
customer : {Name: 'Jane Lox'
Addr: '123 Main St'
City: 'Boston'
State: 'MA'},
orderItem:{ Sku: 38383838,
Descr: 'Black chair'},
orderItem:{ Sku: 2872636,
Descr: 'Glass desk'},
orderItem:{ Sku: 4747433,
Descr: 'USB Drive 32GB''}
}
One-to-Many Considerations
 Query attributes in
embedded documents?
 Support for indexing
embedded documents?
 Potential for arbitrary
growth after record
created?
 Need for atomic writes?
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 2: Many-to-Many
Employees
({empID: 1783,
pname: “Michelle”,
lname:”Jones”
projects: [487,973, 287]}
{empID: 9872,
pname: “Bob”,
lname:”Williams”
projects: [487,973, 121]})
Projects
({projID:121,
projName:'NoSQL Pilot'',
team: [9872, 2431,
{projID:487,
projName:'Customer Churn
Analysis'',
team: [1873,9872]})
References
 Minimizes redundancy
 Preserves integrity
 Reduces document growth
 Requires multiple reads
Pattern 2: Many-to-Many
Employee
{empID: 1783,
pname: “Michelle”,
lname:”Jones”
projects: [
{projID:121,
projName:'NoSQL Pilot''},
{projID:487,
projName:'Customer Churn
Analysis''}
]}
Project
{projID:121,
projName:'NoSQL Pilot'',
team: [
{ empID: 1783,
fname: “Michelle”,
lname:”Jones”},
{ empID: 9872,
fname: “Bob”,
lname:”Williams”}
]}
Embedded Documents
 Captures point in time data
 One document read retrieves
data
 Increases document growth
Many-to-Many Considerations
 References
 Minimizes redundancy
 Preserves integrity
 Reduces document growth
 Requires multiple reads
 Embedded Documents
 Captures point in time data
 One document read retrieves
data
 Increases document growth
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 3: Trees with Parent & Child
References
 Trees
 Single root
document
 At most one parent
 No cycles
 Multiple Types
 Is-A
 Part-of
Pattern 3: Trees with References
Children Refs.
({orgUnitID:178,
orgUnitType: “Primary”,
orgUnitName:”P1”
children: [179,180]},
{orgUnitID:179,
orgUnitType: “Branch”,
orgUnitName:”B1”
children: [181,182]},
{orgUnitID:180,
orgUnitType: “Branch”,
orgUnitName:”B2”
children: [183,184]})
Parent Refs.
({orgUnitID:178,
orgUnitType: “Primary”,
orgUnitName:”P1”
parent: 177},
{orgUnitID:179,
orgUnitType: “Branch”,
orgUnitName:”B1”
parent: 178},
{orgUnitID:180,
orgUnitType: “Branch”,
orgUnitName:”B2”
parent: 178})
Tree Considerations
 Children reference allow for
top-down navigation
 Parent references allow for-
bottom up navigation
 Combination allow for
bottom-up and top-down
navigation
 Avoid large arrays
 Consider need for point in
time data
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 4: Trees with Materialized
Paths
 Full path from document
to root is represented in
document
 Implement with arrays or
string
 Especially useful in
hierarchical queries, i.e. a
type and all its subtypes
Materialized Paths
{orderItemId: 1873,
prodType: “Pens”
prodHierarchy:[ “Product_Categories”,
”Office Supplies”,
”Writing Instruments”,
”Pens”]
}
Materialized Paths Considerations
 Support for multi-key
indexing of arrays
 Use of regular expressions
for pattern matching when
string is used
 Ability to utilize indexes
when string representation
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 5: Entity Aggregation
 Entities with sub-types
 Relational models use
multiple tables
 Document models use
varying embedded
documents
 Source of
polymorphism
Entity Aggregation Polymorphic
Documents
{concertID: 132,
locDescr:'Small Bar',
price:”$30.00”,
performerName:”Rolling Stones”}
{concertID: 133,
locDescr: “PDX Jazz Festival”
price:”$75.00”
festivalStart: 15-Feb
festivalEnd: 25-Feb}
Entity Aggregation Considerations
 Aggregation vs Separate
Collections
 High level branching
 Low level branching
Anti-Patterns
Anti-Patterns
 Large arrays
 Significant growth in
document size
 Fetching more data than
needed
 Fear of data duplication
 Thinking SQL, using
NoSQL
 Normalizing without need
Closing Remarks
 Consider, is it worth it to
loose:
 SQL
 Multi-statement transactions
 Triggers
 Let queries drive model
 Consider full life-cycle
 Exploit polymorphism
Questions?

More Related Content

PDF
Overview of Database and Database Management
PPTX
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
PDF
MongodB Internals
PPTX
Participation Constraints in ER diagram
PPTX
SQL for interview
PDF
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
PDF
SHACL Overview
Overview of Database and Database Management
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
MongodB Internals
Participation Constraints in ER diagram
SQL for interview
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
SHACL Overview

What's hot (20)

PPTX
Dom(document object model)
DOCX
Pega Mock questions
PPTX
presentation in html,css,javascript
PPTX
Oracle Data Redaction
PPTX
PPT
Working with Databases and MySQL
PPTX
PPT
Database management system presentation
PPS
Data models
PPT
Sql Server Basics
PDF
Property graph vs. RDF Triplestore comparison in 2020
PDF
Database monitoring and performance management
PDF
Part 3 - Modern Data Warehouse with Azure Synapse
PDF
SQL Functions and Operators
PDF
Slides: Knowledge Graphs vs. Property Graphs
PPTX
Microsoft SQL Server internals & architecture
PPT
Web ontology language (owl)
PDF
Sql functions
PPTX
Mongo Nosql CRUD Operations
Dom(document object model)
Pega Mock questions
presentation in html,css,javascript
Oracle Data Redaction
Working with Databases and MySQL
Database management system presentation
Data models
Sql Server Basics
Property graph vs. RDF Triplestore comparison in 2020
Database monitoring and performance management
Part 3 - Modern Data Warehouse with Azure Synapse
SQL Functions and Operators
Slides: Knowledge Graphs vs. Property Graphs
Microsoft SQL Server internals & architecture
Web ontology language (owl)
Sql functions
Mongo Nosql CRUD Operations
Ad

Similar to Modeling with Document Database: 5 Key Patterns (20)

PDF
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
PDF
Data_Modeling_MongoDB.pdf
PPTX
DotNetToscana: NoSQL Revolution - RavenDB
PDF
PPTX
Overview of entity framework by software outsourcing company india
PPTX
Got documents?
PPTX
Got documents Code Mash Revision
PPTX
Enterprise architectsview 2015-apr
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
PDF
How to Get Started with Your MongoDB Pilot Project
PDF
Relationships are hard
PPTX
JSON Data Modeling - July 2018 - Tulsa Techfest
PDF
An Overview of Entity Framework
PDF
MongoDB and Schema Design
PPTX
Schema design mongo_boston
PPTX
Open Source North - MongoDB Advanced Schema Design Patterns
PPTX
Schema Design Best Practices with Buzz Moschetti
PPTX
Cloud architectural patterns and Microsoft Azure tools
PPTX
Advanced Schema Design Patterns
PDF
Data Modeling for MongoDB
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Data_Modeling_MongoDB.pdf
DotNetToscana: NoSQL Revolution - RavenDB
Overview of entity framework by software outsourcing company india
Got documents?
Got documents Code Mash Revision
Enterprise architectsview 2015-apr
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
How to Get Started with Your MongoDB Pilot Project
Relationships are hard
JSON Data Modeling - July 2018 - Tulsa Techfest
An Overview of Entity Framework
MongoDB and Schema Design
Schema design mongo_boston
Open Source North - MongoDB Advanced Schema Design Patterns
Schema Design Best Practices with Buzz Moschetti
Cloud architectural patterns and Microsoft Azure tools
Advanced Schema Design Patterns
Data Modeling for MongoDB
Ad

More from Dan Sullivan, Ph.D. (13)

PPTX
How to Design a Modern Data Warehouse in BigQuery
PPTX
With Automated ML, is Everyone an ML Engineer?
PPTX
Getting Started with BigQuery ML
PPTX
Google Cloud Certifications & Machine Learning
PPTX
Unstructured text to structured data
PPTX
A first look at tf idf-pdx data science meetup
PPTX
Text mining meets neural nets
PPTX
ACID vs BASE in NoSQL: Another False Dichotomy
PPTX
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
PPTX
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
PPTX
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
PPTX
Text Mining for Biocuration of Bacterial Infectious Diseases
PPTX
Limits of RDBMS and Need for NoSQL in Bioinformatics
How to Design a Modern Data Warehouse in BigQuery
With Automated ML, is Everyone an ML Engineer?
Getting Started with BigQuery ML
Google Cloud Certifications & Machine Learning
Unstructured text to structured data
A first look at tf idf-pdx data science meetup
Text mining meets neural nets
ACID vs BASE in NoSQL: Another False Dichotomy
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Text Mining for Biocuration of Bacterial Infectious Diseases
Limits of RDBMS and Need for NoSQL in Bioinformatics

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
annual-report-2024-2025 original latest.
PPTX
climate analysis of Dhaka ,Banglades.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IB Computer Science - Internal Assessment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Quality review (1)_presentation of this 21
IBA_Chapter_11_Slides_Final_Accessible.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Miokarditis (Inflamasi pada Otot Jantung)
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction-to-Cloud-ComputingFinal.pptx
ISS -ESG Data flows What is ESG and HowHow
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
annual-report-2024-2025 original latest.
climate analysis of Dhaka ,Banglades.pptx

Modeling with Document Database: 5 Key Patterns

  • 1. Modeling with Document Databases: 5 Key Patterns Dan Sullivan, Principal DS Applied Technologies Enterprise Data World 2015 Washington, D.C. April 1, 2015
  • 2. My Background  Data Architect / Engineer  NoSQL and relational data modeler  Big data  Analytics, machine learning and text mining  Cloud computing   Computational Biologist  Author  No SQL for Mere Mortals  Contributor to TechTarget
  • 3. Overview  Patterns and Data Modeling  5 Key Patterns  Anti-Patterns to Avoid  Last suggestions { -------------- -------------- -------------- { ------ ------ } } [ ----------- ] } Document { -------------- -------------- -------------- { ------ ------ } } [ ----------- ] } Document { -------------- -------------- -------------- { ------ ------ } } [ ----------- ] } Document { -------------- -------------- -------------- { ------ ------ } } [ ----------- ] } Document
  • 4. Patterns and Data Modeling
  • 5. Schema-less <> Model-less  Schema-less Document Databases  No fixed schema  Polymorphic documents   ...however, not a Design Free for All  Queries drives organization  Performance Considerations  Long-term Maintenance   Middle Ground: Data Model Patterns  Reusable methods for organizing data  Model is implicit in document structures
  • 6. Patterns  Commonly used structure and organization  Abstraction, not implementation specific  Popularized by “Gang of Four”  Applied to Relational Databases by David C. Hay  Applies to NoSQL
  • 7. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 8. Pattern 1: One-to-Many  Embed Documents  Multiple documents embedded  “Many” attributes stored with “One” document  Pros  Single fetch returns primary and related data  Might improve performance  Simplifies application code  Cons  Increases document size  Might degrade performance { OrderID: 1837373, customer : {Name: 'Jane Lox' Addr: '123 Main St' City: 'Boston' State: 'MA'}, orderItem:{ Sku: 38383838, Descr: 'Black chair'}, orderItem:{ Sku: 2872636, Descr: 'Glass desk'}, orderItem:{ Sku: 4747433, Descr: 'USB Drive 32GB''} }
  • 9. One-to-Many Considerations  Query attributes in embedded documents?  Support for indexing embedded documents?  Potential for arbitrary growth after record created?  Need for atomic writes?
  • 10. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 11. Pattern 2: Many-to-Many Employees ({empID: 1783, pname: “Michelle”, lname:”Jones” projects: [487,973, 287]} {empID: 9872, pname: “Bob”, lname:”Williams” projects: [487,973, 121]}) Projects ({projID:121, projName:'NoSQL Pilot'', team: [9872, 2431, {projID:487, projName:'Customer Churn Analysis'', team: [1873,9872]}) References  Minimizes redundancy  Preserves integrity  Reduces document growth  Requires multiple reads
  • 12. Pattern 2: Many-to-Many Employee {empID: 1783, pname: “Michelle”, lname:”Jones” projects: [ {projID:121, projName:'NoSQL Pilot''}, {projID:487, projName:'Customer Churn Analysis''} ]} Project {projID:121, projName:'NoSQL Pilot'', team: [ { empID: 1783, fname: “Michelle”, lname:”Jones”}, { empID: 9872, fname: “Bob”, lname:”Williams”} ]} Embedded Documents  Captures point in time data  One document read retrieves data  Increases document growth
  • 13. Many-to-Many Considerations  References  Minimizes redundancy  Preserves integrity  Reduces document growth  Requires multiple reads  Embedded Documents  Captures point in time data  One document read retrieves data  Increases document growth
  • 14. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 15. Pattern 3: Trees with Parent & Child References  Trees  Single root document  At most one parent  No cycles  Multiple Types  Is-A  Part-of
  • 16. Pattern 3: Trees with References Children Refs. ({orgUnitID:178, orgUnitType: “Primary”, orgUnitName:”P1” children: [179,180]}, {orgUnitID:179, orgUnitType: “Branch”, orgUnitName:”B1” children: [181,182]}, {orgUnitID:180, orgUnitType: “Branch”, orgUnitName:”B2” children: [183,184]}) Parent Refs. ({orgUnitID:178, orgUnitType: “Primary”, orgUnitName:”P1” parent: 177}, {orgUnitID:179, orgUnitType: “Branch”, orgUnitName:”B1” parent: 178}, {orgUnitID:180, orgUnitType: “Branch”, orgUnitName:”B2” parent: 178})
  • 17. Tree Considerations  Children reference allow for top-down navigation  Parent references allow for- bottom up navigation  Combination allow for bottom-up and top-down navigation  Avoid large arrays  Consider need for point in time data
  • 18. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 19. Pattern 4: Trees with Materialized Paths  Full path from document to root is represented in document  Implement with arrays or string  Especially useful in hierarchical queries, i.e. a type and all its subtypes
  • 20. Materialized Paths {orderItemId: 1873, prodType: “Pens” prodHierarchy:[ “Product_Categories”, ”Office Supplies”, ”Writing Instruments”, ”Pens”] }
  • 21. Materialized Paths Considerations  Support for multi-key indexing of arrays  Use of regular expressions for pattern matching when string is used  Ability to utilize indexes when string representation
  • 22. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 23. Pattern 5: Entity Aggregation  Entities with sub-types  Relational models use multiple tables  Document models use varying embedded documents  Source of polymorphism
  • 24. Entity Aggregation Polymorphic Documents {concertID: 132, locDescr:'Small Bar', price:”$30.00”, performerName:”Rolling Stones”} {concertID: 133, locDescr: “PDX Jazz Festival” price:”$75.00” festivalStart: 15-Feb festivalEnd: 25-Feb}
  • 25. Entity Aggregation Considerations  Aggregation vs Separate Collections  High level branching  Low level branching
  • 27. Anti-Patterns  Large arrays  Significant growth in document size  Fetching more data than needed  Fear of data duplication  Thinking SQL, using NoSQL  Normalizing without need
  • 28. Closing Remarks  Consider, is it worth it to loose:  SQL  Multi-statement transactions  Triggers  Let queries drive model  Consider full life-cycle  Exploit polymorphism