SlideShare a Scribd company logo
Advanced Document
Modeling Techniques
Tales from a High Scale Commerce Platform
Who am I?
• Jonathan Roeder @superduperjon
• Architect and Manager for Volusion Inc., working on Mozu
• 14 years building high-scale commerce with document
databases
• 4 years developing Mozu with MongoDB
• Motto : Why show a picture when you can write 1000 words.
What is Mozu?
• The Cloud Commerce Platform
• Powerful merchant capabilities
• Rich 3rd party developer extensibility
• Multi-Tenant SaaS
• Launched Jan 2014, after three years of ground-up
development
MongoDB is the backing store for Mozu’s:
• Shopping Carts
• Order Management System
• Distributed Cache
• Content Management System
• Logs
• Login Sessions
• 3rd Party Developer File System
• Document Database for 3rd
Party developers
What is Mozu Content Management System?
• Empowers developers and merchants to:
• Create schemas (content types)
• Author data conforming to schema (content items)
• Use data to construct commerce experiences
Mozu CMS @ 30,000 ft, in pencil!
Advanced Document Modeling Techniques from a High-Scale Commerce Platform
Example Content Item (Abridged):
> content = db.content.find( { _id : 1 } )
{
_id: 1,
name: “About Us”,
contentType: “webpage”,
properties : {
body : “<div>…</div>”,
tags : [“tag1” , “tag2”],
moreData : {…}
}
}
Fantastic background! Now what?
• Implement a feature
• Consider concerns
• Explore patterns to conquer the considered concerns
• Cool?
• Cool
Feature: CMS Publishing
• Merchants can draft, preview and publish content changes
• A content item may have the following versions:
• 0 or 1 Drafted
• 0 or 1 Published
• 0 or more Archived
• CMS can return draft version or published version
• Seems easy!?
Content schema with Publishing. Take 1:
> drafted = db.content.find( { _id : 2 } )
{
_id: 2,
contentId: 1,
publishState: “drafted”,
name: “About Us”,
contentType: “webpage”,
properties : {…}
}
> published = db.content.find( { _id : 1 }
)
{
_id: 1,
contentId: 1,
publishState: “published”,
name: “About Us”,
contentType: “webpage”,
properties : {…}
}
NUANCE!
• When only a published version exists, it must pretend it’s also
the draft.
• It makes sense! While administering content, a merchant wants
to see all drafts AND all published that have no draft.
Draft
Draft
Published
Content schema with Publishing. Take 1.5:
> content = db.content.find( { _id : 2 } )
{
_id: 2,
contentId: 1,
publishState: “publishedAndDraft”,
name: “About Us”,
contentType: “webpage”,
}
Publish State Cheat Sheet:
• drafted
• publishedAndDraft (there is a published version but no draft)
• published (there is a published AND a draft)
Operations
Publishing aware queries. Take 1:
• Query published content:
>published = db.content.find(
{publishState : {$in: [“publishedAndDraft”,
“published”]}}
)
• Query draft content:
>drafted = db.content.find(
{publishState : {$in: [“drafted”, “publishedAndDraft”]}}
)
Creating a draft. Take 1:
• Insert the draft:
>db.content.insert( { _id : 2, contentId:1, …,
publishState : “drafted”})
• Remove the published content from double duty:
>db.content.update(
{contentId: 1, publishState: ”publishedAndDraft”},
{$set: {publishState: “published” }}
)
Query
Update
Publishing. Take 1:
• Archive the previous published version (if present):
>db.content.update(
{contentId:1, publishState:”published”},
{ $set: { publishState: “archived” } }
)
• Publish the draft:
>db.content.update(
{contentId: 1, publishState:”drafted”},
{$set: {publishState: “publishedAndDraft” }}
)
Query
Update
Query
Update
Did you spot the problems? There were lots!
• Hints!
• Creating drafts and publishing drafts each require multiple
database operations.
• Truth is being transitioned across these operations.
Concurrency defined:
When multiple things try to do stuff at the same time and it’s all
okay.
Problem 1 – Isolation:
• When someone walks in on you in the changing room
• Publish:
• Interim reads from other clients would see that NO published
content exists.
Archive PublishInterim
Problem 2 – Atomicity:
• All or nothing, baby
• What if we have a Database or App failure after the first
operation or during second operation?
• We end up stuck in interim state, with either two published
versions or none
Can we do better?
Time Out! Public Service Announcement:
• Know MongoDB
• Specifically, know Update Operators
• http://docs.mongodb.org/manual/reference/operator/update/
Can we do better?
• Let’s use MongoDB’s update operators!
Publish State Cheat Sheet. Take 2:
• 1 = drafted
• 2 = publishedAndDraft
• 3 = published
Content schema, with Publishing. Take 2:
> drafted = db.content.find( { _id : 2 })
{
_id: 2,
contentId: 1,
publishState: 1,
name: “About Us”,
contentType: “webpage”,
}
> published = db.content.find( { _id : 1
})
{
_id: 1,
contentId: 1,
publishState: 3,
name: “About Us”,
contentType: “webpage”,
}
Publishing aware queries. Take 2:
• Query published content:
>published = db.content.find( publishState : {$in:[2,3]})
• Query draft content:
>drafted = db.content.find( publishState : {$in:[1,2]})
Publishing. Take 2:
>db.content.update(
{contentId: 1},
{ $inc: { publishState: 1 } },
{ multi : true}
)
Increment Operator!
Update Multiple Docs..both the drafted and published
Better!
• Now only one client operation to publish!
• App tier can’t cause initial partial failure.
• Arguably less observable interim state.
• Notice that I bolded and underlined and italicized client above.
• Allows us to use the $isolated operator.
Worse!!
• Concurrent attempts to publish would cause too many $inc
operations.
The Same:
• Still not Atomic.
• Still not truly Isolated, without using $isolated.
What about creating a draft?
Pattern #1: Benign Write
• “Stage” a change in MongoDB
• Craft queries to exclude it until appropriate
• Use update operators to “commit” the staged change
Publish State Cheat Sheet. Take 3:
• 0 = uncommittedDraft
• 1 = drafted
• 2 = publishedAndDraft
• 3 = published
Content schema, with Publishing. Take 2.5:
> content = db.content.find( { _id : 2 })
{
_id: 2,
contentId: 1,
publishState: 0,
name: “About Us”,
contentType: “webpage”,
properties : {}
}
> content = db.content.find( { _id : 1 })
{
_id: 1,
contentId: 1,
publishState: 2,
name: “About Us”,
contentType: “webpage”,
properties : {}
}
Publishing aware queries. (No Change)
• Query published content:
>content = db.content.find( {publishState : {$in:[2,3]}})
• Query drafted content:
>content = db.content.find( {publishState : {$in:[1,2]}})
Note that we never look for publishState 0.
Creating a draft. Take 2:
• Insert uncommitted draft:
>db.content.insert( { _id : 2, contentId:1, …, publishState : 0})
• Commit draft:
>db.content.update(
{contentId: 1},
{ $inc: { publishState: 1 } },
{ multi : true}
)
Same Operation as Publish
Draft AND Published updated
Benign Write: A closer look
• Reduce latency when coordinating changes across documents
• If application fails after inserting uncommitted draft, nothing bad
happens!
• Look into advanced things you can do with arrays.
Otherwise, all problems solved!
I lied, there are lots more problems:
• What if two clients concurrently attempt to create a draft of
contentId 1? (Race condition/Dirty read)
• What if two clients concurrently or even serially attempt to
publish contentId 1? (Optimistic/pessimistic concurrency)
Pattern #2: Concurrency Control with unique
indexes.
• Let MongoDB enforce truth
• http://docs.mongodb.org/manual/core/write-operations-
atomicity/#concurrency-control
Unique Index:
• Each document (version) follows from a known predecessor
• We can ensure that a predecessor can be used only once
>db.content.createIndex( { contentId : 1 , predId : 1 }, {unique:
true})
Content schema, now with Publishing. Take 3:
> content = db.content.find( { _id : 2 }
)
{
_id: 2,
contentId: 1,
predId: 1,
publishState: 0,
name: “About Us”,
contentType: “webpage”,
properties : {}
}
> content = db.content.find( { _id : 1 }
)
{
_id: 1,
contentId: 1,
predId: null,
publishState: 2,
name: “About Us”,
contentType: “webpage”,
properties : {}
}
Publishing aware queries. (No Change)
• Query published content:
>content = db.content.find( {publishState : {$in:[2,3]}})
• Query drafted content:
>content = db.content.find( {publishState : {$in:[1,2]}})
Creating a draft. Take 3:
>myPredId = db.content.find( {contentId:1, publishState:2} ).get(“_id”)
>db.content.insert( { _id : 2, contentId:1, predId: myPredId, …,
publishState: 0})
>db.content.update(
{contentId: 1},
{ $inc: { publishState: 1 } },
{ multi : true}
)
Unique index, a closer look:
• If insert fails due to duplicate key violation, a concurrent client
beat us to creating a draft.
• Depending on concurrency approach, application can retry or
fail
I lied, there are lots more problems:
• What if two clients concurrently attempt to create a draft of
contentId 1?
• What if two clients concurrently or even serially attempt to
publish contentId 1?
Pattern #3: Update-if-Current, a.k.a Benign Wrong:
• Only update documents that match known good state.
• Discriminate via the update operator’s query argument.
• React to the number of results updated.
• http://docs.mongodb.org/manual/tutorial/update-if-current/
Creating a draft. Take 4:
>myPredId= db.content.find( {contentId:1, publishState:2} ).get(“_id”)
>db.content.insert( { _id : 2, contentId:1, predId: myPredId, …,
publishState : 0})
>db.content.update(
{contentId: 1, publishState: { $in: [0,2]}},
{ $inc: { publishState: 1 } },
{ multi : true}
)
Query Criteria
Creating a draft. Take 4.5:
>myPredId= db.content.find( {contentId:1, publishState:2} ).get(“_id”)
>db.content.insert( { _id : 2, contentId:1, predId: myPredId, …,
publishState : 0})
>db.content.update(
{contentId: 1, publishState: { $in: [0,2]},
$or: [ { _id: myPredId}, { predId: myPredId} ]},
{ $inc: { publishState: 1 } },
{ multi : true}
)
Even More Query Criteria!
Publishing. Take 3:
>db.content.update(
{contentId: 1, publishState: { $in: [1,3]}},
{ $inc: { publishState: 1 } },
{ multi : true}
)
Query Criteria
I lied, there are lots more problems:
• What if two clients concurrently attempt to create a draft of
contentId 1?
• What if two clients concurrently or even serially attempt to
publish contentId 1?
Now all problems solved??
• No atomicity, but how about eventual consistency?
• MongoDB.org describes a pattern for pseudo two phase
commits
• We’d be forced back into orchestrating multiple updates from
the App tier
• But good inspiration.
Short Breather
Bonus Round!
Alternate Approach: Embedded Documents
• MongoDB supports Atomicity and Isolation @ document level.
• Therefore, model schema with embedded documents when
possible.
• Store Draft and Published versions on the same document.
Content schema, now with Publishing:
> content = db.content.find( { _id : 1 } )
{
_id: 1,
name: “About Us”,
contentType: “webpage”,
publishState: [ ],
draftedContent: { },
publishedContent: { }
}
Publishing aware queries:
• Query published content:
>content = db.content.find( {publishState : “published”})
• Query drafted content:
>content = db.content.find(
{publishState : {$in: [“published”,”drafted”]}}
)
Draft Content Schema:
> content = db.content.find( { _id : 1 } )
{
_id: 1,
name: “About Us”,
contentType: “webpage”,
publishState: [“drafted”],
draftedContent: { },
draftedToPublishContent: { },
}
Remember publishedAndDraft
Publishing:
>db.content.update(
{id: 1},
{
$addToSet: { publishState: “published” },
$rename : { “draftedToPublishContent”,
“publishedContent” }
}
)
addToSet
rename
Published Content Schema:
> content = db.content.find( { _id : 1 } )
{
_id: 1,
name: “About Us”,
contentType: “webpage”,
publishState: [“drafted”, “published”],
draftedContent: { },
publishedContent: { },
}
The Good / Bad / Ugly:
• Good
• Isolation and Atomicity
• Fewer moving parts
• Bad
• Larger Document Sizes
• Ugly
• Indexing Properties would require double the indexes
Strong finish!
• Know what you’re getting into with Isolation and Atomicity
• Consider embedding documents where feasible
• Employ patterns to preempt or remediate concurrency issues
• Include pictures and jokes in your slide decks
Even more slides
Eventual Consistency Approach:
• Our publishing scenarios just move documents (versions)
between states
• We can easily detect and fix inconsistencies if we know where
to look
Pseudo Transaction:
>db.transactions.insert(
{ _id : 1, type : “publish” , scope : { contentId : 1 } }
)
Tag a draft or publish operation with transaction id:
>publishResult = db.content.update(
{contentId: 1, publishState: { $in: [1,3]}},
{ $inc: { publishState: 1 }, $set: {transactionId: 1} },
{ multi : true}
)
Pseudo Transaction, Part II:
• If publishResult communicates no errors, we’re done! Clean up.
>db.content.update(
{transactionId : 1 } ,
{$unset : {transactionId : “”}
)
>db.transactions.delete( { transactionId: 1})
Remove transaction reference
Pseudo Transaction, Part III:
• If publishResult returns errors, remediate
• For unobserved errors, background thread polls transaction
collection
Pseudo Transaction, Part IV
• Each document (version) is linked via predId
• Each linked document should be one publishState increment
apart
• Detect inconsistencies and attempt roll forward or rollback
• Rinse, repeat

More Related Content

PDF
Learn Learn how to build your mobile back-end with MongoDB
PPTX
Webinar: Back to Basics: Thinking in Documents
KEY
MongoDB, PHP and the cloud - php cloud summit 2011
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
PDF
Building your first app with mongo db
PPTX
User Data Management with MongoDB
PPT
MongoDB Schema Design
PDF
Webinar: Building Your First App with MongoDB and Java
Learn Learn how to build your mobile back-end with MongoDB
Webinar: Back to Basics: Thinking in Documents
MongoDB, PHP and the cloud - php cloud summit 2011
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Building your first app with mongo db
User Data Management with MongoDB
MongoDB Schema Design
Webinar: Building Your First App with MongoDB and Java

What's hot (20)

PPTX
Indexing Strategies to Help You Scale
PPTX
Socialite, the Open Source Status Feed
PDF
Building your first app with MongoDB
PPTX
Android and firebase database
POTX
Mobile 1: Mobile Apps with MongoDB
PDF
Storing tree structures with MongoDB
PPTX
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
KEY
Practical Ruby Projects With Mongo Db
PPTX
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
KEY
MongoDB and PHP ZendCon 2011
PDF
Mongo and Harmony
PDF
Entity Relationships in a Document Database at CouchConf Boston
PPTX
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
PPTX
Back to Basics Webinar 2: Your First MongoDB Application
PPTX
Mongo db queries
PPTX
Socialite, the Open Source Status Feed Part 1: Design Overview and Scaling fo...
PPTX
Back to Basics: My First MongoDB Application
PDF
Starting with MongoDB
PDF
MongoDB
KEY
Modeling Data in MongoDB
Indexing Strategies to Help You Scale
Socialite, the Open Source Status Feed
Building your first app with MongoDB
Android and firebase database
Mobile 1: Mobile Apps with MongoDB
Storing tree structures with MongoDB
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Practical Ruby Projects With Mongo Db
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
MongoDB and PHP ZendCon 2011
Mongo and Harmony
Entity Relationships in a Document Database at CouchConf Boston
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Back to Basics Webinar 2: Your First MongoDB Application
Mongo db queries
Socialite, the Open Source Status Feed Part 1: Design Overview and Scaling fo...
Back to Basics: My First MongoDB Application
Starting with MongoDB
MongoDB
Modeling Data in MongoDB
Ad

Similar to Advanced Document Modeling Techniques from a High-Scale Commerce Platform (20)

PPTX
Concurrency Patterns with MongoDB
PDF
Building Apps with MongoDB
PDF
MongoDB for Coder Training (Coding Serbia 2013)
PDF
MongoDB.pdf
PPTX
Introduction to MongoDB – A NoSQL Database
KEY
Schema Design (Mongo Austin)
PPTX
MongoDB_ppt.pptx
PPT
9. Document Oriented Databases
PDF
Mongodb in-anger-boston-rb-2011
PPTX
Mongo db tips and advance features
PDF
MongoDB Meetup
PDF
MongoDB Tokyo - Monitoring and Queueing
KEY
Schema Design
PDF
MongoDB - How to model and extract your data
PPT
Building Your First MongoDB App ~ Metadata Catalog
PDF
The emerging world of mongo db csp
PPTX
Webinar: Scaling MongoDB
PPTX
Webinar: General Technical Overview of MongoDB for Dev Teams
PPTX
Document validation in MongoDB 3.2
PPTX
Webinar: Building Your First Application with MongoDB
Concurrency Patterns with MongoDB
Building Apps with MongoDB
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB.pdf
Introduction to MongoDB – A NoSQL Database
Schema Design (Mongo Austin)
MongoDB_ppt.pptx
9. Document Oriented Databases
Mongodb in-anger-boston-rb-2011
Mongo db tips and advance features
MongoDB Meetup
MongoDB Tokyo - Monitoring and Queueing
Schema Design
MongoDB - How to model and extract your data
Building Your First MongoDB App ~ Metadata Catalog
The emerging world of mongo db csp
Webinar: Scaling MongoDB
Webinar: General Technical Overview of MongoDB for Dev Teams
Document validation in MongoDB 3.2
Webinar: Building Your First Application with MongoDB
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Electronic commerce courselecture one. Pdf
PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
“AI and Expert System Decision Support & Business Intelligence Systems”
Agricultural_Statistics_at_a_Glance_2022_0.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Electronic commerce courselecture one. Pdf
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine learning based COVID-19 study performance prediction

Advanced Document Modeling Techniques from a High-Scale Commerce Platform

  • 1. Advanced Document Modeling Techniques Tales from a High Scale Commerce Platform
  • 2. Who am I? • Jonathan Roeder @superduperjon • Architect and Manager for Volusion Inc., working on Mozu • 14 years building high-scale commerce with document databases • 4 years developing Mozu with MongoDB • Motto : Why show a picture when you can write 1000 words.
  • 3. What is Mozu? • The Cloud Commerce Platform • Powerful merchant capabilities • Rich 3rd party developer extensibility • Multi-Tenant SaaS • Launched Jan 2014, after three years of ground-up development
  • 4. MongoDB is the backing store for Mozu’s: • Shopping Carts • Order Management System • Distributed Cache • Content Management System • Logs • Login Sessions • 3rd Party Developer File System • Document Database for 3rd Party developers
  • 5. What is Mozu Content Management System? • Empowers developers and merchants to: • Create schemas (content types) • Author data conforming to schema (content items) • Use data to construct commerce experiences
  • 6. Mozu CMS @ 30,000 ft, in pencil!
  • 8. Example Content Item (Abridged): > content = db.content.find( { _id : 1 } ) { _id: 1, name: “About Us”, contentType: “webpage”, properties : { body : “<div>…</div>”, tags : [“tag1” , “tag2”], moreData : {…} } }
  • 9. Fantastic background! Now what? • Implement a feature • Consider concerns • Explore patterns to conquer the considered concerns • Cool? • Cool
  • 10. Feature: CMS Publishing • Merchants can draft, preview and publish content changes • A content item may have the following versions: • 0 or 1 Drafted • 0 or 1 Published • 0 or more Archived • CMS can return draft version or published version • Seems easy!?
  • 11. Content schema with Publishing. Take 1: > drafted = db.content.find( { _id : 2 } ) { _id: 2, contentId: 1, publishState: “drafted”, name: “About Us”, contentType: “webpage”, properties : {…} } > published = db.content.find( { _id : 1 } ) { _id: 1, contentId: 1, publishState: “published”, name: “About Us”, contentType: “webpage”, properties : {…} }
  • 12. NUANCE! • When only a published version exists, it must pretend it’s also the draft. • It makes sense! While administering content, a merchant wants to see all drafts AND all published that have no draft.
  • 14. Content schema with Publishing. Take 1.5: > content = db.content.find( { _id : 2 } ) { _id: 2, contentId: 1, publishState: “publishedAndDraft”, name: “About Us”, contentType: “webpage”, }
  • 15. Publish State Cheat Sheet: • drafted • publishedAndDraft (there is a published version but no draft) • published (there is a published AND a draft)
  • 17. Publishing aware queries. Take 1: • Query published content: >published = db.content.find( {publishState : {$in: [“publishedAndDraft”, “published”]}} ) • Query draft content: >drafted = db.content.find( {publishState : {$in: [“drafted”, “publishedAndDraft”]}} )
  • 18. Creating a draft. Take 1: • Insert the draft: >db.content.insert( { _id : 2, contentId:1, …, publishState : “drafted”}) • Remove the published content from double duty: >db.content.update( {contentId: 1, publishState: ”publishedAndDraft”}, {$set: {publishState: “published” }} ) Query Update
  • 19. Publishing. Take 1: • Archive the previous published version (if present): >db.content.update( {contentId:1, publishState:”published”}, { $set: { publishState: “archived” } } ) • Publish the draft: >db.content.update( {contentId: 1, publishState:”drafted”}, {$set: {publishState: “publishedAndDraft” }} ) Query Update Query Update
  • 20. Did you spot the problems? There were lots! • Hints! • Creating drafts and publishing drafts each require multiple database operations. • Truth is being transitioned across these operations.
  • 21. Concurrency defined: When multiple things try to do stuff at the same time and it’s all okay.
  • 22. Problem 1 – Isolation: • When someone walks in on you in the changing room • Publish: • Interim reads from other clients would see that NO published content exists. Archive PublishInterim
  • 23. Problem 2 – Atomicity: • All or nothing, baby • What if we have a Database or App failure after the first operation or during second operation? • We end up stuck in interim state, with either two published versions or none
  • 24. Can we do better?
  • 25. Time Out! Public Service Announcement: • Know MongoDB • Specifically, know Update Operators • http://docs.mongodb.org/manual/reference/operator/update/
  • 26. Can we do better? • Let’s use MongoDB’s update operators!
  • 27. Publish State Cheat Sheet. Take 2: • 1 = drafted • 2 = publishedAndDraft • 3 = published
  • 28. Content schema, with Publishing. Take 2: > drafted = db.content.find( { _id : 2 }) { _id: 2, contentId: 1, publishState: 1, name: “About Us”, contentType: “webpage”, } > published = db.content.find( { _id : 1 }) { _id: 1, contentId: 1, publishState: 3, name: “About Us”, contentType: “webpage”, }
  • 29. Publishing aware queries. Take 2: • Query published content: >published = db.content.find( publishState : {$in:[2,3]}) • Query draft content: >drafted = db.content.find( publishState : {$in:[1,2]})
  • 30. Publishing. Take 2: >db.content.update( {contentId: 1}, { $inc: { publishState: 1 } }, { multi : true} ) Increment Operator! Update Multiple Docs..both the drafted and published
  • 31. Better! • Now only one client operation to publish! • App tier can’t cause initial partial failure. • Arguably less observable interim state. • Notice that I bolded and underlined and italicized client above. • Allows us to use the $isolated operator.
  • 32. Worse!! • Concurrent attempts to publish would cause too many $inc operations.
  • 33. The Same: • Still not Atomic. • Still not truly Isolated, without using $isolated.
  • 35. Pattern #1: Benign Write • “Stage” a change in MongoDB • Craft queries to exclude it until appropriate • Use update operators to “commit” the staged change
  • 36. Publish State Cheat Sheet. Take 3: • 0 = uncommittedDraft • 1 = drafted • 2 = publishedAndDraft • 3 = published
  • 37. Content schema, with Publishing. Take 2.5: > content = db.content.find( { _id : 2 }) { _id: 2, contentId: 1, publishState: 0, name: “About Us”, contentType: “webpage”, properties : {} } > content = db.content.find( { _id : 1 }) { _id: 1, contentId: 1, publishState: 2, name: “About Us”, contentType: “webpage”, properties : {} }
  • 38. Publishing aware queries. (No Change) • Query published content: >content = db.content.find( {publishState : {$in:[2,3]}}) • Query drafted content: >content = db.content.find( {publishState : {$in:[1,2]}}) Note that we never look for publishState 0.
  • 39. Creating a draft. Take 2: • Insert uncommitted draft: >db.content.insert( { _id : 2, contentId:1, …, publishState : 0}) • Commit draft: >db.content.update( {contentId: 1}, { $inc: { publishState: 1 } }, { multi : true} ) Same Operation as Publish Draft AND Published updated
  • 40. Benign Write: A closer look • Reduce latency when coordinating changes across documents • If application fails after inserting uncommitted draft, nothing bad happens! • Look into advanced things you can do with arrays.
  • 42. I lied, there are lots more problems: • What if two clients concurrently attempt to create a draft of contentId 1? (Race condition/Dirty read) • What if two clients concurrently or even serially attempt to publish contentId 1? (Optimistic/pessimistic concurrency)
  • 43. Pattern #2: Concurrency Control with unique indexes. • Let MongoDB enforce truth • http://docs.mongodb.org/manual/core/write-operations- atomicity/#concurrency-control
  • 44. Unique Index: • Each document (version) follows from a known predecessor • We can ensure that a predecessor can be used only once >db.content.createIndex( { contentId : 1 , predId : 1 }, {unique: true})
  • 45. Content schema, now with Publishing. Take 3: > content = db.content.find( { _id : 2 } ) { _id: 2, contentId: 1, predId: 1, publishState: 0, name: “About Us”, contentType: “webpage”, properties : {} } > content = db.content.find( { _id : 1 } ) { _id: 1, contentId: 1, predId: null, publishState: 2, name: “About Us”, contentType: “webpage”, properties : {} }
  • 46. Publishing aware queries. (No Change) • Query published content: >content = db.content.find( {publishState : {$in:[2,3]}}) • Query drafted content: >content = db.content.find( {publishState : {$in:[1,2]}})
  • 47. Creating a draft. Take 3: >myPredId = db.content.find( {contentId:1, publishState:2} ).get(“_id”) >db.content.insert( { _id : 2, contentId:1, predId: myPredId, …, publishState: 0}) >db.content.update( {contentId: 1}, { $inc: { publishState: 1 } }, { multi : true} )
  • 48. Unique index, a closer look: • If insert fails due to duplicate key violation, a concurrent client beat us to creating a draft. • Depending on concurrency approach, application can retry or fail
  • 49. I lied, there are lots more problems: • What if two clients concurrently attempt to create a draft of contentId 1? • What if two clients concurrently or even serially attempt to publish contentId 1?
  • 50. Pattern #3: Update-if-Current, a.k.a Benign Wrong: • Only update documents that match known good state. • Discriminate via the update operator’s query argument. • React to the number of results updated. • http://docs.mongodb.org/manual/tutorial/update-if-current/
  • 51. Creating a draft. Take 4: >myPredId= db.content.find( {contentId:1, publishState:2} ).get(“_id”) >db.content.insert( { _id : 2, contentId:1, predId: myPredId, …, publishState : 0}) >db.content.update( {contentId: 1, publishState: { $in: [0,2]}}, { $inc: { publishState: 1 } }, { multi : true} ) Query Criteria
  • 52. Creating a draft. Take 4.5: >myPredId= db.content.find( {contentId:1, publishState:2} ).get(“_id”) >db.content.insert( { _id : 2, contentId:1, predId: myPredId, …, publishState : 0}) >db.content.update( {contentId: 1, publishState: { $in: [0,2]}, $or: [ { _id: myPredId}, { predId: myPredId} ]}, { $inc: { publishState: 1 } }, { multi : true} ) Even More Query Criteria!
  • 53. Publishing. Take 3: >db.content.update( {contentId: 1, publishState: { $in: [1,3]}}, { $inc: { publishState: 1 } }, { multi : true} ) Query Criteria
  • 54. I lied, there are lots more problems: • What if two clients concurrently attempt to create a draft of contentId 1? • What if two clients concurrently or even serially attempt to publish contentId 1?
  • 55. Now all problems solved?? • No atomicity, but how about eventual consistency? • MongoDB.org describes a pattern for pseudo two phase commits • We’d be forced back into orchestrating multiple updates from the App tier • But good inspiration.
  • 58. Alternate Approach: Embedded Documents • MongoDB supports Atomicity and Isolation @ document level. • Therefore, model schema with embedded documents when possible. • Store Draft and Published versions on the same document.
  • 59. Content schema, now with Publishing: > content = db.content.find( { _id : 1 } ) { _id: 1, name: “About Us”, contentType: “webpage”, publishState: [ ], draftedContent: { }, publishedContent: { } }
  • 60. Publishing aware queries: • Query published content: >content = db.content.find( {publishState : “published”}) • Query drafted content: >content = db.content.find( {publishState : {$in: [“published”,”drafted”]}} )
  • 61. Draft Content Schema: > content = db.content.find( { _id : 1 } ) { _id: 1, name: “About Us”, contentType: “webpage”, publishState: [“drafted”], draftedContent: { }, draftedToPublishContent: { }, } Remember publishedAndDraft
  • 62. Publishing: >db.content.update( {id: 1}, { $addToSet: { publishState: “published” }, $rename : { “draftedToPublishContent”, “publishedContent” } } ) addToSet rename
  • 63. Published Content Schema: > content = db.content.find( { _id : 1 } ) { _id: 1, name: “About Us”, contentType: “webpage”, publishState: [“drafted”, “published”], draftedContent: { }, publishedContent: { }, }
  • 64. The Good / Bad / Ugly: • Good • Isolation and Atomicity • Fewer moving parts • Bad • Larger Document Sizes • Ugly • Indexing Properties would require double the indexes
  • 65. Strong finish! • Know what you’re getting into with Isolation and Atomicity • Consider embedding documents where feasible • Employ patterns to preempt or remediate concurrency issues • Include pictures and jokes in your slide decks
  • 67. Eventual Consistency Approach: • Our publishing scenarios just move documents (versions) between states • We can easily detect and fix inconsistencies if we know where to look
  • 68. Pseudo Transaction: >db.transactions.insert( { _id : 1, type : “publish” , scope : { contentId : 1 } } ) Tag a draft or publish operation with transaction id: >publishResult = db.content.update( {contentId: 1, publishState: { $in: [1,3]}}, { $inc: { publishState: 1 }, $set: {transactionId: 1} }, { multi : true} )
  • 69. Pseudo Transaction, Part II: • If publishResult communicates no errors, we’re done! Clean up. >db.content.update( {transactionId : 1 } , {$unset : {transactionId : “”} ) >db.transactions.delete( { transactionId: 1}) Remove transaction reference
  • 70. Pseudo Transaction, Part III: • If publishResult returns errors, remediate • For unobserved errors, background thread polls transaction collection
  • 71. Pseudo Transaction, Part IV • Each document (version) is linked via predId • Each linked document should be one publishState increment apart • Detect inconsistencies and attempt roll forward or rollback • Rinse, repeat

Editor's Notes

  • #40: Don’t take my word for it. The red arrow says so.