Advanced Document Modeling Techniques from a High-Scale Commerce Platform

Advanced Document
Modeling Techniques
Tales from a High Scale Commerce Platform

Who am I?
• Jonathan Roeder @superduperjon
• Architect and Manager for Volusion Inc., working on Mozu
• 14 years building high-scale commerce with document
databases
• 4 years developing Mozu with MongoDB
• Motto : Why show a picture when you can write 1000 words.

What is Mozu?
• The Cloud Commerce Platform
• Powerful merchant capabilities
• Rich 3rd party developer extensibility
• Multi-Tenant SaaS
• Launched Jan 2014, after three years of ground-up
development

MongoDB is the backing store for Mozu’s:
• Shopping Carts
• Order Management System
• Distributed Cache
• Content Management System
• Logs
• Login Sessions
• 3rd Party Developer File System
• Document Database for 3rd
Party developers

What is Mozu Content Management System?
• Empowers developers and merchants to:
• Create schemas (content types)
• Author data conforming to schema (content items)
• Use data to construct commerce experiences

Mozu CMS @ 30,000 ft, in pencil!

Advanced Document Modeling Techniques from a High-Scale Commerce Platform

Example Content Item (Abridged):
> content = db.content.find( { _id : 1 } )
{
_id: 1,
name: “About Us”,
contentType: “webpage”,
properties : {
body : “<div>…</div>”,
tags : [“tag1” , “tag2”],
moreData : {…}
}
}

Fantastic background! Now what?
• Implement a feature
• Consider concerns
• Explore patterns to conquer the considered concerns
• Cool?
• Cool

Feature: CMS Publishing
• Merchants can draft, preview and publish content changes
• A content item may have the following versions:
• 0 or 1 Drafted
• 0 or 1 Published
• 0 or more Archived
• CMS can return draft version or published version
• Seems easy!?

Content schema with Publishing. Take 1:
> drafted = db.content.find( { _id : 2 } )
{
_id: 2,
contentId: 1,
publishState: “drafted”,
properties : {…}
}
> published = db.content.find( { _id : 1 }
)
{
_id: 1,
contentId: 1,
publishState: “published”,
properties : {…}
}

NUANCE!
• When only a published version exists, it must pretend it’s also
the draft.
• It makes sense! While administering content, a merchant wants
to see all drafts AND all published that have no draft.

Content schema with Publishing. Take 1.5:
{
_id: 2,
contentId: 1,
publishState: “publishedAndDraft”,
}

Publish State Cheat Sheet:
• drafted
• publishedAndDraft (there is a published version but no draft)
• published (there is a published AND a draft)

Publishing aware queries. Take 1:
• Query published content:
>published = db.content.find(
{publishState : {$in: [“publishedAndDraft”,
“published”]}}
)
• Query draft content:
>drafted = db.content.find(
{publishState : {$in: [“drafted”, “publishedAndDraft”]}}
)

Creating a draft. Take 1:
• Insert the draft:
>db.content.insert( { _id : 2, contentId:1, …,
publishState : “drafted”})
• Remove the published content from double duty:
>db.content.update(
{contentId: 1, publishState: ”publishedAndDraft”},
{$set: {publishState: “published” }}
)
Query
Update

Publishing. Take 1:
• Archive the previous published version (if present):
>db.content.update(
{contentId:1, publishState:”published”},
{ $set: { publishState: “archived” } }
)
• Publish the draft:
>db.content.update(
{contentId: 1, publishState:”drafted”},
{$set: {publishState: “publishedAndDraft” }}
)
Query
Update
Query
Update

Did you spot the problems? There were lots!
• Hints!
• Creating drafts and publishing drafts each require multiple
database operations.
• Truth is being transitioned across these operations.

Concurrency defined:
When multiple things try to do stuff at the same time and it’s all
okay.

Problem 1 – Isolation:
• When someone walks in on you in the changing room
• Publish:
• Interim reads from other clients would see that NO published
content exists.
Archive PublishInterim

Problem 2 – Atomicity:
• All or nothing, baby
• What if we have a Database or App failure after the first
operation or during second operation?
• We end up stuck in interim state, with either two published
versions or none

Time Out! Public Service Announcement:
• Know MongoDB
• Specifically, know Update Operators
• http://docs.mongodb.org/manual/reference/operator/update/

Can we do better?
• Let’s use MongoDB’s update operators!

Publish State Cheat Sheet. Take 2:
• 1 = drafted
• 2 = publishedAndDraft
• 3 = published

Content schema, with Publishing. Take 2:
> drafted = db.content.find( { _id : 2 })
{
_id: 2,
contentId: 1,
publishState: 1,
}
> published = db.content.find( { _id : 1
})
{
_id: 1,
contentId: 1,
publishState: 3,
}

Publishing aware queries. Take 2:
>published = db.content.find( publishState : {$in:[2,3]})
• Query draft content:
>drafted = db.content.find( publishState : {$in:[1,2]})

Publishing. Take 2:
>db.content.update(
{contentId: 1},
{ $inc: { publishState: 1 } },
{ multi : true}
)
Increment Operator!
Update Multiple Docs..both the drafted and published

Better!
• Now only one client operation to publish!
• App tier can’t cause initial partial failure.
• Arguably less observable interim state.
• Notice that I bolded and underlined and italicized client above.
• Allows us to use the $isolated operator.

Worse!!
• Concurrent attempts to publish would cause too many $inc
operations.

The Same:
• Still not Atomic.
• Still not truly Isolated, without using $isolated.

Pattern #1: Benign Write
• “Stage” a change in MongoDB
• Craft queries to exclude it until appropriate
• Use update operators to “commit” the staged change

Publish State Cheat Sheet. Take 3:
• 0 = uncommittedDraft
• 1 = drafted
• 2 = publishedAndDraft
• 3 = published

Content schema, with Publishing. Take 2.5:
> content = db.content.find( { _id : 2 })
{
_id: 2,
contentId: 1,
publishState: 0,
properties : {}
}
> content = db.content.find( { _id : 1 })
{
_id: 1,
contentId: 1,
publishState: 2,
properties : {}
}

Publishing aware queries. (No Change)
>content = db.content.find( {publishState : {$in:[2,3]}})
• Query drafted content:
Note that we never look for publishState 0.

• Insert uncommitted draft:
>db.content.insert( { _id : 2, contentId:1, …, publishState : 0})
• Commit draft:
>db.content.update(
{contentId: 1},
{ multi : true}
)
Same Operation as Publish
Draft AND Published updated

Benign Write: A closer look
• Reduce latency when coordinating changes across documents
• If application fails after inserting uncommitted draft, nothing bad
happens!
• Look into advanced things you can do with arrays.

Otherwise, all problems solved!

I lied, there are lots more problems:
• What if two clients concurrently attempt to create a draft of
contentId 1? (Race condition/Dirty read)
• What if two clients concurrently or even serially attempt to
publish contentId 1? (Optimistic/pessimistic concurrency)

Pattern #2: Concurrency Control with unique
indexes.
• Let MongoDB enforce truth
• http://docs.mongodb.org/manual/core/write-operations-
atomicity/#concurrency-control

Unique Index:
• Each document (version) follows from a known predecessor
• We can ensure that a predecessor can be used only once
>db.content.createIndex( { contentId : 1 , predId : 1 }, {unique:
true})

Content schema, now with Publishing. Take 3:
> content = db.content.find( { _id : 2 }
)
{
_id: 2,
contentId: 1,
predId: 1,
publishState: 0,
properties : {}
}
> content = db.content.find( { _id : 1 }
)
{
_id: 1,
contentId: 1,
predId: null,
publishState: 2,
properties : {}
}

Publishing aware queries. (No Change)

>myPredId = db.content.find( {contentId:1, publishState:2} ).get(“_id”)
>db.content.insert( { _id : 2, contentId:1, predId: myPredId, …,
publishState: 0})
>db.content.update(
{contentId: 1},
{ multi : true}
)

Unique index, a closer look:
• If insert fails due to duplicate key violation, a concurrent client
beat us to creating a draft.
• Depending on concurrency approach, application can retry or
fail

I lied, there are lots more problems:
• What if two clients concurrently attempt to create a draft of
contentId 1?
• What if two clients concurrently or even serially attempt to
publish contentId 1?

Pattern #3: Update-if-Current, a.k.a Benign Wrong:
• Only update documents that match known good state.
• Discriminate via the update operator’s query argument.
• React to the number of results updated.
• http://docs.mongodb.org/manual/tutorial/update-if-current/

>myPredId= db.content.find( {contentId:1, publishState:2} ).get(“_id”)
publishState : 0})
>db.content.update(
{contentId: 1, publishState: { $in: [0,2]}},
{ multi : true}
)
Query Criteria

Creating a draft. Take 4.5:
>myPredId= db.content.find( {contentId:1, publishState:2} ).get(“_id”)
publishState : 0})
>db.content.update(
{contentId: 1, publishState: { $in: [0,2]},
$or: [ { _id: myPredId}, { predId: myPredId} ]},
{ multi : true}
)
Even More Query Criteria!

Publishing. Take 3:
>db.content.update(
{ multi : true}
)
Query Criteria

Now all problems solved??
• No atomicity, but how about eventual consistency?
• MongoDB.org describes a pattern for pseudo two phase
commits
• We’d be forced back into orchestrating multiple updates from
the App tier
• But good inspiration.

Alternate Approach: Embedded Documents
• MongoDB supports Atomicity and Isolation @ document level.
• Therefore, model schema with embedded documents when
possible.
• Store Draft and Published versions on the same document.

Content schema, now with Publishing:
{
_id: 1,
publishState: [ ],
draftedContent: { },
publishedContent: { }
}

Publishing aware queries:
>content = db.content.find( {publishState : “published”})
>content = db.content.find(
{publishState : {$in: [“published”,”drafted”]}}
)

Draft Content Schema:
{
_id: 1,
publishState: [“drafted”],
draftedToPublishContent: { },
}
Remember publishedAndDraft

Publishing:
>db.content.update(
{id: 1},
{
$addToSet: { publishState: “published” },
$rename : { “draftedToPublishContent”,
“publishedContent” }
}
)
addToSet
rename

Published Content Schema:
{
_id: 1,
publishState: [“drafted”, “published”],
publishedContent: { },
}

The Good / Bad / Ugly:
• Good
• Isolation and Atomicity
• Fewer moving parts
• Bad
• Larger Document Sizes
• Ugly
• Indexing Properties would require double the indexes

Strong finish!
• Know what you’re getting into with Isolation and Atomicity
• Consider embedding documents where feasible
• Employ patterns to preempt or remediate concurrency issues
• Include pictures and jokes in your slide decks

Eventual Consistency Approach:
• Our publishing scenarios just move documents (versions)
between states
• We can easily detect and fix inconsistencies if we know where
to look

Pseudo Transaction:
>db.transactions.insert(
{ _id : 1, type : “publish” , scope : { contentId : 1 } }
)
Tag a draft or publish operation with transaction id:
>publishResult = db.content.update(
{ $inc: { publishState: 1 }, $set: {transactionId: 1} },
{ multi : true}
)

Pseudo Transaction, Part II:
• If publishResult communicates no errors, we’re done! Clean up.
>db.content.update(
{transactionId : 1 } ,
{$unset : {transactionId : “”}
)
>db.transactions.delete( { transactionId: 1})
Remove transaction reference

Pseudo Transaction, Part III:
• If publishResult returns errors, remediate
• For unobserved errors, background thread polls transaction
collection

Pseudo Transaction, Part IV
• Each document (version) is linked via predId
• Each linked document should be one publishState increment
apart
• Detect inconsistencies and attempt roll forward or rollback
• Rinse, repeat

Advanced Document Modeling Techniques from a High-Scale Commerce Platform

More Related Content

What's hot (20)

Similar to Advanced Document Modeling Techniques from a High-Scale Commerce Platform (20)

More from MongoDB (20)

Recently uploaded (20)

Advanced Document Modeling Techniques from a High-Scale Commerce Platform

Editor's Notes