SlideShare a Scribd company logo
The Rough Guide to MongoDB
Simeon Simeonov
@simeons
Founding. Funding.
Growing. Startups.
Why MongoDB?
I am @simeons
recruit amazing people
solve hard problems
ship
make users happy
repeat
The Rough Guide to MongoDB
Why MongoDB?
Again,
please
SQL is slow
(for our business)
SQL is slow
(for our developer workflow)
SQL is slow
(for our analytics system)
So what’s Swoop?
The Rough Guide to MongoDB
Display Advertising
Makes the Web Suck
User-focused optimization
Tens of millions of users
1000+% better than average
200+% better than Google
Swoop Fixes That
Mobile SDKs
iOS & Android
Web SDK
RequireJS & jQuery
Components
AngularJS
NLP, etc.
Python
Targeting
High-Perf Java
Analytics
Ruby 2.0
Internal Apps
Ruby 2.0 / Rails 3
Pub Portal
Ruby 2.0 / Rails 3
Ad Portal
Ruby 2.0 / Rails 4
MongoDB: the Good
Fast
Flexible
JavaScript
MongoDB: the Bad
Not Quite Enterprise-Grade
Not Quite Enterprise-Grade
Not Cheap to Run Well
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
RAM + locks == $$$
The Rough Guide to MongoDB
Five Steps to Happiness
Sharding
Native Relationships
Atomic Update Buffering
Content-Addressed Storage
Shell Tricks
The Rough Guide to MongoDB
The Rough Guide to MongoDB
// Google AdWords object model
Account
Campaign
AdGroup // this joins ads & keywords
Ad
Keyword
// For example
AdGroup has an Account
AdGroup has a Campaign
AdGroup has many Ads
AdGroup has many Keywords
Slam dunk
for SQL
// Let’s play a bit
Account
Campaign
AdGroup
Ad
Keyword
// Let’s play some more
Account
Campaign
AdGroup
Ad
Keyword
// There is just one bit left
Account
Campaign
AdGroup
1 Ad
0 Keyword
// build a hierarchical ID
accountIDcampaignIDadGroupID((0keywordID)|(1adID))
// a binary ID!
10100100001100000000101001100110101010010100
< accountID >< campaignID >< …
// Encode it in base 16, 32 or 64
{"_id" : "a4300a66a94d20f1", … }
// Example
The 5th
ad
Of the 3rd
ad group
Of the 7th
campaign
Of the 255th
account
could have the _id 0x00ff000700031005
The _id for the 10th
keyword of the same ad group
would be 0x00ff00070003000a
// Neat: the ad’s and keyword’s _ids contain the
// IDs of all of their ancestors in the hierarchy.
keywordId = 0x00ff00070003000a
adGroupId = keywordId & 0xffffffffffff0000
campaignId = keywordId & 0xffffffff00000000
accountId = keywordId & 0xffff000000000000
// has-a relationship is a simple lookup
account = db.accounts.findOne({_id: accountId})
// Neater: has-many relationships are just
// range queries on the _id index.
adGroupId = keywordId & 0xffffffffffff0000
startOfAds = adGroupId + 0x1000
endOfAds = adGroupId + 0x1fff
adsForKeyword = db.ads.find({
_id: {$gte: startOfAds, $lte: endOfAds}
})
// Technically, that was a join via the ad group.
// Who said Mongo can’t do joins???
The Rough Guide to MongoDB
The Rough Guide to MongoDB
The Rough Guide to MongoDB
The Rough Guide to MongoDB
> db.reports.findOne()
{
"_id" : …,
"period" : "hour",
"shard" : 0, // 16Mb doc limit protection
"topic" : "ce-1",
"ts" : ISODate("2012-06-12T05:00:00Z"),
"variations" : {
"2" : { // variationID (dimension set)
"hint" : {
"present" : 311, // hint.present is a metric
"clicks" : 1
}
},
"4" : {
"hint" : {
"present" : 331
}
}
}
}
Content Addressed Storage
Lazy join abstraction
Very space efficient
Extremely (pre-)cacheable
Join only happens during reporting
// Step 1: take a set of dimensions worth tracking
data = {
"domain_id" : "SW-28077508-16444",
"hint" : "Find an organic alternative",
"theme" : "red"
}
// Step 2: compute a digital signature, e.g., MD5
sig = "000069569F4835D16E69DF704187AC2F”
// Step 3: if new sig, increment a counter
counter = 264034
// Step 4: create a document in the context-
// addressed store collection for these
> db.cas.findOne()
{
"_id" : "000069569F4835D16E69DF704187AC2F", // MD5 hash
"data" : { // data that was digested to the hash above
"domain_id" : "SW-28077508-16444",
"hint" : "Find an organic alternative",
"theme” : "red"
},
"meta_data" : {
"id" : 264034 // variationID
},
"created_at" : ISODate("2013-02-04T12:05:34.752Z")
}
// Elsewhere, in the reports collection…
"variations" : {
"264034" : {
// metrics here
},
…
lazy join
The Rough Guide to MongoDB
// Use underscore.js in the shell
// See http://underscorejs.org/
function underscore() {
load("/mongo_hacks/underscore.js");
}
// Loads underscore.js on the MongoDB server
function server_underscore(force) {
force = force || false;
if (force || typeof(underscoreLoaded) === 'undefined') {
db.eval(cat("/mongo_hacks/underscore.js"));
underscoreLoaded = true;
}
}
// Callstack printing on exception -- wraps a function
function dbg(f) {
try {
f();
} catch (e) {
print("n**** Exception: " + e.toString());
print("n");
print(e.stack);
print("n");
if (arguments.length > 1) {
printjson(arguments);
print("n");
}
throw e;
}
}
function minutesAgo(minutes, d) {
d = d || new Date();
return new Date(d.valueOf() - minutes * 60 * 1000);
}
function hoursAgo(hours, d) {
d = d || new Date();
return minutesAgo(60 * hours, d);
}
function daysAgo(days, d) {
d = d || new Date();
return hoursAgo(24 * days, d);
}
// Don’t write in the shell.
// Use your fav editor, save & type t() in mongo
function t() {
load("/mongo_hacks/bag_of_tricks.js");
}
The Rough Guide to MongoDB
@simeons
sim@swoop.com

More Related Content

PPTX
Revolutionazing Search Advertising with ElasticSearch at Swoop
PDF
Google Adwords - Knowledge Sharing
PDF
Using lucene solr to build advertising systems
PDF
Pricing Analytics: Creating Linear & Power Demand Curves
PPTX
Incrementality: How to calculate the real ROI of your marketing programs
PPT
Key Metrics for Disaster Recovery and Business Continuity
PDF
Data Science - Part V - Decision Trees & Random Forests
DOC
Soalan Penilaian 2 Pendidikan Islam Tingkatan 2
Revolutionazing Search Advertising with ElasticSearch at Swoop
Google Adwords - Knowledge Sharing
Using lucene solr to build advertising systems
Pricing Analytics: Creating Linear & Power Demand Curves
Incrementality: How to calculate the real ROI of your marketing programs
Key Metrics for Disaster Recovery and Business Continuity
Data Science - Part V - Decision Trees & Random Forests
Soalan Penilaian 2 Pendidikan Islam Tingkatan 2

Similar to The Rough Guide to MongoDB (20)

KEY
PDF
MongoDB Atlas Workshop - Singapore
PPTX
Introduction to MongoDB
PPTX
Webinar: General Technical Overview of MongoDB for Ops Teams
KEY
Mongodb intro
PDF
MongoDB.pdf
PDF
Using MongoDB and Python
PDF
2016 feb-23 pyugre-py_mongo
PDF
Building Apps with MongoDB
PPTX
Webinar: General Technical Overview of MongoDB for Dev Teams
PDF
MongoDB Meetup
PPTX
MongoDB 3.4 webinar
PDF
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
PPTX
Dev Jumpstart: Build Your First App with MongoDB
PPTX
MongoDb and NoSQL
PDF
MongoDB: a gentle, friendly overview
PPTX
Intro To Mongo Db
PPTX
Dev Jumpstart: Build Your First App with MongoDB
PPTX
Einführung in MongoDB
PPT
Introduction to MongoDB
MongoDB Atlas Workshop - Singapore
Introduction to MongoDB
Webinar: General Technical Overview of MongoDB for Ops Teams
Mongodb intro
MongoDB.pdf
Using MongoDB and Python
2016 feb-23 pyugre-py_mongo
Building Apps with MongoDB
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB Meetup
MongoDB 3.4 webinar
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
Dev Jumpstart: Build Your First App with MongoDB
MongoDb and NoSQL
MongoDB: a gentle, friendly overview
Intro To Mongo Db
Dev Jumpstart: Build Your First App with MongoDB
Einführung in MongoDB
Introduction to MongoDB
Ad

More from Simeon Simeonov (10)

PDF
HyperLogLog Intuition Without Hard Math
PPTX
High accuracy ML & AI over sensitive data
PDF
Memory Issues in Ruby on Rails Applications
PPTX
Three Tips for Winning Startup Weekend
PPTX
Swoop: Solve Hard Problems & Fly Robots
PPTX
Build a Story Factory for Inbound Marketing in Five Easy Steps
PPTX
Strategies for Startup Success by Simeon Simeonov
PDF
Patterns of Successful Angel Investing by Simeon Simeonov
PPTX
Customer Development: The Second Decade by Bob Dorf
PPT
Beyond Bootstrapping
HyperLogLog Intuition Without Hard Math
High accuracy ML & AI over sensitive data
Memory Issues in Ruby on Rails Applications
Three Tips for Winning Startup Weekend
Swoop: Solve Hard Problems & Fly Robots
Build a Story Factory for Inbound Marketing in Five Easy Steps
Strategies for Startup Success by Simeon Simeonov
Patterns of Successful Angel Investing by Simeon Simeonov
Customer Development: The Second Decade by Bob Dorf
Beyond Bootstrapping
Ad

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
The various Industrial Revolutions .pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
August Patch Tuesday
PPTX
Modernising the Digital Integration Hub
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
Group 1 Presentation -Planning and Decision Making .pptx
O2C Customer Invoices to Receipt V15A.pptx
NewMind AI Weekly Chronicles - August'25-Week II
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
The various Industrial Revolutions .pptx
1. Introduction to Computer Programming.pptx
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
observCloud-Native Containerability and monitoring.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
DP Operators-handbook-extract for the Mautical Institute
1 - Historical Antecedents, Social Consideration.pdf
Getting started with AI Agents and Multi-Agent Systems
August Patch Tuesday
Modernising the Digital Integration Hub
Developing a website for English-speaking practice to English as a foreign la...
A novel scalable deep ensemble learning framework for big data classification...
OMC Textile Division Presentation 2021.pptx
WOOl fibre morphology and structure.pdf for textiles

The Rough Guide to MongoDB

  • 1. The Rough Guide to MongoDB Simeon Simeonov @simeons
  • 5. recruit amazing people solve hard problems ship make users happy repeat
  • 8. SQL is slow (for our business)
  • 9. SQL is slow (for our developer workflow)
  • 10. SQL is slow (for our analytics system)
  • 13. Display Advertising Makes the Web Suck User-focused optimization Tens of millions of users 1000+% better than average 200+% better than Google Swoop Fixes That
  • 14. Mobile SDKs iOS & Android Web SDK RequireJS & jQuery Components AngularJS NLP, etc. Python Targeting High-Perf Java Analytics Ruby 2.0 Internal Apps Ruby 2.0 / Rails 3 Pub Portal Ruby 2.0 / Rails 3 Ad Portal Ruby 2.0 / Rails 4
  • 16. MongoDB: the Bad Not Quite Enterprise-Grade Not Quite Enterprise-Grade Not Cheap to Run Well
  • 17. I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code I will write more robust code
  • 18. I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce I will design a better map-reduce
  • 19. RAM + locks == $$$
  • 21. Five Steps to Happiness Sharding Native Relationships Atomic Update Buffering Content-Addressed Storage Shell Tricks
  • 24. // Google AdWords object model Account Campaign AdGroup // this joins ads & keywords Ad Keyword // For example AdGroup has an Account AdGroup has a Campaign AdGroup has many Ads AdGroup has many Keywords Slam dunk for SQL
  • 25. // Let’s play a bit Account Campaign AdGroup Ad Keyword
  • 26. // Let’s play some more Account Campaign AdGroup Ad Keyword
  • 27. // There is just one bit left Account Campaign AdGroup 1 Ad 0 Keyword
  • 28. // build a hierarchical ID accountIDcampaignIDadGroupID((0keywordID)|(1adID)) // a binary ID! 10100100001100000000101001100110101010010100 < accountID >< campaignID >< … // Encode it in base 16, 32 or 64 {"_id" : "a4300a66a94d20f1", … }
  • 29. // Example The 5th ad Of the 3rd ad group Of the 7th campaign Of the 255th account could have the _id 0x00ff000700031005 The _id for the 10th keyword of the same ad group would be 0x00ff00070003000a
  • 30. // Neat: the ad’s and keyword’s _ids contain the // IDs of all of their ancestors in the hierarchy. keywordId = 0x00ff00070003000a adGroupId = keywordId & 0xffffffffffff0000 campaignId = keywordId & 0xffffffff00000000 accountId = keywordId & 0xffff000000000000 // has-a relationship is a simple lookup account = db.accounts.findOne({_id: accountId})
  • 31. // Neater: has-many relationships are just // range queries on the _id index. adGroupId = keywordId & 0xffffffffffff0000 startOfAds = adGroupId + 0x1000 endOfAds = adGroupId + 0x1fff adsForKeyword = db.ads.find({ _id: {$gte: startOfAds, $lte: endOfAds} }) // Technically, that was a join via the ad group. // Who said Mongo can’t do joins???
  • 36. > db.reports.findOne() { "_id" : …, "period" : "hour", "shard" : 0, // 16Mb doc limit protection "topic" : "ce-1", "ts" : ISODate("2012-06-12T05:00:00Z"), "variations" : { "2" : { // variationID (dimension set) "hint" : { "present" : 311, // hint.present is a metric "clicks" : 1 } }, "4" : { "hint" : { "present" : 331 } } } }
  • 37. Content Addressed Storage Lazy join abstraction Very space efficient Extremely (pre-)cacheable Join only happens during reporting
  • 38. // Step 1: take a set of dimensions worth tracking data = { "domain_id" : "SW-28077508-16444", "hint" : "Find an organic alternative", "theme" : "red" } // Step 2: compute a digital signature, e.g., MD5 sig = "000069569F4835D16E69DF704187AC2F” // Step 3: if new sig, increment a counter counter = 264034 // Step 4: create a document in the context- // addressed store collection for these
  • 39. > db.cas.findOne() { "_id" : "000069569F4835D16E69DF704187AC2F", // MD5 hash "data" : { // data that was digested to the hash above "domain_id" : "SW-28077508-16444", "hint" : "Find an organic alternative", "theme” : "red" }, "meta_data" : { "id" : 264034 // variationID }, "created_at" : ISODate("2013-02-04T12:05:34.752Z") } // Elsewhere, in the reports collection… "variations" : { "264034" : { // metrics here }, … lazy join
  • 41. // Use underscore.js in the shell // See http://underscorejs.org/ function underscore() { load("/mongo_hacks/underscore.js"); }
  • 42. // Loads underscore.js on the MongoDB server function server_underscore(force) { force = force || false; if (force || typeof(underscoreLoaded) === 'undefined') { db.eval(cat("/mongo_hacks/underscore.js")); underscoreLoaded = true; } }
  • 43. // Callstack printing on exception -- wraps a function function dbg(f) { try { f(); } catch (e) { print("n**** Exception: " + e.toString()); print("n"); print(e.stack); print("n"); if (arguments.length > 1) { printjson(arguments); print("n"); } throw e; } }
  • 44. function minutesAgo(minutes, d) { d = d || new Date(); return new Date(d.valueOf() - minutes * 60 * 1000); } function hoursAgo(hours, d) { d = d || new Date(); return minutesAgo(60 * hours, d); } function daysAgo(days, d) { d = d || new Date(); return hoursAgo(24 * days, d); }
  • 45. // Don’t write in the shell. // Use your fav editor, save & type t() in mongo function t() { load("/mongo_hacks/bag_of_tricks.js"); }