SlideShare a Scribd company logo
Letters from the Trenches:

Lessons Learned Taking MongoDB to Production
October 17, 2013

Rick Warren

rick.warren@eharmony.com
Traditional Internet Dating Service
Unidirectional User-Defined Criteria
eHarmony Matching
Bidirectional User-Defined Criteria
eHarmony Matching: 3 Parts

1. Bidirectional
User-Defined
Criteria

2. Research-Based
Compatibility
Models

3. Machine-Learned
Affinity Models

Photo Credits

Magnifying glass: andercismo @ http://www.flickr.com/photos/andercismo/
Machine learning: University of Maryland Press Releases @ http://www.flickr.com/photos/umdnews/
Application: Find Potential Matches
As fast as possible:
1. Find people who
meet each other’s
preferences

1. Bidirectional
User-Defined
Criteria

2. Discard combos
that violate
Compatibility
Models
Application: Find Potential Matches
• User attributes in
MongoDB
– Replicated
– Sharded

• Data access pattern:
1. Bidirectional
User-Defined
Criteria

– Read-heavy
– Complex queries

• Java application
Application: Find Potential Matches
• In full production
> 6 mos
– Following several mos
limited production
– Following several mos
intensive dev+testing

• No production
outages
• MongoDB no longer
the thing we worry
about most

• User attributes in
MongoDB
– Replicated
– Sharded

• Data access pattern:
– Read-heavy
– Complex queries

• Java application
Lesson: Provision for Success
 Fit all data & indexes in memory
– MongoDB storage implemented using
mem-mapped files
– Beware under-provisioned VMs

 Minimize field names to keep data
as small as possible
– “Schema-less records” ==
“schema repeated millions of times”
– Morphia Java library can help with mapping
Lesson: Provision for Success
Scale write ops & data volume by adding shards

Scale read ops

by adding secondaries

Shard / RS

Shard / RS

Primary

Primary

Secondary

Secondary

Secondary

Secondary

…

…

…
Lesson: Be Ready to Tinker
• Many processes:

 Use Puppet, Chef, or similar

– mongod on each
node, primary or secondary

– Helps with config
files, command-line arguments

– 2 MMS agents

– Insufficient for adding
secondaries, configuring
indexes, etc.

– Plus, if sharding:
• mongos for each app instance
• 3 config servers

• …Each configured
separately & differently
– Configuration file
– Manual commands to set up

• Less likely to have
DBA support
– …and relational Best
Practices may not transfer

 If scripting, use real client
driver, not mongo shell
– Doesn’t handle output or errors
consistently
– Can’t wait in JavaScript

 Train your DB/Ops team(s)
– And expect to do more yourself
Lesson: Shadow Mode Is Your Friend
 Test with real production data, conditions, and queries
 Measure everything (MMS is a good start, but insufficient)
Real Application

Real Events
& Requests

“Shadow” Application

X

 Kill mongod instances to verify resiliency
Primary school enrollment, Armenia:

http://data.worldbank.org/country/armenia
Lesson: Be Ready to Restore Your Data
• Schemas will
change

 Maintain 2nd copy in
another format
– Backing source of truth?

• Shard key(s) will
change
– More on this later…

• You’ll experience

MongoDB bugs

– Backup in standard format?
– Second cluster with different
version of MongoDB?

 Increment DB name
with each reload
 Automate reload
process, and use it

Image credit:

http://tutorialphotoshopcs-putradom.blogspot.com/2012/11/create-dramatic-meteor-and-burning-city.html
Lesson: Pick a Good Shard Key

1. Distribute Data Volume Evenly
– This is what auto-balancing does for you.

2. Multiply Query Performance
– Isolate queries to 1 shard to multiply read
capacity by # of shards.

3. Distribute Workload Evenly
– Conflicts with above!
Lesson: Pick a Good Shard Key
Shard 1

Shard 2

mongos
1. Distribute Data Volume Evenly

– This is what auto-balancing does for you.

2. Multiply Query Performance
– Isolate queries to 1 shard to multiply read
capacity by # of shards.

3. Distribute Workload Evenly
– Conflicts with above!

Jessica Rabbit: http://disney.wikia.com/wiki/Jessica_Rabbit
Steve Urkel:
http://celebratingtvandfilmgeeks.wordpress.com/2010/04/25/steve-urkel-the-
Lesson: Pick a Good Shard Key
DO These Things

BEWARE These Things

 Use fields appearing in
every query

• Include serial numbers
(or similar)

 Choose combo that
finely partitions data

• Hash fields when reads
might be a problem

 Measure relative load
across shards

• Mutable fields in shard
key—remove and add

– Consider adding
secondaries to loaded
shard(s) ONLY
Summary

1. Provision for Success
2. Be Ready to Tinker

3. Shadow Mode Is Your Friend
4. Be Ready to Restore Your Data

5. Pick a Good Shard Key
We’re Hiring

http://www.eharmony.com/about/careers

rick.warren@eharmony.com

More Related Content

DOC
Web crawler synopsis
PPTX
Spam detection using ML
PDF
In pursuit of messaging broker(s)
PPTX
Swift meetup22june2015
PPTX
eHarmony @ Phoenix Con 2016
PPTX
EHarmony dating service business case
PPTX
Building an ETL pipeline for Elasticsearch using Spark
PDF
Building Scalable Stateless Applications with RxJava
Web crawler synopsis
Spam detection using ML
In pursuit of messaging broker(s)
Swift meetup22june2015
eHarmony @ Phoenix Con 2016
EHarmony dating service business case
Building an ETL pipeline for Elasticsearch using Spark
Building Scalable Stateless Applications with RxJava

Similar to Letters from the Trenches: Lessons Learned Taking MongoDB to Production (20)

PDF
ModelMine a tool to facilitate mining models from open source repositories pr...
PPT
Beautiful Models in PHP
PPTX
PDF
Building an interactive timeline from facebook photos
PDF
Your first web application. From Design to Launch
PDF
Recsys 2016
PDF
10-Step Methodology to Building a Single View with MongoDB
PPT
Web Macros
PPTX
Sec presentation
PDF
Finding Patterns in the Clouds - Cloud Design Patterns
KEY
Data Abstraction for Large Web Applications
PPT
Database Management System Processing.ppt
PDF
Models in Motion: Agile MDE for Continuous Adaptation
PPTX
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
PPTX
Techorama - Evolvable Application Development with MongoDB
PDF
Scaling Instagram
PDF
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
PPT
Applications ,Issues & Technology in Data mining -
PPT
Adaptive Educational Hypermedia
PPTX
Social job search
ModelMine a tool to facilitate mining models from open source repositories pr...
Beautiful Models in PHP
Building an interactive timeline from facebook photos
Your first web application. From Design to Launch
Recsys 2016
10-Step Methodology to Building a Single View with MongoDB
Web Macros
Sec presentation
Finding Patterns in the Clouds - Cloud Design Patterns
Data Abstraction for Large Web Applications
Database Management System Processing.ppt
Models in Motion: Agile MDE for Continuous Adaptation
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
Techorama - Evolvable Application Development with MongoDB
Scaling Instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
Applications ,Issues & Technology in Data mining -
Adaptive Educational Hypermedia
Social job search
Ad

More from Rick Warren (20)

PPTX
Real-World Git
PPTX
Patterns of Data Distribution
PPTX
Data-centric Invocable Services
PDF
Engineering Interoperable and Reliable Systems
PPTX
Scaling DDS to Millions of Computers and Devices
PDF
DDS in a Nutshell
PDF
Java 5 Language PSM for DDS: Final Submission
PPTX
Java 5 PSM for DDS: Revised Submission (out of date)
PPTX
C++ PSM for DDS: Revised Submission
PPTX
Web-Enabled DDS: Revised Submission
PPTX
Java 5 PSM for DDS: Initial Submission (out of date)
PDF
Extensible and Dynamic Topic Types for DDS, Beta 1
PPTX
Mapping the RESTful Programming Model to the DDS Data-Centric Model
PPTX
Large-Scale System Integration with DDS for SCADA, C2, and Finance
PPT
Data-Centric and Message-Centric System Architecture
PPTX
Extensible and Dynamic Topic Types for DDS
PPTX
Easing Integration of Large-Scale Real-Time Systems with DDS
PPTX
Java 5 API for DDS RFP (out of date)
PPTX
Introduction to DDS
PPTX
Extensible and Dynamic Topic Types for DDS
Real-World Git
Patterns of Data Distribution
Data-centric Invocable Services
Engineering Interoperable and Reliable Systems
Scaling DDS to Millions of Computers and Devices
DDS in a Nutshell
Java 5 Language PSM for DDS: Final Submission
Java 5 PSM for DDS: Revised Submission (out of date)
C++ PSM for DDS: Revised Submission
Web-Enabled DDS: Revised Submission
Java 5 PSM for DDS: Initial Submission (out of date)
Extensible and Dynamic Topic Types for DDS, Beta 1
Mapping the RESTful Programming Model to the DDS Data-Centric Model
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Data-Centric and Message-Centric System Architecture
Extensible and Dynamic Topic Types for DDS
Easing Integration of Large-Scale Real-Time Systems with DDS
Java 5 API for DDS RFP (out of date)
Introduction to DDS
Extensible and Dynamic Topic Types for DDS
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Tartificialntelligence_presentation.pptx
PPT
Teaching material agriculture food technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Machine Learning_overview_presentation.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
A comparative analysis of optical character recognition models for extracting...
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
SOPHOS-XG Firewall Administrator PPT.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
Dropbox Q2 2025 Financial Results & Investor Presentation
Assigned Numbers - 2025 - Bluetooth® Document
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
Tartificialntelligence_presentation.pptx
Teaching material agriculture food technology
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Machine Learning_overview_presentation.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A comparative analysis of optical character recognition models for extracting...

Letters from the Trenches: Lessons Learned Taking MongoDB to Production

  • 1. Letters from the Trenches: Lessons Learned Taking MongoDB to Production October 17, 2013 Rick Warren rick.warren@eharmony.com
  • 2. Traditional Internet Dating Service Unidirectional User-Defined Criteria
  • 4. eHarmony Matching: 3 Parts 1. Bidirectional User-Defined Criteria 2. Research-Based Compatibility Models 3. Machine-Learned Affinity Models Photo Credits Magnifying glass: andercismo @ http://www.flickr.com/photos/andercismo/ Machine learning: University of Maryland Press Releases @ http://www.flickr.com/photos/umdnews/
  • 5. Application: Find Potential Matches As fast as possible: 1. Find people who meet each other’s preferences 1. Bidirectional User-Defined Criteria 2. Discard combos that violate Compatibility Models
  • 6. Application: Find Potential Matches • User attributes in MongoDB – Replicated – Sharded • Data access pattern: 1. Bidirectional User-Defined Criteria – Read-heavy – Complex queries • Java application
  • 7. Application: Find Potential Matches • In full production > 6 mos – Following several mos limited production – Following several mos intensive dev+testing • No production outages • MongoDB no longer the thing we worry about most • User attributes in MongoDB – Replicated – Sharded • Data access pattern: – Read-heavy – Complex queries • Java application
  • 8. Lesson: Provision for Success  Fit all data & indexes in memory – MongoDB storage implemented using mem-mapped files – Beware under-provisioned VMs  Minimize field names to keep data as small as possible – “Schema-less records” == “schema repeated millions of times” – Morphia Java library can help with mapping
  • 9. Lesson: Provision for Success Scale write ops & data volume by adding shards Scale read ops by adding secondaries Shard / RS Shard / RS Primary Primary Secondary Secondary Secondary Secondary … … …
  • 10. Lesson: Be Ready to Tinker • Many processes:  Use Puppet, Chef, or similar – mongod on each node, primary or secondary – Helps with config files, command-line arguments – 2 MMS agents – Insufficient for adding secondaries, configuring indexes, etc. – Plus, if sharding: • mongos for each app instance • 3 config servers • …Each configured separately & differently – Configuration file – Manual commands to set up • Less likely to have DBA support – …and relational Best Practices may not transfer  If scripting, use real client driver, not mongo shell – Doesn’t handle output or errors consistently – Can’t wait in JavaScript  Train your DB/Ops team(s) – And expect to do more yourself
  • 11. Lesson: Shadow Mode Is Your Friend  Test with real production data, conditions, and queries  Measure everything (MMS is a good start, but insufficient) Real Application Real Events & Requests “Shadow” Application X  Kill mongod instances to verify resiliency Primary school enrollment, Armenia: http://data.worldbank.org/country/armenia
  • 12. Lesson: Be Ready to Restore Your Data • Schemas will change  Maintain 2nd copy in another format – Backing source of truth? • Shard key(s) will change – More on this later… • You’ll experience MongoDB bugs – Backup in standard format? – Second cluster with different version of MongoDB?  Increment DB name with each reload  Automate reload process, and use it Image credit: http://tutorialphotoshopcs-putradom.blogspot.com/2012/11/create-dramatic-meteor-and-burning-city.html
  • 13. Lesson: Pick a Good Shard Key 1. Distribute Data Volume Evenly – This is what auto-balancing does for you. 2. Multiply Query Performance – Isolate queries to 1 shard to multiply read capacity by # of shards. 3. Distribute Workload Evenly – Conflicts with above!
  • 14. Lesson: Pick a Good Shard Key Shard 1 Shard 2 mongos 1. Distribute Data Volume Evenly – This is what auto-balancing does for you. 2. Multiply Query Performance – Isolate queries to 1 shard to multiply read capacity by # of shards. 3. Distribute Workload Evenly – Conflicts with above! Jessica Rabbit: http://disney.wikia.com/wiki/Jessica_Rabbit Steve Urkel: http://celebratingtvandfilmgeeks.wordpress.com/2010/04/25/steve-urkel-the-
  • 15. Lesson: Pick a Good Shard Key DO These Things BEWARE These Things  Use fields appearing in every query • Include serial numbers (or similar)  Choose combo that finely partitions data • Hash fields when reads might be a problem  Measure relative load across shards • Mutable fields in shard key—remove and add – Consider adding secondaries to loaded shard(s) ONLY
  • 16. Summary 1. Provision for Success 2. Be Ready to Tinker 3. Shadow Mode Is Your Friend 4. Be Ready to Restore Your Data 5. Pick a Good Shard Key

Editor's Notes

  • #2: Specifically, we’ll be talking about 5 lessons.It should take about 30 minutes.
  • #13: At some point, you’ll realize the data in your cluster isn’t what and/or how you need. You’ll need to reconstruct it.In first two cases, you could dump and reload a single cluster.What about production changes in the mean time?
  • #16: Idea is for the breakdown of data across shards to reflect the same natural divisions of data you’re likely to query against.