SlideShare a Scribd company logo
Workload Isolation...
@asya999 #askAsya - ask me what you're doing wrong
(you might be doing it wrong)
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
7 Deadly Sins
(of bad MongoDB deployments)
5 Stages of Grief
(of bad MongoDB deployments)
10 Commandments
(of good MongoDB deployments)
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
Human Error
Power Outages
Fire
Server Room Issues
Unscheduled Updates & Patches
Workload Isolation?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
HA
Global Write
Clusters
Scale out
Workload
Isolation
Why?
Scalability
Availability
Recoverability
Security
Scalability
Availability
Recoverability
Security
These Stories Are True
or they are based on stories that are true
Based* on real cases filed with MongoDB Support
All names changed
Some details* may have been omitted
* some cases may have been combined and/or embellished to make a point
* just boring ones, not the really embarrassing ones
January 9
The Case of Mistaken Delete
simple typo
replica set
had 45 days in the oplog
but oldest backup was 90+ days
but ... saved the DB files (all of them)
We accidentally remove()ed an entire collection.
Is there a way to undo?
recovery not for the faint of heart ... all data was recovered
Conclusion:
Replication ≠ Backups
Do regular backups.
Don't do production operations "ad hoc"
noSQL ≠ noDBA
We accidentally remove()ed an entire collection.
Is there a way to undo?
Scalability
Application
Driver
mongod
mongod
mongod
Replica set
Application
Driver
mongod
mongod
mongod
Shard 1
Application
Driver
mongos
mongod
mongod
mongod
mongod
mongod
mongod
Shard 1 Shard 2
Application
Driver
mongos
mongod
mongod
mongod
mongod
mongod
mongod
Shard 1 Shard 2
mongod
mongod
mongod
Shard 3
Application
Driver
mongos mongos•••
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
•••
Shard 1 Shard 2 Shard 3 Shard N
Application
Driver
mongos mongos•••
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
•••
Shard 1 Shard 2 Shard 3 Shard N
Application
Driver
mongos
Application
Driver
mongos mongos•••
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
•••
Shard 1 Shard 2 Shard 3 Shard N
Application
Driver
Application
Driver
mongos•••mongos
•••
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongos
•••
Shard 1 Shard 2 Shard 3 Shard N
DATA
DATA
DATA
DATA
mongos mongosmongos mongos
DATA
DATA
DATA
DATA
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongos
•••
Shard 1 Shard 2 Shard 3 Shard N
mongos mongosmongos mongos
DATA
DATA
DATA
DATA
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongos
•••
Shard 1 Shard 2 Shard 3 Shard N
mongos mongosmongos mongos
DATA
DATA
DATA
DATA
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongos
•••
Shard 1 Shard 2 Shard 3 Shard N
mongos mongosmongos mongos
DATA
DATA
DATA
DATA
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
app 0
•••
Shard 0 Shard 0 Shard 0 Shard 0
app 2 app 3app 1 app 4
www.etiennemansard.c om
Horizontal Scaling
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
SHARDING
2
SHARDING
2
3
SHARDING
2
4
3
SHARDING
5
2
4
3
SHARDING
6
5
2
4
3
6
SHARDING
SCATTER-GATHER
SHARDING
TARGETED
6
5
2
4
3
6
SHARDING
COMPARE THROUGHPUT
Query Scaling Rate Comparison
Number of shards
vs
Number of queries
Each query
target one shard
Per shard/system TOTAL
Each query
target all shard
Per shard/system TOTAL
1 10,000/10K 10K/10K
2 5,000/10K 10K/20K
5 2,000/10K 10K/50K
10 1,000/10K 10K/100K
If your application sends 10,000 queries
Query Scaling Rate Comparison
Number of shards
vs
Total query capacity
Each query
target one shard
Per shard/system TOTAL
Each query
target all shard
Per shard/system TOTAL
1 10K/10K 10K/10K
2 10K/20K 10K/10K
5 10K/50K 10K/10K
10 10K/100K 10K/10K
If each shard can process 10,000 queries
SHARDING
COMPARE LATENCY
66
SHARDING
COMPARE LATENCY
SHARDING
COMPARE LATENCY
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
Can I prove it?
Can I demonstrate it?
PROOF
Open Source
Reference Implementation
• Various Fanout Feed Models
• User Graph Implementation
• Content storage
Configurable models and options
Built-in benchmarking
https://github.com/mongodb-labs/socialite
Socialite: Architecture
GraphServiceProxy
ContentProxy
Optimized for Performance
User
timeline
cache
Schema
Indexing Horizontal Scaling
Operational Testing Built-in
User facing latency
Linear scaling of resources
Most important criteria?
• Generates realistic real-life-scale workload
• compared to Twitter, etc.
• Confirms architecture scales linearly
• without loss of responsiveness
Perfect Candidate to Benchmark
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
Comparison
Shard Everything Separate Clusters
Scalability
Availability
Reliability
Debug-ability
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
January 18
The case of deleted database
Accidentally deleted DB directory
Restored recent backup... but mongod won't start
Backups were not good: taken incorrectly
Unable to start mongod process
Unable to start mongod process
Conclusion:
Most Important Part of Backups:
RESTORES
Test your backups
noSQL ≠ noDBA
Unable to start mongod process
Comparison
Shard Everything Separate Clusters
Scalability
Availability
Reliability
Debug-ability
January 13
The case of missing metadata
January 13:
"DBA" adds a new shard
"DBA" observes that data does not seem to be migrating to new shard
"DBA" sets out to "fix" the "problem"
By "re-sharding" the database/collection in question
Which doesn't work (because it's already sharded)
"Simple" solution: remove the config DB metadata for chunks!
Try resharding again!
Force it!
Sequence of Events
How did we help them fix it?
My colleague
The Operation
Result
Conclusion
noSQL ≠ noDBA
Everybody has a test
environment...
Some people are lucky enough
enough to have a totally separate
production environment.
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
May 22
The case of the single data center
At 4:30 PM, Friday alpha page comes in
Two senior support engineers work the ticket till 10 PM
The details:
33 node,
17 terabyte sharded cluster (11 shards, 3 node replica each),
single data center,
no journaling
no backups
Add to that:
power failure in the data center
no UPS
Result:
unreadable data on every node
noSQL ≠ noDBA
Sequence of Events
Bad things happen to good data centers
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
December 30
The case of disappearing data files
Dec 29 2013 10:35:00 AM: db.stats() is showing dataSize > fileSize
Dec 29 2013 04:44:00 PM: "there are data files viewed as missing by `mongod`"
Dec 30 2013 02:12:00 AM: Seeing incorrect fileSize on numerous servers
you can see a drop in fileSize on 12/28 in MMS with
no corresponding drop in the other size metrics.
Dec 30 2013 02:19:00 AM: "[do] these databases have anything in common,
especially with the xxxx DB from yesterday?
Dec 30 2013 02:22:00 AM: Nothing comes to mind ... that DBs have in common
Sequence of Events
Dec 30 2013 02:28:00 AM: We notice this all happening at the same time.
We think something might be deleting data files.
Dec 30 2013 02:31:00 AM: "Something is deleting data files outside mongod?"
Dec 29 2013 04:44:00 PM: "there are data files viewed as missing by `mongod`"
Dec 30 2013 02:12:00 AM: Seeing incorrect fileSize on numerous servers
you can see a drop in fileSize on 12/28 in MMS with
no corresponding drop in the other size metrics.
Dec 30 2013 02:19:00 AM: "[do] these databases have anything in common,
especially with the xxxx DB from yesterday?
Dec 30 2013 02:22:00 AM: Nothing comes to mind ... that DBs have in common
Sequence of Events
Dec 30 2013 02:28:00 AM: We notice this all happening at the same time.
We think something might be deleting data files.
Dec 30 2013 02:31:00 AM: "Something is deleting data files outside mongod?"
Dec 30 2013 02:57:00 AM: Yes. We deleted actual db files on both the primaries
and secondaries on the 28th.
Dec 29 2013 04:44:00 PM: "there are data files viewed as missing by `mongod`"
Dec 30 2013 02:12:00 AM: Seeing incorrect fileSize on numerous servers
you can see a drop in fileSize on 12/28 in MMS with
no corresponding drop in the other size metrics.
Dec 30 2013 02:19:00 AM: "[do] these databases have anything in common,
especially with the xxxx DB from yesterday?
Dec 30 2013 02:22:00 AM: Nothing comes to mind ... that DBs have in common
Sequence of Events
Dec 28 2013: Someone notices that they are low on disk space and as a solution
writes a shell script that finds every file on every disk on every
server that's bigger than 1GB in size and which hasn't been
accessed in >3 days.
And it then deletes it.
This script ran on every server deleting every database file bigger than 1GB
which hasn't been accessed in the previous few days...
Dec 30 2013 02:28:00 AM: We notice this all happening at the same time.
We think something might be deleting data files.
Dec 30 2013 02:31:00 AM: "Something is deleting data files outside mongod?"
Dec 30 2013 02:57:00 AM: Yes. We deleted actual db files on both the primaries
Ultimately, NO data was lost.
running `mongod` process keeps the "deleted" file from being removed
running `mongod` can recreate all data files via db.repair()
BUT... there is no disk space for db.repair()
Luckily, an extra server or two are "found" and allow rotating re-sync of new secondary in each
replica set.
Again: no data was lost. All data was fully and successfully recovered.
Guess the Outcome!
THANK YOU
Enjoy The Rest of the Day

More Related Content

PDF
Workload Isolation - Asya Kamsky
PDF
Building OpenDNS Stats
TXT
2003 December
PDF
Intro to the Hadoop Stack @ April 2011 JavaMUG
PDF
Creating social features at BranchOut using MongoDB
PDF
Cassandra introduction @ ParisJUG
PDF
Building a Social Network with MongoDB
PPTX
Ops Jumpstart: MongoDB Administration 101
Workload Isolation - Asya Kamsky
Building OpenDNS Stats
2003 December
Intro to the Hadoop Stack @ April 2011 JavaMUG
Creating social features at BranchOut using MongoDB
Cassandra introduction @ ParisJUG
Building a Social Network with MongoDB
Ops Jumpstart: MongoDB Administration 101

What's hot (10)

PDF
Building Data Driven Products With Ruby - RubyConf 2012
PDF
MongoDB @ Frankfurt NoSql User Group
PDF
Cassandra summit keynote 2014
PDF
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
PDF
Cassandra Summit 2013 Keynote
PDF
Tokyo cassandra conference 2014
PDF
Optimizing Slow Queries with Indexes and Creativity
PPTX
Web Server Scheduling
PPTX
Replication and Replica Sets
PPTX
MySQL Rises with JSON Support
Building Data Driven Products With Ruby - RubyConf 2012
MongoDB @ Frankfurt NoSql User Group
Cassandra summit keynote 2014
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
Cassandra Summit 2013 Keynote
Tokyo cassandra conference 2014
Optimizing Slow Queries with Indexes and Creativity
Web Server Scheduling
Replication and Replica Sets
MySQL Rises with JSON Support
Ad

Similar to MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong? (20)

PPTX
Big Data Analytics: Finding diamonds in the rough with Azure
PPTX
MongoDB Days UK: Tales from the Field
ODP
Databases benoitg 2009-03-10
PDF
DataDay 2023 Presentation - Notes
PDF
System design handwritten notes guidance
PPTX
A gentle introduction to the world of BigData and Hadoop
PDF
PDF
System Design.pdf
PDF
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
PDF
MongoDB: Optimising for Performance, Scale & Analytics
PDF
Hadoop bank
PDF
Optimizing MongoDB: Lessons Learned at Localytics
PDF
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
PPTX
Performance Tipping Points - Hitting Hardware Bottlenecks
PPTX
Data oriented design and c++
PPTX
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
PDF
Use Your MySQL Knowledge to Become a MongoDB Guru
PPTX
Lessons Learned Migrating 2+ Billion Documents at Craigslist
PPTX
Tales from the Field
PDF
Data corruption
Big Data Analytics: Finding diamonds in the rough with Azure
MongoDB Days UK: Tales from the Field
Databases benoitg 2009-03-10
DataDay 2023 Presentation - Notes
System design handwritten notes guidance
A gentle introduction to the world of BigData and Hadoop
System Design.pdf
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
MongoDB: Optimising for Performance, Scale & Analytics
Hadoop bank
Optimizing MongoDB: Lessons Learned at Localytics
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Performance Tipping Points - Hitting Hardware Bottlenecks
Data oriented design and c++
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Use Your MySQL Knowledge to Become a MongoDB Guru
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Tales from the Field
Data corruption
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Modernizing your data center with Dell and AMD
PPTX
Cloud computing and distributed systems.
PDF
Approach and Philosophy of On baking technology
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Modernizing your data center with Dell and AMD
Cloud computing and distributed systems.
Approach and Philosophy of On baking technology
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing

MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?