SlideShare a Scribd company logo
Scaling to 30,000 Requests Per Second
and Beyond
with MongoDB
Mike Chesnut
Director of Operations Engineering
Crittercism
Scaling to 30,000 Requests Per Second
and Beyond
with MongoDB
Mike Chesnut
Director of Operations Engineering
Crittercism
How a Startup Gets Started
● Pick something and go with it
How a Startup Gets Started
● Pick something and go with it
● Make mistakes along the way
How a Startup Gets Started
● Pick something and go with it
● Make mistakes along the way
● Correct the mistakes you can
How a Startup Gets Started
● Pick something and go with it
● Make mistakes along the way
● Correct the mistakes you can
● Work around the ones you can’t
How a Startup Gets Started
What I’ll Talk About
What I’ll Talk About
● Crittercism - Background and Architecture
What I’ll Talk About
● Crittercism - Background and Architecture
● Router (mongos) Architecture
What I’ll Talk About
● Crittercism - Background and Architecture
● Router (mongos) Architecture
● Sharding Considerations
What I’ll Talk About
● Crittercism - Background and Architecture
● Router (mongos) Architecture
● Sharding Considerations
● The Balancing Act
What I’ll Talk About
● Crittercism - Background and Architecture
● Router (mongos) Architecture
● Sharding Considerations
● The Balancing Act
● Q&A
Critter-What?
Critter-What?
A Brief History...
Critter-What?
Our Founders
(Rob, Andrew, Jeeyun)
Critter-What?
Our Founders
(Rob, Andrew, Jeeyun)
Let’s make a mobile app!
It’ll be awesome!
Critter-What?
(Unnamed Dating App)
Critter-What?
Critter-What?
Critter-What?
Our Founders
(Rob, Andrew, Jeeyun)
Our app isn’t so awesome
after all...
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Critter-What?
Architecture
Architecture
Architecture
API
Architecture
APIFeedback
Architecture
APIFeedback
Crashes
Architecture
APIFeedback
App Loads
Crashes
Architecture
APIFeedback
App Loads
Crashes
Handled
Exceptions
Architecture
APIFeedback
App Loads
Crashes
Handled
Exceptions
Architecture
API
App Loads
Crashes
Handled
Exceptions
Architecture
API
App Loads
Crashes
Handled
Exceptions
Architecture
API
Crashes
Handled
Exceptions
App Loads
batch
Architecture
API
Crashes
Handled
Exceptions
Metadata
App Loads
batch
Architecture
DynamoDB
API
Crashes
Handled
Exceptions
Metadata
App Loads
batch
Architecture
DynamoDB
API
Crashes
Handled
Exceptions
Metadata
App Loads
batch
Architecture
DynamoDB
API
API
Crashes
Handled
Exceptions
Metadata
Performance
Data
Geo Data
App Loads
batch
Architecture
DynamoDB
API
API
Crashes
Handled
Exceptions
Metadata
Performance
Data
Geo Data
40,000 req/s
App Loads
batch
Growth
Router Architecture
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
MongoDB Cluster
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
client
process
application server
client
process
application server
Client Application(s) MongoDB Cluster
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
mongod
server
mongod
server
config
server
config servers
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
mongod
server
mongod
server
config
server
config servers
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
mongod
server
mongod
server
config
server
config servers
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
Single mongos per client problems we encountered:
Router Architecture
Router Architecture
Single mongos per client problems we encountered:
● thousands of connections to config servers
Router Architecture
Single mongos per client problems we encountered:
● thousands of connections to config servers
● config server CPU load
Router Architecture
Single mongos per client problems we encountered:
● thousands of connections to config servers
● config server CPU load
● configdb propagation delays
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
We went from this...
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app ms
app
.
.
.
.
.
.
To this.
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
Router Architecture
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB ClusterRouter Tier
Router Architecture
Separate mongos tier advantages:
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
● additional network hop
Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
● additional network hop
● host failure has a larger effect
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
mongos-per-host failure:
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
mongos-per-host failure:
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
mongos-per-host failure:
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app ms
app
.
.
.
.
.
.
Separate mongos tier failure:
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app ms
app
.
.
.
.
.
.
Separate mongos tier failure:
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app ms
app
.
.
.
.
.
.
Separate mongos tier failure:
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app ms
app
.
.
.
.
.
.
So increase the number of
mongos routers:
Router Architecture
RS
RS
RS
conf
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
appms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
ms
app
.
.
.
.
.
.
ms
ms
So increase the number of
mongos routers:
Router Architecture - Evolve!
Router Architecture - Evolve!
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB ClusterMaybe at first,
doing the
mongos-per-host
architecture
is fine.
Maybe at first,
doing the
mongos-per-host
architecture
is fine.
And it will probably
remain fine
for quite a while.
Router Architecture - Evolve!
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
Router Architecture - Evolve!
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB ClusterRouter TierThis is an area
where you can
and should be
willing to adapt
as you go
(and as needed).
Sharding Considerations
Pick something you want to live with.
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
What could we have done differently?
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
Sharding Considerations
The Balancing Act
The Balancing Act
Why wouldn’t you run the balancer in the first place?
The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
● but maybe you start without it
The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
● but maybe you start without it
● or maybe you need to turn it off for maintenance and forget to turn
it back on
The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
● but maybe you start without it
● or maybe you need to turn it off for maintenance and forget to turn
it back on
Obviously, don’t do this. But if you do, here’s what happens...
The Balancing Act
Fresh, new, empty cluster… But no balancer running.
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
Now we’re pretty full, so let’s add another shard...
The Balancing Act
The Balancing Act
And keep inserting...
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
The Balancing Act
Suddenly we find ourselves with a very unbalanced cluster.
The Balancing Act
But if we enable the balancer, it will DoS the 5th shard!
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
The Balancing Act
The approximate effect looks something like this:
So what can we do?
The Balancing Act
So what can we do?
1. add IOPS
The Balancing Act
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
The Balancing Act
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
The Balancing Act
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
The Balancing Act
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
5. hold your breath
The Balancing Act
So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
5. hold your breath
6. try re-enabling the balancer
The Balancing Act
How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
3. move the chunk
4. allow the system to settle
5. repeat
The Balancing Act
Conclusion here:
Run the balancer!
The Balancing Act
● Design ahead of time
o “NoSQL” lets you play it by ear
o but some of these decisions will bite you later
● Be willing to correct past mistakes
o dedicate time and resources to adapting
o learn how to live with the mistakes you can’t correct
Summary
References
● MongoDB Blog post (details on shard
migration):http://blog.mongodb.org/post/77278906988/crittercism-scaling-to-billions-of-
requests-per-day-on
● MongoDB Webinar (details on manual chunk
migrations):http://www.mongodb.com/presentations/webinar-back-basics-3-scaling-30000-requests-
second-mongodb
● Documentation on mongos routers:http://docs.mongodb.org/master/core/sharded-
cluster-query-routing/
● Documentation on the balancer:http://docs.mongodb.org/manual/tutorial/manage-
sharded-cluster-balancer/
● Documentation on shard keys:http://docs.mongodb.org/manual/core/sharding-shard-
key/
Crittercism: http://www.crittercism.com/ to learn more,
and http://www.crittercism.com/careers/ if you want to help us!
Q&A
Thank You!

More Related Content

PPTX
Scaling to 30,000 Requests Per Second and Beyond with MongoDB
PDF
Doodling for-great-success
PDF
"Micro-frontends from A to Z. How and Why we use Micro-frontends in Namecheap...
PPTX
Webinar: Ensuring Zero Downtime for Your Mission Critical App
PPTX
Saas rad with django, django rest framework
PDF
The Ember.js Framework - Everything You Need To Know
PDF
Cloud-Native Roadshow - Microservices - Detroit
PDF
Microservices Cloud-Native Roadshow Los Angeles
Scaling to 30,000 Requests Per Second and Beyond with MongoDB
Doodling for-great-success
"Micro-frontends from A to Z. How and Why we use Micro-frontends in Namecheap...
Webinar: Ensuring Zero Downtime for Your Mission Critical App
Saas rad with django, django rest framework
The Ember.js Framework - Everything You Need To Know
Cloud-Native Roadshow - Microservices - Detroit
Microservices Cloud-Native Roadshow Los Angeles

Similar to Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB (20)

PDF
Cloud-Native Roadshow - Microservices - Charlotte
PDF
Cloud-Native Roadshow - Microservices - San Francisco
PDF
Cloud-Native Roadshow - Microservices - Seattle
PDF
Cloud-Native Roadshow - Microservices - Toronto
PDF
How We Fixed Our MongoDB Problems
PDF
Web Design
PDF
Cloud-Native Roadshow - Microservices - St. Louis
PDF
Cloud-Native Roadshow - Microservices - Paris
PDF
Angular (v2 and up) - Morning to understand - Linagora
PDF
Lightweight APIs in mRuby (Михаил Бортник)
PDF
Stuttgart Cloud-Native Roadshow Microservices
KEY
Joomla Extreme Performance
PDF
Modern Release Engineering in a Nutshell - Why Researchers should Care!
PDF
DevOps: Find Solutions, Not More Defects
KEY
Mongodb and Totsy - E-commerce Case Study
PDF
Better and Faster: A Journey Toward Clean Code and Enjoyment
PDF
Cloud-Native Roadshow - Microservices - Denver
PDF
Cloud-Native Roadshow - Microservices - DC
PDF
Gitlab for PHP developers (Brisbane PHP meetup, 2019-Jan-29)
PDF
Core web vitals – Business impact and best practices - Meet Magento UK 2021
Cloud-Native Roadshow - Microservices - Charlotte
Cloud-Native Roadshow - Microservices - San Francisco
Cloud-Native Roadshow - Microservices - Seattle
Cloud-Native Roadshow - Microservices - Toronto
How We Fixed Our MongoDB Problems
Web Design
Cloud-Native Roadshow - Microservices - St. Louis
Cloud-Native Roadshow - Microservices - Paris
Angular (v2 and up) - Morning to understand - Linagora
Lightweight APIs in mRuby (Михаил Бортник)
Stuttgart Cloud-Native Roadshow Microservices
Joomla Extreme Performance
Modern Release Engineering in a Nutshell - Why Researchers should Care!
DevOps: Find Solutions, Not More Defects
Mongodb and Totsy - E-commerce Case Study
Better and Faster: A Journey Toward Clean Code and Enjoyment
Cloud-Native Roadshow - Microservices - Denver
Cloud-Native Roadshow - Microservices - DC
Gitlab for PHP developers (Brisbane PHP meetup, 2019-Jan-29)
Core web vitals – Business impact and best practices - Meet Magento UK 2021
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Approach and Philosophy of On baking technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
Network Security Unit 5.pdf for BCA BBA.
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology

Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB

Editor's Notes

  • #2: I’m Mike, I run Ops at Crittercism. I’m going to tell you the story of how we’ve scaled to handle over 30k req/s using a storage strategy based on MongoDB
  • #3: Between proposing this talk and now, we’ve actually grown some more, and now top 40-45k r/s on a daily basis This is about 3.5B requests per day
  • #4: This is really the story of learning as you go
  • #5: I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
  • #6: I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
  • #7: I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
  • #8: I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
  • #9: some advice from our experience about things to do and things not to do
  • #10: I’ll give you a brief overview of what we’re doing
  • #11: some advice based on what we’ve learned related to router architecture
  • #12: I’ll talk about some sharding considerations and the issues that can arise
  • #13: I’ll tell you a story about the Mongo Balancer
  • #14: I’ll be sure to leave time for Q&A
  • #15: First let me tell you a bit about who we are and the problem we’re trying to solve
  • #19: so they made a dating app, which shall remain unnamed
  • #20: and it went over about as well as the dating scene in The Social Network
  • #21: poor star rating, and they didn’t know why
  • #23: So they made a “feedback widget”, and pivoted September 2010 (from Wayback Machine) Enable mobile app developers to allow their users to provide “criticism” of their apps (outside of the app store) Not just a star rating
  • #25: October 2011 added crash reports to help improve ratings now we’re the ones helping you self-criticize
  • #27: added live stats to see app performance in real-time
  • #28: now they’re happy the dating app didn’t pan out, but in the process of making it better, we’ve come to provide something that helps everybody improve their apps
  • #29: today (2014) - what it’s evolved into collecting tons of detailed analytics data - crash reports, groupings Geo data launched in 2013 (just kidding, this is stored in postgres) API & iPad app launched in 2014 - more aggregations of performance data (more ways to view it)
  • #30: this guy feels overwhelmed at times so how do we deal with all of this?
  • #31: so what do we do with all of this data?
  • #32: we started by setting up a db (mongo, of course) we’ve used mongo from the start why mongo? has RDBMS characteristics, has both OLTP and warehose-like properties, lots of flexibility, and it scales
  • #33: put an ingest API in front of it
  • #34: collect user feedback from our feedback widget SDK
  • #35: then we start storing crash data in mongodb, too
  • #36: but what makes crash data more useful is when you have app load data as well -> crash rate (which is a differentiating feature for us)
  • #37: you start catching more errors, but you still want to know about them so let’s add handled exceptions as well
  • #38: we realized crash reporting was really the product, so we discontinued the feedback
  • #40: and our volume kept going up, especially app loads
  • #41: app loads are the highest-volume component here, so let’s count them in a memory-based data store (redis), and batch up the writes before persisting the data to mongo
  • #42: add user metadata as well, to help support desks
  • #43: but that’s a different kind of data and a different volume and access pattern, so let’s add dynamodb into the mix
  • #44: our volume keeps going up, so let’s cache this app data to make our responses faster
  • #45: added APM, which introduced a lot of different data types and structures so we added another ingest API and postgres into the mix (but obviously that’s not going to be part of this talk…)
  • #46: so we’ve scaled to 40k/s by being willing to adapt incrementally, and willing to use whatever works / whatever it takes
  • #47: 2-year period went from 700/s (60M/day) to 40-45k/s (3.8B/day)
  • #48: one of the biggest things we did to help ourselves scale was to consolidate the mongos routers
  • #49: start with a sharded mongodb cluster
  • #50: add your application servers
  • #51: each application server has a local mongos process each client process connects to a local mongos router
  • #52: mongos routers talk to mongods to read and write data mongos routes queries and returns results
  • #53: the mongos knows where data resides thanks to the config servers, which keep track of the shard topology (location of data throughout the cluster)
  • #54: mongos routers talk to config servers as well, to maintain an updated version of the configdb
  • #55: and the config servers also talk to the mongods now let’s zoom out a bit...
  • #56: and you’re going to grow, so you’re going to add more and more application servers
  • #57: and they’re all maintaining these connections between their local mongos, the config servers, and the shard servers
  • #58: (not showing all the lines here, but you get the idea)
  • #62: could mean your application is reading stale data, or can’t find the data it needs when it needs it (and maybe it has to retry, which means it’s now slower)
  • #63: so we went from this...
  • #64: to this
  • #65: closer view
  • #66: move the mongos routers to their own tier be smart about how you route to them (we use chef to keep it within the same AZ)
  • #68: due to connection re-use from mongos to mongod
  • #69: due to far fewer mongos processes
  • #70: far fewer nodes for it to propagate to
  • #71: be aware that this does introduce some disadvantages, too
  • #72: we reduce this by keeping it in the same availability zone / data center
  • #73: let’s look at what that implies
  • #74: in the mongos-per-app-server setup, if one fails...
  • #76: only that one application server is affected
  • #77: but with a separate mongos tier, if one mongos router fails...
  • #79: all app servers connected to it will be affected so be aware of this, and take it into account
  • #80: so maybe increase the number of mongos routers (but still far fewer than you had before)
  • #81: account for which % of your app servers going down you can tolerate (also depends on what your driver allows you to do and how it behaves)
  • #85: So it’s great to have aspects of your architecture that you can change over time. But some things you can’t...
  • #86: This is a fundamental design decision that will have huge implications for a long time, so think about it carefully.
  • #88: Say you have 4 shards. Let’s say each of the World Cup teams has an app, and we shard by app_id.
  • #89: Let’s distribute them evenly, as is likely to be the case.
  • #106: Now, tomorrow the US and Germany are going to play each other
  • #107: So those 2 apps are going to get heavy use, but they happen to be on the same shard, so uh-oh...
  • #108: So those 2 apps are going to get heavy use, but they happen to be on the same shard, so uh-oh...
  • #109: Now this shard isn’t happy Higher load, more lock contention, slower response time for queries to this shard (which are your most common queries due to these apps’ popularity at this time)
  • #110: So let’s add another shard (scale horizontally)...
  • #112: That might help if we had more teams’ apps to add
  • #116: Those new apps had somewhere to go, which is nice. But this hasn’t helped our uneven access pattern at all. So what else can we do? We can try scaling that shard vertically - by performing a migration procedure (see my blog post for details).
  • #117: And hopefully it now cools off
  • #118: But the next day there will be a different game... will those two teams’ apps be on different shards? even if so, maybe now we have 2 pretty-hot shards instead of 1 super-hot one so maybe you decide to just live with heterogenous shard servers to manage (probably much lesser evil than trying to re-shard)
  • #120: We could shard on something other than app_id (for us, maybe that’d be crash_id, which is a randomly-generated hash) and spread the data for each app across all shards
  • #137: So now when the US vs Germany game happens tomorrow...
  • #139: Now we’re reading a bit from many shards, rather than a lot from few shards but now our queries will be a bit slower (due to having to read from many more shards) so understand the trade-off All of this is assuming that your cluster is balanced...
  • #140: The balancer is a super-important part of a sharded mongo cluster… You should love it.
  • #149: Start with an empty cluster, and start filling it with data (we’ll denote “fullness” by going from green to red) This is an example of what can happen when the balancer is not running
  • #176: Okay, so now we have a very unbalanced cluster. 3 of our replica sets are very full, one is pretty full, and the newest one is hardly in use.
  • #177: The balancer will see the full shards and one near-empty one, and will want to move a ton of chunks all at once, causing severe I/O strain on the system.
  • #184: (another version)
  • #186: you’re going to be adding a lot of I/O to the system when you move chunks, and it still has to be able to perform its normal functions, so over-provision
  • #187: updating the configdb (when you move chunks) puts load on your config servers, so make sure they’re ready to handle it
  • #188: this is tedious and will take a LONG time (more detail in a minute)
  • #189: gradually you’ll get to a happier place
  • #190: take a deep breath before you...
  • #191: be ready to turn it off and return to step 3 if needed, then try again
  • #192: See MongoDB webinar I gave (in references) for details on this procedure
  • #193: seems obvious, but not always the case
  • #194: best-case scenario is to make all of the right choices up front… but you’re probably not going to do that. (though hopefully you can learn a bit from our experience and minimize the wrong choices you make). the good news is MongoDB is still working for us, despite the headaches we’ve had to deal with.