SlideShare a Scribd company logo
#MongoDBDays Chicago

Introduction To Sharding
J. Randall Hunt

Hackathoner, MongoDB

@jrhunt, randall@mongodb.com
In Today's Talk

•

What? Why? When?

•

How?

•

What's happening beind the scenes?
What Is Sharding?
This is a picture of my cat.
This is a picture of ~100 cats.

http://a1.s6img.com/cdn/0011/p/3123272_8220815_lz.jpg
This is a cat trying to find a home

webserver

mongod
100 cats trying to find a home.

webserver

(not to scale)

mongod
Scale Up?
Sharding in MongoDB Days 2013
Data Store Scalability

•

Custom Hardware

•

Custom Software

In the past you've had two options for achieving data store scalability:
1) custom hardware (oracle?)
2) custom software (google, facebook)

!

The reason these things were custom were that these problems were not yet common enough. The number of people on the internet 10 years ago is
incredibly small compared to the number of people using web services 10 years from now.
Scale Out?
Scale Out?
The MongoDB Sharding Solution
•

Automatically partition your data

•

Worry about failover at the partition layer

•

Application independent

•

Free and open source
Why Do I Shard?
Input/Output

You input/output exceeds the capacity of a single node or replica set.

this is not easy to do!
Working Set Exceeds Physical Memory

RAM
Working Set Exceeds Physical Memory

Data

RAM
Working Set Exceeds Physical Memory

Data

RAM

Indexes
Working Set Exceeds Physical Memory

Data

RAM Sorts

Indexes
Working Set Exceeds Physical Memory

Data

RAM Sorts

Indexes

Aggregations
Working Set Exceeds Physical Memory

Data

Indexes
RAM

Sorts

Aggregations
Working Set Exceeds Physical Memory
How Does Sharding Work?
MongoDB's Sharding Infrastructure
MongoDB's Sharding Infrastructure
app server

mongod
MongoDB's Sharding Infrastructure
app server

mongod
mongod
mongod
MongoDB's Sharding Infrastructure
app server

shard
MongoDB's Sharding Infrastructure
app server

shard
MongoDB's Sharding Infrastructure
app server

mongos

shard
MongoDB's Sharding Infrastructure
app server

mongos

mongod --configsvr

shard
MongoDB's Sharding Infrastructure
app server

mongos

mongod --configsvr

shard
Terminology
•

Shards

•

Chunks

•

Config Servers

•

mongos

A shard is a server, or a collection of servers, that holds chunks of info which are split up according to a shard key, a shard holds a subset of a collection's
data
A chunk of info is a group of data falling in a particular range based on a shard key that can be moved logically from server to server
config serves hold information about where chunks live
mongos is the router and balancer -- it communicates with the config servers and figures out how to intelligently direct your query.
What exactly is a shard?
•

Shard is a node of the cluster

•

Can be a single mongod or an entire replica set

Shard

Mongod

Shard

or

Primary
Secondary
Secondary

Now what do shards hold? Chunks, which are partitions of your data that live in certain ranges.
Partitioning
•

User defines a shard key or uses hash based sharding

•

Shard key defines a range of data

•

The key space is like points on a line

•

A range is a segment of that line

-∞

Remember interval notation?

Key Space

+∞
Data Distribution
Initially a single chunk
Default Max Chunk Size: 64mb
MongoDB willMongos Mongos split and migrate chunks as
automatically Mongos
they reach the max size
Config
Node 1

Secondary
Server

Shard 1
Mongod

Shard 2
Shards and Shard Keys
Shards and Shard Keys
Chunks!
Shards and Shard Keys
Chunks!

Shard Keys!
What is a config server?
•

A config server is for storing shard meta-data

•

It stores chunk ranges and locations

•

Run with 3 in production!
Config
Node 1

Secondary
Server

Config
Node 1

Secondary
Server

or

Config
Node 1

Secondary
Server

Config
Node 1

Secondary
Server

this is not a replica set, the three servers are purely for failover purposes.

!

pro-tip use CNAMEs to identify these.
What is a mongos?
•

Acts as a router / balancer for queries and ops

•

No local data (persists all info to the config servers)

•

Can run with just one or many
App Server

App Server

App Server

App Server

or
Mongos

Mongos

Mongos
MongoDB's Sharding Infrastructure
App Server

Config
Node 1
Secondary
Server

App Server

App Server

Mongos

Mongos

Mongos

Shard

Shard

Shard

Config
Node 1

Secondary
Server

Config
Node 1

Secondary
Server
Get Started With Sharding?
1. Choose a shard key (we'll talk about this later)
2. Start config servers
3. Turn on sharding
4. Profit.
Mechanics of Sharding
Oh hey there devops!
Start the Configuration Server

Config
Node 1

Secondary
Server

mongod --configsvr
Starts a configuration server on the default port (27019)
Start the mongos router

Mongos

Config
Node 1

Secondary
Server

mongos --configdb catconf.mongodb.com:27019
Start the mongod
Mongos

Config
Node 1

Secondary
Server

Shard
Mongod

mongod --shardsvr
Starts a mongod with the default shard port (27018)
Shard is not yet connected to the rest of the cluster
Could have already been a part of the cluster
Add the Shard
Mongos

Config
Node 1

Secondary
Server

Shard
Mongod

On mongos:
sh.addShard('cat1.mongodb.com:27018')
For a replica set:
sh.addShard('<rsname>/<seedlist>')
Check that everything is working!
Mongos

Config
Node 1

Secondary
Server

Shard
Mongod

[mongos] admin> db.runCommand({ listshards: 1 })
{
"shards": [
{
"_id": "shard0000",
"host": "cat1.mongodb.com:27018"
}
],
"ok": 1
}
Now enable sharding
•

Enable Sharding on a database

sh.enableSharding("<dbname>")

•

Shard a collection (with a key):

sh.shardCollection(

"<dbname>.cat",

{"name": 1})

•

Use a compound shard key to prevent duplicates

sh.shardCollection(

"<dbname>.cats",

{"name": 1, "uniqueid": 1})
Tag Aware Sharding
•

Total control over the distribution of your data!

•

Tag a range of shard keys:

sh.addTagRange(<collection>,<min>,<max>,<tag>)

•

Tag a shard:

sh.addShardTag("shard0000","NYC")

The Balancer

•
•

Transparent to driver and application

•

try to minimize clock skew with ntpd

Ensures even distribution of chunks across the cluster

Very tuneable but defaults are often sensible
Routing Requests
(Oh hi there application developers!)
Cluster Request Routing

Scatter Gather

Targeted

Choose your own adventure!
Targeted Query

Mongos

Shard

Shard

Shard
Routable request received
1

Mongos

Shard

Shard

Shard
Request routed to appropriate shard
1

Mongos

2

Shard

Shard

Shard
Shard returns results
1

Mongos

2
3

Shard

Shard

Shard
mongos returns results to client
1
4
Mongos

2
3

Shard

Shard

Shard
Non-targeted queries

Mongos

Shard

Shard

Shard
request received
1

Mongos

Shard

Shard

Shard
Farm request out to all shards
1

Mongos

2

Shard

2

Shard

2

Shard
shards return results to mongos
1

Mongos

2
3

Shard

2

2
3

Shard

3

Shard
mongos returns results to client
1
4
Mongos

2
3

Shard

2

2
3

Shard

3

Shard
Choosing A Shard Key
Things to remember!
•
•

Shard key values are immutable

•

Shard key must be indexed

•

It is limited to 512 bytes in size

•

Try to choose a field used in queries

•

should not be monotonically increasing!

Shard Key is immutable

Only the shard key can be guaranteed unique across shards
How to choose your key?
•

Cardinality

•

Write Distribution

•

Query Isolation

•

Reliability

•

Index Locality

Cardinality – Can your data be broken down enough?
Query Isolation - query targeting to a specific shard
Reliability – shard outages


!

A good shard key can:


Optimize routing
Minimize (unnecessary) traffic
Allow best scaling

!

consider pre splitting
no unique indexes keys unless part of the shard key

!

geokeys cannot be part of a shardkey
$near won't work but the $geo commands work fine
Thanks!
•

What's Next?

•

Resources:

https://education.mongodb.com/

https://www.mongodb.com/presentations

•

Me:

@jrhunt, randall@mongodb.com

In summary -- and this is not a sales pitch... lots of other databases out there have sharding and replication... not many of them provide the granularity of
control that you need for your applications while maintaining sensible defaults.

More Related Content

PPTX
MongoDB Sharding
KEY
Sharding with MongoDB (Eliot Horowitz)
PPTX
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
PPTX
Basic Sharding in MongoDB presented by Shaun Verch
KEY
Mongodb sharding
PDF
MongoDB Sharding Fundamentals
PPTX
Introduction to Sharding
PDF
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
MongoDB Sharding
Sharding with MongoDB (Eliot Horowitz)
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
Basic Sharding in MongoDB presented by Shaun Verch
Mongodb sharding
MongoDB Sharding Fundamentals
Introduction to Sharding
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...

What's hot (20)

PDF
Sharding
PPTX
Sharding Methods for MongoDB
PPTX
MongoDB Auto-Sharding at Mongo Seattle
PPTX
Sharding
PPTX
Sharding
PDF
Mongodb - Scaling write performance
PPTX
Sharding Methods for MongoDB
PDF
Introduction to Sharding
PPTX
Sharding - Seoul 2012
PDF
Sharding
PPTX
Lightning Talk: MongoDB Sharding
PPTX
Webinar: Sharding
PPTX
Back to Basics Webinar 6: Production Deployment
PPTX
Introduction to Sharding
PDF
Development to Production with Sharded MongoDB Clusters
PPTX
MongoDB Roadmap
PDF
MongoDB sharded cluster. How to design your topology ?
PDF
Шардинг в MongoDB, Henrik Ingo (MongoDB)
PPTX
Keynote: Apache HBase at Yahoo! Scale
PPTX
Sharding
Sharding Methods for MongoDB
MongoDB Auto-Sharding at Mongo Seattle
Sharding
Sharding
Mongodb - Scaling write performance
Sharding Methods for MongoDB
Introduction to Sharding
Sharding - Seoul 2012
Sharding
Lightning Talk: MongoDB Sharding
Webinar: Sharding
Back to Basics Webinar 6: Production Deployment
Introduction to Sharding
Development to Production with Sharded MongoDB Clusters
MongoDB Roadmap
MongoDB sharded cluster. How to design your topology ?
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Keynote: Apache HBase at Yahoo! Scale
Ad

Similar to Sharding in MongoDB Days 2013 (20)

PPTX
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
PPTX
Back tobasicswebinar part6-rev.
PPTX
Sharding Overview
PPTX
Back to Basics: Build Something Big With MongoDB
PPTX
2014 05-07-fr - add dev series - session 6 - deploying your application-2
PPTX
MongoDB : Scaling, Security & Performance
KEY
2011 mongo sf-scaling
PDF
MongoDB Hacks of Frustration
KEY
Scaling MongoDB (Mongo Austin)
PPTX
MongoDB for Time Series Data: Sharding
PDF
Scaling MongoDB - Presentation at MTP
PDF
MongoDB Europe 2016 - Big Data meets Big Compute
PDF
Advanced Administration, Monitoring and Backup
KEY
Sharding with MongoDB (Eliot Horowitz)
PPTX
Hellenic MongoDB user group - Introduction to sharding
PPT
MongoDB Knowledge Shareing
PPT
MongoDB Pros and Cons
PPTX
Tag based sharding presentation
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
PPT
MongoDB Sharding Webinar 2014
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Back tobasicswebinar part6-rev.
Sharding Overview
Back to Basics: Build Something Big With MongoDB
2014 05-07-fr - add dev series - session 6 - deploying your application-2
MongoDB : Scaling, Security & Performance
2011 mongo sf-scaling
MongoDB Hacks of Frustration
Scaling MongoDB (Mongo Austin)
MongoDB for Time Series Data: Sharding
Scaling MongoDB - Presentation at MTP
MongoDB Europe 2016 - Big Data meets Big Compute
Advanced Administration, Monitoring and Backup
Sharding with MongoDB (Eliot Horowitz)
Hellenic MongoDB user group - Introduction to sharding
MongoDB Knowledge Shareing
MongoDB Pros and Cons
Tag based sharding presentation
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
MongoDB Sharding Webinar 2014
Ad

More from Randall Hunt (13)

PPTX
WhereML a Serverless ML Powered Location Guessing Twitter Bot
PPTX
How to Choose The Right Database on AWS - Berlin Summit - 2019
PPTX
Where ml ai_heavy
PPTX
Randall's re:Invent Recap
PPTX
Deep Dive: AWS X-Ray London Summit 2017
PDF
Canada DevOps Conference
PDF
TIAD - Is Automation Worth My Time?
PDF
PDF
A Century Of Weather Data - Midwest.io
PDF
MongoDB at LAHacks :)
PDF
Schema Design in MongoDB - TriMug Meetup North Carolina
PDF
Replication MongoDB Days 2013
PPTX
Replication and replica sets
WhereML a Serverless ML Powered Location Guessing Twitter Bot
How to Choose The Right Database on AWS - Berlin Summit - 2019
Where ml ai_heavy
Randall's re:Invent Recap
Deep Dive: AWS X-Ray London Summit 2017
Canada DevOps Conference
TIAD - Is Automation Worth My Time?
A Century Of Weather Data - Midwest.io
MongoDB at LAHacks :)
Schema Design in MongoDB - TriMug Meetup North Carolina
Replication MongoDB Days 2013
Replication and replica sets

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
KodekX | Application Modernization Development
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
KodekX | Application Modernization Development
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
Chapter 3 Spatial Domain Image Processing.pdf
MIND Revenue Release Quarter 2 2025 Press Release
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
sap open course for s4hana steps from ECC to s4
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Programs and apps: productivity, graphics, security and other tools
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Sharding in MongoDB Days 2013