SlideShare a Scribd company logo
Back tobasicswebinar part6-rev.
Back tobasicswebinar part6-rev.
Solution Architect, MongoDB
Sam Weaver
#MongoDBBasics
‘Build an Application’Webinar Series
Deploying your application
in production
Agenda
• Replica Sets Lifecycle
• Developing with Replica Sets
• Scaling your database
Q&A
• Virtual Genius Bar
– Use chat to post questions
– EMEASolution
Architecture / Support
Team are on hand
– Make use of them during
the sessions!!!
Recap
• Introduction to MongoDB
• Schema design
• Interacting with the database
• Indexing
• Analytics
– Map Reduce
– Aggregation Framework
Deployment Considerations
Working Set Exceeds Physical
Memory
Why Replication?
• How many have faced node failures?
• How many have been woken up from sleep to do a
fail-over(s)?
• How many have experienced issues due to network
latency?
• Different uses for data
– Normal processing
– Simple analytics
Replica Set Lifestyle
Replica Set – Creation
Replica Set – Initialize
Replica Set – Failure
Replica Set – Failover
Replica Set – Recovery
Replica Set – Recovered
Developing with
Replica Sets
Strong Consistency
Delayed Consistency
Write Concern
• Network acknowledgement
• Wait for error
• Wait for journal sync
• Wait for replication
Unacknowledged
MongoDB Acknowledged (wait for
error)
Wait for Journal Sync
Wait for Replication
Tagging
• Control where data is written to, and read from
• Each member can have one or more tags
– tags: {dc: "ny"}
– tags: {dc: "ny",
 subnet: "192.168",
 rack:
"row3rk7"}
• Replica set defines rules for write concerns
• Rules can change without changing app code
{
_id : "mySet",
members : [
{_id : 0, host : "A", tags : {"dc": "ny"}},
{_id : 1, host : "B", tags : {"dc": "ny"}},
{_id : 2, host : "C", tags : {"dc": "sf"}},
{_id : 3, host : "D", tags : {"dc": "sf"}},
{_id : 4, host : "E", tags : {"dc": "cloud"}}],
settings : {
getLastErrorModes : {
allDCs : {"dc" : 3},
someDCs : {"dc" : 2}} }
}
> db.blogs.insert({...})
> db.runCommand({getLastError : 1, w : "someDCs"})
Tagging Example
Wait for Replication (Tagging)
Read Preference Modes
• 5 modes
– primary (only) - Default
– primaryPreferred
– secondary
– secondaryPreferred
– Nearest
When more than one node is possible, closest node is used
for reads (all modes but primary)
Tagged Read Preference
• Custom read preferences
• Control where you read from by (node) tags
– E.g. { "disk": "ssd", "use": "reporting" }
• Use in conjunction with standard read
preferences
– Except primary
• SAFE writes acceptable for our use case
• Potential to use secondary reads for
comments, but probably not needed
• Use tagged reads for analytics
Our application
Scaling
Working Set Exceeds Physical
Memory
• When a specific resource becomes a bottle
neck on a machine or replica set
• RAM
• Disk IO
• Storage
• Concurrency
When to consider Sharding?
Vertical Scalability (Scale Up)
Horizontal Scalability (Scale Out)
Partitioning
• User defines shard key
• Shard key defines range of data
• Key space is like points on a line
• Range is a segment of that line
Initially 1 chunk
Default max chunk size: 64mb
MongoDB automatically splits & migrates chunks
when max reached
Data Distribution
Architecture
What is a Shard?
• Shard is a node of the cluster
• Shard can be a single mongod or a replica set
Meta Data Storage
• Config Server
– Stores cluster chunk ranges and locations
– Can have only 1 or 3 (production must have 3)
– Not a replica set
Routing and Managing Data
• Mongos
– Acts as a router / balancer
– No local data (persists to config database)
– Can have 1 or many
Sharding infrastructure
Cluster Request Routing
• Targeted Queries
• Scatter Gather Queries
• Scatter Gather Queries with Sort
Cluster Request Routing: Targeted
Query
Routable request received
Request routed to appropriate
shard
Shard returns results
Mongos returns results to client
Cluster Request Routing: Non-Targeted
Query
Non-Targeted Request Received
Request sent to all shards
Shards return results to mongos
Mongos returns results to client
Cluster Request Routing: Non-Targeted
Query with Sort
Non-Targeted request with sort
received
Request sent to all shards
Query and sort performed locally
Shards return results to mongos
Mongos merges sorted results
Mongos returns results to client
Shard Key
Shard Key
• Shard key is immutable
• Shard key values are immutable
• Shard key must be indexed
• Shard key limited to 512 bytes in size
• Shard key used to route queries
– Choose a field commonly used in queries
• Only shard key can be unique across shards
– `_id` field is only unique within individual shard
A suitable shard key for our app…
• Occurs in most queries
• Routes to each shard
• Is granular enough to not exceed 64MB chunks
• Any candidates?
– Author?
– Date?
– _id?
– Title?
– Author & Date?
Summary
Things to remember
• Size appropriately for your working set
• Shard when you need to, not before
• Pick a shard key wisely
Next Session – 17th April
• Backup and Disaster Recovery
• Backup and restore options
Thank you
Back tobasicswebinar part6-rev.
Back tobasicswebinar part6-rev.
Back tobasicswebinar part6-rev.

More Related Content

PPTX
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
PPTX
Back to Basics: Build Something Big With MongoDB
PPTX
Capacity Planning For Your Growing MongoDB Cluster
PPTX
Capacity Planning
PDF
PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
PPTX
Webinar: Capacity Planning
PDF
Performance evaluation of cloudera impala (with Comparison to Hive)
PDF
Capacity Planning
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Back to Basics: Build Something Big With MongoDB
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning
PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
Webinar: Capacity Planning
Performance evaluation of cloudera impala (with Comparison to Hive)
Capacity Planning

What's hot (20)

PPTX
Availability and scalability in mongo
PPTX
Hardware Provisioning
PPT
Solr Performance Monitoring with SPM
PPTX
Webinar: MongoDB Management Service (MMS): Session 02 - Backing up Data
PPTX
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
PPTX
Hardware Provisioning for MongoDB
PDF
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PPTX
Webinar: Capacity Planning
PDF
Gophers Riding Elephants: Writing PostgreSQL tools in Go
PDF
Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2...
PDF
Capacity Planning
PPTX
How to Actually Tune Your Spark Jobs So They Work
PPTX
Big data elasticsearch practical
PPTX
The Impala Cookbook
PDF
Development to Production with Sharded MongoDB Clusters
PPTX
Spark in the BigData dark
PPTX
SparkSpark in the Big Data dark by Sergey Levandovskiy
PPTX
/path/to/content - the Apache Jackrabbit content repository
PPTX
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
Availability and scalability in mongo
Hardware Provisioning
Solr Performance Monitoring with SPM
Webinar: MongoDB Management Service (MMS): Session 02 - Backing up Data
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
Hardware Provisioning for MongoDB
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
Webinar: Capacity Planning
Gophers Riding Elephants: Writing PostgreSQL tools in Go
Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2...
Capacity Planning
How to Actually Tune Your Spark Jobs So They Work
Big data elasticsearch practical
The Impala Cookbook
Development to Production with Sharded MongoDB Clusters
Spark in the BigData dark
SparkSpark in the Big Data dark by Sergey Levandovskiy
/path/to/content - the Apache Jackrabbit content repository
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
Ad

Viewers also liked (14)

PDF
MongoDB and AWS Best Practices
PDF
Schema Design
PPTX
S01 e01 schema-design
PPTX
Partner Webinar: Deliver Big Data Apps Faster With Informatica & MongoDB
PPT
Challenges in an E-Commerce Catalogue with SQL; How Mongo helps
PDF
MongoDB, Cloudformation and Chef
PPTX
Webinar: Revolutionizing Application Development with MongoDB
PDF
Mongo db bangalore
PDF
Introduction to the New Aggregation Framework
PPTX
What's New in MongoDB 2.6
PDF
Schema Design
PDF
Introduction to MongoDB and Ruby
PPTX
MongoDB and AWS: Integrations
PPTX
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB and AWS Best Practices
Schema Design
S01 e01 schema-design
Partner Webinar: Deliver Big Data Apps Faster With Informatica & MongoDB
Challenges in an E-Commerce Catalogue with SQL; How Mongo helps
MongoDB, Cloudformation and Chef
Webinar: Revolutionizing Application Development with MongoDB
Mongo db bangalore
Introduction to the New Aggregation Framework
What's New in MongoDB 2.6
Schema Design
Introduction to MongoDB and Ruby
MongoDB and AWS: Integrations
Webinar: An Enterprise Architect’s View of MongoDB
Ad

Similar to Back tobasicswebinar part6-rev. (20)

PPTX
Introduction to Sharding
PPTX
Introduction to Sharding
PPTX
Sharding - Seoul 2012
PPTX
Back to Basics Webinar 6: Production Deployment
PPTX
MongoDB for Time Series Data: Sharding
PPTX
Sharding
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
PPTX
MongoDB Sharding
PPTX
MongoDB: How We Did It – Reanimating Identity at AOL
PPTX
Basic Sharding in MongoDB presented by Shaun Verch
PPTX
Scaling MongoDB
PDF
Silicon Valley Code Camp 2016 - MongoDB in production
PDF
Sharding in MongoDB Days 2013
PPTX
2014 05-07-fr - add dev series - session 6 - deploying your application-2
PDF
MongoDB by Tonny
PPTX
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
PPTX
Sharding
PDF
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
PDF
Evolution of MongoDB Replicaset and Its Best Practices
PDF
Exploring the replication in MongoDB
Introduction to Sharding
Introduction to Sharding
Sharding - Seoul 2012
Back to Basics Webinar 6: Production Deployment
MongoDB for Time Series Data: Sharding
Sharding
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
MongoDB Sharding
MongoDB: How We Did It – Reanimating Identity at AOL
Basic Sharding in MongoDB presented by Shaun Verch
Scaling MongoDB
Silicon Valley Code Camp 2016 - MongoDB in production
Sharding in MongoDB Days 2013
2014 05-07-fr - add dev series - session 6 - deploying your application-2
MongoDB by Tonny
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
Sharding
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Evolution of MongoDB Replicaset and Its Best Practices
Exploring the replication in MongoDB

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Back tobasicswebinar part6-rev.

Editor's Notes

  • #9: Indexes should be contained in working set.
  • #12: Basic explanation2 or more nodes form the setQuorum
  • #13: Initialize -> ElectionPrimary + data replication from primary to secondary
  • #14: Primary down/network failureAutomatic election of new primary if majority exists
  • #15: New primary electedReplication established from new primary
  • #16: Down node comes upRejoins setsRecovery and then secondary
  • #18: ConsistencyWrite preferencesRead preferences
  • #22: Not really fire and forget. This return arrow is to confirm that the network successfully transferred the packet(s) of data.This confirms that the TCP ACK response was received.
  • #25: Presenter should mention:Default is w:1w:majority is what most people should use for durability. Majority is a special token here signifying more than half of the nodes in the set have acknowledged the write.
  • #27: Using 'someDCs' so that in the event of an outage, at least a majority of the DCs would receive the change. This favors availability over durability.
  • #28: Using 'allDCs' because we want to make certain all DCs have this piece of data. If any of the DCs are down, this would timeout. This favors durability over availability.
  • #33: Indexes should be contained in working set.
  • #35: From mainframes, to RAC Oracle servers... People solved problems by adding more resources to a single machine.
  • #36: Large scale operation can be combined with high performance on commodity hardware through horizontal scalingBuild - Document oriented database maps perfectly to object oriented languagesScale - MongoDB presents clear path to scalability that isn't ops intensive - Provides same interface for sharded cluster as single instance
  • #60: The mongos does not have to load the whole set into memory since each shard sorts locally. The mongos can just getMore from the shards as needed and incrementally return the results to the client.
  • #63: _id could be unique across shards if used as shard key.we could only guarantee uniqueness of (any) attributes if the keys are used as shard keys with unique attribute equals true