#MongoBoston




Strategies for Backing Up
MongoDB
Jeff Yemin
Engineering Manager, 10gen
File and Directory Layout
• A set of files per database
Insert with write concern of {fsync :
true}
Archive the data directory
Restore the data directory
Start mongod on restored data
directory
Everything is fine, right?
• No, it's not
• But you can't tell until you look
Try validating the collection
• In the shell, run the validate command
How can we get a clean
backup?
• kill mongod
• fsyncLock / fsyncUnlock
How can we get a clean
backup?
• mongodump
mongodump
• Snapshot of each collection
   – Does NOT represent a point in time, even for a single
     collection
• Can NOT be combined with fsyncLock
   – Remember, you can't read…

• You CAN dump directly from data files to get a
 point in time backup
   – mongodump –dbpath

• Can be costlier than archiving as FS level
Snaphot Query

                5

    2                   7


1       3   4       6       8   9
How can we get a clean
backup?
• journaling
Journaling
• Write-ahead log
• Guarantees a consistent view even after a hard
 crash
• Default behavior as of 2.0
• Journal stored in –dbpath /journal folder
• --journalCommitInterval* (2ms - 300ms)
Journaling implications for
backup
• Logical Volume Manager (LVM)
• LVM snapshots to the rescue
   –   lvcreate –size 100M –snapshot –name mdb-snap01 /dev/vg0/mongodb


• No shutdown or fsyncLock needed
• True point in time backup for a single instance
Replica Sets
Backing up a replica set
• Back up a (hidden) secondary
  –   kill mongod
  –   fsyncLock
  –   mongodump
  –   LVM snapshot
Mongodump for replica sets
• True point in time
   – mongodump –oplog
   – mongorestore –-oplogreplay

• Snapshot query of each collection, then replay
 the oplog at the end
   – Similar to how a new secondary does an initial sync
mongos                               config
Chunks!                               balancer
                                                                           config


                                                                           config



  1    2    3    4    13    14   15    16        25    26   27   28   37    38   39   40

  5    6    7    8    17    18   19    20        29    30   31   32   41    42   43   44

  9    10   11   12   21    22   23    24        33    34   35   36   45    46   47   48


      Shard 1              Shard 2                    Shard 3              Shard 4



Sharded clusters
Backing up a sharded cluster

• mongodump through mongos
  – (but no –oplog)

• mongorestore through mongos
Backup a Sharded Cluster
1. Stop Balancer, and wait till inactive (state:0)
      db.settings.update( { _id: "balancer" },
                          { $set : { stopped: true } } , true )
2. Stop a config server Backup Data
  –     Each shard
  –     Config server (mongodump --db config)
3. Restart config server
4. Resume balancer
#MongoBoston




Thank You
Jeff Yemin
Engineering Manager, 10gen

More Related Content

PPTX
Backup, Restore, and Disaster Recovery
PPT
Backup, restore and repair database in mongo db linux file
PPTX
Mongodb backup
PPTX
MongoDB Backup & Disaster Recovery
PPTX
Backing Up Data with MMS
PDF
Control your service resources with systemd
PDF
Comparison of-foss-distributed-storage
PDF
Setting up mongo replica set
Backup, Restore, and Disaster Recovery
Backup, restore and repair database in mongo db linux file
Mongodb backup
MongoDB Backup & Disaster Recovery
Backing Up Data with MMS
Control your service resources with systemd
Comparison of-foss-distributed-storage
Setting up mongo replica set

What's hot (19)

PDF
Performance comparison of Distributed File Systems on 1Gbit networks
PDF
Comparison of foss distributed storage
PDF
MongoDb scalability and high availability with Replica-Set
PDF
Improve your storage with bcachefs
PDF
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
PDF
Keeping your files safe in the post-Snowden era with SXFS
PDF
LizardFS-WhitePaper-Eng-v3.9.2-web
PDF
MongoDB Shard Cluster
PDF
1 m+ qps on mysql galera cluster
PPT
Intro to MySQL Master Slave Replication
PDF
Redis persistence in practice
ODP
Guaranteeing CloudStack Storage Performance
PDF
GlusterFS As an Object Storage
PDF
Kvm optimizations
PDF
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
PDF
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
PPTX
Cinder
PPTX
MYSQLDUMP & ZRM COMMUNITY (EN)
ODP
LSA2 - 02 Control Groups
Performance comparison of Distributed File Systems on 1Gbit networks
Comparison of foss distributed storage
MongoDb scalability and high availability with Replica-Set
Improve your storage with bcachefs
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
Keeping your files safe in the post-Snowden era with SXFS
LizardFS-WhitePaper-Eng-v3.9.2-web
MongoDB Shard Cluster
1 m+ qps on mysql galera cluster
Intro to MySQL Master Slave Replication
Redis persistence in practice
Guaranteeing CloudStack Storage Performance
GlusterFS As an Object Storage
Kvm optimizations
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
Cinder
MYSQLDUMP & ZRM COMMUNITY (EN)
LSA2 - 02 Control Groups
Ad

Similar to Strategies for Backing Up MongoDB (20)

KEY
Deployment Strategies (Mongo Austin)
KEY
Deployment Strategy
PPTX
Backup, Restore, and Disaster Recovery
KEY
Deployment Strategies
PDF
Deployment
PDF
MongoDB: Advantages of an Open Source NoSQL Database
PPTX
Keeping MongoDB Data Safe
PPTX
Mongosv 2011 - Sharding
PPTX
Backup, Restore, and Disaster Recovery
PPTX
Webinar: Keeping Your MongoDB Data Safe
KEY
Discover MongoDB - Israel
KEY
Sharding with MongoDB (Eliot Horowitz)
PPTX
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
KEY
Mongodb sharding
PDF
Mongodb workshop
PPT
On MongoDB backup
DOCX
MongoDB Replication and Sharding
KEY
2011 mongo sf-sharding
PPT
Mongo db roma replication and sharding
PPTX
Getting started with replica set in MongoDB
Deployment Strategies (Mongo Austin)
Deployment Strategy
Backup, Restore, and Disaster Recovery
Deployment Strategies
Deployment
MongoDB: Advantages of an Open Source NoSQL Database
Keeping MongoDB Data Safe
Mongosv 2011 - Sharding
Backup, Restore, and Disaster Recovery
Webinar: Keeping Your MongoDB Data Safe
Discover MongoDB - Israel
Sharding with MongoDB (Eliot Horowitz)
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
Mongodb sharding
Mongodb workshop
On MongoDB backup
MongoDB Replication and Sharding
2011 mongo sf-sharding
Mongo db roma replication and sharding
Getting started with replica set in MongoDB
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Strategies for Backing Up MongoDB

  • 1. #MongoBoston Strategies for Backing Up MongoDB Jeff Yemin Engineering Manager, 10gen
  • 2. File and Directory Layout • A set of files per database
  • 3. Insert with write concern of {fsync : true}
  • 4. Archive the data directory
  • 5. Restore the data directory
  • 6. Start mongod on restored data directory
  • 7. Everything is fine, right? • No, it's not • But you can't tell until you look
  • 8. Try validating the collection • In the shell, run the validate command
  • 9. How can we get a clean backup? • kill mongod • fsyncLock / fsyncUnlock
  • 10. How can we get a clean backup? • mongodump
  • 11. mongodump • Snapshot of each collection – Does NOT represent a point in time, even for a single collection • Can NOT be combined with fsyncLock – Remember, you can't read… • You CAN dump directly from data files to get a point in time backup – mongodump –dbpath • Can be costlier than archiving as FS level
  • 12. Snaphot Query 5 2 7 1 3 4 6 8 9
  • 13. How can we get a clean backup? • journaling
  • 14. Journaling • Write-ahead log • Guarantees a consistent view even after a hard crash • Default behavior as of 2.0 • Journal stored in –dbpath /journal folder • --journalCommitInterval* (2ms - 300ms)
  • 15. Journaling implications for backup • Logical Volume Manager (LVM) • LVM snapshots to the rescue – lvcreate –size 100M –snapshot –name mdb-snap01 /dev/vg0/mongodb • No shutdown or fsyncLock needed • True point in time backup for a single instance
  • 17. Backing up a replica set • Back up a (hidden) secondary – kill mongod – fsyncLock – mongodump – LVM snapshot
  • 18. Mongodump for replica sets • True point in time – mongodump –oplog – mongorestore –-oplogreplay • Snapshot query of each collection, then replay the oplog at the end – Similar to how a new secondary does an initial sync
  • 19. mongos config Chunks! balancer config config 1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40 5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4 Sharded clusters
  • 20. Backing up a sharded cluster • mongodump through mongos – (but no –oplog) • mongorestore through mongos
  • 21. Backup a Sharded Cluster 1. Stop Balancer, and wait till inactive (state:0) db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ) 2. Stop a config server Backup Data – Each shard – Config server (mongodump --db config) 3. Restart config server 4. Resume balancer

Editor's Notes

  • #10: Do the fsyncLock/fsyncUnlock demo
  • #12: i need a picture for the first bullet
  • #15: Make the point that while you can turn journaling off, you shouldn't.Without journaling, the approach is quite straightforward, there is a one-to-one mapping of data files to memory and when either the OS or an explicit fsync happens, your data is now safe on disk.With journaling we do some tricks.Write ahead log, that is, we write the data to the journal before we update the data itself.Each file is mapped twice, once to a private view which is marked copy-on-write, and once to the shared view – shared in the context that the disk has access to this memory.Every time we do a write, we keep a list of the region of memory that was written to.Batches into group commits, compresses and appends in a group commit to disk by appending to a special journal fileOnce that data has been written to disk, we then do a remapping phase which copies the changes into the shared view, at which point those changes can then be synced to disk.Once that data is synced to disk then it’s safe (barring hardware failure). If there is a failure before the shared/storage view is written to disk, we simply need to apply all the changes in order to the data files since the last time it was synced and we get back to a consistent view of the data
  • #16: LVM Logical volume manager. LVM is a program that abstracts disk images from physical devices, and provides a number of raw disk manipulation and snapshot capabilities useful for system management.LvcreateThis command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodbvolume in the vg0 volume group.This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and paths to your systems volume groups and devices may vary slightly depending on your operating system’sLVM configuration.The snapshot has a cap of at 100 megabytes, because of the parameter --size 100M. This size does not reflect the total amount of the data on the disk, but rather the quantity of differences between the current state of /dev/vg0/mongodb and the creation of the snapshot (i.e. /dev/vg0/mdb-snap01.) Make sure you size this big enough.EBS:If your deployment depends on Amazon’s Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform’s snapshot tool. As a result you may: 1. Flush all writes to disk and create a write lock to ensure consistent state during the backup process. If you choose this option see Backup Without Journaling. 2. Configure LVM to run and hold your MongoDB data files on top of the RAID within your system.If you choose this option, perform the LVM backup operation described in Create Snapshot
  • #18: If the secondary is hidden, then options are more varied. Killing and locking are valid options, so long as there is enough spare capacity in the system to catch up after the backup is complete
  • #21: Ok if you have enough space to store all the data on all the shards