SlideShare a Scribd company logo
Approaching 1 Billion
       Documents in MongoDB




                    David Mytton
1/30      david@boxedice.com / @davidmytton
Server Density Monitoring


       Processing       Database            UI




2/30
                    www.serverdensity.com
Cache / Data Store
                      Postback




       checksLatest              checksHistorical



3/30
db.stats()
       Documents                  937,393,315

       Collections                      27,566

       Indexes                          45,277

       Stored data                      638GB

       Inserts                    5000-8000/s


4/30
                 As of 17th Jun 2010.
13 months ago




5/30
       Why we moved: http://bit.ly/mysqltomongo
Initial Setup

                     Replication




       Master                       Slave
         DC1                         DC2
       8GB RAM                     8GB RAM

6/30
Vertical Scaling

                  Replication




        Master                   Slave
         DC1                      DC2
       72GB RAM                 8GB RAM

7/30
Tip #1

       Keep your indexes in
       memory at all times.

           db.stats()


8/30
i/o not an issue




9/30
Tip #2
         Data is flushed to disk every 60s.


        db.runCommand({fsync:1});


             --syncdelay [60]

10/30
Sharding solves
          everything




11/30
Manual Partitioning
                      Replication




           Master A                   Slave A
            DC1                        DC2
          16GB RAM                  16GB RAM


                      Replication




           Master B                   Slave B
            DC1                        DC2
12/30     16GB RAM                  16GB RAM
Sustained Traffic



                  Master                      Slave
        Avg out:       2.4Mbit/s   Avg out:       4.0Mbit/s
        Avg in:        3.8Mbit/s   Avg in:      111.2Kbit/s

13/30
Database vs collections


        • Many databases = many data files (small but
          quickly get large).
        • Many collections = watch namespace limit.

14/30
Namespaces = Number of collections +
                number of indexes




15/30
Tip #3


        Monitor the 24,000
         namespace limit.


16/30
Using Server Density




17/30
Console

        db.system.namespaces.count()




18/30
Replica Pairs = Failover
                        Replica Pair




             Master A                    Slave A
              DC1                         DC2
            16GB RAM                   16GB RAM


                        Replica Pair




             Master B                    Slave B
              DC1                         DC2
19/30       16GB RAM                   16GB RAM
Tip #4


        Pre-provision your oplog files.



20/30
A shell script to generate 75GB oplog files




          for i in {0..40}
        do echo $i
        head -c 2146435072 /dev/zero > local.$i
        done




21/30
Tip #5


        Expect slower performance
         during initial replica sync.


22/30
Tip #6


        You can rotate your log files
             from the console.


23/30
Rotating your log files

         db.runCommand("logRotate")




24/30
Tip #7

            Index creation blocks by
            default. Use background
              indexing if necessary.



25/30
        MongoDB Manual: http://bit.ly/mongobgindex
Tip #8

         Increase your OS file
         descriptor limit + use
        persistent connections.


26/30
Too many open files!
                 /etc/security/limits.conf
         mongo hard nofile 10000
         mongo soft nofile 10000
          user                 type          limit




                  /etc/ssh/sshd_config

                     UsePAM yes
27/30
Space is not reused
        Data + indexes                  551GB

        Actual disk usage               638GB


                         Fixed in
        1.1.4 1.3.x 1.5.0 1.5.1 1.5.2 1.5.3 1.5.4?

28/30
                     JIRA: SERVER-366
Summary
        1. Keep indexes in memory.
        2. Data is flushed to disk every 60s.
        3. Monitor the 24k namespace limit.
        4. Pre-provision oplog files.
        5. Expect slower performance on replica sync.
        6. Rotate logs from the console.
        7. Index creation blocks by default.
29/30   8. OS file descriptor limit + persistent connections.
Slides
          blog.boxedice.com/mongodb




                  David Mytton
30/30   david@boxedice.com / @davidmytton

More Related Content

PDF
Webinar - Approaching 1 billion documents with MongoDB
PDF
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
PPT
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
PPTX
Understanding and tuning WiredTiger, the new high performance database engine...
PDF
Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...
PPTX
MongoDB Memory Management Demystified
PPTX
Ops Jumpstart: Admin 101
PPT
Redis深入浅出
Webinar - Approaching 1 billion documents with MongoDB
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Understanding and tuning WiredTiger, the new high performance database engine...
Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...
MongoDB Memory Management Demystified
Ops Jumpstart: Admin 101
Redis深入浅出

What's hot (20)

PDF
深入了解Redis
KEY
Sphinx at Craigslist in 2012
PPTX
分析mysql acid 设计实现
PDF
No sql but even less security
PPTX
Strategies for Backing Up MongoDB
PPTX
Introduction to Redis
PPT
Oreilly Webcast 01 19 10
PPT
Hypertable Berlin Buzzwords
PDF
MongoDB Shard Cluster
PDF
XtraDB 5.6 and 5.7: Key Performance Algorithms
PPTX
This is redis - feature and usecase
ODP
Introduction to MongoDB with PHP
PDF
5 Tips for Getting Started with Pivotal GemFire
PDF
Containers > VMs
PDF
Boosting I/O Performance with KVM io_uring
PDF
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
PDF
Erlang scheduler
PPT
Oreilly Webcast Jun17
PDF
Redis acc 2015_eng
PDF
TokuDB internals / Лесин Владислав (Percona)
深入了解Redis
Sphinx at Craigslist in 2012
分析mysql acid 设计实现
No sql but even less security
Strategies for Backing Up MongoDB
Introduction to Redis
Oreilly Webcast 01 19 10
Hypertable Berlin Buzzwords
MongoDB Shard Cluster
XtraDB 5.6 and 5.7: Key Performance Algorithms
This is redis - feature and usecase
Introduction to MongoDB with PHP
5 Tips for Getting Started with Pivotal GemFire
Containers > VMs
Boosting I/O Performance with KVM io_uring
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Erlang scheduler
Oreilly Webcast Jun17
Redis acc 2015_eng
TokuDB internals / Лесин Владислав (Percona)
Ad

Similar to MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents (20)

PPTX
Open Source Data Deduplication
PDF
Some analysis of BlueStore and RocksDB
PDF
Stabilizing Ceph
PDF
Loadays MySQL
PDF
XT Best Practices
PPTX
High Performance Scaling Techniques in Golang Using Go Assembly
PDF
Database performance tuning for SSD based storage
PPTX
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
PDF
SSD based storage tuning for databases
PPTX
Lrz kurs: big data analysis
PDF
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
PDF
Backing up Wikipedia Databases
PDF
Speedrunning the Open Street Map osm2pgsql Loader
PPTX
SDC20 ScaleFlux.pptx
PDF
ZFS and MySQL on Linux, the Sweet Spots
PDF
Dmx3 950-technical specifications
PDF
The Smug Mug Tale
PDF
The Proper Care and Feeding of MySQL Databases
PDF
The Next Chapter in the Sordid Love/Hate Relationship Between DBs and OSes by...
PDF
NoSQL with MySQL
Open Source Data Deduplication
Some analysis of BlueStore and RocksDB
Stabilizing Ceph
Loadays MySQL
XT Best Practices
High Performance Scaling Techniques in Golang Using Go Assembly
Database performance tuning for SSD based storage
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
SSD based storage tuning for databases
Lrz kurs: big data analysis
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Backing up Wikipedia Databases
Speedrunning the Open Street Map osm2pgsql Loader
SDC20 ScaleFlux.pptx
ZFS and MySQL on Linux, the Sweet Spots
Dmx3 950-technical specifications
The Smug Mug Tale
The Proper Care and Feeding of MySQL Databases
The Next Chapter in the Sordid Love/Hate Relationship Between DBs and OSes by...
NoSQL with MySQL
Ad

More from Boxed Ice (8)

PDF
MongoDB Tokyo - Monitoring and Queueing
PDF
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
PDF
MongoDB - Monitoring and queueing
PDF
MongoDB - Monitoring & queueing
PDF
Monitoring MongoDB (MongoUK)
PDF
Monitoring MongoDB (MongoSV)
PDF
MongoUK - PHP Development
PDF
MongoUK - PHP Development
MongoDB Tokyo - Monitoring and Queueing
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoDB - Monitoring and queueing
MongoDB - Monitoring & queueing
Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoSV)
MongoUK - PHP Development
MongoUK - PHP Development

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Spectroscopy.pptx food analysis technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PPT
Teaching material agriculture food technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
sap open course for s4hana steps from ECC to s4
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Electronic commerce courselecture one. Pdf
Programs and apps: productivity, graphics, security and other tools
Spectroscopy.pptx food analysis technology
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
Teaching material agriculture food technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
The AUB Centre for AI in Media Proposal.docx
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Empathic Computing: Creating Shared Understanding

MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents

  • 1. Approaching 1 Billion Documents in MongoDB David Mytton 1/30 david@boxedice.com / @davidmytton
  • 2. Server Density Monitoring Processing Database UI 2/30 www.serverdensity.com
  • 3. Cache / Data Store Postback checksLatest checksHistorical 3/30
  • 4. db.stats() Documents 937,393,315 Collections 27,566 Indexes 45,277 Stored data 638GB Inserts 5000-8000/s 4/30 As of 17th Jun 2010.
  • 5. 13 months ago 5/30 Why we moved: http://bit.ly/mysqltomongo
  • 6. Initial Setup Replication Master Slave DC1 DC2 8GB RAM 8GB RAM 6/30
  • 7. Vertical Scaling Replication Master Slave DC1 DC2 72GB RAM 8GB RAM 7/30
  • 8. Tip #1 Keep your indexes in memory at all times. db.stats() 8/30
  • 9. i/o not an issue 9/30
  • 10. Tip #2 Data is flushed to disk every 60s. db.runCommand({fsync:1}); --syncdelay [60] 10/30
  • 11. Sharding solves everything 11/30
  • 12. Manual Partitioning Replication Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replication Master B Slave B DC1 DC2 12/30 16GB RAM 16GB RAM
  • 13. Sustained Traffic Master Slave Avg out: 2.4Mbit/s Avg out: 4.0Mbit/s Avg in: 3.8Mbit/s Avg in: 111.2Kbit/s 13/30
  • 14. Database vs collections • Many databases = many data files (small but quickly get large). • Many collections = watch namespace limit. 14/30
  • 15. Namespaces = Number of collections + number of indexes 15/30
  • 16. Tip #3 Monitor the 24,000 namespace limit. 16/30
  • 18. Console db.system.namespaces.count() 18/30
  • 19. Replica Pairs = Failover Replica Pair Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replica Pair Master B Slave B DC1 DC2 19/30 16GB RAM 16GB RAM
  • 20. Tip #4 Pre-provision your oplog files. 20/30
  • 21. A shell script to generate 75GB oplog files for i in {0..40} do echo $i head -c 2146435072 /dev/zero > local.$i done 21/30
  • 22. Tip #5 Expect slower performance during initial replica sync. 22/30
  • 23. Tip #6 You can rotate your log files from the console. 23/30
  • 24. Rotating your log files db.runCommand("logRotate") 24/30
  • 25. Tip #7 Index creation blocks by default. Use background indexing if necessary. 25/30 MongoDB Manual: http://bit.ly/mongobgindex
  • 26. Tip #8 Increase your OS file descriptor limit + use persistent connections. 26/30
  • 27. Too many open files! /etc/security/limits.conf mongo hard nofile 10000 mongo soft nofile 10000 user type limit /etc/ssh/sshd_config UsePAM yes 27/30
  • 28. Space is not reused Data + indexes 551GB Actual disk usage 638GB Fixed in 1.1.4 1.3.x 1.5.0 1.5.1 1.5.2 1.5.3 1.5.4? 28/30 JIRA: SERVER-366
  • 29. Summary 1. Keep indexes in memory. 2. Data is flushed to disk every 60s. 3. Monitor the 24k namespace limit. 4. Pre-provision oplog files. 5. Expect slower performance on replica sync. 6. Rotate logs from the console. 7. Index creation blocks by default. 29/30 8. OS file descriptor limit + persistent connections.
  • 30. Slides blog.boxedice.com/mongodb David Mytton 30/30 david@boxedice.com / @davidmytton