SlideShare a Scribd company logo
June-21-2019
MongoDB HA, what can go wrong?
{"name": "Igor Donchovski",
"live_in": "Skopje",
"email": "donchovski@pythian.com",
"current_role": "Lead database consultant",
"education": [{"type": "College", "name": "FEIT", "graduated": "2008", "university": "UKIM"},
{"type": "Master", "name": "FINKI", "graduated": "2013", "university": "UKIM"}],
"work": [{"role": "Web developer", "start": "2007", "end": "2012", "company": "Gord Systems"},
{"role": "DBA", "start": "2012", "end": "2014", "company": "NOVP"},
{"role": "Database consultant", "start": "2014", "end": "2016", "company": "Pythian"},
{"role": "Lead database consultant", "start": "2016", "company": "Pythian"}],
"certificates": [{"name": "C100DBA", "year": "2016", "description": "MongoDB certified DBA"}],
"social": [{"network": "LinkedIn", "link": "www.linkedin.com/in/igorle"},
{"network": "Twitter", "link": "https://twitter.com/igorle", "handle": "@igorle"}],
"interests": ["Hiking", "Biking", "Traveling"],
"hobbies": ["Painting", "Photography", "Cooking"],
"proud_of": ["Volunteering", "Helping the Community"]}
About Me
© 2019 Pythian. Confidential
• What is replica set, how replication works
• Replication concept
• Replica set features, deployment architectures
• Hidden nodes, Arbiter nodes, Priority 0 nodes
• Production failures
• Monitoring replica set
• QA
Overview
© 2019 Pythian. Confidential
Time
© 2019 Pythian. Confidential
Replication
• Group of mongod processes that maintain the same data set
• Redundancy and high availability
• Increased read capacity (scaling reads)
• Automatic failover
Replica Set
# Members # Nodes Required to Elect New Primary Fault Tolerance
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3
© 2019 Pythian. Confidential
priority:1 votes:1
priority:1 votes:1 priority:1 votes:1
Replication Concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary
1.
© 2019 Pythian. Confidential
Replication Concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary
2. oplog
1.
© 2019 Pythian. Confidential
Replication Concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary
2. oplog
1.
3. 3.
© 2019 Pythian. Confidential
Replication Concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary
© 2018 Pythian. Confidential
2. oplog
1.
3. 3.
4. 4.
Replication Concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary*
*settings.chainingAllowed (true by default)
2. oplog
1.
3. 3.
4. 4.
5.
© 2019 Pythian. Confidential
Replica Set Oplog
• Special capped collection that keeps a rolling record of all operations that
modify the data stored in the databases
• Idempotent
• Default oplog size
For Unix and Windows systems
Storage Engine Default Oplog Size Lower Bound Upper Bound
In-memory 5% of physical memory 50MB 50GB
WiredTiger 5% of free disk space 990MB 50GB
MMAPv1 5% of free disk space 990MB 50GB
© 2019 Pythian. Confidential
© 2019 Pythian. Confidential
Configuration
Configuration Options
• 50 members per replica set (7 voting members)
• Arbiter node
• Priority 0 node
• Hidden node
• Delayed node
© 2019 Pythian. Confidential
• Does not hold copy of data
• Votes in elections
Arbiter Node
hidden : true
Arbiter
© 2019 Pythian. Confidential
Priority 0 Node
Priority - floating point (i.e. decimal) number between 0 and 1000
• Cannot become primary, cannot trigger election
• Visible to application (accepts reads/writes)
• Votes in elections
Secondary
priority : 0
© 2019 Pythian. Confidential
Hidden Node
• Not visible to application
• Never becomes primary, but can vote in elections
• Use cases
○ Reporting
○ Backups
hidden : truehidden: true priority:0
Secondary
hidden : true priority : 0
© 2019 Pythian. Confidential
Delayed Node
• Must be priority 0 member
• Should be hidden member (not mandatory)
• Mainly used for backups (historical snapshot of data)
• Recovery in case of human error
Secondary
slaveDelay : 3600
priority : 0
hidden : true
© 2019 Pythian. Confidential
© 2019 Pythian. Confidential
Everyone on the same page?
© 2019 Pythian. Confidential
Failures
Small Oplog Size
1. Primary/Secondary node down
○ Node failure
○ Planned maintenance
2. Automatic Failover
…… (several hours later)
3. New Primary overwrites latest oplog
4. Failed Node needs resync
MongoDB >= 3.6: db.adminCommand({replSetResizeOplog: 1, size: 32000})
© 2019 Pythian. Confidential
Arbiter Nodes
● Votes in election
● Does not hold copy of data
● If 2 nodes are down, no majority to elect
new Primary
● Fault tolerance is still 1 node
● 4 data nodes + 1 Arbiter makes more
sense
Heartbeat
© 2019 Pythian. Confidential
Priority 0 Nodes
● Application driver sends writes to Primary
● Reads go to Primary by default
● Secondaries can serve reads
● Read preference
○ primary (default)
○ primaryPreferred
○ secondary
○ secondaryPreferred
○ nearest
© 2019 Pythian. Confidential
• Primary node fails
• Replica set starts election for new Primary
• Zero nodes eligible for Primary
• Application can not send writes
• Database is read only*
*depends on read preference setting
Priority 0 Nodes
© 2019 Pythian. Confidential
Hidden Nodes
● Application driver sends writes to Primary
● Reads go to Primary by default
● Secondaries cannot serve reads
● Read preference
○ primary
© 2019 Pythian. Confidential
• Primary node fails
• Replica set starts election for new Primary
• Zero nodes eligible for Primary (priority:0)
• Application can not send writes/reads
• Downtime
Hidden Nodes
© 2019 Pythian. Confidential
• Primary node fails
• Secondary elected as new Primary
• Working set does not fit in memory
• Performance degradation
• Application stalls
Hardware
64GB RAM, 16 CPU
32GB RAM, 8 CPU 32GB RAM, 8 CPU
© 2019 Pythian. Confidential
• Dataset grows
• No Disk space on Secondary
• mongod process fails
• 2 nodes replica set
• Zero tolerance for failures
Hardware
Disk: 300GB
Disk: 300GB Disk: 200GB
© 2019 Pythian. Confidential
● Heartbeat lost
● Primary step down
● New Primary election
● Application timeout*
● Rollback
Best Practice: Test Primary step
down for your application
*Retryable writes since MongoDB 3.6
Network
© 2019 Pythian. Confidential
• All replica set members deployed in single Availability Zone
• Availability Zone #1 goes down
• Downtime
Cloud
Cloud Deployment
Region #1
Availability Zone #1
© 2019 Pythian. Confidential
● Availability Zone #1 goes down
○ New Primary elected from AZ #2
● Availability Zone #2 goes down
○ Database is read only
Cloud Deployment
© 2019 Pythian. Confidential
Cloud
Region #1
AZ#1 AZ#2
• Region #1 goes down
• Downtime
Cloud Deployment
© 2019 Pythian. Confidential
Cloud
Region #1
AZ#1 AZ#2 AZ#3
● VM2 goes down
○ Primary node has majority on VM1
● VM1 goes down
○ Database is read only
Virtualization
VMWARE
VM1 VM2
Physical Server
© 2019 Pythian. Confidential
● Replica set major version upgrade (3.6>4.0)
● Driver v3.6 not compatible with DB v4.0
● Compatibility changes
● Application cannot send requests
● Downtime
● Rollback to previous DB version
Version Upgrades
MongoDB: 3.6.4 MongoDB: 3.6.4
© 2019 Pythian. Confidential
● Replica set major version upgrade
● Promote new version as Primary
● Confirm application works
● Forget to upgrade Secondaries
● Start using new features
● New Primary elected
● Application errors
Version Upgrades
MongoDB: 3.6 MongoDB: 3.6
MongoDB: 4.0
© 2019 Pythian. Confidential
● Minor version upgrade
● Promote new version as Primary
● Confirm application works
● Forget to upgrade Secondaries
● Bug fixes in minor release
● New Primary elected
● Application errors
Version Upgrades
MongoDB: 3.6.4 MongoDB: 3.6.4
MongoDB: 3.6.8
© 2019 Pythian. Confidential
Version Upgrades
MongoDB: 3.6.8MongoDB: 3.6.8MongoDB: 3.6.8
MongoDB: 3.6.8
MongoDB: 3.6.8
MongoDB: 3.6.3
MongoDB: 3.6.3
MongoDB: 3.6.8
MongoDB: 3.6.8MongoDB: 3.6.8
MongoDB: 3.6.8
MongoDB: 3.6.8
MongoDB: 3.6.8
MongoDB: 3.6.8
© 2019 Pythian. Confidential
MongoDB: 3.6.3
● Adding index on a collection
● Connect to the Primary node
○ db.people.createIndex( { zipcode: 1 }, { background: true } )
DDL Operation
© 2019 Pythian. Confidential
● Stop one Secondary
● Restart on different port
DDL Operation
Secondary
--port=27777
© 2019 Pythian. Confidential
● Add the Index
● Rejoin to replica
● Promote Secondary as Primary
● Forget the other nodes
DDL Operation
Secondary
--port=27777
db.people.createIndex({zipcode:1})
© 2019 Pythian. Confidential
● Pick one Secondary
● db.fsyncLock()
● Take snapshot
● db.fsyncUnlock()
● Unlock fails
● Secondary starts lagging
● Primary overwrites oplog
● Secondary needs initial sync
Backups
© 2019 Pythian. Confidential
© 2019 Pythian. Confidential
Sharded Clusters
© 2019 Pythian. Confidential
Sharded Clusters
© 2019 Pythian. Confidential
Monitoring Replica Set
• Replica set has no Primary
• Number of unhealthy members is above threshold
• Replication lag is above threshold
• Replica set elected new Primary
• Host of any type has restarted
• Host of type Secondary is recovering
• Host of any type is down
• Host of any type has experienced Rollback
• Network issues between members of the replica set or cluster
• Monitoring backup status
© 2019 Pythian. Confidential
Summary
• Replica set with odd number of voting members
• Hidden or Delayed member for dedicated functions (reporting, backups …)
• Have more than one eligible Primary in the replica set
• Use multi-AZ for Cloud deployments
• Don’t deploy more than one mongod process per node/host
• Run replica set members with same hardware for all nodes
• Run replica set members with same mongo version
• Monitor your replica set status and nodes
• Monitor replication lag and Oplog size
© 2019 Pythian. Confidential
Questions?
© 2019 Pythian. Confidential
We’re Hiring!
https://www.pythian.com/careers/
© 2019 Pythian. Confidential

More Related Content

PDF
Maintenance for MongoDB Replica Sets
PDF
How to scale MongoDB
PDF
Exploring the replication and sharding in MongoDB
PDF
Enhancing the default MongoDB Security
PDF
Working with MongoDB as MySQL DBA
PDF
Exploring the replication in MongoDB
PPTX
Back to Basics 2017: Introduction to Sharding
PPTX
Introduction to Sharding
Maintenance for MongoDB Replica Sets
How to scale MongoDB
Exploring the replication and sharding in MongoDB
Enhancing the default MongoDB Security
Working with MongoDB as MySQL DBA
Exploring the replication in MongoDB
Back to Basics 2017: Introduction to Sharding
Introduction to Sharding

What's hot (20)

PDF
How To Connect Spark To Your Own Datasource
PPTX
MongoDB and Spark
PDF
Webinar: Schema Patterns and Your Storage Engine
PPTX
2014 05-07-fr - add dev series - session 6 - deploying your application-2
PPT
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
PPTX
Webinar: Architecting Secure and Compliant Applications with MongoDB
PPT
Migrating to MongoDB: Best Practices
PPTX
Back to Basics Webinar 6: Production Deployment
PDF
Challenges with MongoDB
PDF
MongodB Internals
PPTX
Sharding
PPTX
MongoDB - External Authentication
PPTX
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
PDF
MongoDB Sharding Fundamentals
PDF
Sharding
PPTX
How sitecore depends on mongo db for scalability and performance, and what it...
PPTX
High Performance Applications with MongoDB
PDF
Mongo db dhruba
PDF
MongoDB Europe 2016 - Big Data meets Big Compute
PDF
Mongo db 3.4 Overview
How To Connect Spark To Your Own Datasource
MongoDB and Spark
Webinar: Schema Patterns and Your Storage Engine
2014 05-07-fr - add dev series - session 6 - deploying your application-2
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
Webinar: Architecting Secure and Compliant Applications with MongoDB
Migrating to MongoDB: Best Practices
Back to Basics Webinar 6: Production Deployment
Challenges with MongoDB
MongodB Internals
Sharding
MongoDB - External Authentication
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
MongoDB Sharding Fundamentals
Sharding
How sitecore depends on mongo db for scalability and performance, and what it...
High Performance Applications with MongoDB
Mongo db dhruba
MongoDB Europe 2016 - Big Data meets Big Compute
Mongo db 3.4 Overview
Ad

Similar to MongoDB HA - what can go wrong (20)

PDF
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
PPTX
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
PDF
Sidiq Permana - Building For The Next Billion Users
PDF
MongoDB @ Fiverr: The Road to Atlas
PDF
Angular v2 et plus : le futur du développement d'applications en entreprise
PDF
Scalable Application Development @ Picnic
PPTX
Splunk Phantom, the Endpoint Data Model & Splunk Security Essentials App!
PPTX
Modernisation of legacy PHP applications using Symfony2 - PHP Northeast Confe...
PPTX
Industrialiser spark
PDF
Deploying MariaDB for HA on Google Cloud Platform
PDF
Open Social Summit Korea Overview
PDF
Implementing MySQL Database-as-a-Service using open source tools
PDF
Android best practices 2015
PDF
Conquering Data Migration from Oracle to Postgres
 
PDF
IRJET- Industry Production Manager using Raspberry Pi
ODP
From Java Code to Java Heap: Understanding the Memory Usage of Your App - Ch...
PPTX
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
PPTX
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
PDF
Implementing OpenChain ISO/IEC 5230 at endjin
PDF
Kubernetes as data platform
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Sidiq Permana - Building For The Next Billion Users
MongoDB @ Fiverr: The Road to Atlas
Angular v2 et plus : le futur du développement d'applications en entreprise
Scalable Application Development @ Picnic
Splunk Phantom, the Endpoint Data Model & Splunk Security Essentials App!
Modernisation of legacy PHP applications using Symfony2 - PHP Northeast Confe...
Industrialiser spark
Deploying MariaDB for HA on Google Cloud Platform
Open Social Summit Korea Overview
Implementing MySQL Database-as-a-Service using open source tools
Android best practices 2015
Conquering Data Migration from Oracle to Postgres
 
IRJET- Industry Production Manager using Raspberry Pi
From Java Code to Java Heap: Understanding the Memory Usage of Your App - Ch...
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
Implementing OpenChain ISO/IEC 5230 at endjin
Kubernetes as data platform
Ad

Recently uploaded (20)

PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Well-logging-methods_new................
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
additive manufacturing of ss316l using mig welding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Sustainable Sites - Green Building Construction
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
UNIT 4 Total Quality Management .pptx
PPT
Project quality management in manufacturing
PPTX
Geodesy 1.pptx...............................................
R24 SURVEYING LAB MANUAL for civil enggi
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CH1 Production IntroductoryConcepts.pptx
bas. eng. economics group 4 presentation 1.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Well-logging-methods_new................
Automation-in-Manufacturing-Chapter-Introduction.pdf
additive manufacturing of ss316l using mig welding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Sustainable Sites - Green Building Construction
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CYBER-CRIMES AND SECURITY A guide to understanding
Lecture Notes Electrical Wiring System Components
UNIT 4 Total Quality Management .pptx
Project quality management in manufacturing
Geodesy 1.pptx...............................................

MongoDB HA - what can go wrong

  • 2. {"name": "Igor Donchovski", "live_in": "Skopje", "email": "donchovski@pythian.com", "current_role": "Lead database consultant", "education": [{"type": "College", "name": "FEIT", "graduated": "2008", "university": "UKIM"}, {"type": "Master", "name": "FINKI", "graduated": "2013", "university": "UKIM"}], "work": [{"role": "Web developer", "start": "2007", "end": "2012", "company": "Gord Systems"}, {"role": "DBA", "start": "2012", "end": "2014", "company": "NOVP"}, {"role": "Database consultant", "start": "2014", "end": "2016", "company": "Pythian"}, {"role": "Lead database consultant", "start": "2016", "company": "Pythian"}], "certificates": [{"name": "C100DBA", "year": "2016", "description": "MongoDB certified DBA"}], "social": [{"network": "LinkedIn", "link": "www.linkedin.com/in/igorle"}, {"network": "Twitter", "link": "https://twitter.com/igorle", "handle": "@igorle"}], "interests": ["Hiking", "Biking", "Traveling"], "hobbies": ["Painting", "Photography", "Cooking"], "proud_of": ["Volunteering", "Helping the Community"]} About Me © 2019 Pythian. Confidential
  • 3. • What is replica set, how replication works • Replication concept • Replica set features, deployment architectures • Hidden nodes, Arbiter nodes, Priority 0 nodes • Production failures • Monitoring replica set • QA Overview © 2019 Pythian. Confidential Time
  • 4. © 2019 Pythian. Confidential Replication
  • 5. • Group of mongod processes that maintain the same data set • Redundancy and high availability • Increased read capacity (scaling reads) • Automatic failover Replica Set # Members # Nodes Required to Elect New Primary Fault Tolerance 3 2 1 4 3 1 5 3 2 6 4 2 7 4 3 © 2019 Pythian. Confidential priority:1 votes:1 priority:1 votes:1 priority:1 votes:1
  • 6. Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary 1. © 2019 Pythian. Confidential
  • 7. Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary 2. oplog 1. © 2019 Pythian. Confidential
  • 8. Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary 2. oplog 1. 3. 3. © 2019 Pythian. Confidential
  • 9. Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary © 2018 Pythian. Confidential 2. oplog 1. 3. 3. 4. 4.
  • 10. Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary* *settings.chainingAllowed (true by default) 2. oplog 1. 3. 3. 4. 4. 5. © 2019 Pythian. Confidential
  • 11. Replica Set Oplog • Special capped collection that keeps a rolling record of all operations that modify the data stored in the databases • Idempotent • Default oplog size For Unix and Windows systems Storage Engine Default Oplog Size Lower Bound Upper Bound In-memory 5% of physical memory 50MB 50GB WiredTiger 5% of free disk space 990MB 50GB MMAPv1 5% of free disk space 990MB 50GB © 2019 Pythian. Confidential
  • 12. © 2019 Pythian. Confidential Configuration
  • 13. Configuration Options • 50 members per replica set (7 voting members) • Arbiter node • Priority 0 node • Hidden node • Delayed node © 2019 Pythian. Confidential
  • 14. • Does not hold copy of data • Votes in elections Arbiter Node hidden : true Arbiter © 2019 Pythian. Confidential
  • 15. Priority 0 Node Priority - floating point (i.e. decimal) number between 0 and 1000 • Cannot become primary, cannot trigger election • Visible to application (accepts reads/writes) • Votes in elections Secondary priority : 0 © 2019 Pythian. Confidential
  • 16. Hidden Node • Not visible to application • Never becomes primary, but can vote in elections • Use cases ○ Reporting ○ Backups hidden : truehidden: true priority:0 Secondary hidden : true priority : 0 © 2019 Pythian. Confidential
  • 17. Delayed Node • Must be priority 0 member • Should be hidden member (not mandatory) • Mainly used for backups (historical snapshot of data) • Recovery in case of human error Secondary slaveDelay : 3600 priority : 0 hidden : true © 2019 Pythian. Confidential
  • 18. © 2019 Pythian. Confidential Everyone on the same page?
  • 19. © 2019 Pythian. Confidential Failures
  • 20. Small Oplog Size 1. Primary/Secondary node down ○ Node failure ○ Planned maintenance 2. Automatic Failover …… (several hours later) 3. New Primary overwrites latest oplog 4. Failed Node needs resync MongoDB >= 3.6: db.adminCommand({replSetResizeOplog: 1, size: 32000}) © 2019 Pythian. Confidential
  • 21. Arbiter Nodes ● Votes in election ● Does not hold copy of data ● If 2 nodes are down, no majority to elect new Primary ● Fault tolerance is still 1 node ● 4 data nodes + 1 Arbiter makes more sense Heartbeat © 2019 Pythian. Confidential
  • 22. Priority 0 Nodes ● Application driver sends writes to Primary ● Reads go to Primary by default ● Secondaries can serve reads ● Read preference ○ primary (default) ○ primaryPreferred ○ secondary ○ secondaryPreferred ○ nearest © 2019 Pythian. Confidential
  • 23. • Primary node fails • Replica set starts election for new Primary • Zero nodes eligible for Primary • Application can not send writes • Database is read only* *depends on read preference setting Priority 0 Nodes © 2019 Pythian. Confidential
  • 24. Hidden Nodes ● Application driver sends writes to Primary ● Reads go to Primary by default ● Secondaries cannot serve reads ● Read preference ○ primary © 2019 Pythian. Confidential
  • 25. • Primary node fails • Replica set starts election for new Primary • Zero nodes eligible for Primary (priority:0) • Application can not send writes/reads • Downtime Hidden Nodes © 2019 Pythian. Confidential
  • 26. • Primary node fails • Secondary elected as new Primary • Working set does not fit in memory • Performance degradation • Application stalls Hardware 64GB RAM, 16 CPU 32GB RAM, 8 CPU 32GB RAM, 8 CPU © 2019 Pythian. Confidential
  • 27. • Dataset grows • No Disk space on Secondary • mongod process fails • 2 nodes replica set • Zero tolerance for failures Hardware Disk: 300GB Disk: 300GB Disk: 200GB © 2019 Pythian. Confidential
  • 28. ● Heartbeat lost ● Primary step down ● New Primary election ● Application timeout* ● Rollback Best Practice: Test Primary step down for your application *Retryable writes since MongoDB 3.6 Network © 2019 Pythian. Confidential
  • 29. • All replica set members deployed in single Availability Zone • Availability Zone #1 goes down • Downtime Cloud Cloud Deployment Region #1 Availability Zone #1 © 2019 Pythian. Confidential
  • 30. ● Availability Zone #1 goes down ○ New Primary elected from AZ #2 ● Availability Zone #2 goes down ○ Database is read only Cloud Deployment © 2019 Pythian. Confidential Cloud Region #1 AZ#1 AZ#2
  • 31. • Region #1 goes down • Downtime Cloud Deployment © 2019 Pythian. Confidential Cloud Region #1 AZ#1 AZ#2 AZ#3
  • 32. ● VM2 goes down ○ Primary node has majority on VM1 ● VM1 goes down ○ Database is read only Virtualization VMWARE VM1 VM2 Physical Server © 2019 Pythian. Confidential
  • 33. ● Replica set major version upgrade (3.6>4.0) ● Driver v3.6 not compatible with DB v4.0 ● Compatibility changes ● Application cannot send requests ● Downtime ● Rollback to previous DB version Version Upgrades MongoDB: 3.6.4 MongoDB: 3.6.4 © 2019 Pythian. Confidential
  • 34. ● Replica set major version upgrade ● Promote new version as Primary ● Confirm application works ● Forget to upgrade Secondaries ● Start using new features ● New Primary elected ● Application errors Version Upgrades MongoDB: 3.6 MongoDB: 3.6 MongoDB: 4.0 © 2019 Pythian. Confidential
  • 35. ● Minor version upgrade ● Promote new version as Primary ● Confirm application works ● Forget to upgrade Secondaries ● Bug fixes in minor release ● New Primary elected ● Application errors Version Upgrades MongoDB: 3.6.4 MongoDB: 3.6.4 MongoDB: 3.6.8 © 2019 Pythian. Confidential
  • 36. Version Upgrades MongoDB: 3.6.8MongoDB: 3.6.8MongoDB: 3.6.8 MongoDB: 3.6.8 MongoDB: 3.6.8 MongoDB: 3.6.3 MongoDB: 3.6.3 MongoDB: 3.6.8 MongoDB: 3.6.8MongoDB: 3.6.8 MongoDB: 3.6.8 MongoDB: 3.6.8 MongoDB: 3.6.8 MongoDB: 3.6.8 © 2019 Pythian. Confidential MongoDB: 3.6.3
  • 37. ● Adding index on a collection ● Connect to the Primary node ○ db.people.createIndex( { zipcode: 1 }, { background: true } ) DDL Operation © 2019 Pythian. Confidential
  • 38. ● Stop one Secondary ● Restart on different port DDL Operation Secondary --port=27777 © 2019 Pythian. Confidential
  • 39. ● Add the Index ● Rejoin to replica ● Promote Secondary as Primary ● Forget the other nodes DDL Operation Secondary --port=27777 db.people.createIndex({zipcode:1}) © 2019 Pythian. Confidential
  • 40. ● Pick one Secondary ● db.fsyncLock() ● Take snapshot ● db.fsyncUnlock() ● Unlock fails ● Secondary starts lagging ● Primary overwrites oplog ● Secondary needs initial sync Backups © 2019 Pythian. Confidential
  • 41. © 2019 Pythian. Confidential
  • 42. Sharded Clusters © 2019 Pythian. Confidential
  • 43. Sharded Clusters © 2019 Pythian. Confidential
  • 44. Monitoring Replica Set • Replica set has no Primary • Number of unhealthy members is above threshold • Replication lag is above threshold • Replica set elected new Primary • Host of any type has restarted • Host of type Secondary is recovering • Host of any type is down • Host of any type has experienced Rollback • Network issues between members of the replica set or cluster • Monitoring backup status © 2019 Pythian. Confidential
  • 45. Summary • Replica set with odd number of voting members • Hidden or Delayed member for dedicated functions (reporting, backups …) • Have more than one eligible Primary in the replica set • Use multi-AZ for Cloud deployments • Don’t deploy more than one mongod process per node/host • Run replica set members with same hardware for all nodes • Run replica set members with same mongo version • Monitor your replica set status and nodes • Monitor replication lag and Oplog size © 2019 Pythian. Confidential