SlideShare a Scribd company logo
Scaling MongoDB with Horizontal
and Vertical Sharding
Manosh Malai
CTO, Mydbops LLP
01st April 2023
MongoDB User Group Bangalore
Interested in Open Source technologies
Interested in MongoDB, DevOps & DevOpSec Practices
Tech Speaker/Blogger
CTO, Mydbops LLP
Manosh Malai
About Me
Consulting
Services
Managed
Services
Focuses on MySQL, MongoDB and PostgreSQL
Mydbops Services
Vertical Sharding
Horizontal Sharding
Introduction
Agenda
INTRODUCTION
Database Sharding
Database sharding is the process of storing a large database across
multiple machines
WHEN TO SHARD ?
When To Shard - I
Size of Data: If your database is becoming too large to fit on a single server,
sharding may be necessary to distribute the data across multiple servers.
Performance: Sharding can improve query performance by reducing the amount
of data that needs to be processed on a single server.
When To Shard - II
Scalability: Sharding enables you to horizontally scale out your MongoDB
database by distributing data across multiple nodes.
Availability and Redundancy: Sharding can improve query performance
by reducing the amount of data that needs to be processed on a single
server.
When To Shard - III
Availability: Sharding can improve the overall availability of your database
by providing redundancy across multiple nodes.
Flexibility: Sharding enables you to distribute data across multiple nodes
based on your specific requirements.
When To Shard - IV
Cost-effectiveness: Sharding can be a cost-effective way to scale out
your database. Rather than purchasing expensive hardware to support a
single, monolithic database.
Type Of Sharding
Vertical
Sharding
Horizontal
Sharding
Will MongoDB Support Vertical Sharding?
Vertical Sharding
Session
Session
Product Catalog
Carts
Product Catalog
Checkouts
Carts
Checkouts
Distributing tables across multiple Standalone / Replica / Shards
Vertical Sharding Strategy - Pros
Different data access patterns:
▪ Vertical sharding may be useful when different table are accessed at different frequencies or
have different access patterns.
▪ By splitting these tables into different shards, the performance of queries that only need to
access a subset of columns can be improved.
Better data management:
▪ Vertical sharding can provide better control over data access, as sensitive or confidential data
can be stored separately from other data. This can help with compliance with regulations such
as GDPR or HIPAA.
Vertical Sharding Strategy - Cons
Data Interconnectedness:
▪ Vertical sharding may not be the best solution for databases with heavily interconnected data. If
there is a need for complex joins or queries across multiple columns, horizontal sharding or
other scaling strategies may be more appropriate.
Limited Scalability:
▪ Only Suitable for Small or Medium data size.
How We Can Achieve Vertical Sharding?
▪ Service Discovery
▪ Consul
▪ Etcd
▪ ZooKeeper
▪ Data Sync
▪ Mongopush
▪ mongosync
▪ mongodump&mongorestore
Vertical Sharding Strategy
Vertical Sharding: Service Discovery and Data Migration
▪ Use Consul to dynamically discover the nodes in your MongoDB cluster and route traffic to them accordingly.
▪ Mongopush sync the data from X1 Cluster to X2 Cluster
Type Of Sharding
Vertical
Sharding
Horizontal
Sharding
Will MongoDB Support Horizontal Sharding?
What MongoDB Horizontal Sharding and Its Components
Each shard contains a subset of the sharded data
Mongos
Con g Server
Shards
Shard Key
Collection Shard Key
Divide and distribute collection evenly using shard key
The shard key consists of a field or fields that exists in the every document in a collection
MongoDB Shard Key
IO Scheduler
Range Sharding
Hash Sharding
Zone Sharding
Pros Cons
▪ Even Data Distribution
▪ Even Read and Write Workload
Distribution
• Range queries likely trigger
expensive
• broadcast operation
Pros Cons
▪ Even Data Distribution
▪ Target Operation for both single
and ranged queries
▪ Even Read and Write Workload
Distribution
• Susceptible to the selection and
usage of good shard key that used
in both read and write queries
Pros Cons
• Isolate a specific subset of data on
the specific set of shards
• Data geographically closet to
application servers
• Data tiering and sla's based on
shard hardware
• Susceptible to the selection and
usage of good shard key that used
in both read and write queries
Target and Broadcast Operation
db.collection. nd({ })
Target Query
Broadcast Query
db.collection. nd({ })
Shard Key Indexes
Single- eld Ascending Index
Single- eld Hashed Index
Compound Ascending Index
Compound Hashed Index
Declare Shard Key
sh.shardCollection("db.test", {"fieldA" : 1, "fieldB": "hashed"}, false/true, {numInitialChunks: 5, collation: { locale: "simple" }})
sh.shardCollection(namespace, key, unique, options)
▪ When the collection is empty, sh.shardCollection() generates an index on the shard key if an index for that
key does not already exist.
▪ If the collection is not empty, you must create the index first before using sh.shardCollection()
▪ It is not possible to have a shard key index that indicates a multikey index, text index, or geospatial index on
the fields of the shard key.
▪ MongoDB can enforce a uniqueness constraint on ranged shard key index only.
▪ In a compound index with uniqueness, where the shard key is a prefix
▪ MongoDB ensures uniqueness across the entire key combination, rather than individual components of the
shard key.
Shard Key Improvement After MongoDB v4.2
WITHOUT PREFIX COMPRESSION
Mutable Shard key value (v4.2)
Re nable Shard Key (v4.4)
Compound Hashed Shard Key (v4.4)
Live Resharding(v5.0)
What and Why Refinable Shard Key (v4.4)
Shard Key: customer_id
Re ning Shard
Key
db.adminCommand({refineCollectionShardKey:
database.collection, key:{<existing Key>, <New Suffix1>: <1|""hashed">,...}})
21%
15%
64%
Shard A Shard B Shard C
▪ Refine at any time
▪ No Database downtime
Refining a collection's shard key
improves data distribution and resolves
issues caused by insufficient cardinality
leading to jumbo chunks.
Refinable Shard Key (v4.4)
Shard Key: vehical_no Re ning Shard
Key
db.adminCommand({refineCollectionShardKey: "mydb.test", key:
{vehical_no: 1, user_mnumber: "hashed"}})
Avoid changing the range or hashed type for any existing shard key fields, as it can lead to
inconsistencies in data. For instance, refrain from changing a shard key such as { vehicle_no: 1 }
to { vehicle_no: "hashed", order_id: 1 }.
▪ For refining shard keys, your cluster must have a version of at least 4.4 and a feature compatibility version of 4.4.
▪ Retain the same prefix when defining the new shard key, i.e., it must begin with the same field(s) as the existing
shard key.
▪ When refining shard keys, additional fields can only be added as suffixes to the existing shard key.
▪ To support the modified shard key, it is necessary to create a new index.
▪ Prior to executing the refineCollectionShardKey command, it is essential to stop the balancer.
▪ sh.status to see the status
Guidelines for Refining Shard Keys
Compound Hashed Shard Key (v4.4)
21%
15%
64%
Shard A Shard B Shard C
Existing Shard Key: vehical_no
New Shard Key: vehical_no, user_mnumber
sh.shardCollection( "test.order", {"vehical_no": 1, "user_mnumber": "hashed"})
sh.shardCollection( "test.order", {"vehical_no": "hashed", "user_mnumber": 1})
▪ Overcome Monotonicall
increase key
Live Resharding(v5.0)
Resharding without downtime
Any Combinations Change
Compound Hash Range
Range Range
Range Hash
Resharding Process Flow
▪ Before starting a sharding operation on a collection of 1 TB size, it is recommended to have a minimum of
1.2 TB of free storage.
▪ I/O: Ensure that your I/O capacity is below 50%.
▪ CPU load: Ensure your CPU load is below 80%.
Rewrite your application's queries to use both the current shard key and the new shard key
rewrite your application's queries to use the new shard key without reload
Monitor the resharding process, use a $currentOp pipeline stage
Deploy your rewritten application
Resharding Who's Donor and Recipients
• Donor are shards which currently own chunks of the sharded collection
• Recipients are shards which would own chunks of the sharded collection according to the new
shard key and zones
Resharding Internal Process Flow
Commit Phase
Clone, Apply, and Catch-up
Phase
Index Phase
Initialization Phase The balancer determines the new data distribution for the sharded collection.
A new empty sharded collection, with the same collection options as the original one, is
created by each shard recipient.
This new collection serves as the target for the new data written by the recipient shards.
Each shard recipient builds the necessary new indexes.
• Each recipient of a shard makes a copy of the initial documents that it would be
responsible for under the new shard key
• Each shard recipient begins applying oplog entries from operations that happened after the
recipient cloned the data.
• When all shards have reached strict consistency, the resharding coordinator commits
the resharding operation and installs the new routing table.
• The resharding coordinator instructs each donor and recipient shard primary,
independently, to rename the temporary sharded collection. The temporary collection
becomes the new resharded collection
• Each donor shard drops the old sharded collection.
Resharding Process Command
db.adminCommand({
reshardCollection: "mydb.test",
key: {"vehical_no": 1, "user_mnumber": "hashed"}
})
Start the resharding operation
Monitor the resharding operation
db.getSiblingDB("admin").aggregate([
{ $currentOp: { allUsers: true, localOps: false } },
{
$match: {
type: "op",
"originatingCommand.reshardCollection": "mydb.test"
}}])
Abort resharding operation
db.adminCommand({
abortReshardCollection: "mydb.test"
})
To summarize, what issue does this feature resolve?
• Jumbo Chunks
• Uneven Load Distribution
• Decreased Query Performance Over Time by Scatter-gather queries
Reach Us : Info@mydbops.com
Thank You
Reach Us : Info@mydbops.com
Thank You
Database End Of The Life
MySQL 5.7 31 Oct 2023
MongoDB 4.2 30 April 2023
MongoDB 4.4 29 Feb 2024
PostgreSQL 11 9 Nov 2023

More Related Content

PPTX
Impala presentation
PDF
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
PPT
An overview of snowflake
PPTX
What’s New in Oracle E-Business Suite R12.2 for Database Administrators?
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PPTX
Performance Optimizations in Apache Impala
PDF
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
PPT
Solr Presentation
Impala presentation
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
An overview of snowflake
What’s New in Oracle E-Business Suite R12.2 for Database Administrators?
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Performance Optimizations in Apache Impala
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Solr Presentation

What's hot (20)

PPTX
What to Expect From Oracle database 19c
PDF
MariaDB Xpand 고객사례 안내.pdf
PPTX
How to size up an Apache Cassandra cluster (Training)
PPTX
Tuning PostgreSQL for High Write Throughput
PPTX
Exadata Backup
PDF
Oracle RAC - New Generation
PPTX
HBaseCon 2015: HBase Performance Tuning @ Salesforce
PPT
SQL Server Cluster Presentation
PPTX
Transforming Infrastructure into Code - Importing existing cloud resources u...
PDF
Write Faster SQL with Trino.pdf
PDF
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
PPTX
Integrating Microservices with Apache Camel
PDF
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
PDF
Inside MongoDB: the Internals of an Open-Source Database
PDF
Oracle Enterprise Manager Cloud Control 13c for DBAs
PPTX
Ozone- Object store for Apache Hadoop
PDF
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PPTX
Spark architecture
What to Expect From Oracle database 19c
MariaDB Xpand 고객사례 안내.pdf
How to size up an Apache Cassandra cluster (Training)
Tuning PostgreSQL for High Write Throughput
Exadata Backup
Oracle RAC - New Generation
HBaseCon 2015: HBase Performance Tuning @ Salesforce
SQL Server Cluster Presentation
Transforming Infrastructure into Code - Importing existing cloud resources u...
Write Faster SQL with Trino.pdf
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Integrating Microservices with Apache Camel
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
Inside MongoDB: the Internals of an Open-Source Database
Oracle Enterprise Manager Cloud Control 13c for DBAs
Ozone- Object store for Apache Hadoop
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Apache Tez: Accelerating Hadoop Query Processing
Spark architecture
Ad

Similar to Scaling MongoDB with Horizontal and Vertical Sharding (20)

PDF
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
PDF
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops Team
PDF
Scaling MongoDB - Presentation at MTP
PPTX
MongoDB : Scaling, Security & Performance
PPTX
Sharding Overview
PPTX
MongoDB Sharding
PPTX
Introduction to Sharding
PPTX
Introduction to Sharding
PDF
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
PDF
Sharding
PPTX
Sharding - Seoul 2012
PPTX
Hellenic MongoDB user group - Introduction to sharding
PPTX
DBVersity MongoDB Online Training Presentations
PDF
One to Many: The Story of Sharding at Box
ODP
MongoDB: Advance concepts - Replication and Sharding
PDF
MongoDB by Tonny
PPTX
Jose portillo dev con presentation 1138
PDF
What We Need to Unlearn about Persistent Storage
PDF
Avoiding Data Hotspots at Scale
PDF
No sq lv1_0
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops Team
Scaling MongoDB - Presentation at MTP
MongoDB : Scaling, Security & Performance
Sharding Overview
MongoDB Sharding
Introduction to Sharding
Introduction to Sharding
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
Sharding
Sharding - Seoul 2012
Hellenic MongoDB user group - Introduction to sharding
DBVersity MongoDB Online Training Presentations
One to Many: The Story of Sharding at Box
MongoDB: Advance concepts - Replication and Sharding
MongoDB by Tonny
Jose portillo dev con presentation 1138
What We Need to Unlearn about Persistent Storage
Avoiding Data Hotspots at Scale
No sq lv1_0
Ad

More from Mydbops (20)

PDF
Scaling TiDB for Large-Scale Application
PDF
AWS MySQL Showdown - RDS vs RDS Multi AZ vs Aurora vs Serverless - Mydbops...
PDF
Mastering Vector Search with MongoDB Atlas - Manosh Malai - Mydbops MyWebinar 39
PDF
Migration Journey To TiDB - Kabilesh PR - Mydbops MyWebinar 38
PDF
AWS Blue Green Deployment for Databases - Mydbops
PDF
What's New In MySQL 8.4 LTS Mydbops MyWebinar Edition 36
PDF
What's New in PostgreSQL 17? - Mydbops MyWebinar Edition 35
PDF
What's New in MongoDB 8.0 - Mydbops MyWebinar Edition 34
PDF
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
PDF
Read/Write Splitting using MySQL Router - Mydbops Meetup16
PDF
TiDB - From Data to Discovery: Exploring the Intersection of Distributed Dat...
PDF
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
PDF
Demystifying Real time Analytics with TiDB
PDF
Must Know Postgres Extension for DBA and Developer during Migration
PDF
Efficient MySQL Indexing and what's new in MySQL Explain
PDF
Scale your database traffic with Read & Write split using MySQL Router
PDF
PostgreSQL Schema Changes with pg-osc - Mydbops @ PGConf India 2024
PDF
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
PDF
Mastering Aurora PostgreSQL Clusters for Disaster Recovery
PDF
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Scaling TiDB for Large-Scale Application
AWS MySQL Showdown - RDS vs RDS Multi AZ vs Aurora vs Serverless - Mydbops...
Mastering Vector Search with MongoDB Atlas - Manosh Malai - Mydbops MyWebinar 39
Migration Journey To TiDB - Kabilesh PR - Mydbops MyWebinar 38
AWS Blue Green Deployment for Databases - Mydbops
What's New In MySQL 8.4 LTS Mydbops MyWebinar Edition 36
What's New in PostgreSQL 17? - Mydbops MyWebinar Edition 35
What's New in MongoDB 8.0 - Mydbops MyWebinar Edition 34
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Read/Write Splitting using MySQL Router - Mydbops Meetup16
TiDB - From Data to Discovery: Exploring the Intersection of Distributed Dat...
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Demystifying Real time Analytics with TiDB
Must Know Postgres Extension for DBA and Developer during Migration
Efficient MySQL Indexing and what's new in MySQL Explain
Scale your database traffic with Read & Write split using MySQL Router
PostgreSQL Schema Changes with pg-osc - Mydbops @ PGConf India 2024
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Mastering Aurora PostgreSQL Clusters for Disaster Recovery
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Electronic commerce courselecture one. Pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
sap open course for s4hana steps from ECC to s4
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
Electronic commerce courselecture one. Pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Scaling MongoDB with Horizontal and Vertical Sharding

  • 1. Scaling MongoDB with Horizontal and Vertical Sharding Manosh Malai CTO, Mydbops LLP 01st April 2023 MongoDB User Group Bangalore
  • 2. Interested in Open Source technologies Interested in MongoDB, DevOps & DevOpSec Practices Tech Speaker/Blogger CTO, Mydbops LLP Manosh Malai About Me
  • 3. Consulting Services Managed Services Focuses on MySQL, MongoDB and PostgreSQL Mydbops Services
  • 6. Database Sharding Database sharding is the process of storing a large database across multiple machines
  • 8. When To Shard - I Size of Data: If your database is becoming too large to fit on a single server, sharding may be necessary to distribute the data across multiple servers. Performance: Sharding can improve query performance by reducing the amount of data that needs to be processed on a single server.
  • 9. When To Shard - II Scalability: Sharding enables you to horizontally scale out your MongoDB database by distributing data across multiple nodes. Availability and Redundancy: Sharding can improve query performance by reducing the amount of data that needs to be processed on a single server.
  • 10. When To Shard - III Availability: Sharding can improve the overall availability of your database by providing redundancy across multiple nodes. Flexibility: Sharding enables you to distribute data across multiple nodes based on your specific requirements.
  • 11. When To Shard - IV Cost-effectiveness: Sharding can be a cost-effective way to scale out your database. Rather than purchasing expensive hardware to support a single, monolithic database.
  • 13. Will MongoDB Support Vertical Sharding?
  • 14. Vertical Sharding Session Session Product Catalog Carts Product Catalog Checkouts Carts Checkouts Distributing tables across multiple Standalone / Replica / Shards
  • 15. Vertical Sharding Strategy - Pros Different data access patterns: ▪ Vertical sharding may be useful when different table are accessed at different frequencies or have different access patterns. ▪ By splitting these tables into different shards, the performance of queries that only need to access a subset of columns can be improved. Better data management: ▪ Vertical sharding can provide better control over data access, as sensitive or confidential data can be stored separately from other data. This can help with compliance with regulations such as GDPR or HIPAA.
  • 16. Vertical Sharding Strategy - Cons Data Interconnectedness: ▪ Vertical sharding may not be the best solution for databases with heavily interconnected data. If there is a need for complex joins or queries across multiple columns, horizontal sharding or other scaling strategies may be more appropriate. Limited Scalability: ▪ Only Suitable for Small or Medium data size.
  • 17. How We Can Achieve Vertical Sharding? ▪ Service Discovery ▪ Consul ▪ Etcd ▪ ZooKeeper ▪ Data Sync ▪ Mongopush ▪ mongosync ▪ mongodump&mongorestore
  • 19. Vertical Sharding: Service Discovery and Data Migration ▪ Use Consul to dynamically discover the nodes in your MongoDB cluster and route traffic to them accordingly. ▪ Mongopush sync the data from X1 Cluster to X2 Cluster
  • 21. Will MongoDB Support Horizontal Sharding?
  • 22. What MongoDB Horizontal Sharding and Its Components Each shard contains a subset of the sharded data Mongos Con g Server Shards
  • 23. Shard Key Collection Shard Key Divide and distribute collection evenly using shard key The shard key consists of a field or fields that exists in the every document in a collection
  • 24. MongoDB Shard Key IO Scheduler Range Sharding Hash Sharding Zone Sharding Pros Cons ▪ Even Data Distribution ▪ Even Read and Write Workload Distribution • Range queries likely trigger expensive • broadcast operation Pros Cons ▪ Even Data Distribution ▪ Target Operation for both single and ranged queries ▪ Even Read and Write Workload Distribution • Susceptible to the selection and usage of good shard key that used in both read and write queries Pros Cons • Isolate a specific subset of data on the specific set of shards • Data geographically closet to application servers • Data tiering and sla's based on shard hardware • Susceptible to the selection and usage of good shard key that used in both read and write queries
  • 25. Target and Broadcast Operation db.collection. nd({ }) Target Query Broadcast Query db.collection. nd({ })
  • 26. Shard Key Indexes Single- eld Ascending Index Single- eld Hashed Index Compound Ascending Index Compound Hashed Index
  • 27. Declare Shard Key sh.shardCollection("db.test", {"fieldA" : 1, "fieldB": "hashed"}, false/true, {numInitialChunks: 5, collation: { locale: "simple" }}) sh.shardCollection(namespace, key, unique, options) ▪ When the collection is empty, sh.shardCollection() generates an index on the shard key if an index for that key does not already exist. ▪ If the collection is not empty, you must create the index first before using sh.shardCollection() ▪ It is not possible to have a shard key index that indicates a multikey index, text index, or geospatial index on the fields of the shard key. ▪ MongoDB can enforce a uniqueness constraint on ranged shard key index only. ▪ In a compound index with uniqueness, where the shard key is a prefix ▪ MongoDB ensures uniqueness across the entire key combination, rather than individual components of the shard key.
  • 28. Shard Key Improvement After MongoDB v4.2 WITHOUT PREFIX COMPRESSION Mutable Shard key value (v4.2) Re nable Shard Key (v4.4) Compound Hashed Shard Key (v4.4) Live Resharding(v5.0)
  • 29. What and Why Refinable Shard Key (v4.4) Shard Key: customer_id Re ning Shard Key db.adminCommand({refineCollectionShardKey: database.collection, key:{<existing Key>, <New Suffix1>: <1|""hashed">,...}}) 21% 15% 64% Shard A Shard B Shard C ▪ Refine at any time ▪ No Database downtime Refining a collection's shard key improves data distribution and resolves issues caused by insufficient cardinality leading to jumbo chunks.
  • 30. Refinable Shard Key (v4.4) Shard Key: vehical_no Re ning Shard Key db.adminCommand({refineCollectionShardKey: "mydb.test", key: {vehical_no: 1, user_mnumber: "hashed"}}) Avoid changing the range or hashed type for any existing shard key fields, as it can lead to inconsistencies in data. For instance, refrain from changing a shard key such as { vehicle_no: 1 } to { vehicle_no: "hashed", order_id: 1 }. ▪ For refining shard keys, your cluster must have a version of at least 4.4 and a feature compatibility version of 4.4. ▪ Retain the same prefix when defining the new shard key, i.e., it must begin with the same field(s) as the existing shard key. ▪ When refining shard keys, additional fields can only be added as suffixes to the existing shard key. ▪ To support the modified shard key, it is necessary to create a new index. ▪ Prior to executing the refineCollectionShardKey command, it is essential to stop the balancer. ▪ sh.status to see the status Guidelines for Refining Shard Keys
  • 31. Compound Hashed Shard Key (v4.4) 21% 15% 64% Shard A Shard B Shard C Existing Shard Key: vehical_no New Shard Key: vehical_no, user_mnumber sh.shardCollection( "test.order", {"vehical_no": 1, "user_mnumber": "hashed"}) sh.shardCollection( "test.order", {"vehical_no": "hashed", "user_mnumber": 1}) ▪ Overcome Monotonicall increase key
  • 32. Live Resharding(v5.0) Resharding without downtime Any Combinations Change Compound Hash Range Range Range Range Hash
  • 33. Resharding Process Flow ▪ Before starting a sharding operation on a collection of 1 TB size, it is recommended to have a minimum of 1.2 TB of free storage. ▪ I/O: Ensure that your I/O capacity is below 50%. ▪ CPU load: Ensure your CPU load is below 80%. Rewrite your application's queries to use both the current shard key and the new shard key rewrite your application's queries to use the new shard key without reload Monitor the resharding process, use a $currentOp pipeline stage Deploy your rewritten application
  • 34. Resharding Who's Donor and Recipients • Donor are shards which currently own chunks of the sharded collection • Recipients are shards which would own chunks of the sharded collection according to the new shard key and zones
  • 35. Resharding Internal Process Flow Commit Phase Clone, Apply, and Catch-up Phase Index Phase Initialization Phase The balancer determines the new data distribution for the sharded collection. A new empty sharded collection, with the same collection options as the original one, is created by each shard recipient. This new collection serves as the target for the new data written by the recipient shards. Each shard recipient builds the necessary new indexes. • Each recipient of a shard makes a copy of the initial documents that it would be responsible for under the new shard key • Each shard recipient begins applying oplog entries from operations that happened after the recipient cloned the data. • When all shards have reached strict consistency, the resharding coordinator commits the resharding operation and installs the new routing table. • The resharding coordinator instructs each donor and recipient shard primary, independently, to rename the temporary sharded collection. The temporary collection becomes the new resharded collection • Each donor shard drops the old sharded collection.
  • 36. Resharding Process Command db.adminCommand({ reshardCollection: "mydb.test", key: {"vehical_no": 1, "user_mnumber": "hashed"} }) Start the resharding operation Monitor the resharding operation db.getSiblingDB("admin").aggregate([ { $currentOp: { allUsers: true, localOps: false } }, { $match: { type: "op", "originatingCommand.reshardCollection": "mydb.test" }}]) Abort resharding operation db.adminCommand({ abortReshardCollection: "mydb.test" })
  • 37. To summarize, what issue does this feature resolve? • Jumbo Chunks • Uneven Load Distribution • Decreased Query Performance Over Time by Scatter-gather queries
  • 38. Reach Us : Info@mydbops.com Thank You
  • 39. Reach Us : Info@mydbops.com Thank You Database End Of The Life MySQL 5.7 31 Oct 2023 MongoDB 4.2 30 April 2023 MongoDB 4.4 29 Feb 2024 PostgreSQL 11 9 Nov 2023