SlideShare a Scribd company logo
How Sitecore depends on MongoDB
for scalability and performance, and
what it can teach you
Antonios Giannopoulos
Database Administrator – ObjectRocket
Grant Killian
Sitecore Architect - Rackspace
Percona Live 2017
Agenda
We are going to discuss:
Key terms
- Introduction to Sitecore
- Introduction to MongoDB
Best Practices for MongoDB with Sitecore
Scaling Sitecore
Benchmarks
Who We Are
Antonios Giannopoulos
Database Administrator w/ ObjectRocket
Grant Killian
Sitecore Architect w/ Rackspace
Sitecore MVP
Sitecore Architecture
Minimum necessary to understand this talk
Gartner Magic
Quadrant for
WCM (Web
Content
Management)
-Sept 2016
Sitecore is a framework for building websites...
How sitecore depends on mongo db for scalability and performance, and what it can teach you
How sitecore depends on mongo db for scalability and performance, and what it can teach you
Sitecore ♥ MongoDB because . . .
● Unstructured document model is a better fit for
Sitecore analytics vs traditional database rows
● ∞ scalability
● Introduces key flexibility to the system
○ HTTP Session state
○ Optional repository for other Sitecore modules
○ 100% replacement for SQL Server (experimental)
■ $$$
MongoDB replica-set
A group of mongod processes that maintain the same dataset
Replica sets provides:
- Redundancy
- High availability
- Scaling
MongoDB replica-set
Consists of at least 3 nodes
- Up to 50 nodes in 3.0 and higher
- 12 on previous versions
A replica-set node may be either:
- Primary
- Secondary
- Arbiter
MongoDB replica-set
Asynchronous replication
- Delay between PRI and SECs
- SECs pull and apply operations
Automatic failover
- If a PRI fails a SEC takes its place
MongoDB replica-set
Best Practices
- Odd number of members
- Use same server specs
- Reliable network connections
- Adjust the oplog accordingly
MongoDB Sharded Clusters
Consists of:
Mongos
- It’s a statement (query) router
- Connection interface for the driver - makes sharding transparent
Config Servers: Holds cluster metadata - location of the data
Shards: Contains a subset of the sharded data
MongoDB Sharded Clusters
MongoDB Sharded Clusters
Best Practices
- Deploy shards as replica-sets
- Reliable network connections
- But most important… pick a shard key
Undo a shard key might require downtime
MongoDB Sharded Clusters
What makes a good shard key:
- High Cardinality
- Not Null values
- Immutable field(s)
- Not Monotonically increased fields
- Even read/write distribution
- Even data distribution
- Read targeting/locality
Most important choose a shard key according to your application requirements
MongoDB Storage Engines
MongoDB version 3.0 and higher supports:
- MMAPv1
- WiredTiger
- RocksDB (Percona Server)
- In Memory (Percona Server)
- Fractal Tree (Percona Server)
Sitecore MongoDB Databases
1. Analytics - customer visit metrics (IP address, browser,pages…)
2. Tracking_contact - contact processing
3. Tracking_history - history worker queue for full rebuilds
4. Tracking_live - task queue for real-time processing
5. Private_session - “classic” http session state
6. Shared_session - meta http session state for contacts
(engagement state for livetime of interactions…)
For example . . .
Graphic courtesy of http://www.techphoria414.com
Scaling Sitecore – Separate Workloads
Move each Sitecore database to a separate instance
Sitecore uses different connection string per Database
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database
_name_" />
connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_databas
e_name_" />
Instances can be optimized according to their workload
Scaling Sitecore – Polyglot
Use a different storage engine per database:
- Different instances
- Sharded clusters, different storage engines per shard
Percona In-memory storage engine is a good fit for _sessions
- Based on the in-memory storage engine used in MongoDB Enterprise Edition
- _sessions data are not persistent
Scaling Sitecore - Sharding
What to shard:
- Large collections for capacity
- Busy collections for load distribution
How to pick a shard key:
- Collect a representative statement sample and identify statement patterns
- Pick a shard key that scales the workload/statements
- Meet sharding constraints
Scaling Sitecore - Sharding
From Sitecore documentation: “Sitecore calculates
diskspace sizing projections using 5KB per
interaction and 2.5KB per identified contact and
these two items make up 80% of the diskspace”
Sharding interaction and contact for capacity.
Scaling Sitecore - Sharding
Collection Interaction
Receives: Inserts, Queries and Updates
Read/Write Ratio: 60-40
Updates are using the _id
Queries are using:
"_id, ContactId” : 80%
"ContactId,_t”: 5%
"ContactId,ContactVisitIndex”: 15%
Scaling Sitecore - Sharding
Collection Interaction
Recommended shard key is _id:1 or _id:hashed
- Scale vast majority of statements
- But… few scatter-gather queries (around 20%)
{ContactId:1} is also decent, But:
- Updates on sharded collections MUST use the shard key (or {multi:true}) - _id an exception to that rule
- _id is generated by the application not the driver
- Potential for Jumbo chunks
Scaling Sitecore - Sharding
Collection Interaction
Choose your shard key according to your engine
- MMAP _id:1 or _id:hashed
- WiredTiger _id:1 or _id:hashed or ContactId:1
Sitecore may optimize sharding by including ContactId on the updates
Scaling Sitecore - Sharding
Collection Contacts
Receives: Inserts, Queries and Updates
Read/Write Ratio: 80-20
Updates are using the _id
Queries are using the _id (with additional fields)
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - Sharding
Collection Devices
Recommended shard key is _id:1 or _id:hashed
Collection ClassificationsMap
Recommended shard key is _id:1 or _id:hashed
Collection KeyBehaviorCache
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - Sharding
Collection GeoIps
Recommended shard key is _id:1 or _id:hashed
Collection OperationStatuses
Recommended shard key is _id:1 or _id:hashed
Collection ReferringSites
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - Sharding
{_id:1} vs {_id:hashed}
Client generated _id are monotonically increased thus “hashed”
added for randomness
Sitecore_id is a .NET UUID (Universally Unique Identifier) bundled
on BinData datatype
Example: "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")
Scaling Sitecore - Sharding
{_id:1} vs {_id:hashed}
You may use the uuidhelpers.js utility to convert _id to UUID
Download from: https://github.com/mongodb/mongo-csharp-
driver/blob/master/uuidhelpers.js
>doc = db.test.findOne()
{ "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==") }
>doc._id.toCSUUID()
CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")
Scaling Sitecore - Sharding
Use {_id:"hashed”} when you have an empty collection
Using numInitialChunks allows to pre-split and distribute empty chunks.
- Avoid chunk splits
- Avoid chunk moves
db.adminCommand( { shardCollection: <collection>, key: {_id:”hashed”} ,
numInitialChunks:<number>} ) , number < 8192 per shard.
Scaling Sitecore - Sharding
Use {_id:"hashed”} when you have an empty collection
Define numInitialChunks
Size= Collection size (in MB)/32
Count= Number of documents/125000
Limit= Number of shards*8192
numInitialChunks = Min(Max(Size, Count), Limit)
Scaling Sitecore - Sharding
Move Primary
Move each sitecore database to a different shard:
(analytics, tracking_live …)
db.runCommand( { movePrimary: <databaseName>, to: <newPrimaryShard> } )
Requires downtime for live databases
Scaling Sitecore – Secondary Reads
You can configure Secondary Reads from the driver (secondary or
secondaryPreferred)
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_da
tabase_name_?readPreference=secondary/>
In 3.4 maxStalenessSeconds was introduced to control stale reads
Specifies, in seconds, how stale a secondary can be before the client stops using
it for read operations
Scaling Sitecore – Secondary Reads
Use ReplicaSet Tags to target reads:
- Direct reads to specific replica set nodes
- Reduces availability
conf = rs.conf();
conf.members[0].tags = {"db": "analytics"}
rs.reconfig(conf)
Set readPreferenceTags on the connection string
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPref
erenceTags=analytics/>
Order matters when setting multiple tagsOrder matters
Scaling Sitecore – Multi Region
Challenges:
- Direct reads to the closest node
- Direct writes to the closest node
- Single database entity for reporting
- Minimum complexity
Scaling Sitecore – Multi Region
Replica Set:
- Target reads using nearest read concern
- Target reads using region based tags
- Writes must go to the Primary
- Requires at least one secondary per region
Scaling Sitecore – Multi Region
Sharded cluster:
- Target reads using nearest read concern
- Target reads using region based tags
- Requires at least one secondary per region
- Writes must go to the Primaries
- Tags or Zones are based on shard key ranges
- Add location to shard key as prefix – change the source code
Scaling Sitecore – Multi Region
Mongo to Mongo connector:
- Creates a pipeline from a MongoDB cluster to another
MongoDB cluster
- Reads and replicates oplog operations
- Easy deployment
mongo-connector -m <name:port> -t <name:port> -d <database>
Scaling Sitecore – Connector
oplog oplog
db.Insert.foo ({a:1})
db.Insert.foo ({_id:1, a:1})
{ "ts" : Timestamp(), "h" :
NumLong(), "v" : 2, "op" :
"i", "ns”:”foo.foo”, "o" : {
"_id" : 1, a:1}
Scaling Sitecore – Multi Region
Mongo to Mongo Connector
Scaling Sitecore – Multi Region
Mongo to Mongo Connector
Scaling Sitecore – Multi Region
Mongo to Mongo Connector
Benchmarks
Benchmark 1: Single/Replica set MMAP vs Single shard/Replica set
WiredTiger (3.2.8)
Results: WiredTiger is 9.5% faster
Benchmark 2: Sharded cluster MMAP vs Sharded cluster
WiredTiger (Analytics sharded on {_id:1})
Results: WiredTiger is 9.4% faster
So what?
- Evaluate your MongoDB architecture to determine if it
would benefit from scaling
- If scaling is in order, consider this talk as a
reference
- Recognize how MongoDB’s versatility makes it
relevant to a wide variety of applications
Whats next?
- Test MongoRocks (Percona Server) against Sitecore
- Test In-Memory (Percona Server) for sessions or
cache(s)
- Expand sharding recommendations on add-ons
- Evaluate other Sitecore modules for suitability with
MongoDB
- Re-invent our benchmarks
We’re Hiring!
Looking to join a dynamic & innovative team?
Justine is here at Percona Live 2017,
Reach out directly to our Recruiter at justine.marmolejo@rackspace.com
Questions?
Thank you!!!
antonios.giannopoulos@rackspace.co.uk
@iamantonios
🍍
grant.killian@rackspace.com
@sitecoreagent

More Related Content

PDF
How To Connect Spark To Your Own Datasource
PDF
MongoDB Europe 2016 - Big Data meets Big Compute
PPTX
MongoDB and Spark
PDF
Webinar: Schema Patterns and Your Storage Engine
PDF
Maintenance for MongoDB Replica Sets
PDF
How to scale MongoDB
PPTX
High Performance Applications with MongoDB
PPTX
MongoDB and Hadoop: Driving Business Insights
How To Connect Spark To Your Own Datasource
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB and Spark
Webinar: Schema Patterns and Your Storage Engine
Maintenance for MongoDB Replica Sets
How to scale MongoDB
High Performance Applications with MongoDB
MongoDB and Hadoop: Driving Business Insights

What's hot (20)

PDF
MongoDB HA - what can go wrong
PDF
MongodB Internals
PDF
Enhancing the default MongoDB Security
PPTX
Building Spring Data with MongoDB
PDF
Exploring the replication and sharding in MongoDB
PDF
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
PDF
Working with MongoDB as MySQL DBA
PPTX
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
PPTX
Advanced Sharding Features in MongoDB 2.4
PPTX
Webinar: Best Practices for Getting Started with MongoDB
PPTX
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
PPTX
MongoDB 2.4 and spring data
PPTX
2014 05-07-fr - add dev series - session 6 - deploying your application-2
PPTX
MongoDB et Hadoop
PPT
Migrating to MongoDB: Best Practices
PPT
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
PPT
5 Pitfalls to Avoid with MongoDB
PPTX
MongoDB + Spring
PDF
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB HA - what can go wrong
MongodB Internals
Enhancing the default MongoDB Security
Building Spring Data with MongoDB
Exploring the replication and sharding in MongoDB
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
Working with MongoDB as MySQL DBA
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
Joins and Other MongoDB 3.2 Aggregation Enhancements
Advanced Sharding Features in MongoDB 2.4
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
MongoDB 2.4 and spring data
2014 05-07-fr - add dev series - session 6 - deploying your application-2
MongoDB et Hadoop
Migrating to MongoDB: Best Practices
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
5 Pitfalls to Avoid with MongoDB
MongoDB + Spring
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
Ad

Similar to How sitecore depends on mongo db for scalability and performance, and what it can teach you (20)

KEY
2011 mongo sf-scaling
PPT
MongoDB Sharding Webinar 2014
PPTX
Agility and Scalability with MongoDB
PPTX
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
PPTX
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
PPTX
Sharding Methods for MongoDB
PPTX
Back tobasicswebinar part6-rev.
PPT
Everything You Need to Know About Sharding
PPTX
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
PPTX
Sharding Methods for MongoDB
PDF
Шардинг в MongoDB, Henrik Ingo (MongoDB)
PDF
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
PPT
MongoDB Knowledge Shareing
PPTX
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
PPTX
Ops Jumpstart: MongoDB Administration 101
PDF
Introduction to Sharding
PPTX
MongoDB IoT City Tour EINDHOVEN: Sharding in MongoDB
KEY
Sharding with MongoDB (Eliot Horowitz)
PDF
Cignex mongodb-sharding-mongodbdays
KEY
Discover MongoDB - Israel
2011 mongo sf-scaling
MongoDB Sharding Webinar 2014
Agility and Scalability with MongoDB
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Sharding Methods for MongoDB
Back tobasicswebinar part6-rev.
Everything You Need to Know About Sharding
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Sharding Methods for MongoDB
Шардинг в MongoDB, Henrik Ingo (MongoDB)
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Knowledge Shareing
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
Ops Jumpstart: MongoDB Administration 101
Introduction to Sharding
MongoDB IoT City Tour EINDHOVEN: Sharding in MongoDB
Sharding with MongoDB (Eliot Horowitz)
Cignex mongodb-sharding-mongodbdays
Discover MongoDB - Israel
Ad

More from Antonios Giannopoulos (15)

PDF
Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
PDF
Using MongoDB with Kafka - Use Cases and Best Practices
PPTX
Sharding in MongoDB 4.2 #what_is_new
PPTX
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
PDF
Managing data and operation distribution in MongoDB
PDF
Upgrading to MongoDB 4.0 from older versions
PDF
How to upgrade to MongoDB 4.0 - Percona Europe 2018
PDF
Elastic 101 tutorial - Percona Europe 2018
PDF
Triggers in MongoDB
PPTX
Sharded cluster tutorial
PPTX
MongoDB – Sharded cluster tutorial - Percona Europe 2017
PDF
Percona Live 2017 ­- Sharded cluster tutorial
PDF
Antonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
PPTX
Introduction to Polyglot Persistence
PDF
MongoDB Sharding Fundamentals
Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Using MongoDB with Kafka - Use Cases and Best Practices
Sharding in MongoDB 4.2 #what_is_new
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
Managing data and operation distribution in MongoDB
Upgrading to MongoDB 4.0 from older versions
How to upgrade to MongoDB 4.0 - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018
Triggers in MongoDB
Sharded cluster tutorial
MongoDB – Sharded cluster tutorial - Percona Europe 2017
Percona Live 2017 ­- Sharded cluster tutorial
Antonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
Introduction to Polyglot Persistence
MongoDB Sharding Fundamentals

Recently uploaded (20)

PPTX
An Unlikely Response 08 10 2025.pptx
PPTX
Intro to ISO 9001 2015.pptx wareness raising
PPTX
Tablets And Capsule Preformulation Of Paracetamol
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
PPTX
_ISO_Presentation_ISO 9001 and 45001.pptx
PPTX
Hydrogel Based delivery Cancer Treatment
PPTX
Project and change Managment: short video sequences for IBA
PPTX
The Effect of Human Resource Management Practice on Organizational Performanc...
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PPTX
Non-Verbal-Communication .mh.pdf_110245_compressed.pptx
PPTX
Human Mind & its character Characteristics
PPTX
Impressionism_PostImpressionism_Presentation.pptx
DOC
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
PPTX
2025-08-10 Joseph 02 (shared slides).pptx
PPTX
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
PDF
oil_refinery_presentation_v1 sllfmfls.pdf
PPTX
Tour Presentation Educational Activity.pptx
PDF
Nykaa-Strategy-Case-Fixing-Retention-UX-and-D2C-Engagement (1).pdf
PDF
Instagram's Product Secrets Unveiled with this PPT
An Unlikely Response 08 10 2025.pptx
Intro to ISO 9001 2015.pptx wareness raising
Tablets And Capsule Preformulation Of Paracetamol
Swiggy’s Playbook: UX, Logistics & Monetization
_ISO_Presentation_ISO 9001 and 45001.pptx
Hydrogel Based delivery Cancer Treatment
Project and change Managment: short video sequences for IBA
The Effect of Human Resource Management Practice on Organizational Performanc...
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Non-Verbal-Communication .mh.pdf_110245_compressed.pptx
Human Mind & its character Characteristics
Impressionism_PostImpressionism_Presentation.pptx
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
2025-08-10 Joseph 02 (shared slides).pptx
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
oil_refinery_presentation_v1 sllfmfls.pdf
Tour Presentation Educational Activity.pptx
Nykaa-Strategy-Case-Fixing-Retention-UX-and-D2C-Engagement (1).pdf
Instagram's Product Secrets Unveiled with this PPT

How sitecore depends on mongo db for scalability and performance, and what it can teach you

  • 1. How Sitecore depends on MongoDB for scalability and performance, and what it can teach you Antonios Giannopoulos Database Administrator – ObjectRocket Grant Killian Sitecore Architect - Rackspace Percona Live 2017
  • 2. Agenda We are going to discuss: Key terms - Introduction to Sitecore - Introduction to MongoDB Best Practices for MongoDB with Sitecore Scaling Sitecore Benchmarks
  • 3. Who We Are Antonios Giannopoulos Database Administrator w/ ObjectRocket Grant Killian Sitecore Architect w/ Rackspace Sitecore MVP
  • 4. Sitecore Architecture Minimum necessary to understand this talk
  • 5. Gartner Magic Quadrant for WCM (Web Content Management) -Sept 2016
  • 6. Sitecore is a framework for building websites...
  • 9. Sitecore ♥ MongoDB because . . . ● Unstructured document model is a better fit for Sitecore analytics vs traditional database rows ● ∞ scalability ● Introduces key flexibility to the system ○ HTTP Session state ○ Optional repository for other Sitecore modules ○ 100% replacement for SQL Server (experimental) ■ $$$
  • 10. MongoDB replica-set A group of mongod processes that maintain the same dataset Replica sets provides: - Redundancy - High availability - Scaling
  • 11. MongoDB replica-set Consists of at least 3 nodes - Up to 50 nodes in 3.0 and higher - 12 on previous versions A replica-set node may be either: - Primary - Secondary - Arbiter
  • 12. MongoDB replica-set Asynchronous replication - Delay between PRI and SECs - SECs pull and apply operations Automatic failover - If a PRI fails a SEC takes its place
  • 13. MongoDB replica-set Best Practices - Odd number of members - Use same server specs - Reliable network connections - Adjust the oplog accordingly
  • 14. MongoDB Sharded Clusters Consists of: Mongos - It’s a statement (query) router - Connection interface for the driver - makes sharding transparent Config Servers: Holds cluster metadata - location of the data Shards: Contains a subset of the sharded data
  • 16. MongoDB Sharded Clusters Best Practices - Deploy shards as replica-sets - Reliable network connections - But most important… pick a shard key Undo a shard key might require downtime
  • 17. MongoDB Sharded Clusters What makes a good shard key: - High Cardinality - Not Null values - Immutable field(s) - Not Monotonically increased fields - Even read/write distribution - Even data distribution - Read targeting/locality Most important choose a shard key according to your application requirements
  • 18. MongoDB Storage Engines MongoDB version 3.0 and higher supports: - MMAPv1 - WiredTiger - RocksDB (Percona Server) - In Memory (Percona Server) - Fractal Tree (Percona Server)
  • 19. Sitecore MongoDB Databases 1. Analytics - customer visit metrics (IP address, browser,pages…) 2. Tracking_contact - contact processing 3. Tracking_history - history worker queue for full rebuilds 4. Tracking_live - task queue for real-time processing 5. Private_session - “classic” http session state 6. Shared_session - meta http session state for contacts (engagement state for livetime of interactions…)
  • 20. For example . . . Graphic courtesy of http://www.techphoria414.com
  • 21. Scaling Sitecore – Separate Workloads Move each Sitecore database to a separate instance Sitecore uses different connection string per Database connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database _name_" /> connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_databas e_name_" /> Instances can be optimized according to their workload
  • 22. Scaling Sitecore – Polyglot Use a different storage engine per database: - Different instances - Sharded clusters, different storage engines per shard Percona In-memory storage engine is a good fit for _sessions - Based on the in-memory storage engine used in MongoDB Enterprise Edition - _sessions data are not persistent
  • 23. Scaling Sitecore - Sharding What to shard: - Large collections for capacity - Busy collections for load distribution How to pick a shard key: - Collect a representative statement sample and identify statement patterns - Pick a shard key that scales the workload/statements - Meet sharding constraints
  • 24. Scaling Sitecore - Sharding From Sitecore documentation: “Sitecore calculates diskspace sizing projections using 5KB per interaction and 2.5KB per identified contact and these two items make up 80% of the diskspace” Sharding interaction and contact for capacity.
  • 25. Scaling Sitecore - Sharding Collection Interaction Receives: Inserts, Queries and Updates Read/Write Ratio: 60-40 Updates are using the _id Queries are using: "_id, ContactId” : 80% "ContactId,_t”: 5% "ContactId,ContactVisitIndex”: 15%
  • 26. Scaling Sitecore - Sharding Collection Interaction Recommended shard key is _id:1 or _id:hashed - Scale vast majority of statements - But… few scatter-gather queries (around 20%) {ContactId:1} is also decent, But: - Updates on sharded collections MUST use the shard key (or {multi:true}) - _id an exception to that rule - _id is generated by the application not the driver - Potential for Jumbo chunks
  • 27. Scaling Sitecore - Sharding Collection Interaction Choose your shard key according to your engine - MMAP _id:1 or _id:hashed - WiredTiger _id:1 or _id:hashed or ContactId:1 Sitecore may optimize sharding by including ContactId on the updates
  • 28. Scaling Sitecore - Sharding Collection Contacts Receives: Inserts, Queries and Updates Read/Write Ratio: 80-20 Updates are using the _id Queries are using the _id (with additional fields) Recommended shard key is _id:1 or _id:hashed
  • 29. Scaling Sitecore - Sharding Collection Devices Recommended shard key is _id:1 or _id:hashed Collection ClassificationsMap Recommended shard key is _id:1 or _id:hashed Collection KeyBehaviorCache Recommended shard key is _id:1 or _id:hashed
  • 30. Scaling Sitecore - Sharding Collection GeoIps Recommended shard key is _id:1 or _id:hashed Collection OperationStatuses Recommended shard key is _id:1 or _id:hashed Collection ReferringSites Recommended shard key is _id:1 or _id:hashed
  • 31. Scaling Sitecore - Sharding {_id:1} vs {_id:hashed} Client generated _id are monotonically increased thus “hashed” added for randomness Sitecore_id is a .NET UUID (Universally Unique Identifier) bundled on BinData datatype Example: "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")
  • 32. Scaling Sitecore - Sharding {_id:1} vs {_id:hashed} You may use the uuidhelpers.js utility to convert _id to UUID Download from: https://github.com/mongodb/mongo-csharp- driver/blob/master/uuidhelpers.js >doc = db.test.findOne() { "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==") } >doc._id.toCSUUID() CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")
  • 33. Scaling Sitecore - Sharding Use {_id:"hashed”} when you have an empty collection Using numInitialChunks allows to pre-split and distribute empty chunks. - Avoid chunk splits - Avoid chunk moves db.adminCommand( { shardCollection: <collection>, key: {_id:”hashed”} , numInitialChunks:<number>} ) , number < 8192 per shard.
  • 34. Scaling Sitecore - Sharding Use {_id:"hashed”} when you have an empty collection Define numInitialChunks Size= Collection size (in MB)/32 Count= Number of documents/125000 Limit= Number of shards*8192 numInitialChunks = Min(Max(Size, Count), Limit)
  • 35. Scaling Sitecore - Sharding Move Primary Move each sitecore database to a different shard: (analytics, tracking_live …) db.runCommand( { movePrimary: <databaseName>, to: <newPrimaryShard> } ) Requires downtime for live databases
  • 36. Scaling Sitecore – Secondary Reads You can configure Secondary Reads from the driver (secondary or secondaryPreferred) connectionString="mongodb://_mongo_server_01_:_port_number_/_session_da tabase_name_?readPreference=secondary/> In 3.4 maxStalenessSeconds was introduced to control stale reads Specifies, in seconds, how stale a secondary can be before the client stops using it for read operations
  • 37. Scaling Sitecore – Secondary Reads Use ReplicaSet Tags to target reads: - Direct reads to specific replica set nodes - Reduces availability conf = rs.conf(); conf.members[0].tags = {"db": "analytics"} rs.reconfig(conf) Set readPreferenceTags on the connection string connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPref erenceTags=analytics/> Order matters when setting multiple tagsOrder matters
  • 38. Scaling Sitecore – Multi Region Challenges: - Direct reads to the closest node - Direct writes to the closest node - Single database entity for reporting - Minimum complexity
  • 39. Scaling Sitecore – Multi Region Replica Set: - Target reads using nearest read concern - Target reads using region based tags - Writes must go to the Primary - Requires at least one secondary per region
  • 40. Scaling Sitecore – Multi Region Sharded cluster: - Target reads using nearest read concern - Target reads using region based tags - Requires at least one secondary per region - Writes must go to the Primaries - Tags or Zones are based on shard key ranges - Add location to shard key as prefix – change the source code
  • 41. Scaling Sitecore – Multi Region Mongo to Mongo connector: - Creates a pipeline from a MongoDB cluster to another MongoDB cluster - Reads and replicates oplog operations - Easy deployment mongo-connector -m <name:port> -t <name:port> -d <database>
  • 42. Scaling Sitecore – Connector oplog oplog db.Insert.foo ({a:1}) db.Insert.foo ({_id:1, a:1}) { "ts" : Timestamp(), "h" : NumLong(), "v" : 2, "op" : "i", "ns”:”foo.foo”, "o" : { "_id" : 1, a:1}
  • 43. Scaling Sitecore – Multi Region Mongo to Mongo Connector
  • 44. Scaling Sitecore – Multi Region Mongo to Mongo Connector
  • 45. Scaling Sitecore – Multi Region Mongo to Mongo Connector
  • 46. Benchmarks Benchmark 1: Single/Replica set MMAP vs Single shard/Replica set WiredTiger (3.2.8) Results: WiredTiger is 9.5% faster Benchmark 2: Sharded cluster MMAP vs Sharded cluster WiredTiger (Analytics sharded on {_id:1}) Results: WiredTiger is 9.4% faster
  • 47. So what? - Evaluate your MongoDB architecture to determine if it would benefit from scaling - If scaling is in order, consider this talk as a reference - Recognize how MongoDB’s versatility makes it relevant to a wide variety of applications
  • 48. Whats next? - Test MongoRocks (Percona Server) against Sitecore - Test In-Memory (Percona Server) for sessions or cache(s) - Expand sharding recommendations on add-ons - Evaluate other Sitecore modules for suitability with MongoDB - Re-invent our benchmarks
  • 49. We’re Hiring! Looking to join a dynamic & innovative team? Justine is here at Percona Live 2017, Reach out directly to our Recruiter at justine.marmolejo@rackspace.com