SlideShare a Scribd company logo
MongoDB Indexing and Query Optimizer Details Antoine Girbal Mongo FR March 23, 2011
What will we cover? Many details of how indexing and the query optimizer work
A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations.
We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge).
Much of the material will be presented through examples.
Diagrams are to aid understanding – some details will be left out.
Btree (conceptual diagram) 1 2 3 4 5 6 7 8 9 {_id:4,x:6}
Find One Document db.c.find( {x:6} ).limit( 1 )
Index {x:1}
Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
Find One Document > db.c.find( {x:6} ).limit( 1 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } } Uses a btree cursor to find the object. Index ranges are around a single value.
Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
Find One Document 1 2 3 4 5 6 6 6 9 6 ? {_id:4,x:6} Now we have duplicate x values
Find One Document 1 2 3 4 5 6 6 6 9 6 ? {_id:4,x:6}
Equality Match db.c.find( {x:6} )
Index {x:1}
Several documents to be returned
Equality Match 9 1 2 3 4 5 6 6 6 6 ? {_id:4,x:6} {_id:5,x:6} {_id:1,x:6}
Equality Match > db.c.find( {x:6} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }
Equality Match 1 2 3 4 5 6 6 6 9 6 ?
Full Document Matcher db.c.find( {x:6,y:1} )
Index {x:1}
Object content needs to be checked
Full Document Matcher 9 1 2 3 4 5 6 6 6 6 ? {y:4,x:6} {y:5,x:6} {y:1,x:6}
Full Document Matcher > db.c.find( {x:6,y:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } } Documents for all matching index keys are scanned, but only one document matched on non index keys.
Range Match db.c.find( {x:{$gte:4,$lte:7}} )
Index {x:1}
Range Match 8 1 2 3 4 5 6 7 9 4 <= ? <= 7
Range Match > db.c.find( {x:{$gte:4,$lte:7}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 4, &quot;nscannedObjects&quot; : 4, &quot;n&quot; : 4, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 4, 7 ] ] } }
Range Match 1 2 3 4 5 6 7 8 9
Exclusive Range Match db.c.find( {x:{$gt:4,$lt:7}} )
Index {x:1}
Range of index is same as inclusive range match
but boundaries are not scanned nor returned
Multikeys db.c.find( {x:{$gt:7}} )
Index {x:1}
documents contain lists with several values like [8,9].
Multikeys 1 2 3 4 5 6 7 9 ? > 7 {_id:4,x:[8,9]} 8
Multikeys > db.c.find( {x:{$gt:7}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 2, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 7, 1.7976931348623157e+308 ] ] } } All keys in valid range are scanned, but the matcher rejects duplicate documents making n == 1.
Multikeys 1 2 3 4 5 6 7 8 9
Range Types Explicit inequality db.c.find( {x:{$gt:4,$lt:7}} )
db.c.find( {x:{$gt:4}} )
db.c.find( {x:{$ne:4}} ) Regular expression prefix db.c.find( {x:/^a/} ) Data type db.c.find( {x:/a/} )
Range Types db.c.find( {x:/^a/} ) &quot;indexBounds&quot; : { &quot;x&quot; : [ [ &quot;a&quot;, &quot;b&quot; ], [ /^a/, /^a/ ] ] } 2 ranges scanned of 2 different types: string and regex
Range Types db.c.find( {x:/a/} ) &quot;indexBounds&quot; : { &quot;x&quot; : [ [ &quot;&quot;, { } ], [ /a/, /a/ ] ] } Here the index only helps to restrict type, not efficient in practice
Set Match db.c.find( {x:{$in:[3,6]}} )
Index {x:1}
Set Match 8 1 2 3 4 5 6 7 9 3 , 6
Set Match > db.c.find( {x:{$in:[3,6]}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1 multi&quot;, &quot;nscanned&quot; : 3, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 2, &quot;millis&quot; : 8, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 3, 3 ], [ 6, 6 ] ] }} Why is nscanned 3?  This is an algorithmic detail, when there are disjoint ranges for a key nscanned may be higher than the number of matching keys.
Set Match 1 2 3 4 5 6 7 8 9
All Match db.c.find( {x:{$all:[3,6]}} )
Index {x:1}
All Match 8 1 2 3 4 5 6 7 9 3 ? {_id:4,x:[3,6]}
All Match > db.c.find( {x:{$all:[3,6]}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 1, &quot;nscannedObjects&quot; : 1, &quot;n&quot; : 1, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 3, 3 ] ] } } The first entry in the $all match array is always used for index bounds.  Note this may not be the least numerous indexed value in the $all array.
All Match 1 2 3 4 5 6 7 8 9
Limit db.c.find( {x:{$lt:6},y:3} ).limit( 3 )
Index {x:1}
Limit 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
Limit > db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 4, &quot;nscannedObjects&quot; : 4, &quot;n&quot; : 3, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } Scan until three matches are found, then stop.
Skip db.c.find( {x:{$lt:6},y:3} ).skip( 3 )
Index {x:1}
Skip 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
Skip > db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } All skipped documents are scanned.
Sort db.c.find( {x:{$lt:6}} ).sort( {x:1} )
Index {x:1}
Sorting along index key uses index btree ordering
Sort 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 4, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } Find uses the btree cursor to easily sort data
Sort db.c.find( {x:{$lt:6}} ).sort( {y:1} )
Index {x:1}
Using non-indexed key to sort data will need to scan & order
Sort 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
Sort Results are sorted on the fly to match requested order.  The scanAndOrder field is only printed when its value is true. > db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 4, &quot;scanAndOrder&quot; : true, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } }
Sort and scanAndOrder With “scanAndOrder” sort, all documents must be touched even if there is a limit spec.
With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.
Count Count uses the same indexed but only scans in the index, not the object data in storage
With some operators the full document must be checked.  Some of these cases: $all

More Related Content

PPT
2011 Mongo FR - MongoDB introduction
PPT
Introduction to MongoDB
PPTX
Building a Scalable Inbox System with MongoDB and Java
PPT
Introduction to MongoDB
PPTX
PPTX
Indexing with MongoDB
PPT
Fast querying indexing for performance (4)
PDF
Indexing
2011 Mongo FR - MongoDB introduction
Introduction to MongoDB
Building a Scalable Inbox System with MongoDB and Java
Introduction to MongoDB
Indexing with MongoDB
Fast querying indexing for performance (4)
Indexing

What's hot (20)

PPTX
Webinar: Index Tuning and Evaluation
PDF
Indexing and Query Optimizer (Mongo Austin)
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
PPTX
MongoDB and Indexes - MUG Denver - 20160329
PPTX
Indexing Strategies to Help You Scale
PPTX
MongoDB - Aggregation Pipeline
PPTX
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
PPTX
Reducing Development Time with MongoDB vs. SQL
PPTX
Webinar: Exploring the Aggregation Framework
KEY
Schema Design with MongoDB
PPTX
Back to Basics Webinar 5: Introduction to the Aggregation Framework
KEY
MongoDB Aggregation Framework
PPTX
MongoDB Aggregation
PDF
Map/Confused? A practical approach to Map/Reduce with MongoDB
PDF
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
PPTX
Webinar: Back to Basics: Thinking in Documents
PDF
MongoDB Aggregation Framework
PDF
MongoDB Performance Tuning
PDF
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
PDF
Storing tree structures with MongoDB
Webinar: Index Tuning and Evaluation
Indexing and Query Optimizer (Mongo Austin)
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB and Indexes - MUG Denver - 20160329
Indexing Strategies to Help You Scale
MongoDB - Aggregation Pipeline
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Reducing Development Time with MongoDB vs. SQL
Webinar: Exploring the Aggregation Framework
Schema Design with MongoDB
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB Aggregation Framework
MongoDB Aggregation
Map/Confused? A practical approach to Map/Reduce with MongoDB
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Webinar: Back to Basics: Thinking in Documents
MongoDB Aggregation Framework
MongoDB Performance Tuning
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
Storing tree structures with MongoDB
Ad

Similar to 2011 Mongo FR - Indexing in MongoDB (20)

PPTX
Schema design with MongoDB (Dwight Merriman)
ODP
Cool bonsai cool - an introduction to ElasticSearch
PPTX
MongoDB
PPTX
Php 2
PPTX
Indexing and Query Optimizer (Aaron Staple)
PPTX
MongoDB's index and query optimize
PPT
Schema design short
PPTX
R meetup talk
PPT
Data Structure In C#
PPT
Arrays in c
PDF
Fazendo mágica com ElasticSearch
PDF
Fighting fraud: finding duplicates at scale
PPTX
Boost tour 1_44_0_all
PPT
PPTX
C to perl binding
PPT
Sencha Touch Intro
PDF
Mongo indexes
PPTX
Linq Sanjay Vyas
PDF
Program 4You are to write an efficient program that will read a di.pdf
PPT
Scientific Computing with Python Webinar --- May 22, 2009
Schema design with MongoDB (Dwight Merriman)
Cool bonsai cool - an introduction to ElasticSearch
MongoDB
Php 2
Indexing and Query Optimizer (Aaron Staple)
MongoDB's index and query optimize
Schema design short
R meetup talk
Data Structure In C#
Arrays in c
Fazendo mágica com ElasticSearch
Fighting fraud: finding duplicates at scale
Boost tour 1_44_0_all
C to perl binding
Sencha Touch Intro
Mongo indexes
Linq Sanjay Vyas
Program 4You are to write an efficient program that will read a di.pdf
Scientific Computing with Python Webinar --- May 22, 2009
Ad

2011 Mongo FR - Indexing in MongoDB

  • 1. MongoDB Indexing and Query Optimizer Details Antoine Girbal Mongo FR March 23, 2011
  • 2. What will we cover? Many details of how indexing and the query optimizer work
  • 3. A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations.
  • 4. We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge).
  • 5. Much of the material will be presented through examples.
  • 6. Diagrams are to aid understanding – some details will be left out.
  • 7. Btree (conceptual diagram) 1 2 3 4 5 6 7 8 9 {_id:4,x:6}
  • 8. Find One Document db.c.find( {x:6} ).limit( 1 )
  • 10. Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
  • 11. Find One Document > db.c.find( {x:6} ).limit( 1 ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 1, &quot;nscannedObjects&quot; : 1, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 6, 6 ] ] } } Uses a btree cursor to find the object. Index ranges are around a single value.
  • 12. Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
  • 13. Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
  • 14. Find One Document 1 2 3 4 5 6 6 6 9 6 ? {_id:4,x:6} Now we have duplicate x values
  • 15. Find One Document 1 2 3 4 5 6 6 6 9 6 ? {_id:4,x:6}
  • 18. Several documents to be returned
  • 19. Equality Match 9 1 2 3 4 5 6 6 6 6 ? {_id:4,x:6} {_id:5,x:6} {_id:1,x:6}
  • 20. Equality Match > db.c.find( {x:6} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 3, &quot;nscannedObjects&quot; : 3, &quot;n&quot; : 3, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 6, 6 ] ] } }
  • 21. Equality Match 1 2 3 4 5 6 6 6 9 6 ?
  • 22. Full Document Matcher db.c.find( {x:6,y:1} )
  • 24. Object content needs to be checked
  • 25. Full Document Matcher 9 1 2 3 4 5 6 6 6 6 ? {y:4,x:6} {y:5,x:6} {y:1,x:6}
  • 26. Full Document Matcher > db.c.find( {x:6,y:1} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 3, &quot;nscannedObjects&quot; : 3, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 6, 6 ] ] } } Documents for all matching index keys are scanned, but only one document matched on non index keys.
  • 27. Range Match db.c.find( {x:{$gte:4,$lte:7}} )
  • 29. Range Match 8 1 2 3 4 5 6 7 9 4 <= ? <= 7
  • 30. Range Match > db.c.find( {x:{$gte:4,$lte:7}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 4, &quot;nscannedObjects&quot; : 4, &quot;n&quot; : 4, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 4, 7 ] ] } }
  • 31. Range Match 1 2 3 4 5 6 7 8 9
  • 32. Exclusive Range Match db.c.find( {x:{$gt:4,$lt:7}} )
  • 34. Range of index is same as inclusive range match
  • 35. but boundaries are not scanned nor returned
  • 38. documents contain lists with several values like [8,9].
  • 39. Multikeys 1 2 3 4 5 6 7 9 ? > 7 {_id:4,x:[8,9]} 8
  • 40. Multikeys > db.c.find( {x:{$gt:7}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 2, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 7, 1.7976931348623157e+308 ] ] } } All keys in valid range are scanned, but the matcher rejects duplicate documents making n == 1.
  • 41. Multikeys 1 2 3 4 5 6 7 8 9
  • 42. Range Types Explicit inequality db.c.find( {x:{$gt:4,$lt:7}} )
  • 44. db.c.find( {x:{$ne:4}} ) Regular expression prefix db.c.find( {x:/^a/} ) Data type db.c.find( {x:/a/} )
  • 45. Range Types db.c.find( {x:/^a/} ) &quot;indexBounds&quot; : { &quot;x&quot; : [ [ &quot;a&quot;, &quot;b&quot; ], [ /^a/, /^a/ ] ] } 2 ranges scanned of 2 different types: string and regex
  • 46. Range Types db.c.find( {x:/a/} ) &quot;indexBounds&quot; : { &quot;x&quot; : [ [ &quot;&quot;, { } ], [ /a/, /a/ ] ] } Here the index only helps to restrict type, not efficient in practice
  • 47. Set Match db.c.find( {x:{$in:[3,6]}} )
  • 49. Set Match 8 1 2 3 4 5 6 7 9 3 , 6
  • 50. Set Match > db.c.find( {x:{$in:[3,6]}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1 multi&quot;, &quot;nscanned&quot; : 3, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 2, &quot;millis&quot; : 8, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 3, 3 ], [ 6, 6 ] ] }} Why is nscanned 3? This is an algorithmic detail, when there are disjoint ranges for a key nscanned may be higher than the number of matching keys.
  • 51. Set Match 1 2 3 4 5 6 7 8 9
  • 52. All Match db.c.find( {x:{$all:[3,6]}} )
  • 54. All Match 8 1 2 3 4 5 6 7 9 3 ? {_id:4,x:[3,6]}
  • 55. All Match > db.c.find( {x:{$all:[3,6]}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 1, &quot;nscannedObjects&quot; : 1, &quot;n&quot; : 1, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 3, 3 ] ] } } The first entry in the $all match array is always used for index bounds. Note this may not be the least numerous indexed value in the $all array.
  • 56. All Match 1 2 3 4 5 6 7 8 9
  • 59. Limit 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
  • 60. Limit > db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 4, &quot;nscannedObjects&quot; : 4, &quot;n&quot; : 3, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } Scan until three matches are found, then stop.
  • 63. Skip 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
  • 64. Skip > db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } All skipped documents are scanned.
  • 65. Sort db.c.find( {x:{$lt:6}} ).sort( {x:1} )
  • 67. Sorting along index key uses index btree ordering
  • 68. Sort 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
  • 69. Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 4, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } Find uses the btree cursor to easily sort data
  • 70. Sort db.c.find( {x:{$lt:6}} ).sort( {y:1} )
  • 72. Using non-indexed key to sort data will need to scan & order
  • 73. Sort 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
  • 74. Sort Results are sorted on the fly to match requested order. The scanAndOrder field is only printed when its value is true. > db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 4, &quot;scanAndOrder&quot; : true, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } }
  • 75. Sort and scanAndOrder With “scanAndOrder” sort, all documents must be touched even if there is a limit spec.
  • 76. With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.
  • 77. Count Count uses the same indexed but only scans in the index, not the object data in storage
  • 78. With some operators the full document must be checked. Some of these cases: $all
  • 79. $size
  • 81. Negation - $ne, $nin, $not, etc. With current semantics, all multikey elements must match negation constraints Multikey de duplication works without loading full document
  • 82. Covered Indexes db.c.find( {x:6}, {x:1,_id:0} )
  • 83. Index {x:1} Id would be returned by default, but isn’t in the index so we need to exclude to return only indexed fields.
  • 84. Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 1, &quot;nscannedObjects&quot; : 1, &quot;n&quot; : 1, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : true, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 6, 6 ] ] } } IndexOnly is true, and isMultiKey must be false. Currently we set isMultiKey to true the first time we save a doc where the field is a multikey array.
  • 85. Two Equality Bounds db.c.find( {x:5,y:’c’} )
  • 87. Two Equality Bounds ? 5 c 1 b 3 d 4 g 5 d 5 f 6 c 7 a 9 b 5 c
  • 88. Two Equality Bounds > db.c.find( {x:5,y:'c'} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1_y_1&quot;, &quot;nscanned&quot; : 1, &quot;nscannedObjects&quot; : 1, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 5, 5 ] ], &quot;y&quot; : [ [ &quot;c&quot;, &quot;c&quot; ] ]}} 2 Ranges applied to narrow down the data to scan.
  • 89. Two Equality Bounds ? 1 b 3 d 4 g 5 c 5 d 5 f 5 c 6 c 7 a 9 b
  • 90. Two Set Bounds db.c.find( {x:{$in:[5,9]},y:{$in:[’c’,’f’]}} )
  • 92. Two Set Bounds , , , 5 c 1 b 3 d 4 g 5 d 5 f 6 c 7 a 9 f 5 c 5 f 9 c 9 f
  • 93. Two Set Bounds > db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1_y_1 multi&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 3, &quot;n&quot; : 3, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, ... &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 5, 5 ], [ 9, 9 ] ], &quot;y&quot; : [ [ &quot;c&quot;, &quot;c&quot; ], [ &quot;f&quot;, &quot;f&quot; ] ] } }
  • 94. Disjoint $or Criteria db.c.find( {$or:[{x:5},{y:’d’}]} )
  • 96. Does 2 sequential find for each clause
  • 97. Must not return same document twice, so it checks whether it satisfies previous clause
  • 98. Disjoint $or Criteria ? ? 1 b 3 d 4 g 5 d 6 a 7 e 9 f 5 c d 7 g 5 1 b 3 d 4 g 5 d 6 a 7 e 9 f 5 c 7 g
  • 99. Disjoint $or Criteria > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { &quot;clauses&quot; : [ { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 2, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 2, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 5, 5 ] ] } }, { &quot;cursor&quot; : &quot;BtreeCursor y_1&quot;, &quot;nscanned&quot; : 2, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;y&quot; : [ [ &quot;d&quot;, &quot;d&quot; ] ] } }], &quot;nscanned&quot; : 4, &quot;nscannedObjects&quot; : 4, &quot;n&quot; : 3, &quot;millis&quot; : 1}
  • 100. Unindexed $or Clause db.c.find( {$or:[{x:5},{y:’d’}]} )
  • 101. Index {x:1} (no index on y)
  • 102. Unindexed $or Clause > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { &quot;cursor&quot; : &quot;BasicCursor&quot;, &quot;nscanned&quot; : 9, &quot;nscannedObjects&quot; : 9, &quot;n&quot; : 3, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { } } Since y is not indexed, we must do a full collection scan to match y:’d’. Since a full scan is required, we don’t use the index on x to match x:5.
  • 103. Automatic Index Selection (Query Optimizer)
  • 104. Optimal Index find( {x:5} ) Index {x:1}
  • 105. Index {x:1,y:1} find( {x:5} ).sort( {y:1 } ) Index {x:1,y:1} find( {} ).sort( {x:1} ) Index {x:1} find( {x:{$gt:1,$lt:7}} ).sort( {x:1} ) Index {x:1}
  • 106. Optimal Index Rule of Thumb No scanAndOrder
  • 107. All fields with index useful constraints are indexed
  • 108. If there is a range or sort it is the last field of the index used to resolve the query If multiple optimal indexes exist, one chosen arbitrarily.
  • 109. Multiple Candidate Indexes find( {x:4,y:’a’} ) Index {x:1} or {y:1}? find( {x:4} ).sort( {y:1} ) Index {x:1} or {y:1}?
  • 110. Note: {x:1,y:1} is optimal find( {x:{$gt:2,$lt:7},y:{$gt:’a’,$lt:’f’}} ) Index {x:1,y:1} or {y:1,x:1}?
  • 111. Multiple Candidate Indexes The only index selection criterion is nscanned
  • 112. find( {x:4,y:’a’} ) Index {x:1} or {y:1} ?
  • 113. If fewer documents match {y:’a’} than {x:4} then nscanned for {y:1} will be less so we pick {y:1} find( {x:{$gt:2,$lt:7},y:{$gt:’b’,$lt:’f’}} ) Index {x:1,y:1} or {y:1,x:1} ?
  • 114. If fewer distinct values of 2 < x < 7 than distinct values of ‘b’ < y < ‘f’ then {x:1,y:1} chosen (rule of thumb)
  • 115. Multiple Candidate Indexes The only index selection criterion is nscanned
  • 116. Pretty good, but doesn’t cover every case, eg Overhead of using an index versus doing a collection scan
  • 117. Cost of scanAndOrder vs ordered index
  • 118. Cost of loading full document vs just index key
  • 119. Cost of scanning adjacent btree keys vs non adjacent keys/documents
  • 120. Competing Indexes At most one query plan per index
  • 121. Run in interleaved fashion
  • 122. Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.
  • 123. Competing Indexes Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query).
  • 124. We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.
  • 125. “ Learning” a Query Plan When an index is chosen for a query the query’s “pattern” and nscanned are recorded find( {x:3,y:’c’} ) {Pattern: {x:’equality’, y:’equality’}, Index: {x:1}, nscanned: 50}
  • 127. {Pattern: {x:’gt bound’, y:’lt bound’}, Index: {y:1}, nscanned: 500}
  • 128. “ Learning” a Query Plan When a new query matches the same pattern, the same query plan is used find( {x:5,y:’z’} ) Use index {x:1}
  • 131. “ Un-Learning” a Query Plan 100 writes to the collection
  • 132. Indexes added / removed
  • 133. Bad Plan Insurance If nscanned for a new query using a recorded plan is much worse than the recorded nscanned for an earlier query with the same pattern, we start interleaving other plans with the current plan.
  • 135. Thanks! Feature Requests jira.mongodb.org Support groups.google.com/group/mongodb-user