SlideShare a Scribd company logo
#mongodbdays




       Aggregation Framework
       Emily Stolfo
       Ruby Engineer/Evangelist, 10gen
       @EmStolfo




Tuesday, January 29, 13
Agenda
       • State of Aggregation
       • Pipeline
       • Usage and Limitations
       • Optimization
       • Sharding
       • (Expressions)
       • Looking Ahead



Tuesday, January 29, 13
State of Aggregation




Tuesday, January 29, 13
State of Aggregation
       • We're storing our data in MongoDB
       • We need to do ad-hoc reporting, grouping,
          common aggregations, etc.
       • What are we using for this?




Tuesday, January 29, 13
Data Warehousing

Tuesday, January 29, 13
Data Warehousing
       • SQL for reporting and analytics
       • Infrastructure complications
             – Additional maintenance
             – Data duplication
             – ETL processes
             – Real time?




Tuesday, January 29, 13
MapReduce

Tuesday, January 29, 13
MapReduce
       • Extremely versatile, powerful
       • Intended for complex data analysis
       • Overkill for simple aggregation tasks, such as
             – Averages
             – Summation
             – Grouping




Tuesday, January 29, 13
MapReduce in MongoDB
       • Implemented with JavaScript
             – Single-threaded
             – Difficult to debug

       • Concurrency
             – Appearance of parallelism
             – Write locks




Tuesday, January 29, 13
Aggregation Framework

Tuesday, January 29, 13
Aggregation Framework
       • Declared in JSON, executes in C++
       • Flexible, functional, and simple
             – Operation pipeline
             – Computational expressions

       • Works well with sharding




Tuesday, January 29, 13
Enabling Developers
       • Doing more within MongoDB, faster
       • Refactoring MapReduce and groupings
             – Replace pages of JavaScript
             – Longer aggregation pipelines

       • Quick aggregations from the shell




Tuesday, January 29, 13
Pipeline




Tuesday, January 29, 13
Pipeline
       • Process a stream of documents
             – Original input is a collection
             – Final output is a result document

       • Series of operators
             – Filter or transform data
             – Input/output chain


                          ps ax | grep mongod | head -n 1




Tuesday, January 29, 13
Pipeline Operators

                • $match     • $sort
                • $project   • $limit
                • $group     • $skip
                • $unwind




Tuesday, January 29, 13
Example book data
       {
           _id: 375,
           title: "The Great Gatsby",
           ISBN: "9781857150193",
           available: true,
           pages: 218,
           chapters: 9,
           subjects: [
              "Long Island",
              "New York",
              "1920s"
           ],
           language: "English"
       }




Tuesday, January 29, 13
$match
       • Filter documents
       • Uses existing query syntax
       • (No geospatial operations or $where)




Tuesday, January 29, 13
Matching Field Values
       {
                                        { $match: {
           title: "The Great Gatsby",
                                          language: "Russian"
           pages: 218,
                                        }}
           language: "English"
       }

       {
           title: "War and Peace",
                                        {
           pages: 1440,
                                            title: "War and Peace",
           language: "Russian"
                                            pages: 1440,
       }
                                            language: "Russian"
                                        }
       {
           title: "Atlas Shrugged",
           pages: 1088,
           language: "English"
       }




Tuesday, January 29, 13
Matching with Query Operators
       {                                { $match: {
           title: "The Great Gatsby",     pages: { $gt: 1000 }
           pages: 218,                  }}
           language: "English"
       }

       {                                {
           title: "War and Peace",          title: "War and Peace",
           pages: 1440,                     pages: 1440,
           language: "Russian"              language: "Russian"
       }                                }

       {                                {
           title: "Atlas Shrugged",         title: "Atlas Shrugged",
           pages: 1088,                     pages: 1088,
           language: "English"              language: "English"
       }                                }



Tuesday, January 29, 13
$project
       • Reshape documents
       • Include, exclude or rename fields
       • Inject computed fields
       • Create sub-document fields




Tuesday, January 29, 13
Including and Excluding Fields
      {                            { $project: {
          _id: 375,                  _id: 0,
          title: "Great Gatsby",     title: 1,
          ISBN: "9781857150193",     language: 1
          available: true,         }}
          pages: 218,
          subjects: [
             "Long Island",
             "New York",
             "1920s"               {
          ],                           title: " Great Gatsby",
          language: "English"          language: "English"
      }                            }




Tuesday, January 29, 13
Renaming and Computing Fields
       {                            { $project: {
           _id: 375,                  avgChapterLength: {
           title: "Great Gatsby",        $divide: ["$pages",
           ISBN: "9781857150193",              "$chapters"]
           available: true,           },
           pages: 218,                lang: "$language"
           chapters: 9,             }}
           subjects: [
              "Long Island",
              "New York",
              "1920s"               {
           ],                           _id: 375,
           language: "English"          avgChapterLength: 24.2222 ,
       }                                lang: "English"
                                    }




Tuesday, January 29, 13
Creating Sub-Document Fields
                                   { $project: {
      {
                                     title: 1,
          _id: 375,
                                     stats: {
          title: "Great Gatsby",
                                       pages: "$pages",
          ISBN: "9781857150193",
                                       language: "$language",
          available: true,
                                     }
          pages: 218,
                                   }}
          subjects: [
             "Long Island",
             "New York",
             "1920s"
                                   {
          ],
                                       _id: 375,
          language: "English"
                                       title: " Great Gatsby",
      }
                                       stats: {
                                         pages: 218,
                                         language: "English"
                                       }




Tuesday, January 29, 13
$group
       • Group documents by an ID
             – Field reference, object, constant

       • Other output fields are computed
             – $max, $min, $avg, $sum
             – $addToSet, $push
             – $first, $last

       • Processes all data in memory




Tuesday, January 29, 13
Calculating an Average
       {                                { $group: {
           title: "The Great Gatsby",     _id: "$language",
           pages: 218,                    avgPages: { $avg:
           language: "English"                    "$pages" }
       }                                }}

       {
           title: "War and Peace",
           pages: 1440,                 {
           language: "Russian"              _id: "Russian",
       }                                    avgPages: 1440
                                        }
       {
           title: "Atlas Shrugged",     {
           pages: 1088,                     _id: "English",
           language: "English"              avgPages: 653
       }                                }



Tuesday, January 29, 13
Summating Fields and Counting
       {                                { $group: {
           title: "The Great Gatsby",     _id: "$language",
           pages: 218,                    numTitles: { $sum: 1 },
           language: "English"            sumPages: { $sum: "$pages" }
                                        }}
       }

       {
           title: "War and Peace",      {
           pages: 1440,                     _id: "Russian",
           language: "Russian”              numTitles: 1,
       }                                    sumPages: 1440
                                        }
       {
                                        {
           title: "Atlas Shrugged",
                                            _id: "English",
           pages: 1088,                     numTitles: 2,
           language: "English"              sumPages: 1306
       }                                }




Tuesday, January 29, 13
Collecting Distinct Values
       {                                { $group: {
           title: "The Great Gatsby",     _id: "$language",
           pages: 218,                    titles: { $addToSet: "$title" }
           language: "English"          }}
       }

       {                                {
           title: "War and Peace",          _id: "Russian",
                                            titles: [ "War and Peace" ]
           pages: 1440,                 }
           language: "Russian"
       }
                                        {
                                            _id: "English",
       {                                    titles: [
           title: "Atlas Shrugged",           "Atlas Shrugged",
           pages: 1088,                       "The Great Gatsby"
           language: "English"              ]
                                        }
       }




Tuesday, January 29, 13
$unwind
       • Applied to an array field
       • Yield new documents for each array element
             – Array replaced by element value
             – Missing/empty fields → no output
             – Non-array fields → error

       • Pipe to $group to aggregate array values




Tuesday, January 29, 13
Yielding Multiple Documents from One
       {                                { $unwind: "$subjects" }
           title: "The Great Gatsby",
           ISBN: "9781857150193",
                                        {
           subjects: [
                                            title: "The Great Gatsby",
             "Long Island",                 ISBN: "9781857150193",
             "New York",                    subjects: "Long Island"
             "1920s"                    }
           ]
       }                                {
                                            title: "The Great Gatsby",
                                            ISBN: "9781857150193",
                                            subjects: "New York"
                                        }

                                        {
                                            title: "The Great Gatsby",
                                            ISBN: "9781857150193",
                                            subjects: "1920s"
                                        }



Tuesday, January 29, 13
$sort, $limit, $skip
       • Sort documents by one or more fields
             – Same order syntax as cursors
             – Waits for earlier pipeline operator to return
             – In-memory unless early and indexed

       • Limit and skip follow cursor behavior




Tuesday, January 29, 13
Sort All the Documents in the Pipeline

       { title: "The Great Gatsby" }    { $sort: { title: 1 }}
       { title: "Brave New World" }
       { title: "Grapes of Wrath" }     { title: "Animal Farm" }
       { title: "Animal Farm" }         { title: "Brave New World" }
       { title: "Lord of the Flies" }   { title: "Fahrenheit 451" }
       { title: "Fathers and Sons" }    { title: "Fathers and Sons" }
       { title: "Invisible Man" }       { title: "Grapes of Wrath" }
       { title: "Fahrenheit 451" }      { title: "Invisible Man" }
                                        { title: "Lord of the Flies" }
                                        { title: "The Great Gatsby" }




Tuesday, January 29, 13
Limit Documents Through the Pipeline

       { title: "The Great Gatsby" }    { $limit: 5 }
       { title: "Brave New World" }
       { title: "Grapes of Wrath" }     { title: "The Great Gatsby" }
       { title: "Animal Farm" }         { title: "Brave New World" }
       { title: "Lord of the Flies" }   { title: "Grapes of Wrath" }
       { title: "Fathers and Sons" }    { title: "Animal Farm" }
       { title: "Invisible Man" }       { title: "Lord of the Flies" }
       { title: "Fahrenheit 451" }




Tuesday, January 29, 13
Skip Over Documents in the Pipeline

       { title: "The Great Gatsby" }    { $skip: 5 }
       { title: "Brave New World" }
       { title: "Grapes of Wrath" }
       { title: "Animal Farm" }         { title: "Fathers and Sons" }
       { title: "Lord of the Flies" }   { title: "Invisible Man" }
       { title: "Fathers and Sons" }    { title: "Fahrenheit 451" }
       { title: "Invisible Man" }
       { title: "Fahrenheit 451" }




Tuesday, January 29, 13
Usage and Limitations




Tuesday, January 29, 13
Usage
       • collection.aggregate() method
             – Mongo shell
             – Most drivers

       • aggregate database command




Tuesday, January 29, 13
Collection
         db.books.aggregate([
           { $project: { language: 1 }},
           { $group: { _id: "$language", numTitles: { $sum: 1 }}}
         ])




         {
             result: [
                { _id: "Russian", numTitles: 1 },
                { _id: "English", numTitles: 2 }
             ],
             ok: 1
         }




Tuesday, January 29, 13
Database Command
         db.runCommand({
           aggregate: "books",
           pipeline: [
             { $project: { language: 1 }},
             { $group: { _id: "$language", numTitles: { $sum: 1 }}}
           ]
         })



         {
             result: [
                { _id: "Russian", numTitles: 1 },
                { _id: "English", numTitles: 2 }
             ],
             ok: 1
         }



Tuesday, January 29, 13
Limitations
       • Result limited by BSON document size
             – Final command result
             – Intermediate shard results

       • Pipeline operator memory limits
       • Some BSON types unsupported
             – Binary, Code, deprecated types




Tuesday, January 29, 13
Sharding




Tuesday, January 29, 13
Sharding
       • Split the pipeline at first $group or $sort
             – Shards execute pipeline up to that point
             – mongos merges results and continues

       • Early $match may excuse shards
       • CPU and memory implications for mongos




Tuesday, January 29, 13
Sharding
       [
           {   $match: { /* filter by shard key */ }},
           {   $project: { /* select fields  */ }},
           {   $group: { /* group by some field */ }},
           {   $sort: { /* sort by some field */ }},
           {   $project: { /* reshape result   */ }}
       ]




Tuesday, January 29, 13
Aggregation in a sharded cluster

Tuesday, January 29, 13
Expressions




Tuesday, January 29, 13
Expressions
       • Return computed values
       • Used with $project and $group
       • Reference fields using $ (e.g. "$x")
       • Expressions may be nested




Tuesday, January 29, 13
Boolean Operators
       • Input array of one or more values
             – $and, $or
             – Short-circuit logic

       • Invert values with $not
       • Evaluation of non-boolean types
             – null, undefined, zero ▶ false
             – Non-zero, strings, dates, objects ▶ true

                           { $and: [true, false] } ▶ false
                           { $or: ["foo", 0] } ▶ true
                           { $not: null       } ▶ true




Tuesday, January 29, 13
Comparison Operators
       • Compare numbers, strings, and dates
       • Input array with two operands
             – $cmp, $eq, $ne
             – $gt, $gte, $lt, $lte


                          {   $cmp: [3, 4]       } ▶ -1
                          {   $eq: ["foo", "bar"] } ▶ false
                          {   $ne: ["foo", "bar"] } ▶ true
                          {   $gt: [9, 7]      } ▶ true




Tuesday, January 29, 13
Arithmetic Operators
       • Input array of one or more numbers
             – $add, $multiply

       • Input array of two operands
             – $subtract, $divide, $mod


                          {   $add:    [1, 2, 3] } ▶ 6
                          {   $multiply: [2, 2, 2] } ▶ 8
                          {   $subtract: [10, 7] } ▶ 3
                          {   $divide: [10, 2] } ▶ 5
                          {   $mod:     [8, 3] } ▶ 2




Tuesday, January 29, 13
String Operators
       • $strcasecmp case-insensitive comparison
             – $cmp is case-sensitive

       • $toLower and $toUpper case change
       • $substr for sub-string extraction
       • Not encoding aware (assumes ASCII alphabet)

                          {   $strcasecmp:   
 ["foo", "bar"] } 
   ▶   
   1
                          {   $substr:  
      ["foo", 1, 2] } 
    ▶   
   "oo"
                          {   $toUpper: 
      "foo"        }
      ▶   
   "FOO"
                          {   $toLower: 
      "BAR"         }
     ▶   
   "bar"




Tuesday, January 29, 13
Date Operators
       • Extract values from date objects
             – $dayOfYear, $dayOfMonth, $dayOfWeek
             – $year, $month, $week
             – $hour, $minute, $second


               {   $year:   ISODate("2012-10-24T00:00:00.000Z") } ▶ 2012
               {   $month:    ISODate("2012-10-24T00:00:00.000Z") } ▶ 10
               {   $dayOfMonth: ISODate("2012-10-24T00:00:00.000Z") } ▶ 24
               {   $dayOfWeek: ISODate("2012-10-24T00:00:00.000Z") } ▶ 4
               {   $dayOfYear: ISODate("2012-10-24T00:00:00.000Z") } ▶ 299
               {   $week:    ISODate("2012-10-24T00:00:00.000Z") } ▶ 43




Tuesday, January 29, 13
Conditional Operators
       • $cond ternary operator
       • $ifNull



                          { $cond: [{ $eq: [1, 2] }, "same", "different"] } ▶ "different”

                          { $ifNull: ["foo", "bar"] } ▶ "foo"
                          { $ifNull: [null, "bar"] } ▶ "bar"




Tuesday, January 29, 13
Looking Ahead




Tuesday, January 29, 13
Framework Use Cases
       • Basic aggregation queries
       • Ad-hoc reporting
       • Real-time analytics
       • Visualizing time series data




Tuesday, January 29, 13
Extending the Framework
       • Adding new pipeline operators, expressions
       • $out and $tee for output control
             – https://jira.mongodb.org/browse/SERVER-3253




Tuesday, January 29, 13
Future Enhancements
       • Automatically move $match earlier if possible
       • Pipeline explain facility
       • Memory usage improvements
             – Grouping input sorted by _id
             – Sorting with limited output




Tuesday, January 29, 13
#mongodbdays




       Thank You
       Emily Stolfo
       Ruby Engineer/Evangelist, 10gen
       @EmStolfo




Tuesday, January 29, 13

More Related Content

PPTX
ETL for Pros: Getting Data Into MongoDB
PDF
Aggregation Framework MongoDB Days Munich
PPTX
Aggregation Framework
PDF
De normalised london aggregation framework overview
PPTX
Building Your First App with MongoDB
PPTX
Building Your First App with MongoDB
PDF
Building a MongoDB App with Perl
PPTX
Building Your First App with MongoDB
ETL for Pros: Getting Data Into MongoDB
Aggregation Framework MongoDB Days Munich
Aggregation Framework
De normalised london aggregation framework overview
Building Your First App with MongoDB
Building Your First App with MongoDB
Building a MongoDB App with Perl
Building Your First App with MongoDB

Similar to Aggregation Framework (7)

PPTX
Building Your First App: An Introduction to MongoDB
PPTX
Building Your First App: An Introduction to MongoDB
PDF
Persisting dynamic data with mongodb and mongomapper
PDF
Schema Design
PPTX
Introduction to MongoDB and Hadoop
PPTX
Introduction to Underscore.js
PDF
Mongo db
Building Your First App: An Introduction to MongoDB
Building Your First App: An Introduction to MongoDB
Persisting dynamic data with mongodb and mongomapper
Schema Design
Introduction to MongoDB and Hadoop
Introduction to Underscore.js
Mongo db
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
Ad

Aggregation Framework

  • 1. #mongodbdays Aggregation Framework Emily Stolfo Ruby Engineer/Evangelist, 10gen @EmStolfo Tuesday, January 29, 13
  • 2. Agenda • State of Aggregation • Pipeline • Usage and Limitations • Optimization • Sharding • (Expressions) • Looking Ahead Tuesday, January 29, 13
  • 4. State of Aggregation • We're storing our data in MongoDB • We need to do ad-hoc reporting, grouping, common aggregations, etc. • What are we using for this? Tuesday, January 29, 13
  • 6. Data Warehousing • SQL for reporting and analytics • Infrastructure complications – Additional maintenance – Data duplication – ETL processes – Real time? Tuesday, January 29, 13
  • 8. MapReduce • Extremely versatile, powerful • Intended for complex data analysis • Overkill for simple aggregation tasks, such as – Averages – Summation – Grouping Tuesday, January 29, 13
  • 9. MapReduce in MongoDB • Implemented with JavaScript – Single-threaded – Difficult to debug • Concurrency – Appearance of parallelism – Write locks Tuesday, January 29, 13
  • 11. Aggregation Framework • Declared in JSON, executes in C++ • Flexible, functional, and simple – Operation pipeline – Computational expressions • Works well with sharding Tuesday, January 29, 13
  • 12. Enabling Developers • Doing more within MongoDB, faster • Refactoring MapReduce and groupings – Replace pages of JavaScript – Longer aggregation pipelines • Quick aggregations from the shell Tuesday, January 29, 13
  • 14. Pipeline • Process a stream of documents – Original input is a collection – Final output is a result document • Series of operators – Filter or transform data – Input/output chain ps ax | grep mongod | head -n 1 Tuesday, January 29, 13
  • 15. Pipeline Operators • $match • $sort • $project • $limit • $group • $skip • $unwind Tuesday, January 29, 13
  • 16. Example book data { _id: 375, title: "The Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, chapters: 9, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } Tuesday, January 29, 13
  • 17. $match • Filter documents • Uses existing query syntax • (No geospatial operations or $where) Tuesday, January 29, 13
  • 18. Matching Field Values { { $match: { title: "The Great Gatsby", language: "Russian" pages: 218, }} language: "English" } { title: "War and Peace", { pages: 1440, title: "War and Peace", language: "Russian" pages: 1440, } language: "Russian" } { title: "Atlas Shrugged", pages: 1088, language: "English" } Tuesday, January 29, 13
  • 19. Matching with Query Operators { { $match: { title: "The Great Gatsby", pages: { $gt: 1000 } pages: 218, }} language: "English" } { { title: "War and Peace", title: "War and Peace", pages: 1440, pages: 1440, language: "Russian" language: "Russian" } } { { title: "Atlas Shrugged", title: "Atlas Shrugged", pages: 1088, pages: 1088, language: "English" language: "English" } } Tuesday, January 29, 13
  • 20. $project • Reshape documents • Include, exclude or rename fields • Inject computed fields • Create sub-document fields Tuesday, January 29, 13
  • 21. Including and Excluding Fields { { $project: { _id: 375, _id: 0, title: "Great Gatsby", title: 1, ISBN: "9781857150193", language: 1 available: true, }} pages: 218, subjects: [ "Long Island", "New York", "1920s" { ], title: " Great Gatsby", language: "English" language: "English" } } Tuesday, January 29, 13
  • 22. Renaming and Computing Fields { { $project: { _id: 375, avgChapterLength: { title: "Great Gatsby", $divide: ["$pages", ISBN: "9781857150193", "$chapters"] available: true, }, pages: 218, lang: "$language" chapters: 9, }} subjects: [ "Long Island", "New York", "1920s" { ], _id: 375, language: "English" avgChapterLength: 24.2222 , } lang: "English" } Tuesday, January 29, 13
  • 23. Creating Sub-Document Fields { $project: { { title: 1, _id: 375, stats: { title: "Great Gatsby", pages: "$pages", ISBN: "9781857150193", language: "$language", available: true, } pages: 218, }} subjects: [ "Long Island", "New York", "1920s" { ], _id: 375, language: "English" title: " Great Gatsby", } stats: { pages: 218, language: "English" } Tuesday, January 29, 13
  • 24. $group • Group documents by an ID – Field reference, object, constant • Other output fields are computed – $max, $min, $avg, $sum – $addToSet, $push – $first, $last • Processes all data in memory Tuesday, January 29, 13
  • 25. Calculating an Average { { $group: { title: "The Great Gatsby", _id: "$language", pages: 218, avgPages: { $avg: language: "English" "$pages" } } }} { title: "War and Peace", pages: 1440, { language: "Russian" _id: "Russian", } avgPages: 1440 } { title: "Atlas Shrugged", { pages: 1088, _id: "English", language: "English" avgPages: 653 } } Tuesday, January 29, 13
  • 26. Summating Fields and Counting { { $group: { title: "The Great Gatsby", _id: "$language", pages: 218, numTitles: { $sum: 1 }, language: "English" sumPages: { $sum: "$pages" } }} } { title: "War and Peace", { pages: 1440, _id: "Russian", language: "Russian” numTitles: 1, } sumPages: 1440 } { { title: "Atlas Shrugged", _id: "English", pages: 1088, numTitles: 2, language: "English" sumPages: 1306 } } Tuesday, January 29, 13
  • 27. Collecting Distinct Values { { $group: { title: "The Great Gatsby", _id: "$language", pages: 218, titles: { $addToSet: "$title" } language: "English" }} } { { title: "War and Peace", _id: "Russian", titles: [ "War and Peace" ] pages: 1440, } language: "Russian" } { _id: "English", { titles: [ title: "Atlas Shrugged", "Atlas Shrugged", pages: 1088, "The Great Gatsby" language: "English" ] } } Tuesday, January 29, 13
  • 28. $unwind • Applied to an array field • Yield new documents for each array element – Array replaced by element value – Missing/empty fields → no output – Non-array fields → error • Pipe to $group to aggregate array values Tuesday, January 29, 13
  • 29. Yielding Multiple Documents from One { { $unwind: "$subjects" } title: "The Great Gatsby", ISBN: "9781857150193", { subjects: [ title: "The Great Gatsby", "Long Island", ISBN: "9781857150193", "New York", subjects: "Long Island" "1920s" } ] } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "New York" } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "1920s" } Tuesday, January 29, 13
  • 30. $sort, $limit, $skip • Sort documents by one or more fields – Same order syntax as cursors – Waits for earlier pipeline operator to return – In-memory unless early and indexed • Limit and skip follow cursor behavior Tuesday, January 29, 13
  • 31. Sort All the Documents in the Pipeline { title: "The Great Gatsby" } { $sort: { title: 1 }} { title: "Brave New World" } { title: "Grapes of Wrath" } { title: "Animal Farm" } { title: "Animal Farm" } { title: "Brave New World" } { title: "Lord of the Flies" } { title: "Fahrenheit 451" } { title: "Fathers and Sons" } { title: "Fathers and Sons" } { title: "Invisible Man" } { title: "Grapes of Wrath" } { title: "Fahrenheit 451" } { title: "Invisible Man" } { title: "Lord of the Flies" } { title: "The Great Gatsby" } Tuesday, January 29, 13
  • 32. Limit Documents Through the Pipeline { title: "The Great Gatsby" } { $limit: 5 } { title: "Brave New World" } { title: "Grapes of Wrath" } { title: "The Great Gatsby" } { title: "Animal Farm" } { title: "Brave New World" } { title: "Lord of the Flies" } { title: "Grapes of Wrath" } { title: "Fathers and Sons" } { title: "Animal Farm" } { title: "Invisible Man" } { title: "Lord of the Flies" } { title: "Fahrenheit 451" } Tuesday, January 29, 13
  • 33. Skip Over Documents in the Pipeline { title: "The Great Gatsby" } { $skip: 5 } { title: "Brave New World" } { title: "Grapes of Wrath" } { title: "Animal Farm" } { title: "Fathers and Sons" } { title: "Lord of the Flies" } { title: "Invisible Man" } { title: "Fathers and Sons" } { title: "Fahrenheit 451" } { title: "Invisible Man" } { title: "Fahrenheit 451" } Tuesday, January 29, 13
  • 35. Usage • collection.aggregate() method – Mongo shell – Most drivers • aggregate database command Tuesday, January 29, 13
  • 36. Collection db.books.aggregate([ { $project: { language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ]) { result: [ { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 } ], ok: 1 } Tuesday, January 29, 13
  • 37. Database Command db.runCommand({ aggregate: "books", pipeline: [ { $project: { language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ] }) { result: [ { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 } ], ok: 1 } Tuesday, January 29, 13
  • 38. Limitations • Result limited by BSON document size – Final command result – Intermediate shard results • Pipeline operator memory limits • Some BSON types unsupported – Binary, Code, deprecated types Tuesday, January 29, 13
  • 40. Sharding • Split the pipeline at first $group or $sort – Shards execute pipeline up to that point – mongos merges results and continues • Early $match may excuse shards • CPU and memory implications for mongos Tuesday, January 29, 13
  • 41. Sharding [ { $match: { /* filter by shard key */ }}, { $project: { /* select fields */ }}, { $group: { /* group by some field */ }}, { $sort: { /* sort by some field */ }}, { $project: { /* reshape result */ }} ] Tuesday, January 29, 13
  • 42. Aggregation in a sharded cluster Tuesday, January 29, 13
  • 44. Expressions • Return computed values • Used with $project and $group • Reference fields using $ (e.g. "$x") • Expressions may be nested Tuesday, January 29, 13
  • 45. Boolean Operators • Input array of one or more values – $and, $or – Short-circuit logic • Invert values with $not • Evaluation of non-boolean types – null, undefined, zero ▶ false – Non-zero, strings, dates, objects ▶ true { $and: [true, false] } ▶ false { $or: ["foo", 0] } ▶ true { $not: null } ▶ true Tuesday, January 29, 13
  • 46. Comparison Operators • Compare numbers, strings, and dates • Input array with two operands – $cmp, $eq, $ne – $gt, $gte, $lt, $lte { $cmp: [3, 4] } ▶ -1 { $eq: ["foo", "bar"] } ▶ false { $ne: ["foo", "bar"] } ▶ true { $gt: [9, 7] } ▶ true Tuesday, January 29, 13
  • 47. Arithmetic Operators • Input array of one or more numbers – $add, $multiply • Input array of two operands – $subtract, $divide, $mod { $add: [1, 2, 3] } ▶ 6 { $multiply: [2, 2, 2] } ▶ 8 { $subtract: [10, 7] } ▶ 3 { $divide: [10, 2] } ▶ 5 { $mod: [8, 3] } ▶ 2 Tuesday, January 29, 13
  • 48. String Operators • $strcasecmp case-insensitive comparison – $cmp is case-sensitive • $toLower and $toUpper case change • $substr for sub-string extraction • Not encoding aware (assumes ASCII alphabet) { $strcasecmp: ["foo", "bar"] } ▶ 1 { $substr: ["foo", 1, 2] } ▶ "oo" { $toUpper: "foo" } ▶ "FOO" { $toLower: "BAR" } ▶ "bar" Tuesday, January 29, 13
  • 49. Date Operators • Extract values from date objects – $dayOfYear, $dayOfMonth, $dayOfWeek – $year, $month, $week – $hour, $minute, $second { $year: ISODate("2012-10-24T00:00:00.000Z") } ▶ 2012 { $month: ISODate("2012-10-24T00:00:00.000Z") } ▶ 10 { $dayOfMonth: ISODate("2012-10-24T00:00:00.000Z") } ▶ 24 { $dayOfWeek: ISODate("2012-10-24T00:00:00.000Z") } ▶ 4 { $dayOfYear: ISODate("2012-10-24T00:00:00.000Z") } ▶ 299 { $week: ISODate("2012-10-24T00:00:00.000Z") } ▶ 43 Tuesday, January 29, 13
  • 50. Conditional Operators • $cond ternary operator • $ifNull { $cond: [{ $eq: [1, 2] }, "same", "different"] } ▶ "different” { $ifNull: ["foo", "bar"] } ▶ "foo" { $ifNull: [null, "bar"] } ▶ "bar" Tuesday, January 29, 13
  • 52. Framework Use Cases • Basic aggregation queries • Ad-hoc reporting • Real-time analytics • Visualizing time series data Tuesday, January 29, 13
  • 53. Extending the Framework • Adding new pipeline operators, expressions • $out and $tee for output control – https://jira.mongodb.org/browse/SERVER-3253 Tuesday, January 29, 13
  • 54. Future Enhancements • Automatically move $match earlier if possible • Pipeline explain facility • Memory usage improvements – Grouping input sorted by _id – Sorting with limited output Tuesday, January 29, 13
  • 55. #mongodbdays Thank You Emily Stolfo Ruby Engineer/Evangelist, 10gen @EmStolfo Tuesday, January 29, 13