SlideShare a Scribd company logo
Understanding
Aggregation Framework
@ MongoDB 3.6
Matan Zohar
Disruptive Technologies Leader – Matrix
2
Why Aggregation Framework?
SELECT cust_id,ord_date,
SUM(price) AS total
FROM orders
GROUP BY cust_id,ord_date
HAVING total > 250
3
Our Standard Database Tools
GROUP BY HAVING
JOIN
WHERE
AVG
MIN MAX
SUM
COUNTSELECT
ORDER BY
4
We deserve Tools for Documents
$group
$abs$lookup $match $avg
$min
$max db.collection.aggregate()
$sum
db.collection.find() $sort
$mergeObjects
$count
$bucket
$limit
$project
$sample
$skip
$unwind
$eq
$divide
$cond
$exp
$concat
$log
$map
$reduce
$split $substr
$size $cmp $dateFromString $filter
5
What is the big deal?
ARTICLE
Name
Publish date
URL
Text
CATEGORY
Name
URL
TAG
Name
URL
COMMENT
Text
Date
Author
USER
Name
Email
ARTICLE
Name
Publish date
URL
Text
USER
Name
Email
COMMENT []
Text
Date
Author
TAG []
Name
URL
CATEGORY []
Name
URL
Relational Model Document Model
6
What is Aggregation Framework?
• Processing pipeline of stages, transforming the document into an aggregated result.
• Query process optimization, designed for a sharded cluster.
7
Get to know the Tools (Stages)
• $project – Change the document structure, select fields, remove fields, add newly
computed fields (SELECT).
• $lookup – Join two collections at a stage by an expression (JOIN / SUB QUERY)
• $match – Filters the documents according to a condition (WHERE / HAVING)
• $group – Groups the documents by an expression (GROUP BY)
• $sort – Sorts all input documents by selected fields and order (ORDER BY)
• $unwind – Flatten a hierarchal document (array of documents)
• $count – Counts the number of documents in the current stage
• $bucket – Categorizes groups of documents according to an expression.
8
How does it actually look?
SELECT cust_id, ord_date,
SUM(price) AS total
FROM orders
GROUP BY cust_id, ord_date
HAVING total > 250
db.orders.aggregate( [
{ $group: {
_id: {
cust_id: "$cust_id",
ord_date: {
month: { $month: "$ord_date" },
day: { $dayOfMonth: "$ord_date" },
year: { $year: "$ord_date"}
}
},
total: { $sum: "$price" }
}
},
{ $match: { total: { $gt: 250 } } }
] )
9
MongoDB 3.6 – What’s new ?
• $lookup – More expressive, now supports non equi-joins, subqueries!
That means improved performance for Analytics & BI Tools as more ops are pushed down to the database.
• $expr – You can use aggregation expressions within the query language!
Simpler code, can be used from db.collection.find(), but use with caution, does net yet fully leverages indexes.
Whenever in doubt use db.collection.aggregate().
• New Aggregation Operations:
– $arrayToObject
– $objectToArray
– $mergeObjects
– $dateFromString
– $dateFromParts
– $dateToParts
• $$REMOVE – New aggregation variable, allows for the conditional exclusion of a field.
• hint, comment – New options for aggregate command and the db.collection.aggregate() method.
• hint – An index to use for the aggregation (on the initial collection).
• comment – A string to help trace the operation in the database profiler, currentOp, and logs.
10
• Filter as soon as possible.
• Use indexes to improve sorts, matches, lookups.
• If performance is less than expected use explain to analyze the plan.
• Use hint when there is a better index that is not being used.
• Use projection to filter the subset of fields you need at the beginning,
the pipeline will take it in consideration and pass less data between stages.
Optimizing Aggregation Pipelines
11
Lets play with Compass
12
Lets play with Compass
13
Lets play with Compass
14
Lets play with Compass
15
Lets play with Compass
16
Lets play with Compass
17
Lets play with Compass
18
Lets play with Compass
19
Lets play with Compass
20
Lets play with Compass
21
Aggregation Pipeline & SQL
db.restaurants.aggregate([
{ $match: { name: "Riviera Caterer" }},
{ $project: { name: 1, cuisine: 1, borough: 1, "grades.score": 1 }},
{ $unwind: "$grades" }
])
22
Aggregation Pipeline & SQL
23
Aggregation Pipeline & SQL
24
Aggregation Pipeline & SQL
25
Aggregation Pipeline & SQL
26
Aggregation Pipeline & SQL
27
Aggregation Pipeline & SQL
28
Aggregation Pipeline & SQL
29
Aggregation Pipeline & SQL
30
Aggregation Pipeline & SQL
31

More Related Content

PPTX
PPTX
Building a horizontally scalable API in php
PDF
Indexing and Query Optimizer (Richard Kreuter)
PPTX
Apex collection patterns
PDF
Mongoseattle indexing-2010-07-27
PPT
20120518 power shell_文字處理及輕量測試
PDF
Apache avro and overview hadoop tools
PDF
Compass Framework
Building a horizontally scalable API in php
Indexing and Query Optimizer (Richard Kreuter)
Apex collection patterns
Mongoseattle indexing-2010-07-27
20120518 power shell_文字處理及輕量測試
Apache avro and overview hadoop tools
Compass Framework

What's hot (20)

PDF
RESTo - restful semantic search tool for geospatial
PPTX
Lucene in Action
PDF
1.4 data cleaning and manipulation in r and excel
PPTX
PostgreSQL's Secret NoSQL Superpowers
PPT
Hive - SerDe and LazySerde
PPTX
Avro introduction
PDF
Operations on rdd
PPTX
Android Lab Test : Reading the foot file list (english)
ODP
ODF Template Engine
PDF
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
PDF
Impetus White Paper- Handling Data Corruption in Elasticsearch
PPTX
13. CodeIgniter vederea inregistrarilor3
PDF
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
KEY
LibreCat::Catmandu
PPTX
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
PDF
Sql cheat sheet
PPTX
Hash table in java
PPT
بيانات الدولة 1
PDF
DBIx::Class introduction - 2010
PPTX
Webinar: Indexing and Query Optimization
RESTo - restful semantic search tool for geospatial
Lucene in Action
1.4 data cleaning and manipulation in r and excel
PostgreSQL's Secret NoSQL Superpowers
Hive - SerDe and LazySerde
Avro introduction
Operations on rdd
Android Lab Test : Reading the foot file list (english)
ODF Template Engine
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
Impetus White Paper- Handling Data Corruption in Elasticsearch
13. CodeIgniter vederea inregistrarilor3
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
LibreCat::Catmandu
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Sql cheat sheet
Hash table in java
بيانات الدولة 1
DBIx::Class introduction - 2010
Webinar: Indexing and Query Optimization
Ad

Similar to SH 2 - SES 3 - MongoDB Aggregation Framework.pptx (20)

PDF
Indexing and Query Optimizer (Mongo Austin)
PDF
Mongophilly indexing-2011-04-26
PPTX
MongoDB's New Aggregation framework
PDF
Indexing and Query Optimizer
ODP
Entity Query API
PDF
Experiment no 05
PPTX
MongoDB Aggregation MongoSF May 2011
PDF
Cost-based Query Optimization
PDF
Cost-Based query optimization
PDF
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
PDF
phoenix-on-calcite-hadoop-summit-2016
PPTX
fard car.pptx
PPTX
Introduction to MongoDB
ODP
Practical catalyst
PPTX
Mongo db queries
PPT
Zend framework 03 - singleton factory data mapper caching logging
PPTX
Introduction to MongoDB
PDF
Introduction to Elasticsearch
ODP
DrupalCon Chicago Practical MongoDB and Drupal
PPTX
Performance By Design
Indexing and Query Optimizer (Mongo Austin)
Mongophilly indexing-2011-04-26
MongoDB's New Aggregation framework
Indexing and Query Optimizer
Entity Query API
Experiment no 05
MongoDB Aggregation MongoSF May 2011
Cost-based Query Optimization
Cost-Based query optimization
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
phoenix-on-calcite-hadoop-summit-2016
fard car.pptx
Introduction to MongoDB
Practical catalyst
Mongo db queries
Zend framework 03 - singleton factory data mapper caching logging
Introduction to MongoDB
Introduction to Elasticsearch
DrupalCon Chicago Practical MongoDB and Drupal
Performance By Design
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

SH 2 - SES 3 - MongoDB Aggregation Framework.pptx

  • 1. Understanding Aggregation Framework @ MongoDB 3.6 Matan Zohar Disruptive Technologies Leader – Matrix
  • 2. 2 Why Aggregation Framework? SELECT cust_id,ord_date, SUM(price) AS total FROM orders GROUP BY cust_id,ord_date HAVING total > 250
  • 3. 3 Our Standard Database Tools GROUP BY HAVING JOIN WHERE AVG MIN MAX SUM COUNTSELECT ORDER BY
  • 4. 4 We deserve Tools for Documents $group $abs$lookup $match $avg $min $max db.collection.aggregate() $sum db.collection.find() $sort $mergeObjects $count $bucket $limit $project $sample $skip $unwind $eq $divide $cond $exp $concat $log $map $reduce $split $substr $size $cmp $dateFromString $filter
  • 5. 5 What is the big deal? ARTICLE Name Publish date URL Text CATEGORY Name URL TAG Name URL COMMENT Text Date Author USER Name Email ARTICLE Name Publish date URL Text USER Name Email COMMENT [] Text Date Author TAG [] Name URL CATEGORY [] Name URL Relational Model Document Model
  • 6. 6 What is Aggregation Framework? • Processing pipeline of stages, transforming the document into an aggregated result. • Query process optimization, designed for a sharded cluster.
  • 7. 7 Get to know the Tools (Stages) • $project – Change the document structure, select fields, remove fields, add newly computed fields (SELECT). • $lookup – Join two collections at a stage by an expression (JOIN / SUB QUERY) • $match – Filters the documents according to a condition (WHERE / HAVING) • $group – Groups the documents by an expression (GROUP BY) • $sort – Sorts all input documents by selected fields and order (ORDER BY) • $unwind – Flatten a hierarchal document (array of documents) • $count – Counts the number of documents in the current stage • $bucket – Categorizes groups of documents according to an expression.
  • 8. 8 How does it actually look? SELECT cust_id, ord_date, SUM(price) AS total FROM orders GROUP BY cust_id, ord_date HAVING total > 250 db.orders.aggregate( [ { $group: { _id: { cust_id: "$cust_id", ord_date: { month: { $month: "$ord_date" }, day: { $dayOfMonth: "$ord_date" }, year: { $year: "$ord_date"} } }, total: { $sum: "$price" } } }, { $match: { total: { $gt: 250 } } } ] )
  • 9. 9 MongoDB 3.6 – What’s new ? • $lookup – More expressive, now supports non equi-joins, subqueries! That means improved performance for Analytics & BI Tools as more ops are pushed down to the database. • $expr – You can use aggregation expressions within the query language! Simpler code, can be used from db.collection.find(), but use with caution, does net yet fully leverages indexes. Whenever in doubt use db.collection.aggregate(). • New Aggregation Operations: – $arrayToObject – $objectToArray – $mergeObjects – $dateFromString – $dateFromParts – $dateToParts • $$REMOVE – New aggregation variable, allows for the conditional exclusion of a field. • hint, comment – New options for aggregate command and the db.collection.aggregate() method. • hint – An index to use for the aggregation (on the initial collection). • comment – A string to help trace the operation in the database profiler, currentOp, and logs.
  • 10. 10 • Filter as soon as possible. • Use indexes to improve sorts, matches, lookups. • If performance is less than expected use explain to analyze the plan. • Use hint when there is a better index that is not being used. • Use projection to filter the subset of fields you need at the beginning, the pipeline will take it in consideration and pass less data between stages. Optimizing Aggregation Pipelines
  • 11. 11 Lets play with Compass
  • 12. 12 Lets play with Compass
  • 13. 13 Lets play with Compass
  • 14. 14 Lets play with Compass
  • 15. 15 Lets play with Compass
  • 16. 16 Lets play with Compass
  • 17. 17 Lets play with Compass
  • 18. 18 Lets play with Compass
  • 19. 19 Lets play with Compass
  • 20. 20 Lets play with Compass
  • 21. 21 Aggregation Pipeline & SQL db.restaurants.aggregate([ { $match: { name: "Riviera Caterer" }}, { $project: { name: 1, cuisine: 1, borough: 1, "grades.score": 1 }}, { $unwind: "$grades" } ])
  • 31. 31