SlideShare a Scribd company logo
@talevy
All About Aggregations
Tal Levy, Software Engineer
- http://localhost:9200
{ }
{ “tagline”: “You Know, for Search” }
3
• Originally built on Lucene for text-based
searching
• Lucene and Elasticsearch work together to
provide new storage formats and data types
specific for numeric and keyword metrics.
• Aggregations alongside searching
More than search
4
Query
5
Query
Aggs
6
And Analytics
Searching & Aggregating
7
price color make sold
10000 red honda 10/28/2016
20000 red honda 11/05/2016
30000 green ford 05/08/2016
15000 blue toyota 07/02/2016
12000 green toyota 08/19/2016
20000 red honda 11/05/2016
80000 red bmw 01/01/2016
25000 blue ford 02/12/2016
Data Structures For Field Values on Shards
8
color
red
red
green
blue
green
red
red
blue
• Two considerations for our data
• Fast querying by values
• Fast aggregating by values
Inverted Index: terms-to-documents
9
color doc1 doc2 doc3
red ◉ ◉ ◉
blue ◉ ◉ ◉
green ◉ ◉ ◉
purple ◉ ◉ ◉
orange ◉ ◉ ◉
white ◉ ◉ ◉
black ◉ ◉ ◉
brown ◉ ◉ ◉
Doc Values: documents-to-terms
10
1
value
per
document
1 column per field
price color make sold
10000 red honda 10/28/2016
20000 red honda 11/05/2016
30000 green ford 05/08/2016
15000 blue toyota 07/02/2016
12000 green toyota 08/19/2016
20000 red honda 11/05/2016
80000 red bmw 01/01/2016
25000 blue ford 02/12/2016
How Distributed Aggregations Work?
11
Data nodes
Coordinating node
• inline with search query
• Executed in isolation on
each shard
• 4 phases
• Parse
• Collect
• Combine
• Reduce
Phase 1: Parse
12
Data nodes
Coordinating node
• Coordinating node splits
the request into shard
requests
• Shards parse
aggregations and
initialize data-structures
Phase 2,3: Collect, Combine
13
Data nodes
Coordinating node
• Shards process all
matching documents
• Once done, they combine
aggregated data into
an aggregation
Phase 4: Reduce
14
Data nodes
Coordinating node
• Shards send their
aggregations to the
coordinating node
• Which reduces them
into a single aggregation
Designed for speed
15
Single network round-trip
Single pass through data on shards
Aggregates are computed in memory
Trades accuracy for speed
Only pay for documents that match query
Can be composed (average response time — broken by day)
Types of Aggregations
16
• Bucket
• Terms
• (Date) Histograms
• Filter
• Range
• …
• Metric
• Stats
• Percentiles
• Cardinality (unique counts)
• Top Hits
• Scripted
• …
Example Terms Aggregation Query
17
GET products/_search
{
"size" : 0,
"query": {"match_all": {} },
"aggs" : {
"my_produce_ids” : {
"terms": {
"field": "pid",
"size": 3
}
}
}
}
Example Terms Aggregation Response
18
{
"hits": {…},
"aggregations": {
"my_product_ids”: {
"doc_count_error_upper_bound": 3302,
"sum_other_doc_count": 8879020,
"buckets": [
{ "key": "030758836X", "doc_count": 7440 },
{ "key": "0439023483", "doc_count": 6717 },
{ "key": "0375831002", "doc_count": 4864 }
]
}}}
Things To Consider
19
{
"hits": {…},
"aggregations": {
"my_product_ids”: {
"doc_count_error_upper_bound": 3302,
"sum_other_doc_count": 8879020,
"buckets": [
{ "key": "030758836X", "doc_count": 7440 },
{ "key": "0439023483", "doc_count": 6717 },
{ "key": "0375831002", "doc_count": 4864 }
]
}}}
Upper bound on error on counts for each term
number of docs not included in buckets
Locality Bias: Top N(1)
20
A
COUNT
RED 5
GREEN 4
BLUE 2
COUNT
RED 2
GREEN 4
BLUE 1
B
COUNT
RED 7
GREEN 8
BLUE 3
A B
Node A’s Counts Node B’s Counts Global Counts
Shard Size: Top 3
21
Data nodes
Coordinating node
• How many buckets to
return per shard?
• “shard_size”
15
15
15
15
3
Example Terms Aggregation Query
22
GET products/_search
{
"size" : 0,
"query": {"match_all": {} },
"aggs" : {
"my_produce_ids” : {
"terms": {
"field": "pid",
"size": 3,
“shard_size”: 999999
}
}
}
}
Summary
23
Aggregations are powerful & fast
Need to trade accuracy for speed/memory in some cases
Use `shard_size` to help manage accuracy with terms aggregation
Leverage Kibana to help write aggregations!
Profile your aggregations using the Query Profiler
What We Missed
24
Pipeline Aggregations: Aggregations of Aggregations
Using `requests.cache` to cache complex static aggregations
Matrix Aggregations: covariance and correlation
New aggregation types introduced all the time
What’s New In 6.0
What to expect?
26
Efficient sparse doc-value reading and writing
index-time sorting
Removal of types
Cross-cluster search
Upgrading to 6.0 with rolling restarts!
and so much more!
• Elastic Discussion Forums:
https://discuss.elastic.co/
• Aggregation Documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-
aggregations.html
• Terms Aggregation Approximation: https://www.elastic.co/guide/en/elasticsearch/
reference/current/search-aggregations-bucket-terms-aggregation.html#search-
aggregations-bucket-terms-aggregation-approximate-counts
• Similar Deck From my colleagues Adrien and Colin! https://www.elastic.co/elasticon/
2015/sf/all-about-aggregations
Resources
27
Q & A

More Related Content

PPTX
Elasticsearch: Getting Started Part 3 Aggregations
PDF
Redis Day TLV 2018 - RediSearch Aggregations
PPTX
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
PDF
using nodejs to count 30 billion requests per day
PDF
Introduction to elasticsearch
PDF
Aggregation APi in Ibexa DXP by Adam Wójs
PDF
RedisConf18 - Introducing RediSearch Aggregations
PDF
Elasticsearch for Data Analytics
Elasticsearch: Getting Started Part 3 Aggregations
Redis Day TLV 2018 - RediSearch Aggregations
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
using nodejs to count 30 billion requests per day
Introduction to elasticsearch
Aggregation APi in Ibexa DXP by Adam Wójs
RedisConf18 - Introducing RediSearch Aggregations
Elasticsearch for Data Analytics

Similar to All about aggregations (20)

PPTX
Learning MongoDB Aggregations in 10 Minutes
PPTX
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
PDF
Webinar: Managing Real Time Risk Analytics with MongoDB
PPTX
1403 app dev series - session 5 - analytics
PDF
Mongo db aggregation guide
PDF
Mongo db aggregation-guide
PPTX
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
PDF
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
PPTX
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
ODP
Aggregation Framework in MongoDB Overview Part-1
PDF
ELK - What's new and showcases
PDF
MongoDB Aggregation Framework
PPTX
Aggregate Data Models.pptxszfsfsfsafsafsafasf
PDF
Real-time Data Analytics mit Elasticsearch
ODP
MongoDB Distilled
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
PPTX
Beyond the Basics 2: Aggregation Framework
PDF
Workshop mongo aggregations [musescodejs 16.11.19]
PPTX
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Learning MongoDB Aggregations in 10 Minutes
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Managing Real Time Risk Analytics with MongoDB
1403 app dev series - session 5 - analytics
Mongo db aggregation guide
Mongo db aggregation-guide
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Aggregation Framework in MongoDB Overview Part-1
ELK - What's new and showcases
MongoDB Aggregation Framework
Aggregate Data Models.pptxszfsfsfsafsafsafasf
Real-time Data Analytics mit Elasticsearch
MongoDB Distilled
Joins and Other MongoDB 3.2 Aggregation Enhancements
Beyond the Basics 2: Aggregation Framework
Workshop mongo aggregations [musescodejs 16.11.19]
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Ad

More from Fan Robbin (10)

PDF
The state of geo in ElasticSearch
PDF
reliabe by design
PDF
updates from lucene lands 2015
PDF
bm25 demystified
PDF
Seven deadly sins of ElasticSearch Benchmarking
PPT
AinoVongeCorry_AnIntroductionToArchitectureQuality.ppt
PDF
广告推荐训练系统的落地实践
PDF
微博推荐引擎架构蜕变之路
PDF
Claire protorpc
PPTX
可视化的微博
The state of geo in ElasticSearch
reliabe by design
updates from lucene lands 2015
bm25 demystified
Seven deadly sins of ElasticSearch Benchmarking
AinoVongeCorry_AnIntroductionToArchitectureQuality.ppt
广告推荐训练系统的落地实践
微博推荐引擎架构蜕变之路
Claire protorpc
可视化的微博
Ad

Recently uploaded (20)

PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PPT
tcp ip networks nd ip layering assotred slides
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PPTX
artificial intelligence overview of it and more
PPTX
Internet___Basics___Styled_ presentation
PPTX
QR Codes Qr codecodecodecodecocodedecodecode
PPTX
Introduction to Information and Communication Technology
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPTX
Introuction about ICD -10 and ICD-11 PPT.pptx
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
SAP Ariba Sourcing PPT for learning material
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
Design_with_Watersergyerge45hrbgre4top (1).ppt
Decoding a Decade: 10 Years of Applied CTI Discipline
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
tcp ip networks nd ip layering assotred slides
Tenda Login Guide: Access Your Router in 5 Easy Steps
artificial intelligence overview of it and more
Internet___Basics___Styled_ presentation
QR Codes Qr codecodecodecodecocodedecodecode
Introduction to Information and Communication Technology
Paper PDF World Game (s) Great Redesign.pdf
Introuction about WHO-FIC in ICD-10.pptx
An introduction to the IFRS (ISSB) Stndards.pdf
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Introuction about ICD -10 and ICD-11 PPT.pptx
WebRTC in SignalWire - troubleshooting media negotiation
PptxGenJS_Demo_Chart_20250317130215833.pptx
The New Creative Director: How AI Tools for Social Media Content Creation Are...
SAP Ariba Sourcing PPT for learning material

All about aggregations

  • 1. @talevy All About Aggregations Tal Levy, Software Engineer
  • 2. - http://localhost:9200 { } { “tagline”: “You Know, for Search” }
  • 3. 3 • Originally built on Lucene for text-based searching • Lucene and Elasticsearch work together to provide new storage formats and data types specific for numeric and keyword metrics. • Aggregations alongside searching More than search
  • 7. Searching & Aggregating 7 price color make sold 10000 red honda 10/28/2016 20000 red honda 11/05/2016 30000 green ford 05/08/2016 15000 blue toyota 07/02/2016 12000 green toyota 08/19/2016 20000 red honda 11/05/2016 80000 red bmw 01/01/2016 25000 blue ford 02/12/2016
  • 8. Data Structures For Field Values on Shards 8 color red red green blue green red red blue • Two considerations for our data • Fast querying by values • Fast aggregating by values
  • 9. Inverted Index: terms-to-documents 9 color doc1 doc2 doc3 red ◉ ◉ ◉ blue ◉ ◉ ◉ green ◉ ◉ ◉ purple ◉ ◉ ◉ orange ◉ ◉ ◉ white ◉ ◉ ◉ black ◉ ◉ ◉ brown ◉ ◉ ◉
  • 10. Doc Values: documents-to-terms 10 1 value per document 1 column per field price color make sold 10000 red honda 10/28/2016 20000 red honda 11/05/2016 30000 green ford 05/08/2016 15000 blue toyota 07/02/2016 12000 green toyota 08/19/2016 20000 red honda 11/05/2016 80000 red bmw 01/01/2016 25000 blue ford 02/12/2016
  • 11. How Distributed Aggregations Work? 11 Data nodes Coordinating node • inline with search query • Executed in isolation on each shard • 4 phases • Parse • Collect • Combine • Reduce
  • 12. Phase 1: Parse 12 Data nodes Coordinating node • Coordinating node splits the request into shard requests • Shards parse aggregations and initialize data-structures
  • 13. Phase 2,3: Collect, Combine 13 Data nodes Coordinating node • Shards process all matching documents • Once done, they combine aggregated data into an aggregation
  • 14. Phase 4: Reduce 14 Data nodes Coordinating node • Shards send their aggregations to the coordinating node • Which reduces them into a single aggregation
  • 15. Designed for speed 15 Single network round-trip Single pass through data on shards Aggregates are computed in memory Trades accuracy for speed Only pay for documents that match query Can be composed (average response time — broken by day)
  • 16. Types of Aggregations 16 • Bucket • Terms • (Date) Histograms • Filter • Range • … • Metric • Stats • Percentiles • Cardinality (unique counts) • Top Hits • Scripted • …
  • 17. Example Terms Aggregation Query 17 GET products/_search { "size" : 0, "query": {"match_all": {} }, "aggs" : { "my_produce_ids” : { "terms": { "field": "pid", "size": 3 } } } }
  • 18. Example Terms Aggregation Response 18 { "hits": {…}, "aggregations": { "my_product_ids”: { "doc_count_error_upper_bound": 3302, "sum_other_doc_count": 8879020, "buckets": [ { "key": "030758836X", "doc_count": 7440 }, { "key": "0439023483", "doc_count": 6717 }, { "key": "0375831002", "doc_count": 4864 } ] }}}
  • 19. Things To Consider 19 { "hits": {…}, "aggregations": { "my_product_ids”: { "doc_count_error_upper_bound": 3302, "sum_other_doc_count": 8879020, "buckets": [ { "key": "030758836X", "doc_count": 7440 }, { "key": "0439023483", "doc_count": 6717 }, { "key": "0375831002", "doc_count": 4864 } ] }}} Upper bound on error on counts for each term number of docs not included in buckets
  • 20. Locality Bias: Top N(1) 20 A COUNT RED 5 GREEN 4 BLUE 2 COUNT RED 2 GREEN 4 BLUE 1 B COUNT RED 7 GREEN 8 BLUE 3 A B Node A’s Counts Node B’s Counts Global Counts
  • 21. Shard Size: Top 3 21 Data nodes Coordinating node • How many buckets to return per shard? • “shard_size” 15 15 15 15 3
  • 22. Example Terms Aggregation Query 22 GET products/_search { "size" : 0, "query": {"match_all": {} }, "aggs" : { "my_produce_ids” : { "terms": { "field": "pid", "size": 3, “shard_size”: 999999 } } } }
  • 23. Summary 23 Aggregations are powerful & fast Need to trade accuracy for speed/memory in some cases Use `shard_size` to help manage accuracy with terms aggregation Leverage Kibana to help write aggregations! Profile your aggregations using the Query Profiler
  • 24. What We Missed 24 Pipeline Aggregations: Aggregations of Aggregations Using `requests.cache` to cache complex static aggregations Matrix Aggregations: covariance and correlation New aggregation types introduced all the time
  • 26. What to expect? 26 Efficient sparse doc-value reading and writing index-time sorting Removal of types Cross-cluster search Upgrading to 6.0 with rolling restarts! and so much more!
  • 27. • Elastic Discussion Forums: https://discuss.elastic.co/ • Aggregation Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search- aggregations.html • Terms Aggregation Approximation: https://www.elastic.co/guide/en/elasticsearch/ reference/current/search-aggregations-bucket-terms-aggregation.html#search- aggregations-bucket-terms-aggregation-approximate-counts • Similar Deck From my colleagues Adrien and Colin! https://www.elastic.co/elasticon/ 2015/sf/all-about-aggregations Resources 27
  • 28. Q & A