Open In App

Elasticsearch Group By Field Aggregation & Bucketing

Last Updated : 23 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Elasticsearch is a powerful search and analytics engine that provides various aggregation capabilities to analyze and summarize data. One of the essential aggregation features is the "Group By Field" aggregation, also known as "Terms Aggregation" or "Bucketing." This article will explore Elasticsearch's Group By Field Aggregation in detail, including its functionality, use cases, syntax, examples, and outputs, all explained in beginner-friendly language.

Understanding Aggregations in Elasticsearch

Aggregations in Elasticsearch are similar to SQL’s GROUP BY clause. They allow you to summarize and analyze data by grouping it into buckets based on field values. There are various types of aggregations in Elasticsearch, but for grouping data, the primary ones are:

  • Terms Aggregation
  • Histogram Aggregation
  • Date Histogram Aggregation

Terms Aggregation

Terms Aggregation is used to group documents by unique values of a specified field. This is particularly useful for categorical data, such as tags, categories, or keywords.

Syntax:

{
"aggs": {
"agg_name": {
"terms": {
"field": "field_name",
"size": 10
}
}
}
}
  • agg_name: The name of the aggregation.
  • field_name: The field to group by.
  • size: The number of unique terms to return.

Example: Grouping Articles by Category

Suppose you have a dataset of news articles with different categories, and you want to see the distribution of these categories.

Indexing Data:

PUT /news_articles/_doc/1
{
"title": "Tech Giants Unveil New Products",
"category": "Technology"
}

PUT /news_articles/_doc/2
{
"title": "Fashion Week Trends 2023",
"category": "Fashion"
}

PUT /news_articles/_doc/3
{
"title": "Stock Market Update: Bullish Trends Continue",
"category": "Finance"
}

Performing Terms Aggregation:

GET /news_articles/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
}
}
}
}

Output:

{
"aggregations": {
"categories": {
"buckets": [
{
"key": "Technology",
"doc_count": 1
},
{
"key": "Fashion",
"doc_count": 1
},
{
"key": "Finance",
"doc_count": 1
}
]
}
}
}

Histogram Aggregation

Histogram Aggregation is used to group documents into buckets based on numerical field ranges. This is useful for data such as prices, ages, or any continuous numerical field.

Syntax:

{
"aggs": {
"agg_name": {
"histogram": {
"field": "field_name",
"interval": interval_value
}
}
}
}
  • agg_name: The name of the aggregation.
  • field_name: The numerical field to group by.
  • interval_value: The bucket interval size.

Example: Grouping Products by Price Range

Consider a dataset of products with different prices, and you want to group these products into price ranges.

Indexing Data:

PUT /products/_doc/1
{
"name": "Product A",
"price": 25
}

PUT /products/_doc/2
{
"name": "Product B",
"price": 50
}

PUT /products/_doc/3
{
"name": "Product C",
"price": 75
}

Performing Histogram Aggregation:

GET /products/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"histogram": {
"field": "price",
"interval": 25
}
}
}
}

Output:

{
"aggregations": {
"price_ranges": {
"buckets": [
{
"key": 25,
"doc_count": 1
},
{
"key": 50,
"doc_count": 1
},
{
"key": 75,
"doc_count": 1
}
]
}
}
}

Date Histogram Aggregation

Date Histogram Aggregation is used to group documents into buckets based on date or time intervals. This is particularly useful for time-series data, such as logs or event data.

Syntax:

{
"aggs": {
"agg_name": {
"date_histogram": {
"field": "date_field",
"calendar_interval": "interval"
}
}
}
}
  • agg_name: The name of the aggregation.
  • field_name: The date field to group by.
  • interval: The calendar interval (e.g., day, week, month).

Example: Grouping Events by Day

Suppose you have a dataset of events with timestamps, and you want to group these events by day.

Indexing Data:

PUT /events/_doc/1
{
"event": "Login",
"timestamp": "2023-01-01T10:00:00Z"
}

PUT /events/_doc/2
{
"event": "Logout",
"timestamp": "2023-01-01T12:00:00Z"
}

PUT /events/_doc/3
{
"event": "Purchase",
"timestamp": "2023-01-02T14:00:00Z"
}

Performing Date Histogram Aggregation:

GET /events/_search
{
"size": 0,
"aggs": {
"events_per_day": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
}
}
}
}

Output:

{
"aggregations": {
"events_per_day": {
"buckets": [
{
"key_as_string": "2023-01-01T00:00:00.000Z",
"doc_count": 2
},
{
"key_as_string": "2023-01-02T00:00:00.000Z",
"doc_count": 1
}
]
}
}
}

Understanding Group By Field Aggregation

Group By Field Aggregation in Elasticsearch allows you to group documents based on the values of a specific field in your dataset. It divides the dataset into "buckets," where each bucket represents a unique value of the chosen field. You can then perform various sub-aggregations or calculations within each bucket to analyze the grouped data further.

How Does Group By Field Aggregation Work?

When you apply the Group By Field Aggregation to your Elasticsearch query, it scans through the documents in your index and groups them based on the values of the specified field. It then creates a bucket for each unique field value and aggregates the documents within each bucket. Finally, it returns the aggregated results for each bucket, allowing you to analyze the data based on different categories or dimensions.

Syntax:

{
"aggs": {
"agg_name": {
"terms": {
"field": "field_name",
"size": 10
},
"aggs": {
"sub_agg": {
"aggregation_type": { ... }
}
}
}
}
}
  • agg_name: The name of the aggregation.
  • field_name: The field to group by.
  • size: The maximum number of buckets to return.

Example: Grouping Documents by Category

Let's consider an example where we have a dataset of products with different categories, and we want to group them by category to analyze sales within each category.

Indexing Data:

PUT /products/_doc/1
{
"name": "iPhone 13",
"category": "Smartphones",
"price": 999
}

PUT /products/_doc/2
{
"name": "Samsung Galaxy S21",
"category": "Smartphones",
"price": 899
}

PUT /products/_doc/3
{
"name": "MacBook Pro",
"category": "Laptops",
"price": 1999
}

Performing Group By Field Aggregation

GET /products/_search
{
"size": 0,
"aggs": {
"products_by_category": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}

Output:

{
"aggregations": {
"products_by_category": {
"buckets": [
{
"key": "Smartphones",
"doc_count": 2,
"avg_price": {
"value": 949
}
},
{
"key": "Laptops",
"doc_count": 1,
"avg_price": {
"value": 1999
}
}
]
}
}
}

Analysis:

  • There are two buckets: "Smartphones" and "Laptops," representing the unique values of the "category" field.
  • Within the "Smartphones" bucket, there are two documents, and the average price is calculated as $949.
  • Within the "Laptops" bucket, there is one document, and the average price is $1999.

Real-World Use Cases

1. E-commerce Sales Analysis:

Grouping sales data by product categories, brands, or regions can provide insights into sales performance, customer preferences, and market trends.

2. Log Analysis:

Grouping log data by severity levels, error codes, or timestamps can help identify patterns, anomalies, and trends in system behavior.

3. Marketing Campaign Analysis:

Grouping marketing data by campaign types, channels, or demographics can help evaluate the effectiveness of different campaigns and target audiences.

Advanced Options

1. Bucket Ordering:

You can specify the order in which buckets are returned based on certain criteria such as count, key, or custom metrics.

2. Bucket Filtering:

You can apply filters to buckets to include or exclude specific buckets based on certain conditions.

Conclusion

Elasticsearch's Group By Field Aggregation, also known as Terms Aggregation, is a powerful feature for analyzing and summarizing data based on the values of a specific field. By dividing the dataset into buckets and aggregating the documents within each bucket, you can gain valuable insights into your data's patterns, trends, and distributions. Whether you're analyzing sales data, log files, or marketing campaigns, Group By Field Aggregation provides a versatile and effective way to explore and understand your data's structure and content.


Article Tags :

Similar Reads