Aggregations with Search API

Allows users to perform high-level analysis, providing different types of aggregations like Terms, Significant Terms, and Date Histogram to detect trends and generate insights

Using Aggregation in the Search API, you can power dashboard and run high-level analysis on the results of your searches. Aggregations use the same Lucene-based logic, and results are returned as ordinary JSON documents.

We allow different types of aggregation:

  • Terms: Terms Aggregate enables the user to query for an aggregated view of the results, based on the provided parameters. This retrieves the counts for the desired field of the results against the query. Mostly used to generate dashboards, reports, and graphs.
  • Trend (or Significant Terms): Compared to the Term Aggregation, the "Trend Aggregation" does not return the most popular terms in a set. Instead, the trend aggregation highlights the results of a field that has undergone a significant change in popularity measured between a foreground and background set.
  • Data Histogram: Date Histogram Aggregate enables user to query the aggregated result based on a time-frame. Mostly used to generate dashboards, reports, and graphs.

The use cases for the aggregations are many:

  • Searching for which organizations have more results for "fraud" than usual
  • Detecting micro-trends within hashtags, locations, or other enriched fields.
  • Seeing patterns of sentiment towards company products over a time period.

📘

Just for private data sources

The Aggregations is just available for private data sources

Using the Search API, the user can use the aggregations field to send the query aggregation. This is an example of a request using terms aggregation:

{
  "query": {
    "from": 0,
    "size": 0,
    "query": "content.title:*pizza*",
    "data_sources": [
      "private.socialgist_reddit"
    ],
    "aggregations": {
      "posts_by_subreddit": {
        "terms": {
          "field": "forum.name.keyword",
          "size": 5
        }
      }
    }
  }
}

This search is requesting an aggregated view of the number of posts containing "pizza" in the title, grouped by the subreddit they were posted in. This information could be useful for understanding which subreddits are most active or relevant for discussions around pizza-related topics.

It is necessary to add the suffix .keyword for the fields that will be used in the aggregation, as you can see in the last example: forum.name + .keyword.

The response will look like this:

{
  "total": {
    "value": 20,
    "relation": "eq"
  },
  "aggregations": {
    "posts_by_subreddit": {
      "buckets": [
        {
          "key": "mildlyinfuriating",
          "doc_count": 2
        },
        {
          "key": "newjersey",
          "doc_count": 2
        },
        {
          "key": "AskACanadian",
          "doc_count": 1
        },
        {
          "key": "CharacterAMARoleplay",
          "doc_count": 1
        },
        {
          "key": "Dominos",
          "doc_count": 1
        }
      ]
    }
  }
}