ElasticSearch use summary (5)

Elasticsearch is an open source framework that provides retrieval and relevance ranking, and also supports complex statistics-aggregation of stored documents.
Aggregations in ES are divided into two categories: Metrics and buckets (forgive my poor English, I can't find the right words... just use words!). To put it more simply, metric is very similar to avg, max, min and other methods in SQL, and bucket is a bit similar to group by.
This article briefly introduces the usage of metric aggregation.
Metric aggregation can be divided into two types according to the return type of the value: single-value aggregation and multi-value aggregation.
Single value aggregation
Sum Sum
This aggregation returns a single value, dsl can refer to the following:

"aggs" : {
        "intraday_return" : { "sum" : { "field" : "change" } }
    }

Returns the sum of the change fields:

{
    ...

    "aggregations": {
        "intraday_return": {
           "value": 2.18
        }
    }
}

where intraday_return is the name of the aggregate, which is also used as the id value returned by the request. In addition, the aggregation supports scripts, so I won't go into details here, just refer to the official documentation for details.
Min find the minimum value

{
    "aggs" : {
        "min_price" : { "min" : { "field" : "price" } }
    }
}

Max find the maximum value

{
    "aggs" : {
        "max_price" : { "max" : { "field" : "price" } }
    }
}

avg average

{
    "aggs" : {
        "avg_grade" : { "avg" : { "field" : "grade" } }
    }
}

Cardinality finds unique values, that is, how many fields are not repeated

{
    "aggs" : {
        "author_count" : {
            "cardinality" : {
                "field" : "author"
            }
        }
    }
}

Multi-value aggregation
percentiles to find percentages

{
    "aggs" : {
        "load_time_outlier" : {
            "percentile_ranks" : {
                "field" : "load_time", 
                "values" : [15, 30]
            }
        }
    }
}

The returned result contains multiple values:

{
    ...

   "aggregations": {
      "load_time_outlier": {
         "values" : {
            "15": 92,
            "30": 100
         }
      }
   }
}

Summary
The above is not comprehensive, such as the new version of ES, which also supports multi-value percentile Rank, Geo Bounds geographic location information, Scripted Metric scripts; single-valued top hits, etc.

In terms of performance, ES has also done a lot of optimizations: such as max and min, if for the sorted field, then the calculation step is skipped directly, and the target value can be directly taken out.
Of course, some aggregations also require specific occasions. For example, cardinality calculates the unique value by hashing. If the field data is large, it will consume a lot of performance.
In addition, the buckets can be nested. For example, if a max aggregation is nested under the range aggregation, the max statistics will be performed again on each result group obtained by the range.
Supporting the use of scripts in aggregation can increase the flexibility of statistics.
A lot of content still needs to be used in practice to understand its advantages.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325902616&siteId=291194637