table of Contents
Introduction to Aggregate Analysis
What is ES aggregation analysis?
ES aggregation analysis query writing
Value source for aggregate analysis
Value count counts the number of documents with a value in a field
stats count count max min avg sum 5 values
Percentiles Percentile corresponding value statistics
Geo Bounds aggregation finds the range of coordinate points in the document set
Geo Centroid aggregation to find the coordinate value of the center point
Terms Aggregation grouped and aggregated based on field value items
filter Aggregation performs aggregation calculation on documents that meet the filter query
Filters Aggregation multiple filter group aggregation calculation
Date Range Aggregation time range grouping aggregation
Date Histogram Aggregation Time histogram (bar) aggregation
Missing Aggregation bucket aggregation of missing values
Introduction to Aggregate Analysis
What is ES aggregation analysis?
Aggregation analysis is an important feature in the database. It completes the aggregation calculation of the data in a query data set, such as: finding the maximum and minimum values of a field (or the result of a calculation expression), calculating the sum, and the average value. As a search engine and database, ES also provides powerful aggregation analysis capabilities.
- The aggregation of indicators such as the maximum, minimum, sum, and average value for a data set is called indicator aggregation metric in ES
- In addition to the aggregation function in the relational database, the queried data can also be grouped by group by, and then aggregated on the group. In ES, group by is called bucketing, and bucket aggregation is bucketing
ES also provides matrix aggregation (matrix) and pipeline aggregation (pipleline), but they are still being improved.
ES aggregation analysis query writing
In the query request body, use the aggregation node to define the aggregation analysis according to the following syntax:
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
[,"meta" : { [<meta_data_body>] } ]?
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
//aggregations 也可简写为 aggs
Value source for aggregate analysis
The value of the aggregate calculation can take the value of the field or the result of the script calculation.
Index aggregation
max min sum avg
POST /bank/_search?
{
"size": 0,
"aggs": {
"masssbalance": {
"max": {
"field": "balance"
}
}
}
}
//查询所有客户中余额的最大值
POST /bank/_search?
{
"size": 2,
"query": {
"match": {
"age": 24
}
},
"sort": [
{
"balance": {
"order": "desc"
}
}
],
"aggs": {
"max_balance": {
"max": {
"field": "balance"
}
}
}
}
//年龄为24岁的客户中的余额最大值
POST /bank/_search?size=0
{
"aggs" : {
"avg_age" : {
"avg" : {
"script" : {
"source" : "doc.age.value"
}
}
},
"avg_age10" : {
"avg" : {
"script" : {
"source" : "doc.age.value + 10"
}
}
}
}}
//值来源于脚本
//查询所有客户的平均年龄是多少
POST /bank/_search?size=0
{
"aggs": {
"sum_balance": {
"sum": {
"field": "balance",
"script": {
"source": "_value * 1.03"
}
}
}
}
}
//指定field,在脚本中用_value 取字段的值
POST /bank/_search?size=0
{
"aggs": {
"avg_age": {
"avg": {
"field": "age",
"missing": 18
}
} }}
POST /bank/_search?size=0
{
"aggs": {
"avg_age": {
"avg": {
"field": "age",
"missing": 18
}
}
}
}
//为缺失值字段,指定值。如未指定,缺失该字段值的文档将被忽略。
Document count
POST /bank/_doc/_count
{
"query": {
"match": {
"age" : 24
}
}
}
Value count counts the number of documents with a value in a field
POST /bank/_search?size=0
{
"aggs" : {
"age_count" : { "value_count" : { "field" : "age" } }
}
}
Cardinality value de-counting
POST /bank/_search?size=0
{
"aggs": {
"age_count": {
"cardinality": {
"field": "age"
}
},
"state_count": {
"cardinality": {
"field": "state.keyword"
}
}
}
}
//state的使用它的keyword版
stats count count max min avg sum 5 values
POST /bank/_search?size=0
{
"aggs": {
"age_stats": {
"stats": {
"field": "age"
}
}
}
}
Extended stats
Advanced statistics, 4 more statistical results than stats: sum of squares, variance, standard deviation, mean plus/minus two standard deviation interval
POST /bank/_search?size=0
{
"aggs": {
"age_stats": {
"extended_stats": {
"field": "age"
}
}
}
}
Percentiles Percentile corresponding value statistics
For the value of the specified field (script), accumulate the proportion of the number of documents corresponding to each value from small to large (the percentage of all hit documents), and return the value corresponding to the specified proportion. By default, it returns the value in the quantile [1, 5, 25, 50, 75, 95, 99]. The following intermediate results can be understood as: the age value of documents that account for 50% is <= 31, or vice versa: the number of documents with age <= 31 accounts for 50% of the total number of hit documents
POST /bank/_search?size=0
{
"aggs": {
"age_percents": {
"percentiles": {
"field": "age"
}
}
}
}
"aggregations": {
"age_percents": {
"values": {
"1.0": 20,
"5.0": 21,
"25.0": 25,
"50.0": 31,
"75.0": 35,
"95.0": 39,
"99.0": 40
}
}
}
POST /bank/_search?size=0
{
"aggs": {
"age_percents": {
"percentiles": {
"field": "age",
"percents" : [95, 99, 99.9]
}
}
}
}
//指定分位值
Percentiles rank The proportion of documents whose statistical value is less than or equal to the specified value
POST /bank/_search?size=0
{
"aggs": {
"gge_perc_rank": {
"percentile_ranks": {
"field": "age",
"values": [
25,
30
]
}
}
}
}
"aggregations": {
"gge_perc_rank": {
"values": {
"25.0": 26.1,
"30.0": 49.3
}
}
}
Geo Bounds aggregation finds the range of coordinate points in the document set
Geo Centroid aggregation to find the coordinate value of the center point
Bucket aggregation
Terms Aggregation grouped and aggregated based on field value items
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age"
}
}
}
}
"aggregations": {
"age_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 463,
"buckets": [
{ //文档计数的最大偏差值
"key": 31,
"doc_count": 61
}, //未返回的其他项的文档数
{
"key": 39,
"doc_count": 60 //默认情况下返回按文档计数从高到低的前10个分组
},
{
"key": 26,
"doc_count": 59
},
….
]
}
}
- size specifies how many groups to return
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"size": 20
}
} }}
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"size": 5,
"shard_size":20
}
} }}
//shard_size 指定每个分片上返回多少个分组
//shard_size 的默认值为: 索引只有一个分片:= size多分片:= size * 1.5 + 10
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"size": 5,
"shard_size":20,
"show_term_doc_count_error": true
} } }}
//每个分组上显示偏差值
- order specifies the order of the group
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"order" : { "_count" : "asc" }
}
}
}
}
//根据文档计数排序
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"order" : { "_key" : "asc" }
}
}
}
}
//根据分组值排序
- Take the group index value
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"order": {
"max_balance": "asc"
}
},
"aggs": {
"max_balance": {
"max": {
"field": "balance"
}
},
"min_balance": {
"min": {
"field": "balance"
}
} } } }}
- Sort by group index value
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"order": {
"max_balance": "asc"
}
},
"aggs": {
"max_balance": {
"max": {
"field": "balance"
}
}
}
} }}
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"order": {
"stats_balance.max": "asc"
}
},
"aggs": {
"stats_balance": {
"stats": {
"field": "balance"
}
}
}
} }}
- Filter group
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"min_doc_count": 60
}
}
}
}
//用文档计数来筛选
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"terms": {
"field": "age",
"include": [20,24]
}
}
}
}
//筛选指定的值列表
GET /_search
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"include" : ".*sport.*",
"exclude" : "water_.*"
}
}
}
}
//正则表达式匹配值
GET /_search
{
"aggs" : {
"JapaneseCars" : {
"terms" : {
"field" : "make",
"include" : ["mazda", "honda"]
}
},
"ActiveCarManufacturers" : {
"terms" : {
"field" : "make",
"exclude" : ["rover", "jensen"]
}
}
}
}
//指定值列表
- Group by script calculated value
GET /_search
{
"aggs" : {
"genres" : {
"terms" : {
"script" : {
"source": "doc['genre'].value",
"lang": "painless"
}
}
}
}
}
- Missing value processing
GET /_search
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"missing": "N/A"
}
}
}
}
filter Aggregation performs aggregation calculation on documents that meet the filter query
Select documents with compound filter criteria from the documents hit by the query to aggregate
POST /bank/_search?size=0
{
"aggs": {
"age_terms": {
"filter": {"match":{"gender":"F"}},
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
Filters Aggregation multiple filter group aggregation calculation
PUT /logs/_doc/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "body" : "warning: page could not be rendered" }
{ "index" : { "_id" : 2 } }
{ "body" : "authentication error" }
{ "index" : { "_id" : 3 } }
{ "body" : "warning: connection timed out" }
GET logs/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"filters" : {
"errors" : { "match" : { "body" : "error" }},
"warnings" : { "match" : { "body" : "warning" }}
}
} } }}
GET logs/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"other_bucket_key": "other_messages",
"filters" : {
"errors" : { "match" : { "body" : "error" }},
"warnings" : { "match" : { "body" : "warning" }}
}
}
}
}
}
//为其他值组指定key
Range Aggregation
POST /bank/_search?size=0
{
"aggs": {
"age_range": {
"range": {
"field": "age",
"ranges": [
{"to":25},
{"from": 25,"to": 35},
{"from": 35}
]
},
"aggs": {
"bmax": {
"max": {
"field": "balance"
}
}
} } }}
POST /bank/_search?size=0
{
"aggs": {
"age_range": {
"range": {
"field": "age",
"keyed": true,
"ranges": [
{"to":25,"key": "Ld"},
{"from": 25,"to": 35,"key": "Md"},
{"from": 35,"key": "Od"}
]
}
}
}
}
//为组指定key
Date Range Aggregation time range grouping aggregation
POST /sales/_search?size=0
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
}
}
}
}
Date Histogram Aggregation Time histogram (bar) aggregation
It is to aggregate statistics by day, month, year, etc. It can be aggregated at intervals of year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) or specified time interval.
POST /sales/_search?size=0
{
"aggs" : {
"sales_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
}
}
}
}
POST /sales/_search?size=0
{
"aggs" : {
"sales_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "90m"
}
}
}
}
Missing Aggregation bucket aggregation of missing values
Documents with missing specified field values are used as a bucket for aggregation analysis
POST /bank/_search?size=0
{
"aggs" : {
"account_without_a_age" : {
"missing" : { "field" : "age" }
}
}
}