Elasticsearch study notes (6) aggregation analysis

table of Contents

Introduction to Aggregate Analysis

What is ES aggregation analysis?

ES aggregation analysis query writing

Value source for aggregate analysis

Index aggregation

max  min  sum  avg

Document count

Value count counts the number of documents with a value in a field

Cardinality value de-counting

stats count count max min avg sum 5 values

Extended stats

Percentiles Percentile corresponding value statistics

Percentiles rank The proportion of documents whose statistical value is less than or equal to the specified value

Geo Bounds aggregation finds the range of coordinate points in the document set

Geo Centroid aggregation to find the coordinate value of the center point

Bucket aggregation

Terms Aggregation grouped and aggregated based on field value items

filter Aggregation performs aggregation calculation on documents that meet the filter query

Filters Aggregation multiple filter group aggregation calculation

Range Aggregation

Date Range Aggregation time range grouping aggregation

Date Histogram Aggregation Time histogram (bar) aggregation

Missing Aggregation bucket aggregation of missing values

Geo Distance Aggregation


Introduction to Aggregate Analysis

What is ES aggregation analysis?

Aggregation analysis is an important feature in the database. It completes the aggregation calculation of the data in a query data set, such as: finding the maximum and minimum values ​​of a field (or the result of a calculation expression), calculating the sum, and the average value. As a search engine and database, ES also provides powerful aggregation analysis capabilities.

  • The aggregation of indicators such as the maximum, minimum, sum, and average value for a data set is called indicator aggregation   metric in ES
  • In addition to the aggregation function in the relational database, the queried data can also be grouped by group by, and then aggregated on the group. In ES, group by is called bucketing, and bucket aggregation is bucketing

ES also provides matrix aggregation (matrix) and pipeline aggregation (pipleline), but they are still being improved.

ES aggregation analysis query writing

In the query request body, use the aggregation node to define the aggregation analysis according to the following syntax:

"aggregations" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?
    }
    [,"<aggregation_name_2>" : { ... } ]*
}
//aggregations 也可简写为 aggs

Value source for aggregate analysis

The value of the aggregate calculation can take the value of the field or the result of the script calculation.

Index aggregation

max  min  sum  avg

POST /bank/_search?
{
  "size": 0, 
  "aggs": {
    "masssbalance": {
      "max": {
        "field": "balance"
      }
    }
  }
}
//查询所有客户中余额的最大值
POST /bank/_search?
{
  "size": 2, 
  "query": {
    "match": {
      "age": 24
    }
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "max_balance": {
      "max": {
        "field": "balance"
      }
    }
  }
}
//年龄为24岁的客户中的余额最大值
POST /bank/_search?size=0
{
    "aggs" : {
        "avg_age" : {
            "avg" : {
                "script" : {
                    "source" : "doc.age.value"
                }
            }
        },
        "avg_age10" : {
            "avg" : {
                "script" : {
                    "source" : "doc.age.value + 10"
                }
            }
        }
    }}
//值来源于脚本
//查询所有客户的平均年龄是多少
POST /bank/_search?size=0
{
  "aggs": {
    "sum_balance": {
      "sum": {
        "field": "balance",
        "script": {
            "source": "_value * 1.03"
        }
      }
    }
  }
}
//指定field,在脚本中用_value 取字段的值
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age",
        "missing": 18
      }
    }  }}
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age",
        "missing": 18
      }
    }
  }
}
//为缺失值字段,指定值。如未指定,缺失该字段值的文档将被忽略。

Document count

POST /bank/_doc/_count
{
  "query": {
    "match": {
      "age" : 24
    }
  }
}

Value count counts the number of documents with a value in a field

POST /bank/_search?size=0
{
    "aggs" : {
        "age_count" : { "value_count" : { "field" : "age" } }
    }
}

Cardinality value de-counting

POST /bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "cardinality": {
        "field": "age"
      }
    },
    "state_count": {
      "cardinality": {
        "field": "state.keyword"
      }
    }
  }
}
//state的使用它的keyword版

stats count count max min avg sum 5 values

POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {
      "stats": {
        "field": "age"
      }
    }
  }
}

Extended stats

Advanced statistics, 4 more statistical results than stats: sum of squares, variance, standard deviation, mean plus/minus two standard deviation interval

POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {
      "extended_stats": {
        "field": "age"
      }
    }
  }
}

Percentiles Percentile corresponding value statistics

For the value of the specified field (script), accumulate the proportion of the number of documents corresponding to each value from small to large (the percentage of all hit documents), and return the value corresponding to the specified proportion. By default, it returns the value in the quantile [1, 5, 25, 50, 75, 95, 99]. The following intermediate results can be understood as: the age value of documents that account for 50% is <= 31, or vice versa: the number of documents with age <= 31 accounts for 50% of the total number of hit documents

POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age"
      }
    }
  }
}
 "aggregations": {
    "age_percents": {
      "values": {
        "1.0": 20,
        "5.0": 21,
        "25.0": 25,
        "50.0": 31,
        "75.0": 35,
        "95.0": 39,
        "99.0": 40
      }
    }
  }
POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age",
        "percents" : [95, 99, 99.9] 
      }
    }
  }
}
//指定分位值

Percentiles rank The proportion of documents whose statistical value is less than or equal to the specified value

POST /bank/_search?size=0
{
  "aggs": {
    "gge_perc_rank": {
      "percentile_ranks": {
        "field": "age",
        "values": [
          25,
          30
        ]
      }
    }
  }
}
"aggregations": {
    "gge_perc_rank": {
      "values": {
        "25.0": 26.1,
        "30.0": 49.3
      }
    }
  }

Geo Bounds aggregation finds the range of coordinate points in the document set

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html

Geo Centroid aggregation to find the coordinate value of the center point

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html

Bucket aggregation

Terms Aggregation grouped and aggregated based on field value items

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age"
      }
    }
  }
}
 "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 463,
      "buckets": [
        {                     //文档计数的最大偏差值
          "key": 31,
          "doc_count": 61
        },                    //未返回的其他项的文档数
        {
          "key": 39,
          "doc_count": 60     //默认情况下返回按文档计数从高到低的前10个分组
        },
        {
          "key": 26,
          "doc_count": 59
        },
        ….
       ]
    }
  }
  • size specifies how many groups to return
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 20
      }
    }  }}
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 5,
        "shard_size":20
      }
    } }}
//shard_size 指定每个分片上返回多少个分组
//shard_size 的默认值为: 索引只有一个分片:= size多分片:=  size * 1.5 + 10
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 5,
        "shard_size":20,
        "show_term_doc_count_error": true
      }    }  }}
//每个分组上显示偏差值
  • order specifies the order of the group
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order" : { "_count" : "asc" }
      }
    }
  }
}
//根据文档计数排序
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order" : { "_key" : "asc" }
      }
    }
  }
}
//根据分组值排序
  • Take the group index value
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order": {
          "max_balance": "asc"
        }
      },
      "aggs": {
        "max_balance": {
          "max": {
            "field": "balance"
          }
        },
        "min_balance": {
          "min": {
            "field": "balance"
          }
        }      }    }  }}
  • Sort by group index value
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order": {
          "max_balance": "asc"
        }
      },
      "aggs": {
        "max_balance": {
          "max": {
            "field": "balance"
          }
        }
      }
    }  }}
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order": {
          "stats_balance.max": "asc"
        }
      },
      "aggs": {
        "stats_balance": {
          "stats": {
            "field": "balance"
          }
        }
      }
    }  }}
  • Filter group
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "min_doc_count": 60
      }
    }
  }
}
//用文档计数来筛选
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "include": [20,24]
      }
    }
  }
}
//筛选指定的值列表
GET /_search
{
    "aggs" : {
        "tags" : {
            "terms" : {
                "field" : "tags",
                "include" : ".*sport.*",
                "exclude" : "water_.*"
            }
        }
    }
}
//正则表达式匹配值
GET /_search
{
    "aggs" : {
        "JapaneseCars" : {
             "terms" : {
                 "field" : "make",
                 "include" : ["mazda", "honda"]
             }
         },
        "ActiveCarManufacturers" : {
             "terms" : {
                 "field" : "make",
                 "exclude" : ["rover", "jensen"]
             }
         }
    }
}
//指定值列表
  • Group by script calculated value
GET /_search
{
    "aggs" : {
        "genres" : {
            "terms" : {
                "script" : {
                    "source": "doc['genre'].value",
                    "lang": "painless"
                }
            }
        }
    }
}
  • Missing value processing
GET /_search
{
    "aggs" : {
        "tags" : {
             "terms" : {
                 "field" : "tags",
                 "missing": "N/A" 
             }
         }
    }
}

filter Aggregation performs aggregation calculation on documents that meet the filter query

Select documents with compound filter criteria from the documents hit by the query to aggregate

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "filter": {"match":{"gender":"F"}},
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

Filters Aggregation multiple filter group aggregation calculation

PUT /logs/_doc/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "body" : "warning: page could not be rendered" }
{ "index" : { "_id" : 2 } }
{ "body" : "authentication error" }
{ "index" : { "_id" : 3 } }
{ "body" : "warning: connection timed out" }

GET logs/_search
{
  "size": 0,
  "aggs" : {
    "messages" : {
      "filters" : {
        "filters" : {
          "errors" :   { "match" : { "body" : "error"   }},
          "warnings" : { "match" : { "body" : "warning" }}
        }
      }    }  }}
GET logs/_search
{
  "size": 0,
  "aggs" : {
    "messages" : {
      "filters" : {
        "other_bucket_key": "other_messages",
        "filters" : {
          "errors" :   { "match" : { "body" : "error"   }},
          "warnings" : { "match" : { "body" : "warning" }}
        }
      }
    }
  }
}
//为其他值组指定key

Range Aggregation

POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {"to":25},
          {"from": 25,"to": 35},
          {"from": 35}
        ]
      },
      "aggs": {
        "bmax": {
          "max": {
            "field": "balance"
          }
        }
      }    }  }}
POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": {
        "field": "age",
        "keyed": true, 
        "ranges": [
          {"to":25,"key": "Ld"},
          {"from": 25,"to": 35,"key": "Md"},
          {"from": 35,"key": "Od"}
        ]
      }
    }
  }
}
//为组指定key

Date Range Aggregation time range grouping aggregation

POST /sales/_search?size=0
{
    "aggs": {
        "range": {
            "date_range": {
                "field": "date",
                "format": "MM-yyy",
                "ranges": [
                    { "to": "now-10M/M" }, 
                    { "from": "now-10M/M" } 
                ]
            }
        }
    }
}

Date Histogram Aggregation Time histogram (bar) aggregation

It is to aggregate statistics by day, month, year, etc. It can be aggregated at intervals of year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) or specified time interval.

POST /sales/_search?size=0
{
    "aggs" : {
        "sales_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            }
        }
    }
}
POST /sales/_search?size=0
{
    "aggs" : {
        "sales_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "90m"
            }
        }
    }
}

Missing Aggregation bucket aggregation of missing values

Documents with missing specified field values ​​are used as a bucket for aggregation analysis

POST /bank/_search?size=0
{
    "aggs" : {
        "account_without_a_age" : {
            "missing" : { "field" : "age" }
        }
    }
}

Geo Distance Aggregation

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html

Guess you like

Origin blog.csdn.net/qq_34050399/article/details/113245442