Elasticsearch学习笔记之(六)聚合分析

目录

聚合分析简介

ES聚合分析是什么?

ES聚合分析查询的写法

聚合分析的值来源

指标聚合

max  min  sum  avg

文档计数 count

Value count 统计某字段有值的文档数

cardinality  值去重计数

stats 统计 count max min avg sum 5个值

Extended stats

Percentiles 占比百分位对应的值统计

Percentiles rank 统计值小于等于指定值的文档占比

Geo Bounds aggregation 求文档集中的坐标点的范围

Geo Centroid aggregation  求中心点坐标值

桶聚合

Terms Aggregation  根据字段值项分组聚合

filter Aggregation  对满足过滤查询的文档进行聚合计算

Filters Aggregation  多个过滤组聚合计算

Range Aggregation  范围分组聚合

Date Range Aggregation  时间范围分组聚合

Date Histogram Aggregation  时间直方图(柱状)聚合

Missing Aggregation  缺失值的桶聚合

Geo Distance Aggregation  地理距离分区聚合


聚合分析简介

ES聚合分析是什么?

聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。

  • 对一个数据集求最大、最小、和、平均值等指标的聚合,在ES中称为指标聚合   metric
  • 而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组group by,再在组上进行指标聚合。在 ES 中group by 称为分桶,桶聚合  bucketing

ES中还提供了矩阵聚合(matrix)、管道聚合(pipleline),但还在完善中。

ES聚合分析查询的写法

在查询请求体中以aggregations节点按如下语法定义聚合分析:

"aggregations" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?
    }
    [,"<aggregation_name_2>" : { ... } ]*
}
//aggregations 也可简写为 aggs

聚合分析的值来源

聚合计算的值可以取字段的值,也可是脚本计算的结果。

指标聚合

max  min  sum  avg

POST /bank/_search?
{
  "size": 0, 
  "aggs": {
    "masssbalance": {
      "max": {
        "field": "balance"
      }
    }
  }
}
//查询所有客户中余额的最大值
POST /bank/_search?
{
  "size": 2, 
  "query": {
    "match": {
      "age": 24
    }
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "max_balance": {
      "max": {
        "field": "balance"
      }
    }
  }
}
//年龄为24岁的客户中的余额最大值
POST /bank/_search?size=0
{
    "aggs" : {
        "avg_age" : {
            "avg" : {
                "script" : {
                    "source" : "doc.age.value"
                }
            }
        },
        "avg_age10" : {
            "avg" : {
                "script" : {
                    "source" : "doc.age.value + 10"
                }
            }
        }
    }}
//值来源于脚本
//查询所有客户的平均年龄是多少
POST /bank/_search?size=0
{
  "aggs": {
    "sum_balance": {
      "sum": {
        "field": "balance",
        "script": {
            "source": "_value * 1.03"
        }
      }
    }
  }
}
//指定field,在脚本中用_value 取字段的值
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age",
        "missing": 18
      }
    }  }}
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age",
        "missing": 18
      }
    }
  }
}
//为缺失值字段,指定值。如未指定,缺失该字段值的文档将被忽略。

文档计数 count

POST /bank/_doc/_count
{
  "query": {
    "match": {
      "age" : 24
    }
  }
}

Value count 统计某字段有值的文档数

POST /bank/_search?size=0
{
    "aggs" : {
        "age_count" : { "value_count" : { "field" : "age" } }
    }
}

cardinality  值去重计数

POST /bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "cardinality": {
        "field": "age"
      }
    },
    "state_count": {
      "cardinality": {
        "field": "state.keyword"
      }
    }
  }
}
//state的使用它的keyword版

stats 统计 count max min avg sum 5个值

POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {
      "stats": {
        "field": "age"
      }
    }
  }
}

Extended stats

高级统计,比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间

POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {
      "extended_stats": {
        "field": "age"
      }
    }
  }
}

Percentiles 占比百分位对应的值统计

对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比(占所有命中文档数的百分比),返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。如下中间的结果,可以理解为:占比为50%的文档的age值 <= 31,或反过来:age<=31的文档数占总命中文档数的50%

POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age"
      }
    }
  }
}
 "aggregations": {
    "age_percents": {
      "values": {
        "1.0": 20,
        "5.0": 21,
        "25.0": 25,
        "50.0": 31,
        "75.0": 35,
        "95.0": 39,
        "99.0": 40
      }
    }
  }
POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age",
        "percents" : [95, 99, 99.9] 
      }
    }
  }
}
//指定分位值

Percentiles rank 统计值小于等于指定值的文档占比

POST /bank/_search?size=0
{
  "aggs": {
    "gge_perc_rank": {
      "percentile_ranks": {
        "field": "age",
        "values": [
          25,
          30
        ]
      }
    }
  }
}
"aggregations": {
    "gge_perc_rank": {
      "values": {
        "25.0": 26.1,
        "30.0": 49.3
      }
    }
  }

Geo Bounds aggregation 求文档集中的坐标点的范围

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html

Geo Centroid aggregation  求中心点坐标值

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html

桶聚合

Terms Aggregation  根据字段值项分组聚合

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age"
      }
    }
  }
}
 "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 463,
      "buckets": [
        {                     //文档计数的最大偏差值
          "key": 31,
          "doc_count": 61
        },                    //未返回的其他项的文档数
        {
          "key": 39,
          "doc_count": 60     //默认情况下返回按文档计数从高到低的前10个分组
        },
        {
          "key": 26,
          "doc_count": 59
        },
        ….
       ]
    }
  }
  • size 指定返回多少个分组
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 20
      }
    }  }}
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 5,
        "shard_size":20
      }
    } }}
//shard_size 指定每个分片上返回多少个分组
//shard_size 的默认值为: 索引只有一个分片:= size多分片:=  size * 1.5 + 10
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 5,
        "shard_size":20,
        "show_term_doc_count_error": true
      }    }  }}
//每个分组上显示偏差值
  • order  指定分组的排序
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order" : { "_count" : "asc" }
      }
    }
  }
}
//根据文档计数排序
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order" : { "_key" : "asc" }
      }
    }
  }
}
//根据分组值排序
  • 取分组指标值
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order": {
          "max_balance": "asc"
        }
      },
      "aggs": {
        "max_balance": {
          "max": {
            "field": "balance"
          }
        },
        "min_balance": {
          "min": {
            "field": "balance"
          }
        }      }    }  }}
  • 根据分组指标值排序
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order": {
          "max_balance": "asc"
        }
      },
      "aggs": {
        "max_balance": {
          "max": {
            "field": "balance"
          }
        }
      }
    }  }}
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order": {
          "stats_balance.max": "asc"
        }
      },
      "aggs": {
        "stats_balance": {
          "stats": {
            "field": "balance"
          }
        }
      }
    }  }}
  • 筛选分组
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "min_doc_count": 60
      }
    }
  }
}
//用文档计数来筛选
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "include": [20,24]
      }
    }
  }
}
//筛选指定的值列表
GET /_search
{
    "aggs" : {
        "tags" : {
            "terms" : {
                "field" : "tags",
                "include" : ".*sport.*",
                "exclude" : "water_.*"
            }
        }
    }
}
//正则表达式匹配值
GET /_search
{
    "aggs" : {
        "JapaneseCars" : {
             "terms" : {
                 "field" : "make",
                 "include" : ["mazda", "honda"]
             }
         },
        "ActiveCarManufacturers" : {
             "terms" : {
                 "field" : "make",
                 "exclude" : ["rover", "jensen"]
             }
         }
    }
}
//指定值列表
  • 根据脚本计算值分组
GET /_search
{
    "aggs" : {
        "genres" : {
            "terms" : {
                "script" : {
                    "source": "doc['genre'].value",
                    "lang": "painless"
                }
            }
        }
    }
}
  • 缺失值处理
GET /_search
{
    "aggs" : {
        "tags" : {
             "terms" : {
                 "field" : "tags",
                 "missing": "N/A" 
             }
         }
    }
}

filter Aggregation  对满足过滤查询的文档进行聚合计算

在查询命中的文档中选取复合过滤条件的文档进行聚合

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "filter": {"match":{"gender":"F"}},
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

Filters Aggregation  多个过滤组聚合计算

PUT /logs/_doc/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "body" : "warning: page could not be rendered" }
{ "index" : { "_id" : 2 } }
{ "body" : "authentication error" }
{ "index" : { "_id" : 3 } }
{ "body" : "warning: connection timed out" }

GET logs/_search
{
  "size": 0,
  "aggs" : {
    "messages" : {
      "filters" : {
        "filters" : {
          "errors" :   { "match" : { "body" : "error"   }},
          "warnings" : { "match" : { "body" : "warning" }}
        }
      }    }  }}
GET logs/_search
{
  "size": 0,
  "aggs" : {
    "messages" : {
      "filters" : {
        "other_bucket_key": "other_messages",
        "filters" : {
          "errors" :   { "match" : { "body" : "error"   }},
          "warnings" : { "match" : { "body" : "warning" }}
        }
      }
    }
  }
}
//为其他值组指定key

Range Aggregation  范围分组聚合

POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {"to":25},
          {"from": 25,"to": 35},
          {"from": 35}
        ]
      },
      "aggs": {
        "bmax": {
          "max": {
            "field": "balance"
          }
        }
      }    }  }}
POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": {
        "field": "age",
        "keyed": true, 
        "ranges": [
          {"to":25,"key": "Ld"},
          {"from": 25,"to": 35,"key": "Md"},
          {"from": 35,"key": "Od"}
        ]
      }
    }
  }
}
//为组指定key

Date Range Aggregation  时间范围分组聚合

POST /sales/_search?size=0
{
    "aggs": {
        "range": {
            "date_range": {
                "field": "date",
                "format": "MM-yyy",
                "ranges": [
                    { "to": "now-10M/M" }, 
                    { "from": "now-10M/M" } 
                ]
            }
        }
    }
}

Date Histogram Aggregation  时间直方图(柱状)聚合

就是按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合。

POST /sales/_search?size=0
{
    "aggs" : {
        "sales_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            }
        }
    }
}
POST /sales/_search?size=0
{
    "aggs" : {
        "sales_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "90m"
            }
        }
    }
}

Missing Aggregation  缺失值的桶聚合

缺失指定字段值的文档作为一个桶进行聚合分析

POST /bank/_search?size=0
{
    "aggs" : {
        "account_without_a_age" : {
            "missing" : { "field" : "age" }
        }
    }
}

Geo Distance Aggregation  地理距离分区聚合

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html

猜你喜欢

转载自blog.csdn.net/qq_34050399/article/details/113245442