ElasticSearch学习笔记之二十三 桶聚合

桶聚合

桶聚合不同于指标聚合,它不对文档字段进行计算,而是对我们的文档进行分组, 每一个分组都关联一个标准 (依赖于聚合类型),这个 标准决定文档是否会划分到分组. 换句话说,桶就是一个文档的集合,除了桶本身,桶计算还计算并返回划分到每个桶的文档数量。

与指标聚合不用,桶聚合支持子聚合, 这些子聚合可以聚合由它们的父聚合创建的分组。

桶聚合有很多种, 每一个都有不同的 “bucketing” 策略. 有的策略定义一个分组(单分组聚合),有的策略定义固定数量的多个分组(多分组聚合),还有的策略在聚合执行的过程中动态的分组。

注意:
一次响应返回的分组的最大数被elasticsearch集群设置的search.max_buckets属性限制。一般来说它被设置为-1不作限制。但是当结果超过10,000(版本支持的默认最大值)分组的时候会得到一个弃用警告。

Children Aggregation(子聚合)

Children Aggregation 是下面 join字段定义的选择有特定type字段的子文档的的单分组聚合。

这类聚合有一个参数:

  • type - 应该被选择的子文档的类型

举例来说, 我们有一个有questions 和 answers的索引. 有join字段的answer类型文档映射:

PUT child_example
{
  "mappings": {
    "_doc": {
      "properties": {
        "join": {
          "type": "join",
          "relations": {
            "question": "answer"
          }
        }
      }
    }
  }
}

问题文档包含一个 tag 字段,答案文档包含一个owner 字段。 children aggregation可以把问题文档的 tag 字段分组映射到答案文档的owner 分组。

问题文档

PUT child_example/_doc/1
{
  "join": {
    "name": "question"
  },
  "body": "<p>I have Windows 2003 server and i bought a new Windows 2008 server...",
  "title": "Whats the best way to file transfer my site from server to a newer one?",
  "tags": [
    "windows-server-2003",
    "windows-server-2008",
    "file-transfer"
  ]
}

答案如下:

PUT child_example/_doc/2?routing=1
{
  "join": {
    "name": "answer",
    "parent": "1"
  },
  "owner": {
    "location": "Norfolk, United Kingdom",
    "display_name": "Sam",
    "id": 48
  },
  "body": "<p>Unfortunately you're pretty much limited to FTP...",
  "creation_date": "2009-05-04T13:45:37.030"
}
PUT child_example/_doc/3?routing=1&refresh
{
  "join": {
    "name": "answer",
    "parent": "1"
  },
  "owner": {
    "location": "Norfolk, United Kingdom",
    "display_name": "Troll",
    "id": 49
  },
  "body": "<p>Use Linux...",
  "creation_date": "2009-05-05T13:45:37.030"
}

下面的请求可以把2者联合在一起:

POST child_example/_search?size=0
{
  "aggs": {
    "top-tags": {
      "terms": {
        "field": "tags.keyword",
        "size": 10
      },
      "aggs": {
        "to-answers": {
          "children": {
            "type" : "answer" 
          },
          "aggs": {
            "top-names": {
              "terms": {
                "field": "owner.display_name.keyword",
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

type 指向名为answer 的类型/ 映射 .

上面的案例返回置顶的问题标签和每个标签下置顶答案的所有者。

返回如下:

{
  "took": 25,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": []
  },
  "aggregations": {
    "top-tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "file-transfer",
          "doc_count": 1, 
          "to-answers": {
            "doc_count": 2, 
            "top-names": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Sam",
                  "doc_count": 1
                },
                {
                  "key": "Troll",
                  "doc_count": 1
                }
              ]
            }
          }
        },
        {
          "key": "windows-server-2003",
          "doc_count": 1, 
          "to-answers": {
            "doc_count": 2, 
            "top-names": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Sam",
                  "doc_count": 1
                },
                {
                  "key": "Troll",
                  "doc_count": 1
                }
              ]
            }
          }
        },
        {
          "key": "windows-server-2008",
          "doc_count": 1, 
          "to-answers": {
            "doc_count": 2, 
            "top-names": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Sam",
                  "doc_count": 1
                },
                {
                  "key": "Troll",
                  "doc_count": 1
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Range Aggregation(范围聚合)

Range Aggregation是一个可以用户自定义一系列范围,每个范围代表一个分组的多值分组聚合. 在聚合的过程中,从每个文档提取出值然后检查每个分组的范围并且正确的分组。 注意,聚合的每个范围会包含from但是排除to
例如:

GET /_search
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "ranges" : [
                    { "to" : 100.0 },
                    { "from" : 100.0, "to" : 200.0 },
                    { "from" : 200.0 }
                ]
            }
        }
    }
}

结果如下:

{
    ...
    "aggregations": {
        "price_ranges" : {
            "buckets": [
                {
                    "key": "*-100.0",
                    "to": 100.0,
                    "doc_count": 2
                },
                {
                    "key": "100.0-200.0",
                    "from": 100.0,
                    "to": 200.0,
                    "doc_count": 2
                },
                {
                    "key": "200.0-*",
                    "from": 200.0,
                    "doc_count": 3
                }
            ]
        }
    }
}

Keyed Response

设置 keyedtrue 会将每个分组和一个独一无二的key关联并将返回作为hash返回而不是array:

GET /_search
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "keyed" : true,
                "ranges" : [
                    { "to" : 100 },
                    { "from" : 100, "to" : 200 },
                    { "from" : 200 }
                ]
            }
        }
    }
}

结果如下:

{
    ...
    "aggregations": {
        "price_ranges" : {
            "buckets": {
                "*-100.0": {
                    "to": 100.0,
                    "doc_count": 2
                },
                "100.0-200.0": {
                    "from": 100.0,
                    "to": 200.0,
                    "doc_count": 2
                },
                "200.0-*": {
                    "from": 200.0,
                    "doc_count": 3
                }
            }
        }
    }
}

也支持为每个范围范围自定义key:

GET /_search
{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "keyed" : true,
                "ranges" : [
                    { "key" : "cheap", "to" : 100 },
                    { "key" : "average", "from" : 100, "to" : 200 },
                    { "key" : "expensive", "from" : 200 }
                ]
            }
        }
    }
}

结果如下:

{
    ...
    "aggregations": {
        "price_ranges" : {
            "buckets": {
                "cheap": {
                    "to": 100.0,
                    "doc_count": 2
                },
                "average": {
                    "from": 100.0,
                    "to": 200.0,
                    "doc_count": 2
                },
                "expensive": {
                    "from": 200.0,
                    "doc_count": 3
                }
            }
        }
    }
}

猜你喜欢

转载自blog.csdn.net/weixin_43430036/article/details/83311483