Composite aggregation
[kəmˈpɑːzət],组合聚合。属于分桶聚合。
基于不同的源(source)来创建组合聚合(composite aggregation)桶。可以对多级的聚合的结果进行分页。该聚合方式提供了一种方式来流化某种聚合的所有桶,类似于文档的滚动(scroll)。
组合聚合目前不兼容 pipeline aggregation。
组合聚合基于文档的值来创建一个组合,每个组合可以看作是一个组合桶。
比如,文档的内容如下:
{
"keyword": ["foo", "bar"],
"number": [23, 65, 76]
}
通过使用组合聚合的方式,会产生如下几种组合桶。
{
"keyword": "foo", "number": 23 }
{
"keyword": "foo", "number": 65 }
{
"keyword": "foo", "number": 76 }
{
"keyword": "bar", "number": 23 }
{
"keyword": "bar", "number": 65 }
{
"keyword": "bar", "number": 76 }
-
sources:定义聚合源的列表。每个聚合源的名称需要唯一。
-
missing_bucket :默认 false,即如果某个聚合源的结果为空,则整体的组合聚合的结果会输出 []。如果设置 true,只有结果为空的聚合源输出 null,其它聚合源正常输出。
-
size:限制组合聚合的结果输出多少条数据。默认 10。
-
after:设置当前页的起点,即上一页的最后一条数据。
聚合源
terms、histogram、date_histogram、geotile_grid 四种聚合可以作为聚合源。
terms聚合作为聚合源
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_FlightTimeMin": {
"terms": {
"field": "FlightTimeMin"
}
}
}
}
这种方式等价于直接使用 terms 聚合。
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"runtime_mappings": {
"FlightTimeMinChanged": {
"type": "double",
"script": {
"source": """
emit(doc['FlightTimeMin'].value / 10)
"""
}
}
},
"aggs": {
"composite_FlightTimeMinChanged": {
"composite": {
"sources": [
{
"terms_FlightTimeMinChanged": {
"terms": {
"field": "FlightTimeMinChanged"
}
}
}
]
}
}
}
}
支持运行时字段来创建组合桶。
histogram聚合作为聚合源
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_FlightTimeMin": {
"composite": {
"sources": [
{
"histogram_FlightTimeMin": {
"histogram": {
"field": "FlightTimeMin",
"interval": 10
}
}
}
]
}
}
}
}
date_histogram聚合作为聚合源
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_timestamp": {
"composite": {
"sources": [
{
"date_histogram_timestamp": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1d",
"format": "yyyy-MM-dd"
}
}
}
]
}
}
}
}
多种聚合源组合在一起
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_timestamp_FlightTimeMin": {
"composite": {
"sources": [
{
"date_histogram_timestamp": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1d",
"format": "yyyy-MM-dd"
}
}
},
{
"terms_FlightTimeMin": {
"terms": {
"field": "FlightTimeMin"
}
}
}
]
}
}
}
}
不同聚合源分别指定排序规则
先按照第一个聚合源进行排序,然后第二个。。以此类推。
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_timestamp_FlightTimeMin": {
"composite": {
"sources": [
{
"date_histogram_timestamp": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1d",
"format": "yyyy-MM-dd",
"order": "desc"
}
}
},
{
"terms_FlightTimeMin": {
"terms": {
"field": "FlightTimeMin",
"order": "asc"
}
}
}
]
}
}
}
}
组合聚合与子聚合之间的对比
首先使用组合聚合的方式,按照 OriginCountry、DestCountry 两个字段进行词项聚合。
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_OriginCountry_DestCountry": {
"composite": {
"sources": [
{
"terms_OriginCountry": {
"terms": {
"field": "OriginCountry"
}
}
},
{
"terms_DestCountry": {
"terms": {
"field": "DestCountry"
}
}
}
]
}
}
}
}
聚合结果如下:
"aggregations" : {
"composite_OriginCountry_DestCountry" : {
"after_key" : {
"terms_OriginCountry" : "AE",
"terms_DestCountry" : "CA"
},
"buckets" : [
{
"key" : {
"terms_OriginCountry" : "AE",
"terms_DestCountry" : "AE"
},
"doc_count" : 9
},
{
"key" : {
"terms_OriginCountry" : "AE",
"terms_DestCountry" : "AR"
},
"doc_count" : 10
},
。。。。。。
作为对比,我们再使用 terms 子聚合的方式。
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"terms_OriginCountry": {
"terms": {
"field": "OriginCountry"
},
"aggs": {
"terms_DestCountry": {
"terms": {
"field": "DestCountry"
}
}
}
}
}
}
聚合结果如下:
"aggregations" : {
"terms_OriginCountry" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 4114,
"buckets" : [
{
"key" : "IT",
"doc_count" : 2278,
"terms_DestCountry" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 513,
"buckets" : [
{
"key" : "IT",
"doc_count" : 459
},
{
"key" : "US",
"doc_count" : 328
},
{
"key" : "CN",
"doc_count" : 195
},
{
"key" : "CA",
"doc_count" : 192
},
missing_bucket参数
在第二个聚合源中,我们指定一个不存在的字段 FlightTimeMin2。通过修改 missing_bucket 参数的值,对比它的作用。
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_timestamp_FlightTimeMin": {
"composite": {
"sources": [
{
"date_histogram_timestamp": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1d",
"format": "yyyy-MM-dd",
"order": "desc"
}
}
},
{
"terms_FlightTimeMin": {
"terms": {
"field": "FlightTimeMin2",
"order": "asc",
"missing_bucket": false
}
}
}
]
}
}
}
}
after参数
从上一页的 after_key 中,可以得到最后一条数据的内容。
"after_key" : {
"date_histogram_timestamp" : "2022-08-28",
"terms_FlightTimeMin" : 32.9625244140625
}
接下来将 after 参数的内容修改为上述 after_key 的内容,也就是基于上一页来展示下一页的数据内容。
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_timestamp_FlightTimeMin": {
"composite": {
"size": 5,
"after": {
"date_histogram_timestamp" : "2022-08-28",
"terms_FlightTimeMin" : 13.010112762451172
},
"sources": [
{
"date_histogram_timestamp": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1d",
"format": "yyyy-MM-dd"
}
}
},
{
"terms_FlightTimeMin": {
"terms": {
"field": "FlightTimeMin",
"missing_bucket": true
}
}
}
]
}
}
}
}
支持嵌入子聚合
GET kibana_sample_data_flights/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_timestamp_FlightTimeMin": {
"composite": {
"size": 2,
"after": {
"date_histogram_timestamp" : "2022-08-28",
"terms_FlightTimeMin" : 13.010112762451172
},
"sources": [
{
"date_histogram_timestamp": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "1d",
"format": "yyyy-MM-dd"
}
}
},
{
"terms_FlightTimeMin": {
"terms": {
"field": "FlightTimeMin",
"missing_bucket": true
}
}
}
]
},
"aggs": {
"stats_FlightTimeMin": {
"stats": {
"field": "FlightTimeMin"
}
}
}
}
}
}
聚合结果输出如下:
"aggregations" : {
"composite_timestamp_FlightTimeMin" : {
"after_key" : {
"date_histogram_timestamp" : "2022-08-28",
"terms_FlightTimeMin" : 17.2014217376709
},
"buckets" : [
{
"key" : {
"date_histogram_timestamp" : "2022-08-28",
"terms_FlightTimeMin" : 16.21676254272461
},
"doc_count" : 1,
"stats_FlightTimeMin" : {
"count" : 1,
"min" : 16.21676254272461,
"max" : 16.21676254272461,
"avg" : 16.21676254272461,
"sum" : 16.21676254272461
}
},
{
"key" : {
"date_histogram_timestamp" : "2022-08-28",
"terms_FlightTimeMin" : 17.2014217376709
},
"doc_count" : 1,
"stats_FlightTimeMin" : {
"count" : 1,
"min" : 17.2014217376709,
"max" : 17.2014217376709,
"avg" : 17.2014217376709,
"sum" : 17.2014217376709
}
}
]
}
}