elastic search中实现聚合去重查询统计

在elastic search中使用json实现查询功能,与sql语言不同,理解起来要复杂一些,尤其是一些嵌套关系时。对于一些特殊查询要求,记录下来供方便使用。 

一、去重查询并显示字段

假定字段为userUUID,userName和score,索引里面有大量重复的userUUID,每条记录都有一个分数(score),现在需要统计最高的几个分数,但需要不同的用户,同时把用户信息显示出来。直接上json代码。

{
  "size": 0,
  "aggs": {
    "unique_users": {
      "terms": {
        "field": "userUUID.keyword",
        "size": 10,
        "order": {
          "max_score": "desc"
        }
      },
      "aggs": {
        "max_score": {
          "max": {
            "field": "score"
          }
        },

        "top_documents": {
          "top_hits": {
            "_source": {
              "includes": ["userUUID", "userName"]
            },
            "size": 1
 
          }
        }
      }
    }
  }
}

可能的输出结果:

{
	"took": 3,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 781,
			"relation": "eq"
		},
		"max_score": null,
		"hits": []
	},
	"aggregations": {
		"unique_users": {
			"doc_count_error_upper_bound": -1,
			"sum_other_doc_count": 584,
			"buckets": [
				{
					"key": "xxxxxxxxxxx",
					"doc_count": 176,
					"max_score": {
						"value": "137731.0"
					},
					"top_documents": {
						"hits": {
							"total": {
								"value": 176,
								"relation": "eq"
							},
							"max_score": "1.0",
							"hits": [
								{
									"_index": "xxx",
									"_id": "xxxxxx",
									"_score": "1.0",
									"_source": {
										"userUUID": "xxxxxxxxxxx"
									}
								}
							]
						}
					}
				},
				{
					"key": "xxxxxxxxxxxx",
					"doc_count": 21,
					"max_score": {
						"value": "129747.0"
					},
					"top_documents": {
						"hits": {
							"total": {
								"value": 21,
								"relation": "eq"
							},
							"max_score": "1.0",
							"hits": [
								{
									"_index": "xxx",
									"_id": "xxxxxx",
									"_score": "1.0",
									"_source": {
										"userUUID": "xxxxxxxxxxxxxxxx",
										"userName": "玩家7160"
									}
								}
							]
						}
					}
				}
			]
		}
	}
}

二、带条件的聚合统计最高分数

其他一样,只是增加了查询条件。

{
	"size": 0,
	"aggs": {
		"test": {
			"filter": {
				"range": {
					"score": {
						"gt": 12000
					}
				}
			},
			"aggs": {
				"userUUID": {
					"terms": {
						"field": "userUUID.keyword",
						"size": 10,
						"order": {
							"max_score": "desc"
						}
					},
					"aggs": {
						"max_score": {
							"max": {
								"field": "score"
							}
						},
						"top_ddocuments": {
						  
							"top_hits": {
							  "size": 1,
								"_source": {
									"includes": [
										"userUUID",
                                        "userName",
										"score"
									]
								}
								
							}
						}
					}
				}
			}
		}
	}
}

三、按照id去重后,统计一定条件的总数

因为不需要返回具体数据,所以不考虑top_hits段,具体如下:

{
	"query": {
		"bool": {
			"must": [
				{
					"range": {
						"score": {
							"gt": "132000"
						}
					}
				}
			]
		}
	},
	"from": 0,
	"size": 0,
	"aggregations": {
		"total_unique_count": {
			"cardinality": {
				"field": "userUUID.keyword"
			}
		}
	}
}

可能反馈结果如下:

{
	"took": 3,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 495,
			"relation": "eq"
		},
		"max_score": null,
		"hits": []
	},
	"aggregations": {
		"total_unique_count": {
			"value": 56
		}
	}
}

下面是我的游戏示例,大家可以试试:

猜你喜欢

转载自blog.csdn.net/a17432025/article/details/130434973