Elasticsearch 2.4 cardinality(distinct count) aggregation returns inaccurate results

problem finding

The tester encountered a problem. The actual number of ES in the query is only 4, and the result of the deduplication calculation is 5. The actual query is as follows:Enter image description

It can be seen in the figure that doc_count: 4, dc_hostname: 5 This is a bit strange, the doc_count is only 4, how can the count be 5 after deduplication

Cause Analysis and Solutions

After checking the ES official documents, the problem appears in ES 2.4. ES calculates cardinality in memory. If the memory is not enough, the result may be inaccurate. The memory control is based on the parameter precision_threshold. If count is greater than this value, return The result is inaccurate, about 5% error, if count is less than this value, the returned result is based on being close to accurate

  • ES 2.4 Parameter description The default value of precision_threshold is the number of aggregation layers (aggregations) * the number of buckets (buckets), the maximum value is 40000

  • Parameter adjustment Adjust the precision_threshold parameter to 3000

"aggregations": {
	"dc_hostname": {
		"cardinality": {
			"field": "hostname",
			"precision_threshold": 3000
		}
	}
}
  • search result
"buckets": [
  {
    "key": "AM项目监控",
    "doc_count": 4,
    "dc_hostname": {
      "value": 4
    }
  }
]

cardinality-aggrestion official documentation

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325287710&siteId=291194637