problem finding
The tester encountered a problem. The actual number of ES in the query is only 4, and the result of the deduplication calculation is 5. The actual query is as follows:
It can be seen in the figure that doc_count: 4, dc_hostname: 5 This is a bit strange, the doc_count is only 4, how can the count be 5 after deduplication
Cause Analysis and Solutions
After checking the ES official documents, the problem appears in ES 2.4. ES calculates cardinality in memory. If the memory is not enough, the result may be inaccurate. The memory control is based on the parameter precision_threshold. If count is greater than this value, return The result is inaccurate, about 5% error, if count is less than this value, the returned result is based on being close to accurate
-
ES 2.4 Parameter description The default value of precision_threshold is the number of aggregation layers (aggregations) * the number of buckets (buckets), the maximum value is 40000
-
Parameter adjustment Adjust the precision_threshold parameter to 3000
"aggregations": {
"dc_hostname": {
"cardinality": {
"field": "hostname",
"precision_threshold": 3000
}
}
}
- search result
"buckets": [
{
"key": "AM项目监控",
"doc_count": 4,
"dc_hostname": {
"value": 4
}
}
]