Elasticsearch aggregate results paging query

1. Does Elasticsearch support paging after aggregation and why?

No, see how Elasticsearch staff interpreted it.
1) Performance perspective -aggregated paging will cause performance problems in a large number of records.
2) Correctness perspective -the aggregated document count is not accurate.
So strange things may happen, such as the first item of the second page has a higher count than the last element of the first page.

2. What should Elasticsearch do to implement post-aggregation paging?

Solution: It is necessary to display the total number of data that meet the conditions, that is, it needs to be aggregated in full and sorted according to a certain rule.
Note that if the amount of data you want to aggregate is large (one hundred thousand, one million or even ten million), this will inevitably be slow and may time out.

Step 1 : Full aggregation, size is set to the maximum value: 2147483647 .
ES5.X/6.X version is set to 2147483647, which is equal to 2^31-1, which
is the largest symbolic integer constant in 32-bit operating systems; ES1.X 2.X version is set to 0.

E.g:

  "aggregations": {
    
    
    "statistics_assets": {
    
    
      "terms": {
    
    
        "field": "one_account.one_account_no",
        "size": 2147483647,
        "order": {
    
    
          "assets": "asc"
        }

      },
      "aggregations": {
    
    
        "assets": {
    
    
          "sum": {
    
    
            "field": "assets.merge"
          }
        }
      }
    }
  }

The above size is set to the maximum value of 2147483647, because the result of the aggregation is only a few hundred or a few thousand (if there are many aggregation results that cannot be returned at one time, consider using bucket_sort, which will be discussed below). After our aggregation results can be returned at one time, they can be paged Parameters to obtain the specified part of the results (Java can be stored in list or LinkedHashMap, the order of HashMap storage may be changed)

Step 2 : Store the aggregation result in the memory. Use Java to consider list or map storage.

Step 3 : In-memory paging, the data stored in the list is paged and returned in Java.
If there are 10 pieces of data per page, the first page is: take the 0th to 9th elements in the list, and so on.

Summary:
The method described in this article is that when the amount of aggregated data is not large and the aggregated results are not many, all the aggregated results are returned, and then the java back-end uses a list to store and page.

Disadvantages:
But if the amount of data in this method is too large, it will time out during es aggregation and it will be very slow, and if there are many aggregation results produced, it will not return to the java side at one time. If you have a small amount of aggregated data and do not want to return the aggregated results at once, you can consider using bucket_sort,
for example:

"aggregations": {
    
    
    "statistics_assets": {
    
    
      "terms": {
    
    
        "field": "one_account.one_account_no",
        "size": 245645500
      },
      "aggregations": {
    
    
        "assets": {
    
    
          "sum": {
    
    
            "field": "assets.merge"
          }
        },
        "assets_bucket_sort": {
    
    
          "bucket_sort": {
    
    
            "sort": {
    
    
              "assets": {
    
    
                "order": "desc"
              }
            },
            "from": 0,
            "size": 10
          }
        }
      }
    }
  }

Refer to the following official documents or articles:
Bucket sort aggregationedit
How to add Bucket Sort to Query Aggregation
Aggregation + sorting +pagination in elastic search

If you have a large amount of aggregated data, and bucket_sort cannot be used, it is recommended to let the middle office or the data department add new fields, and the data will be processed and calculated again, and the data to be aggregated will be added to the newly added fields, so you don't have to Aggregated, just take the value of the field directly.

Reference article:
Elasticsearch post-aggregation paging in-depth explanation

Guess you like

Origin blog.csdn.net/qq_33697094/article/details/109820792