When SpringBoot integrates ElasticSearch, the pits encountered in paging sorting queries can only return 10 pieces of data each time

background

ElasticSearch is used to segment word queries and return the specified number of data items in paging, but when we want to get more than ten paging data items, ElasticSearch can always return only ten items. This is because ElasticSearch has set the paging data to return only 10 pieces by default for the speed of query, so we need to change the size of the data in paging query by changing the size (the size of the returned data). If it is not set size can only return 10 items.

1. The query should include from-size

If the query query in es does not specify the from-size value, es defaults from=0, size=10, and the default is to query 10 pieces of data at a time

The following query statement, 11 unique id queries, if you do not specify from and size, only 10 pieces of data will be returned instead of 11 pieces

{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "id": [4190,4191,4192,4193,4194,4195,4196,4197,4198,4199,4200]
          }
        }
      ]
    }
  },
  "sort": [
    {
      "created_time": {
        "order": "desc"
      }
    }
  ]
}

So the correct query is to add from=0, size=11, that is, specify the expected size.

The query needs to specify the sort sorting field

If the query query in es does not specify the sort sorting field, the page-turning query may cause repeated queries and pagination confusion.

As follows, query 10 items per page, query multiple pages, and duplicate data may be returned. At this time, the query needs to sort the sorting fields, which are as unique as possible, such as creation time or primary key, unique ID field, etc.

{
	"from": 0,
	"size": 10,
	"query": {
		"bool": {
			"must": [{
				"term": {
					"month": "2022-12"
				}
			}]
		}
	}
}

Generally speaking, this is caused by ES's shard storage and shard retrieval mechanism. A common reason is caused by ES's _score score. The default sorting of ES is exactly the reverse order of _score. When the _score of all docs is 0, the paging will be disordered, and sometimes the results that appear on the first page may appear repeatedly on the second and third pages (depending on the order of the results returned by shards at that time ).

3. From-size paging to set the window size

If you are using it for the first time and are not familiar with es, when the from + size paging query exceeds 10,000, the following exception will be reported:

Result window is too large, from + size must be less than or equal to: [10000] but was [22020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting

上边 from+size=(pageNum-1)*size+size=(1101-1)*20+20=22020>10000,抛出了异常

当然10000也可以调整,如最大上限调整为800000

PUT my_index/_settings
 
{"index.max_result_window":"800000"}

之所以es默认设置最大查询量1w,与es的分布式存储和from+size的分页原理有关,可以参考之前的文章

本篇文章如有帮助到您,请给「翎野君」点个赞,感谢您的支持。

首发链接:https://www.cnblogs.com/lingyejun/p/17557526.html

Guess you like

Origin blog.csdn.net/lingyejun/article/details/131778597