Request body search for Elasticsearch query

1. Basic query syntax

Query example

GET /users/_search?q=21&from=0&size=2&sort=age:desc&search_type=dfs_query_then_fetch&request_cache=true&terminate_after=1&timeout=10s
Field meaning
timeout Search timeout, limit the search request to be executed within the specified time value, and the hits accumulated to that point before expiration are released on bail. The default is no timeout.
from Get a match from an offset. The default is 0.
size The number of clicks returned. The default is 10. If you don’t want to get some matches, but only focus on the number of matches and/or aggregations, setting this value to 0 will help improve performance.
search_type The type of search operation to be performed. It can be dfs_query_then_fetch or query_then_fetch. The default is query_then_fetch.
request_cache Set to true or false to enable or disable caching of search results for requests with a size value of 0, that is, summary and suggestion (the highest match is not returned).
terminate_after The maximum number of documents collected for each shard. When this number is reached, query execution will be terminated early. If set, the response will have a boolean field, terminated_early, to indicate whether the query execution has actually been terminated. The default is no terminate_after.

Two, search_type classification

According to the search type, it can be divided into the following four types:

  • QUERY_THEN_FETCH
  • QUERY_AND_FEATCH
  • DFS_QUERY_THEN_FEATCH
  • DFS_QUERY_AND_FEATCH

2.1 query and fetch (not supported in the new version)

  A query request is issued to all the shards of the index, and when each shard returns, the document and the calculated ranking information are returned together.

  • Advantages: This search method is the fastest. Because compared with several other search methods, this query method only needs to fragment and query once.
  • Disadvantages: The amount of returned data is inaccurate, and the data of (N*shard number) may be returned and the data ranking is not accurate. At the same time, the sum of the results returned by each shard may be n times the size required by the user.

2.2 query then fetch (es default search method)

  The first step is to send a request to all shards. Each shard returns only the document id (not including the document) and ranking-related information (the corresponding score of the document), and then restarts according to the score of the document returned by each shard Sort and rank, take the top size documents.
  The second step is to get the document from the related shard according to the document id. The number of documents returned in this way is equal to the size requested by the user.

  • Advantages: The amount of data returned is accurate.
  • Disadvantages: average performance and inaccurate data ranking.

2.3 DFS query and fetch (not supported in the new version)

  This method has one more DFS step than the first method. With this step, search scoring and ranking can be controlled more precisely. That is, before querying, a request is sent to all the shards, and the scoring basis such as the word frequency and document frequency in all the shards are summarized together, and then the following operations are performed.

  • Advantages: accurate data ranking
  • Disadvantages: average performance. The amount of data returned is inaccurate, it may return (N*number of fragments) data

2.4 DFS query then fetch (es support)

  Before querying, first send a request to all the shards, sum up all the scoring basis such as word frequency and document frequency in all the shards, and then perform the following operations

  • Advantages: The amount of data returned is accurate, and the data ranking is accurate
  • Disadvantages: worst performance

DFS is to collect the word frequency and document frequency of each segment before performing a real query, and then when performing a word search, each segment is searched and ranked according to the global word frequency and document frequency. If you use DFS_QUERY_THEN_FETCH this query method, the efficiency is the lowest, because a search may require 3 fragments. But using the DFS method, the search accuracy is the highest.

2.5 Summary

  • Performance: QUERY_AND_FETCH is the best, DFS_QUERY_THEN_FETCH is the worst.
  • Search accuracy: DFS is more accurate than non-DFS.

Guess you like

Origin blog.csdn.net/qq_42979842/article/details/108089403