Describe in detail the process of Elasticsearch search

Describe in detail the process of Elasticsearch search

We all know that es is a distributed storage and retrieval system. When storing, it is routed and distributed according to the _id field of each record by default, which means that the es server knows exactly which shard each document is distributed on. of.
Compared with the operation on CURD, search is a more complicated execution mode. Because we don't know which documents will be matched, it is possible on any shard, so a search request must query all shards in one index or multiple indexes. In order to complete the query to the results we want.
Finding all matching results is the first step of the query. Data sets from multiple shards will be merged into a sorted list before being paginated and returned to the client. Since it needs to go through the top N operation, search Need to go through two stages to complete, namely query and fetch.

1. Query (query phase)

When a search request is sent, the query will be broadcast to every shard (master shard or replica shard) in the index, and each shard will generate a priority queue of hit documents after executing the query request locally.
This queue is a sorted list of top N data, and its size is equal to the sum of from+size, which means that if your from is 10 and size is 10, then the size of the queue is 20, so this is why the depth You cannot use from+size in paging, because the larger the from, the lower the performance.

The query process of distributed search in es is as follows:

image.png

The client sends a search request to Node 3, and then Node 3 will create a priority queue whose size=from+size2, and then Node 3 forwards the search request to every master shard or replica shard in the index, every shard Will query locally and add the results to the local sorted priority queue. 3. Each shard returns the docId and the values ​​of all fields participating in the sorting, such as _score, to the priority queue, and then returns to the coordinating node, which is Node 3, and then Node 3 is responsible for merging the data in all shards into a global Sorted list.

The term mentioned above is called coordinating node, this node is when the search request is sent to a node with random load, then this node will become a coordinating node, its responsibility is to broadcast the search request to all related shards, and then merge Their response results to a global sorted list and then proceed to the second fetch stage. Note that this result set only contains docId and all sorted field values. The search request can be processed by the master shard or the replica shard, which is why we say increase The number of copies can increase the search throughput. The coordinating node will automatically load balance through round-robin.

Two.fetch (reading phase)

image.png
The process is as follows:

1. The coordinating node identifies those documents that need to be pulled out, and sends a batch of mutil get requests to the relevant shard 2. Each shard loads the relevant documents, and if necessary, they will be returned to the coordinating node 3. Once All documents are pulled back, and the coordinating node will return the result set to the client.

It should be noted here that when the coordinating node is pulled, only the data that needs to be pulled is pulled, such as from=90, size=10, then fetch will only read the 10 pieces of data that need to be read, and these 10 pieces of data may be in On one shard or multiple shards, the coordinating node will construct a multi-get request and send it to each shard. Each shard will get data from the _source field as needed. Once all data is returned, coordinating The node will assemble the data into a single response and return it to the final client.

3. Summary:

This article introduces the query process of es distributed search is divided into two stages: query and fetch. In the query stage, the docId of the relevant document and the related sorting field values ​​are read from all shards, and all are collected on the coordinating node. After the result number enters a global sorted list, and then obtains the data of the page specified according to from+size, obtains these docIds and then constructs a multi-get request to send the relevant shard to obtain the data that needs to be loaded from _source, and finally Then return to the client side, so far the entire search request process is executed

reference

http://www.shouhuola.com/q-29712.html

Guess you like

Origin blog.csdn.net/qq_41489540/article/details/113817238