[源码解析] elastic search 的查询原理（二）

查询是分为query阶段、fetch阶段和expand阶段，共3个阶段。重点是前2个阶段。在上篇文章说的是query阶段，接下来说fetch阶段。

fetch是要解析query阶段的result的

//比如有5个分片，当所有的分片都查询完后，开始进入下一个阶段
private void successfulShardExecution(SearchShardIterator shardsIt) {
        final int remainingOpsOnIterator;
        if (shardsIt.skip()) {
            remainingOpsOnIterator = shardsIt.remaining();
        } else {
            remainingOpsOnIterator = shardsIt.remaining() + 1;
        }
        final int xTotalOps = totalOps.addAndGet(remainingOpsOnIterator);
        if (xTotalOps == expectedTotalOps) {
//就是在这里开始进入下一个阶段
            onPhaseDone();
        }  
    }

//this的名字是 query 表示query阶段。这个方法正式从query进入fetch
final void onPhaseDone() {  // as a tribute to @kimchy aka. finishHim()
        executeNextPhase(this, getNextPhase(results, this));
    }

//下一个阶段就是要创建的这个阶段 FetchSearch阶段
@Override
    protected SearchPhase getNextPhase(final SearchPhaseResults<SearchPhaseResult> results, final SearchPhaseContext context) {
        return new FetchSearchPhase(results, searchPhaseController, context, clusterState());
    }

因为query的时候涉及5个分片，所有结果也是5个

[2020-09-14T22:05:10,997][TRACE][o.e.a.s.TransportSearchAction] [] [query] Moving to next phase: [fetch], based on results from: [zTVfnC6sRReXE9mbtqd6Aw][90][0],[zTVfnC6sRReXE9mbtqd6Aw][90][1],[zTVfnC6sRReXE9mbtqd6Aw][90][2],[zTVfnC6sRReXE9mbtqd6Aw][90][3],[zTVfnC6sRReXE9mbtqd6Aw][90][4] (cluster state version: 138)

上面的这条日志也更具有说服力

在fetch阶段，线程执行时使用的也是search名称的线程池。

在fetch阶段也是要创建请求对象，发送请求的，和query阶段类似
ShardFetchSearchRequest fetchSearchRequest = createFetchRequest(queryResult.queryResult().getContextId(), i, entry,
                            lastEmittedDocPerShard, searchShardTarget.getOriginalIndices());
                        executeFetch(i, searchShardTarget, counter, fetchSearchRequest, queryResult.queryResult(),
                            connection);

此时action的名字

 indices:data/read/search[phase/fetch/id]

我原以为fetch请求只会发送一次，其实不是的，根据对query阶段返回的结果进行解析而定，具体代码是


    private void innerRun() throws Exception {
          //在遍历query阶段的成果
                for (int i = 0; i < docIdsToLoad.length; i++) {
                    IntArrayList entry = docIdsToLoad[i];
                    SearchPhaseResult queryResult = queryResults.get(i);
//只要此处的entry不为空，就会进入到另外一个分支，发送一个fetch请求
                    if (entry == null) { // no results for this shard ID
                        if (queryResult != null) {
                            // if we got some hits from this shard we have to release the context there
                            // we do this as we go since it will free up resources and passing on the request on the
                            // transport layer is cheap.
                            releaseIrrelevantSearchContext(queryResult.queryResult());
                            progressListener.notifyFetchResult(i);
                        }
                        // in any case we count down this result since we don't talk to this shard anymore
                        counter.countDown();
                    } else {
//所以会有可能发送多次fetch请求
                        SearchShardTarget searchShardTarget = queryResult.getSearchShardTarget();
                        Transport.Connection connection = context.getConnection(searchShardTarget.getClusterAlias(),
                            searchShardTarget.getNodeId());
                        ShardFetchSearchRequest fetchSearchRequest = createFetchRequest(queryResult.queryResult().getContextId(), i, entry,
                            lastEmittedDocPerShard, searchShardTarget.getOriginalIndices());
                        executeFetch(i, searchShardTarget, counter, fetchSearchRequest, queryResult.queryResult(),
                            connection);
                    }
                }
            }
        }
    }

在ArraySearchPhaseResults中有一个专门用数组保存请求结果的字段

final AtomicArray<Result> results;

有几个分片数组的长度就是几，但并不是每个都需要发送fetch请求。

有多个fetch结果，所以需要merge

所谓的merge可以理解为将2个结果按照分数排序或者按照字段排序放到一个集合里返回。merge完以后构建response。

fetch阶段完成后，是expand阶段

所以ExpandSearchPhase中的run方法

@Override
    public void run() {
//第一个参数是fetch阶段返回的结果
 context.sendSearchResponse(searchResponse, scrollId);

}

对于这个阶段而言如果在请求中要求field-collapsing，是该阶段的主要处理任务。如果没有该要求，直接进入下一个阶段。

对于query then fetch来说，查询就结束了

[源码解析] elastic search 的查询原理（二）

猜你喜欢