【Elasticsearch源码】写入源码分析（二）

接上一篇：【Elasticsearch源码】写入源码分析（一）

如果需要自动创建索引，则需要遍历bulk的所有index，然后检查index是否需要自动创建，对于不存在的index，则会加入到自动创建的集合中，然后会调用createIndex方法创建index。index的创建由master来把控，master会根据分片分配和均衡的算法来决定在哪些data node上创建index对应的shard，然后将信息同步到data node上，由data node来执行具体的创建动作。

  // Step 1: 对bulkRequest进行过滤，获取所有的索引名。主要为opType和versionType，其中opType为索引操作类型，支持INDEX、CREATE,UPDATE,DELETE四种。DELETE请求如果索引不存在，不应该创建索引，除非external versioning正在使用。
            final Set<String> indices = bulkRequest.requests.stream()
                .filter(request -> request.opType() != DocWriteRequest.OpType.DELETE
                    || request.versionType() == VersionType.EXTERNAL
                    || request.versionType() == VersionType.EXTERNAL_GTE)
                .map(DocWriteRequest::index)
                .collect(Collectors.toSet());
// Step 2: 对各个索引进行检查，indicesThatCannotBeCreated用来存储无法创建索引的信息Map，autoCreateIndices用来存储可以自动创建索引的Set。
//索引是否可以正常自动创建，主要检查：1.是否存在该索引或别名（存在则无法创建）；2.该索引是否被允许自动创建（二次检查，为了防止check信息丢失）；3.动态mapping是否被禁用（如果被禁用，则无法创建）；4.创建索引的匹配规则是否存在并可以正常匹配（如果表达式非空，且该索引无法匹配上，则无法创建）。
            final Map<String, IndexNotFoundException> indicesThatCannotBeCreated = new HashMap<>();
            Set<String> autoCreateIndices = new HashSet<>();
            ClusterState state = clusterService.state();
            for (String index : indices) {
                boolean shouldAutoCreate;
                try {
                    shouldAutoCreate = shouldAutoCreate(index, state);
                } catch (.....) { .....}
                if (shouldAutoCreate) {
                    autoCreateIndices.add(index);
                }
            }
// Step 3: 如果没有索引需要创建，直接executeBulk到下一步；如果存在需要创建的索引，则逐个创建索引，并监听结果，成功计数器减1.失败的话，将bulkRequest中对应的request的value值设置为null，计数器减1，当所有索引执行"创建索引"操作结束后，即计数器为0时，进入executeBulk。
            if (autoCreateIndices.isEmpty()) {
                executeBulk(task, bulkRequest, startTime, listener, responses, indicesThatCannotBeCreated);
            } else {
                final AtomicInteger counter = new AtomicInteger(autoCreateIndices.size());
                for (String index : autoCreateIndices) {
                    createIndex(index, bulkRequest.timeout(), new ActionListener<CreateIndexResponse>() {
                        @Override
                        public void onResponse(CreateIndexResponse result) {
                            if (counter.decrementAndGet() == 0) {
                                executeBulk(task, bulkRequest, startTime, listener, responses, indicesThatCannotBeCreated);
                            }
                        }
                        @Override
                        public void onFailure(Exception e) {
						..........
                    });
                }

3.2.3 协调节点处理并转发请求

创建完index之后，index的各shard已在数据节点上建立完成，接着协调节点将会转发写入请求到文档对应的primary shard。进入到BulkOperation#doRun中。
首先会检查集群无BlockException后（存在BlockedException会不断重试，直至超时），然后遍历BulkRequest的所有子请求，然后根据请求的操作类型生成相应的逻辑，对于写入请求，会首先根据IndexMetaData信息，resolveRouting方法为每条IndexRequest生成路由信息，并通过process方法按需生成doc id（不指定的话默认是UUID）。

            for (int i = 0; i < bulkRequest.requests.size(); i++) {
                DocWriteRequest docWriteRequest = bulkRequest.requests.get(i);
				.......
                Index concreteIndex = concreteIndices.resolveIfAbsent(docWriteRequest);
                try {
                    switch (docWriteRequest.opType()) {
                        case CREATE:
                        case INDEX:
                            .......
                            indexRequest.resolveRouting(metaData);
                            indexRequest.process(indexCreated, mappingMd, concreteIndex.getName());
                            break;
							......
                    }
                } catch (.......) {.......}
            }

然后根据每个IndexRequest请求的路由信息（如果写入时未指定路由，则es默认使用doc id作为路由）得到所要写入的目标shard id，并将DocWriteRequest封装为BulkItemRequest且添加到对应shardId的请求列表中。代码如下：

			//requestsByShard的key是shard id，value是对应的单个doc写入请求（会被封装成BulkItemRequest）的集合
            Map<ShardId, List<BulkItemRequest>> requestsByShard = new HashMap<>();
            for (int i = 0; i < bulkRequest.requests.size(); i++) {
            	//从bulk请求中得到每个doc写入请求
                DocWriteRequest request = bulkRequest.requests.get(i);
                ......
                String concreteIndex = concreteIndices.getConcreteIndex(request.index()).getName();
                //根据路由，找出doc写入的目标shard id
                ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, request.id(),
                    request.routing()).shardId();
                List<BulkItemRequest> shardRequests = requestsByShard.computeIfAbsent(shardId, shard -> new ArrayList<>());
                shardRequests.add(new BulkItemRequest(i, request));
            }

计算ShardId的代码如下所示：这里的partitionOffset是根据参数index.routing_partition_size获取的，默认为1，写入时指定id，可能导致分布不均，可调大该参数，让分片id可变范围更大，分布更均匀。routingFactor默认为1，主要是在做spilt和shrink时改变。

    private static int calculateScaledShardId(IndexMetaData indexMetaData, String effectiveRouting, int partitionOffset) {
        final int hash = Murmur3HashFunction.hash(effectiveRouting) + partitionOffset;
        return Math.floorMod(hash, indexMetaData.getRoutingNumShards()) / indexMetaData.getRoutingFactor();
    }

上一步已经找出每个shard及其所需执行的doc写入请求列表的对应关系，这里就相当于将请求按shard进行了拆分，接下来会将每个shard对应的所有请求封装为BulkShardRequest并交由TransportShardBulkAction来处理：即将相同shard id的请求合并，并转发TransportShardBulkAction请求。

            for (Map.Entry<ShardId, List<BulkItemRequest>> entry : requestsByShard.entrySet()) {
                final ShardId shardId = entry.getKey();
                final List<BulkItemRequest> requests = entry.getValue();
                // 对每个shard id及对应的BulkItemRequest集合，合并为一个BulkShardRequest
                BulkShardRequest bulkShardRequest = new BulkShardRequest(shardId, bulkRequest.getRefreshPolicy(),
                    requests.toArray(new BulkItemRequest[requests.size()]));
                ......
                if (task != null) {
                    bulkShardRequest.setParentTask(nodeId, task.getId());
                // 处理请求（在listener中等待响应，响应都是按shard返回的，如果一个shard中有部分请求失败，将异常填到response中，所有请求完成，即计数器为0，调用finishHim()，整体请求做成功处理）：
                shardBulkAction.execute(bulkShardRequest, new ActionListener<BulkShardResponse>() {
         			........
                });
            }

3.2.4 向主分片发送请求

转发TransportShardBulkAction请求，最后进入到TransportReplicationAction#doExecute方法，然后进入到TransportReplicationAction.ReroutePhase#doRun方法。这里会通过ClusterState获取到primary shard的路由信息，然后得到primay shard所在的node，如果node为当前协调节点则直接将请求发往本地，否则发往远端：

            setPhase(task, "routing"); //标识为routing阶段
            final ClusterState state = observer.setAndGetObservedState();
            .......
            } else {
                // 获取主分片所在的shard路由信息，得到主分片所在的node节点
                final IndexMetaData indexMetaData = state.metaData().index(concreteIndex);
                .........
                final DiscoveryNode node = state.nodes().get(primary.currentNodeId());
                if (primary.currentNodeId().equals(state.nodes().getLocalNodeId())) {
                	//是当前节点，继续执行
                    performLocalAction(state, primary, node, indexMetaData);
                } else {
                	//不是当前节点，转发到对应的node上进行处理
                    performRemoteAction(state, primary, node);
                }
            }

如果分片在当前节点，task当前阶段置为“waiting_on_primary”，否则为“rerouted”，两者都走到同一入口，即performAction(…)，在performAction方法中，会调用TransportService的sendRequest方法，将请求发送出去。
如果对端返回异常，比如对端节点故障或者primary shard挂了，对于这些异常，协调节点会有重试机制，重试的逻辑为等待获取最新的集群状态，然后再根据集群的最新状态（通过集群状态可以拿到新的primary shard信息）重新执行上面的doRun逻辑；如果在等待集群状态更新时超时，则会执行最后一次重试操作（执行doRun）。这块的代码如下：

        void retry(Exception failure) {
            if (observer.isTimedOut()) {
                // 超时时已经做过最后一次尝试，这里将不再重试，超时默认1min
                finishAsFailed(failure);
                return;
            }
            setPhase(task, "waiting_for_retry");
            request.onRetry();
            observer.waitForNextChange(new ClusterStateObserver.Listener() {
                @Override
                public void onNewClusterState(ClusterState state) {
                    run(); //会调用doRun
                }
                .......
                @Override
                public void onTimeout(TimeValue timeout) { //超时，做最后一次重试
                    // Try one more time...
                    run(); //会调用doRun
                }
            });
        }

然后进入到主分片写入过程了，见下篇：【Elasticsearch源码】写入源码分析（三）。

少加点香菜

发布了22 篇原创文章 · 获赞 46 · 访问量 1960

私信关注