[Source] Elasticsearch write source code analysis (c)

Connect one: [Source] Elasticsearch write source code analysis (two) .

3.2.5 Write master node processes slice

Code entry: TransportReplicationAction.PrimaryOperationTransportHandler # messageReceived, then into AsyncPrimaryAction # doRun method.
Check request: 1. The main currently fragmented; 2.allocationId whether the expected value; is the expected value if 3.PrimaryTerm

            if (shardRouting.primary() == false) {
                .....
            }
            final String actualAllocationId = shardRouting.allocationId().getId();
            if (actualAllocationId.equals(targetAllocationID) == false) {
                ......
            }
            final long actualTerm = indexShard.getPendingPrimaryTerm();
            if (actualTerm != primaryTerm) {
              	......
            }

Check whether the main fragment migration:
If you have already migrated: 1 phase state is set to "primary_delegation"; 2 closed primaryShardReference the current fragmentation and timely release of resources; 3 Obtain have migrated to the target node, and forwards the request to the... node, and wait for an execution result; 4 get the result, the task status is updated to "finish".

                    transportService.sendRequest(relocatingNode, transportPrimaryAction,
                        new ConcreteShardRequest<>(request, primary.allocationId().getRelocationId(), primaryTerm),
                        transportOptions,
                        new TransportChannelResponseHandler<Response>(logger, channel, "rerouting indexing to target primary " + primary,
                            reader) {
                            @Override
                            public void handleResponse(Response response) {
                                setPhase(replicationTask, "finished");
                                super.handleResponse(response);
                            }
                            @Override
                            public void handleException(TransportException exp) {
                                setPhase(replicationTask, "finished");
                                super.handleException(exp);
                            }
                        });

If no migration:
. 1. The task status is updated to "primary"; 2 master slice preparation operation (main portion); 3 forwards the request to copy fragment

setPhase(replicationTask, "primary");
                    final ActionListener<Response> listener = createResponseListener(primaryShardReference);
                    createReplicatedOperation(request,
                        ActionListener.wrap(result -> result.respond(listener), listener::onFailure),
                        primaryShardReference).execute(); //入口

After the primary node where the received write request is sent, the coordinator node, execution logic officially written entry is performed in the write execute method ReplicationOperation class, two key steps performed in the method is to first write primary shard, if the write was successful primary shard, then the write request to the node from where the shard.

    public void execute() throws Exception {
        .......
        //关键,这里开始执行写主分片
        primaryResult = primary.perform(request);
		.......
        final ReplicaRequest replicaRequest = primaryResult.replicaRequest();
        if (replicaRequest != null) {
			........
            markUnavailableShardsAsStale(replicaRequest, replicationGroup);
            // 关键步骤,写完primary后这里转发请求到replicas
            performOnReplicas(replicaRequest, globalCheckpoint, maxSeqNoOfUpdatesOrDeletes, replicationGroup);
        }
        successfulShards.incrementAndGet();  // mark primary as successful
        decPendingAndFinishIfNeeded();
    }

Below, we look at the primary key to write the code, write primary entrance function for TransportShardBulkAction # shardOperationOnPrimary, and finally into the index (...) -> process InternalEngine # index (), which is the main process of writing data. First, by acquiring a corresponding index strategy that plan, the implementation plan by the corresponding operations to normal as written, to the indexIntoLucene (...), then write translog. As follows:

    public IndexResult index(Index index) throws IOException {
    			.......
                final IndexResult indexResult;
                if (plan.earlyResultOnPreFlightError.isPresent()) {
                    indexResult = plan.earlyResultOnPreFlightError.get();
                    assert indexResult.getResultType() == Result.Type.FAILURE : indexResult.getResultType();
                } else if (plan.indexIntoLucene || plan.addStaleOpToLucene) {
                	// 将数据写入lucene,最终会调用lucene的文档写入接口
                    indexResult = indexIntoLucene(index, plan);
                } else {
                    indexResult = new IndexResult(
                        plan.versionForIndexing, getPrimaryTerm(), plan.seqNoForIndexing, plan.currentNotFoundOrDeleted);
                }
                if (index.origin().isFromTranslog() == false) {
                    final Translog.Location location;
                    if (indexResult.getResultType() == Result.Type.SUCCESS) {
                        location = translog.add(new Translog.Index(index, indexResult)); //写translog
                    ......
                    indexResult.setTranslogLocation(location);
                }
              .......
        }

ES write operation is to write lucene, lucene to write data to memory and then write translog. After ES reason to write lucene write log is probably the main reason for writing Lucene, Lucene will then carry out some checks on the data, there may be cases written Lucene failure occurs. If the first to write translog, then it would write translog treatment success but written question Lucene has been a failure, so ES uses the Lucene way to write.

After writing primary, we will continue to write replicas, then need to forward the request to the slave node, if the replica shard is not assigned, then simply ignored; if the replica shard are moving data to other nodes, the request is forwarded to the relocation of the target the shard, otherwise, forwarded to the replica shard. replicaRequest is obtained from primaryResult after writing the main fragment, not the original Request. This code is as follows:

    private void performOnReplicas(final ReplicaRequest replicaRequest, final long globalCheckpoint,
                                   final long maxSeqNoOfUpdatesOrDeletes, final ReplicationGroup replicationGroup) {
        totalShards.addAndGet(replicationGroup.getSkippedShards().size());
        final ShardRouting primaryRouting = primary.routingEntry();
        for (final ShardRouting shard : replicationGroup.getReplicationTargets()) {
            if (shard.isSameAllocation(primaryRouting) == false) {
                performOnReplica(shard, replicaRequest, globalCheckpoint, maxSeqNoOfUpdatesOrDeletes);
            }
        }
    }

performOnReplica method forwards the request to the target node, if an exception occurs, such as the peer nodes hang, etc. Shard write fails, for these anomalies, Primary replica shard considered failed and is unavailable, it will report to the master and removed replica. This code is as follows:

    private void performOnReplica(final ShardRouting shard, final ReplicaRequest replicaRequest,
                                  final long globalCheckpoint, final long maxSeqNoOfUpdatesOrDeletes) {
        .....
        totalShards.incrementAndGet();
        pendingActions.incrementAndGet();
        replicasProxy.performOn(shard, replicaRequest, globalCheckpoint, maxSeqNoOfUpdatesOrDeletes, new ActionListener<ReplicaResponse>() {
            @Override
            public void onResponse(ReplicaResponse response) {
                successfulShards.incrementAndGet();
                try {
                    primary.updateLocalCheckpointForShard(shard.allocationId().getId(), response.localCheckpoint());
                    primary.updateGlobalCheckpointForShard(shard.allocationId().getId(), response.globalCheckpoint());
                }
                ......
                decPendingAndFinishIfNeeded();
            }
            @Override
            public void onFailure(Exception replicaException) {
                if (TransportActions.isShardNotAvailableException(replicaException) == false) {
                    RestStatus restStatus = ExceptionsHelper.status(replicaException);
                    shardReplicaFailures.add(new ReplicationResponse.ShardInfo.Failure(
                        shard.shardId(), shard.currentNodeId(), replicaException, restStatus, false));
                }
                String message = String.format(Locale.ROOT, "failed to perform %s on replica %s", opType, shard);
                replicasProxy.failShardIfNeeded(shard, message, replicaException,
                    ActionListener.wrap(r -> decPendingAndFinishIfNeeded(), ReplicationOperation.this::onNoLongerPrimary));
            }
        });
    }

The primary replica and write logic Similarly, no specific description herein.

Here also it relates to Checkpoint update operation, including writing detailed flow translog subsequent replenishment.

发布了22 篇原创文章 · 获赞 46 · 访问量 1959

Guess you like

Origin blog.csdn.net/wudingmei1023/article/details/103938670