Hadoop source code reading (3): HDFS upload

Instructions:
1. Hadoop version: 3.1.3
2. Reading tool: IDEA 2023.1.2
3. Source code acquisition: Index of /dist/hadoop/core/hadoop-3.1.3 (apache.org)
4. Project import: Download source code After that, get the compressed package, open PowerShell in the current directory, use the command to decompress, and then use IDEA to open the folder. Be careful to configure the Maven or Gradle warehouse, otherwise the jar package import will be slow. 5. Reference course: www.bilibili.com /video/BV1Qp… hadoop-3.1.3-src.tar.gz tar -zxvfhadoop-3.1.3-src

HDFS upload

A simple upload code:

public void test() throws IOException {

	FSDataOutputStream fos = fs.create(new Path("/input"));

	fos.write("hello world".getBytes());
}

As you can see, one is created first FSDataOutputStream, and then data is written to it; then it is divided into  the create creation process  and  the write upload process  for source code reading and analysis.

createCreation process

1. The client sends a creation request to NN

First enter createthe method and come to FileSystem.java:

Find the create method and continue entering until you find the static method create:

So back to the call to that static method:

ctrl+alt+B finds the implementation class of this static method:

Entering DistributedFileSystem:

Continue looking down:

You can see that doCallan output stream object is created in the method;

Continue to enter createthe method and come to DFSClient.java:

Keep searching downwards to find newStreamForCreatethe method:

Enter newStreamForCreatethe method and come to DFSOutputStream.java

Here the client sends the create request to NN for processing through RPC communication.

Start thread

2.NN handles the creation request from the client

newStreamForCreateEnter the create method in the method and come to ClientProtocol.java:

Find its implementation class:

Enter NameNodeRpcServer createas follows:

 @Override // ClientProtocol
  public HdfsFileStatus create(String src, FsPermission masked,
      String clientName, EnumSetWritable<CreateFlag> flag,
      boolean createParent, short replication, long blockSize,
      CryptoProtocolVersion[] supportedVersions, String ecPolicyName)
      throws IOException {
    checkNNStartup(); //检查NN是否启动
    String clientMachine = getClientMachine();
    if (stateChangeLog.isDebugEnabled()) {
      stateChangeLog.debug("*DIR* NameNode.create: file "
          +src+" for "+clientName+" at "+clientMachine);
    }
    if (!checkPathLength(src)) { //检查路径长度
      throw new IOException("create: Pathname too long.  Limit "
          + MAX_PATH_LENGTH + " characters, " + MAX_PATH_DEPTH + " levels.");
    }
    namesystem.checkOperation(OperationCategory.WRITE);
    CacheEntryWithPayload cacheEntry = RetryCache.waitForCompletion(retryCache, null);
    if (cacheEntry != null && cacheEntry.isSuccess()) { //缓存相关检查
      return (HdfsFileStatus) cacheEntry.getPayload();
    }

    HdfsFileStatus status = null;
    try {
      PermissionStatus perm = new PermissionStatus(getRemoteUser()
          .getShortUserName(), null, masked);
      //开启文件(重要)
      status = namesystem.startFile(src, perm, clientName, clientMachine,
          flag.get(), createParent, replication, blockSize, supportedVersions,
          ecPolicyName, cacheEntry != null);
    } finally {
      RetryCache.setState(cacheEntry, status != null, status);
    }

    metrics.incrFilesCreated();
    metrics.incrCreateFileOps();
    return status;
  }

Next enter startFilethe method and come to FSNamesystem.java:

Enter startFileInt:

Encapsulate src (file path) into INodesInPath;

Explanation of the INodesInPath class: Contains INodes information resolved from a given path.

First we need to clarify the concept of INodes class:

INodes is an abstract class, and its official explanation is as follows:

In simple terms a basic INode class is an in-memory representation of a file/block hierarchy, containing common fields for file and directory inodes

You can see that INodes is the lowest class, saving some attributes common to files and directories, while the INodesInPath class saves the INode information parsed from a given path;

Next locate startFile:

EnterstartFile

  • First verify whether the file path exists:

Enter getLastINode:

Enter getINode:

It can be seen that when i=-1,return inodes[inodes.length-1];

In other words, get the inode at the last position. If there is one, it means that the file path already exists;

Next, determine whether overwriting is allowed:

If overwriting is not allowed, an exception will be thrown to inform that the file path already exists and repeated uploading of files is not allowed;

  • Then determine whether the parent directory exists:

If the parent directory exists, add file metadata information to it ( addFilemethod)

Entry addFilemethod:

Enter addINode:

Write the data into the directory tree of INode; now the file directory is created.

3.DataStreamer startup process

After the NN processing is completed, return to the client again and start the corresponding thread;

Open DFSOutputStream.java and find newStreamForCreatethe method. After NN completes the creation request, create the output stream:

Target DFSOutputStream:

Calculate chunk size (Directory => File => Block(128M) => packet(64K) => chunk (chunk 512byte + chunksum 4byte))

Return to newStreamForCreatethe method and enterout.start()

Continue to enter:

Continue to enter DataStreamer:

Enter Daemon:

As you can see, the out.start method starts a thread, so go back to the DataStreamer and search for the run method:

If there is no data in the dataQueue, the code will block;

If the dataQueue is not empty, take the packet from it

write upload process

1. Write data to the queue of DataStreamer

The DataStreamer is started in the create phase and data is written to it in the write phase;

Enter the write method and go to FilterOutputStream.java:

Keep going until you reach the abstract method write:

ctrl+alt+B to find its implementation class:

Enter FSOutputSummer.java and locate the write method:

The entry flushBuffermethod, as the name suggests, is to flush the buffer:

Entry writeChecksumChunksmethod:

Entering writeChunkmethod (writing chunk to data queue):

is an abstract method, so find its implementation class:

Enter DFSOutputStream.java to view writeChunkthe specific implementation logic of the method, as follows:

  @Override
  protected synchronized void writeChunk(byte[] b, int offset, int len,
      byte[] checksum, int ckoff, int cklen) throws IOException {
    writeChunkPrepare(len, ckoff, cklen);

    currentPacket.writeChecksum(checksum, ckoff, cklen); //往packet里面写chunk的校验和 4byte
    currentPacket.writeData(b, offset, len); // 往packet里面写一个chunk  512byte
    // 记录写入packet中的chunk个数,累计到127个chuck,这个packet就满了
    currentPacket.incNumChunks();
    getStreamer().incBytesCurBlock(len);

    //如果packet已经满了,则将其放入队列等待传输
    if (currentPacket.getNumChunks() == currentPacket.getMaxChunks() ||
        getStreamer().getBytesCurBlock() == blockSize) {
      enqueueCurrentPacketFull(); 
    }
  }

Entry enqueueCurrentPacketFullmethod:

Entry enqueueCurrentPacketmethod:

Entry waitAndQueuePacketmethod:

void waitAndQueuePacket(DFSPacket packet) throws IOException {
    synchronized (dataQueue) {
      try {
        // 如果队列满了,则等待
        boolean firstWait = true;
        try {
          while (!streamerClosed && dataQueue.size() + ackQueue.size() >
              dfsClient.getConf().getWriteMaxPackets()) {
            if (firstWait) {
              Span span = Tracer.getCurrentSpan();
              if (span != null) {
                span.addTimelineAnnotation("dataQueue.wait");
              }
              firstWait = false;
            }
            try {
              dataQueue.wait(); //等待队列有充足的空间
            } catch (InterruptedException e) {
              // If we get interrupted while waiting to queue data, we still need to get rid
              // of the current packet. This is because we have an invariant that if
              // currentPacket gets full, it will get queued before the next writeChunk.
              //
              // Rather than wait around for space in the queue, we should instead try to
              // return to the caller as soon as possible, even though we slightly overrun
              // the MAX_PACKETS length.
              Thread.currentThread().interrupt();
              break;
            }
          }
        } finally {
          Span span = Tracer.getCurrentSpan();
          if ((span != null) && (!firstWait)) {
            span.addTimelineAnnotation("end.wait");
          }
        }
        checkClosed();
        //如果队列没满,则向队列中添加数据
        queuePacket(packet);
      } catch (ClosedChannelException ignored) {
      }
    }
  }

Enter queuePacketthe method (the logic of adding data to the queue) and come to DataStreamer.java:

2. Build a pipeline

2.1 Rack awareness (determining the storage location of blocks)

Ctrl + n globally search DataStreamer, search runmethod:

  @Override
  public void run() {
    long lastPacket = Time.monotonicNow();
    TraceScope scope = null;
    while (!streamerClosed && dfsClient.clientRunning) {
      // if the Responder encountered an error, shutdown Responder
      if (errorState.hasError()) {
        closeResponder();
      }

      DFSPacket one;
      try {
        // process datanode IO errors if any
        boolean doSleep = processDatanodeOrExternalError();

        final int halfSocketTimeout = dfsClient.getConf().getSocketTimeout()/2;
        //步骤一:等待要发送的packet到来
        synchronized (dataQueue) {
          // wait for a packet to be sent.
          long now = Time.monotonicNow();
          while ((!shouldStop() && dataQueue.size() == 0 &&
              (stage != BlockConstructionStage.DATA_STREAMING ||
                  now - lastPacket < halfSocketTimeout)) || doSleep) {
            long timeout = halfSocketTimeout - (now-lastPacket);
            timeout = timeout <= 0 ? 1000 : timeout;
            timeout = (stage == BlockConstructionStage.DATA_STREAMING)?
                timeout : 1000;
            try {
              //如果dataQueue中没有数据,代码会阻塞在这里
              dataQueue.wait(timeout);
            } catch (InterruptedException  e) {
              LOG.warn("Caught exception", e);
            }
            doSleep = false;
            now = Time.monotonicNow();
          }
          if (shouldStop()) {
            continue;
          }
          // 获取要发送的数据包
          if (dataQueue.isEmpty()) {
            one = createHeartbeatPacket();
          } 
          else {
            try {
              backOffIfNecessary();
            } catch (InterruptedException e) {
              LOG.warn("Caught exception", e);
            }
            //如果数据队列不为空,则从其中取出packet
            one = dataQueue.getFirst(); 
            SpanId[] parents = one.getTraceParents();
            if (parents.length > 0) {
              scope = dfsClient.getTracer().
                  newScope("dataStreamer", parents[0]);
              scope.getSpan().setParents(parents);
            }
          }
        }

        //步骤二:从NN获取新的block
        if (LOG.isDebugEnabled()) {
          LOG.debug("stage=" + stage + ", " + this);
        }
        if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) {
          LOG.debug("Allocating new block: {}", this);
          //向NN申请block并建立数据管道(Pipeline)
          setPipeline(nextBlockOutputStream());
          //启动ResponseProcessor用来监听packet发送是否成功
          initDataStreaming();
        } else if (stage == BlockConstructionStage.PIPELINE_SETUP_APPEND) {
          LOG.debug("Append to block {}", block);
          setupPipelineForAppendOrRecovery();
          if (streamerClosed) {
            continue;
          }
          initDataStreaming();
        }

        long lastByteOffsetInBlock = one.getLastByteOffsetBlock();
        if (lastByteOffsetInBlock > stat.getBlockSize()) {
          throw new IOException("BlockSize " + stat.getBlockSize() +
              " < lastByteOffsetInBlock, " + this + ", " + one);
        }

        if (one.isLastPacketInBlock()) {
          // wait for all data packets have been successfully acked
          synchronized (dataQueue) {
            while (!shouldStop() && ackQueue.size() != 0) {
              try {
                // wait for acks to arrive from datanodes
                dataQueue.wait(1000);
              } catch (InterruptedException  e) {
                LOG.warn("Caught exception", e);
              }
            }
          }
          if (shouldStop()) {
            continue;
          }
          stage = BlockConstructionStage.PIPELINE_CLOSE;
        }

        // 步骤三:发送packet
        SpanId spanId = SpanId.INVALID;
        synchronized (dataQueue) {
          // move packet from dataQueue to ackQueue
          if (!one.isHeartbeatPacket()) {
            if (scope != null) {
              spanId = scope.getSpanId();
              scope.detach();
              one.setTraceScope(scope);
            }
            scope = null;
            dataQueue.removeFirst(); //从dataQueue 把要发送的这个packet 移除出去
            ackQueue.addLast(one); //ackQueue 里面添加这个packet
            packetSendTime.put(one.getSeqno(), Time.monotonicNow());
            dataQueue.notifyAll();
          }
        }

        LOG.debug("{} sending {}", this, one);

        // 步骤四:向DN中写数据
        try (TraceScope ignored = dfsClient.getTracer().
            newScope("DataStreamer#writeTo", spanId)) {
          one.writeTo(blockStream); //写出数据
          blockStream.flush();
        } catch (IOException e) {
          // HDFS-3398 treat primary DN is down since client is unable to
          // write to primary DN. If a failed or restarting node has already
          // been recorded by the responder, the following call will have no
          // effect. Pipeline recovery can handle only one node error at a
          // time. If the primary node fails again during the recovery, it
          // will be taken out then.
          errorState.markFirstNodeIfNotMarked();
          throw e;
        }
        lastPacket = Time.monotonicNow();

        // update bytesSent
        long tmpBytesSent = one.getLastByteOffsetBlock();
        if (bytesSent < tmpBytesSent) {
          bytesSent = tmpBytesSent;
        }

        if (shouldStop()) {
          continue;
        }

        // Is this block full?
        if (one.isLastPacketInBlock()) {
          // wait for the close packet has been acked
          synchronized (dataQueue) {
            while (!shouldStop() && ackQueue.size() != 0) {
              dataQueue.wait(1000);// wait for acks to arrive from datanodes
            }
          }
          if (shouldStop()) {
            continue;
          }

          endBlock();
        }
        if (progress != null) { progress.progress(); }

        // This is used by unit test to trigger race conditions.
        if (artificialSlowdown != 0 && dfsClient.clientRunning) {
          Thread.sleep(artificialSlowdown);
        }
      } catch (Throwable e) {
        // Log warning if there was a real error.
        if (!errorState.isRestartingNode()) {
          // Since their messages are descriptive enough, do not always
          // log a verbose stack-trace WARN for quota exceptions.
          if (e instanceof QuotaExceededException) {
            LOG.debug("DataStreamer Quota Exception", e);
          } else {
            LOG.warn("DataStreamer Exception", e);
          }
        }
        lastException.set(e);
        assert !(e instanceof NullPointerException);
        errorState.setInternalError();
        if (!errorState.isNodeMarked()) {
          // Not a datanode issue
          streamerClosed = true;
        }
      } finally {
        if (scope != null) {
          scope.close();
          scope = null;
        }
      }
    }
    closeInternal();
  }

Enter nextBlockOutputStream(line 68):

Enter locateFollowingBlock:

Enter addBlock:

Enter addBlockand come to the ClientProtocol class:

Therefore, it can be judged that this method is implemented through the client agent of NN

Find its implementation class:

Enter NameNodeRpcServer and locate addBlock:

Enter getAdditionalBlock:

Select the storage location of the block;

Enter chooseTargetForNewBlock:

Enter chooseTarget4NewBlock:

Enter chooseTarget:

Continue to enter chooseTarget:

You can see that it is an abstract class, so find its implementation class:

Enter BlockPlacementPolicyDefault.java:

Enter chooseTarget:

Enter chooseTarget:

Enter chooseTargetInOrder, the logic of rack awareness:

  protected Node chooseTargetInOrder(int numOfReplicas, 
                                 Node writer,
                                 final Set<Node> excludedNodes,
                                 final long blocksize,
                                 final int maxNodesPerRack,
                                 final List<DatanodeStorageInfo> results,
                                 final boolean avoidStaleNodes,
                                 final boolean newBlock,
                                 EnumMap<StorageType, Integer> storageTypes)
                                 throws NotEnoughReplicasException {
    final int numOfResults = results.size();
    if (numOfResults == 0) {
      //第一个block存储在当前节点
      DatanodeStorageInfo storageInfo = chooseLocalStorage(writer,
          excludedNodes, blocksize, maxNodesPerRack, results, avoidStaleNodes,
          storageTypes, true);

      writer = (storageInfo != null) ? storageInfo.getDatanodeDescriptor()
                                     : null;

      if (--numOfReplicas == 0) {
        return writer;
      }
    }
    final DatanodeDescriptor dn0 = results.get(0).getDatanodeDescriptor();
    if (numOfResults <= 1) {
      //第二个block存储在另外一个机架
      chooseRemoteRack(1, dn0, excludedNodes, blocksize, maxNodesPerRack,
          results, avoidStaleNodes, storageTypes);
      if (--numOfReplicas == 0) {
        return writer;
      }
    }
    if (numOfResults <= 2) {
      final DatanodeDescriptor dn1 = results.get(1).getDatanodeDescriptor();
      if (clusterMap.isOnSameRack(dn0, dn1)) {
        //如果第一个和第二个在同一个机架,那么第三个放在其他机架
        chooseRemoteRack(1, dn0, excludedNodes, blocksize, maxNodesPerRack,
            results, avoidStaleNodes, storageTypes);
      } else if (newBlock){
        //如果是新块,和第二个块存储在同一个机架
        chooseLocalRack(dn1, excludedNodes, blocksize, maxNodesPerRack,
            results, avoidStaleNodes, storageTypes);
      } else {
        //如果不是新块,放在当前机架
        chooseLocalRack(writer, excludedNodes, blocksize, maxNodesPerRack,
            results, avoidStaleNodes, storageTypes);
      }
      if (--numOfReplicas == 0) {
        return writer;
      }
    }
    chooseRandom(numOfReplicas, NodeBase.ROOT, excludedNodes, blocksize,
        maxNodesPerRack, results, avoidStaleNodes, storageTypes);
    return writer;
  }
2.2 socket sending

Return to nextBlockOutputStream:

Enter createBlockOutputStream:

As can be seen from the comments, the main function of this method is to establish a connection with the first DN in the pipeline;

  boolean createBlockOutputStream(DatanodeInfo[] nodes,
      StorageType[] nodeStorageTypes, String[] nodeStorageIDs,
      long newGS, boolean recoveryFlag) {
    if (nodes.length == 0) {
      LOG.info("nodes are empty for write pipeline of " + block);
      return false;
    }
    String firstBadLink = "";
    boolean checkRestart = false;
    if (LOG.isDebugEnabled()) {
      LOG.debug("pipeline = " + Arrays.toString(nodes) + ", " + this);
    }

    // persist blocks on namenode on next flush
    persistBlocks.set(true);

    int refetchEncryptionKey = 1;
    while (true) {
      boolean result = false;
      DataOutputStream out = null;
      try {
        assert null == s : "Previous socket unclosed";
        assert null == blockReplyStream : "Previous blockReplyStream unclosed";
        //和DN创建socket连接
        s = createSocketForPipeline(nodes[0], nodes.length, dfsClient);
        long writeTimeout = dfsClient.getDatanodeWriteTimeout(nodes.length);
        long readTimeout = dfsClient.getDatanodeReadTimeout(nodes.length);
    	//输出流,用于写数据到DN
        OutputStream unbufOut = NetUtils.getOutputStream(s, writeTimeout);
        //输入流,用于读取写数据到DN的结果
        InputStream unbufIn = NetUtils.getInputStream(s, readTimeout);
        IOStreamPair saslStreams = dfsClient.saslClient.socketSend(s,
            unbufOut, unbufIn, dfsClient, accessToken, nodes[0]);
        unbufOut = saslStreams.out;
        unbufIn = saslStreams.in;
        out = new DataOutputStream(new BufferedOutputStream(unbufOut,
            DFSUtilClient.getSmallBufferSize(dfsClient.getConfiguration())));
        blockReplyStream = new DataInputStream(unbufIn);

        //
        // Xmit header info to datanode
        //

        BlockConstructionStage bcs = recoveryFlag ?
            stage.getRecoveryStage() : stage;

        // We cannot change the block length in 'block' as it counts the number
        // of bytes ack'ed.
        ExtendedBlock blockCopy = block.getCurrentBlock();
        blockCopy.setNumBytes(stat.getBlockSize());

        boolean[] targetPinnings = getPinnings(nodes);
        // 发送数据
        new Sender(out).writeBlock(blockCopy, nodeStorageTypes[0], accessToken,
            dfsClient.clientName, nodes, nodeStorageTypes, null, bcs,
            nodes.length, block.getNumBytes(), bytesSent, newGS,
            checksum4WriteBlock, cachingStrategy.get(), isLazyPersistFile,
            (targetPinnings != null && targetPinnings[0]), targetPinnings,
            nodeStorageIDs[0], nodeStorageIDs);

        // receive ack for connect
        BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
            PBHelperClient.vintPrefixed(blockReplyStream));
        Status pipelineStatus = resp.getStatus();
        firstBadLink = resp.getFirstBadLink();

        // Got an restart OOB ack.
        // If a node is already restarting, this status is not likely from
        // the same node. If it is from a different node, it is not
        // from the local datanode. Thus it is safe to treat this as a
        // regular node error.
        if (PipelineAck.isRestartOOBStatus(pipelineStatus) &&
            !errorState.isRestartingNode()) {
          checkRestart = true;
          throw new IOException("A datanode is restarting.");
        }

        String logInfo = "ack with firstBadLink as " + firstBadLink;
        DataTransferProtoUtil.checkBlockOpStatus(resp, logInfo);

        assert null == blockStream : "Previous blockStream unclosed";
        blockStream = out;
        result =  true; // success
        errorState.resetInternalError();
        lastException.clear();
        // remove all restarting nodes from failed nodes list
        failed.removeAll(restartingNodes);
        restartingNodes.clear();
      } catch (IOException ie) {
        if (!errorState.isRestartingNode()) {
          LOG.info("Exception in createBlockOutputStream " + this, ie);
        }
        if (ie instanceof InvalidEncryptionKeyException &&
            refetchEncryptionKey > 0) {
          LOG.info("Will fetch a new encryption key and retry, "
              + "encryption key was invalid when connecting to "
              + nodes[0] + " : " + ie);
          // The encryption key used is invalid.
          refetchEncryptionKey--;
          dfsClient.clearDataEncryptionKey();
          // Don't close the socket/exclude this node just yet. Try again with
          // a new encryption key.
          continue;
        }

        // find the datanode that matches
        if (firstBadLink.length() != 0) {
          for (int i = 0; i < nodes.length; i++) {
            // NB: Unconditionally using the xfer addr w/o hostname
            if (firstBadLink.equals(nodes[i].getXferAddr())) {
              errorState.setBadNodeIndex(i);
              break;
            }
          }
        } else {
          assert !checkRestart;
          errorState.setBadNodeIndex(0);
        }

        final int i = errorState.getBadNodeIndex();
        // Check whether there is a restart worth waiting for.
        if (checkRestart) {
          errorState.initRestartingNode(i,
              "Datanode " + i + " is restarting: " + nodes[i],
              shouldWaitForRestart(i));
        }
        errorState.setInternalError();
        lastException.set(ie);
        result =  false;  // error
      } finally {
        if (!result) {
          IOUtils.closeSocket(s);
          s = null;
          IOUtils.closeStream(out);
          IOUtils.closeStream(blockReplyStream);
          blockReplyStream = null;
        }
      }
      return result;
    }
  }

Enter writeBlock:

Enter send:

By flushflashing data;

2.3.socket reception

Data reception is the task of DN, so enter DataXceiverServer.java and locate runthe method:

Receive socket request;

Every time the client sends a block, it starts one DataXceiverto process the block.

Enter DataXceiverand locate the run method:

The type of operation to read data;

Process data according to operation type;

Enter processOp:

Different operation types can be seen

Enter opWriteBlock(write data):

Ctrl +alt +b Find the implementation class of writeBlock and enter DataXceiver.java:

Create a BlockReceiver;

Send data to the downstream socket

Next enter getBlockReceiver:

Enter BlockReceiver:

Create pipeline;

Enter createRbw:

Enter FsDatasetImpl.java:

Enter createRbw:

by createRbwFilecreating file

3. The client receives the response from the DN

Go back to DataStreamer.java and navigate to run:

Start the ResponseProcessor through initDataStreamingthe method to monitor whether the packet is sent successfully;

Create a ResponseProcessor and start the thread;

Enter ResponseProcessorand locate run:

    @Override
    public void run() {

      setName("ResponseProcessor for block " + block);
      PipelineAck ack = new PipelineAck();

      TraceScope scope = null;
      while (!responderClosed && dfsClient.clientRunning && !isLastPacketInBlock) {
        // 处理来自DN的应答
        try {
          // 从管道中读取一个ack
          ack.readFields(blockReplyStream);
          if (ack.getSeqno() != DFSPacket.HEART_BEAT_SEQNO) {
            Long begin = packetSendTime.get(ack.getSeqno());
            if (begin != null) {
              long duration = Time.monotonicNow() - begin;
              if (duration > dfsclientSlowLogThresholdMs) {
                LOG.info("Slow ReadProcessor read fields for block " + block
                    + " took " + duration + "ms (threshold="
                    + dfsclientSlowLogThresholdMs + "ms); ack: " + ack
                    + ", targets: " + Arrays.asList(targets));
              }
            }
          }

          if (LOG.isDebugEnabled()) {
            LOG.debug("DFSClient {}", ack);
          }

          long seqno = ack.getSeqno();
          // processes response status from datanodes.
          ArrayList<DatanodeInfo> congestedNodesFromAck = new ArrayList<>();
          for (int i = ack.getNumOfReplies()-1; i >=0  && dfsClient.clientRunning; i--) {
            final Status reply = PipelineAck.getStatusFromHeader(ack
                .getHeaderFlag(i));
            if (PipelineAck.getECNFromHeader(ack.getHeaderFlag(i)) ==
                PipelineAck.ECN.CONGESTED) {
              congestedNodesFromAck.add(targets[i]);
            }
            // Restart will not be treated differently unless it is
            // the local node or the only one in the pipeline.
            if (PipelineAck.isRestartOOBStatus(reply)) {
              final String message = "Datanode " + i + " is restarting: "
                  + targets[i];
              errorState.initRestartingNode(i, message,
                  shouldWaitForRestart(i));
              throw new IOException(message);
            }
            // node error
            if (reply != SUCCESS) {
              errorState.setBadNodeIndex(i); // mark bad datanode
              throw new IOException("Bad response " + reply +
                  " for " + block + " from datanode " + targets[i]);
            }
          }

          if (!congestedNodesFromAck.isEmpty()) {
            synchronized (congestedNodes) {
              congestedNodes.clear();
              congestedNodes.addAll(congestedNodesFromAck);
            }
          } else {
            synchronized (congestedNodes) {
              congestedNodes.clear();
              lastCongestionBackoffTime = 0;
            }
          }

          assert seqno != PipelineAck.UNKOWN_SEQNO :
              "Ack for unknown seqno should be a failed ack: " + ack;
          if (seqno == DFSPacket.HEART_BEAT_SEQNO) {  // a heartbeat ack
            continue;
          }

          // 标志成功传输的ack
          DFSPacket one;
          synchronized (dataQueue) {
            one = ackQueue.getFirst();
          }
          if (one.getSeqno() != seqno) {
            throw new IOException("ResponseProcessor: Expecting seqno " +
                " for block " + block +
                one.getSeqno() + " but received " + seqno);
          }
          isLastPacketInBlock = one.isLastPacketInBlock();

          // Fail the packet write for testing in order to force a
          // pipeline recovery.
          if (DFSClientFaultInjector.get().failPacket() &&
              isLastPacketInBlock) {
            failPacket = true;
            throw new IOException(
                "Failing the last packet for testing.");
          }

          // update bytesAcked
          block.setNumBytes(one.getLastByteOffsetBlock());

          synchronized (dataQueue) {
            scope = one.getTraceScope();
            if (scope != null) {
              scope.reattach();
              one.setTraceScope(null);
            }
            lastAckedSeqno = seqno;
            pipelineRecoveryCount = 0;
            ackQueue.removeFirst(); //从ack队列中移除
            packetSendTime.remove(seqno);
            dataQueue.notifyAll(); //通知dataQueue应答处理完毕

            one.releaseBuffer(byteArrayManager);
          }
        } catch (Exception e) {
          if (!responderClosed) {
            lastException.set(e);
            errorState.setInternalError();
            errorState.markFirstNodeIfNotMarked();
            synchronized (dataQueue) {
              dataQueue.notifyAll();
            }
            if (!errorState.isRestartingNode()) {
              LOG.warn("Exception for " + block, e);
            }
            responderClosed = true;
          }
        } finally {
          if (scope != null) {
            scope.close();
          }
          scope = null;
        }
      }
    }

At this point, after the client successfully receives the response from the DN, the upload process is completed.

Guess you like

Origin blog.csdn.net/qq_51235856/article/details/132973642