The principle of HDFS asynchronous editlog based on RPC Call delayed return

Preface


In the previous article, the author introduced that in order to increase the internal RPC throughput, the Hadoop community releases the Handler resources on the Server side early by adjusting the delayed response response, so as to use the Handler's processing capabilities as much as possible for real RPC requests. The asynchronous editlog mechanism currently used by HDFS uses this optimization and improvement. The HDFS asynchronous editlog writing mentioned here is not what everyone simply thinks that the NameNode completely asynchronously writes the editlog to its JournalNode service, and then directly returns the result to the client. Then, when the editlog failed to be written asynchronously, how can the client know what happened later? It can only accept the previously received "expected" results for subsequent operations. So we say that delayed return can play a powerful role in this scene. The author of this article will talk in detail how this delayed return mechanism works in the asynchronous editlog of HDFS.

RPC normal request processing of existing HDFS


Before describing the HDFS asynchronous editlog mechanism, let's take a look at the process of normal HDFS RPC request processing:

  • 1) The client initiates the request operation.
  • 2) The NameNode receives the request, and then executes the processing of the corresponding RPC call request method, and if the operation is successfully processed, it needs to write out the editlog information of the corresponding operation.
  • 3) The NameNode request execution ends. At the end of the method, the logSync operation is executed, and the editlog corresponding to this operation is written to the JournalNode. At this point, a complete RPC call operation ends.
  • 4) The NameNode returns the result to the Client.

The simple diagram process is as follows:
Insert picture description here

The following is a sample RPC call request processing method in NameNode:


  boolean setReplication(final String src, final short replication)
      throws IOException {
    
    
    final String operationName = "setReplication";
    boolean success = false;
    checkOperation(OperationCategory.WRITE);
    final FSPermissionChecker pc = getPermissionChecker();
    FSPermissionChecker.setOperationType(operationName);
    try {
    
    
      writeLock();
      // 1)执行设置副本具体操作
      try {
    
    
        checkOperation(OperationCategory.WRITE);
        checkNameNodeSafeMode("Cannot set replication for " + src);
        success = FSDirAttrOp.setReplication(dir, pc, blockManager, src,
            replication);
      } finally {
    
    
        writeUnlock(operationName, getLockReportInfoSupplier(src));
      }
    } catch (AccessControlException e) {
    
    
      logAuditEvent(false, operationName, src);
      throw e;
    }
    if (success) {
    
    
      // 3)如果执行成功,执行logSync操作,写出editlog到JN中
      getEditLog().logSync();
      logAuditEvent(true, operationName, src);
    }
    return success;
  }

The internal setReplication method above:

  static boolean setReplication(
      FSDirectory fsd, FSPermissionChecker pc, BlockManager bm, String src,
      final short replication) throws IOException {
    
    
    bm.verifyReplication(src, replication, null);
    final boolean isFile;
    fsd.writeLock();
    try {
    
    
      final INodesInPath iip = fsd.resolvePath(pc, src, DirOp.WRITE);
      if (fsd.isPermissionEnabled()) {
    
    
        fsd.checkPathAccess(pc, iip, FsAction.WRITE);
      }

      final BlockInfo[] blocks = unprotectedSetReplication(fsd, iip,
                                                           replication);
      isFile = blocks != null;
      if (isFile) {
    
    
        // 2)执行到此处,setReplication操作成功,写出setReplication对应的editlog信息
        fsd.getEditLog().logSetReplication(iip.getPath(), replication);
      }
    } finally {
    
    
      fsd.writeUnlock();
    }
    return isFile;
  }

Asynchronous editlog mechanism based on RPC Call delayed return


After understanding the RPC Call call for writing editlog synchronously, let’s take a look at how asynchronous editlog is done.

First of all, asynchronous editlog must ensure a big principle premise

The request processing result received by the Client must be reliable.

The real problem to be solved here is that we not only want the editlog to be written out asynchronously, on the other hand, the return result of the client calling thread has to depend on the completion of the editlog writing, which actually depends on the previous execution result. So the asynchronous writing of our editlog here is not asynchronous in the full sense.

The core improvement of the asynchronous editlog is that it removes the heavier operations that are synchronously written by the editlog such as logSync from the RPC Call processing method, and another thread is used to write the editlog. Instead, logSync does a simple editlog entry queue operation. In this case, the Handler on the NameNode server can immediately take over to handle other requests. After waiting for the consumption editlog to be actually written out, the thread that synchronizes the editlog triggers the operation of returning the client response. At this time, the client receives the result of the request processing. In this process, we still ensure that such a big premise will be returned as long as the editlog is successfully written.

Simply put, for the server side, its editlog writing is executed asynchronously, but for the client side, its results still need to wait for the completion of the editlog.

A simple diagram of this process is as follows: The
Insert picture description here
above figure shows 2 dashed lines, the first small dashed line refers to the Handler thread after the logSync method is executed and the editlog to be written is added to the pending editlog queue, and the Handler is processed The request is complete, and then it can continue to process other requests. The big dotted line in the second paragraph means that the return of response is triggered after the editlog write thread has actually executed logSync.

Therefore, we can see from here that the delayed return strategy of RPC Call mainly works as follows:

  • Split the potentially fixed and possibly heavier operations in the original RPC Call operation to another thread for processing, so as to ensure that the Handler thread can quickly execute the main operation method, thereby increasing the server's request throughput.
  • For asynchronous threads to execute the previously split operations, the client waits for the completion of the execution of the requested operation by delaying the return result to ensure the accuracy of data processing.

Regarding the change of delayed return, interested students refer to the community JIRA: HADOOP-10300: Allowed deferred sending of call responses . Of course, the asynchronous editlog described in this article is only a case of using the delayed reply mechanism, and we can also have other similar applicable scenarios.

Related Links


[1].https://issues.apache.org/jira/browse/HADOOP-10300
[2].https://blog.csdn.net/Androidlushangderen/article/details/106316751

Guess you like

Origin blog.csdn.net/Androidlushangderen/article/details/106535484