Preface

During the operation of a distributed system, client request response timeouts caused by network instability (such as network timeouts) occur from time to time. In this low-probability situation, the client is actually unable to perceive whether its request is actually processed, it can only be based on a bad situation (that is, the request is not processed by the server), and then execute Trial operation. The problem arises here. For some non-power operations, retrying operations will return different results. At this time, in fact, the server should not execute the second request initiated by the client, assuming that the server has successfully processed the first request of the client. In this article, we will talk about RetryCache for non-idempotent operation processing. Through RetryCache we can avoid repeated requests being processed.

The problem of repeated processing of non-idempotent operations

Here we go back and talk about the problems that may be caused by repeated processing of non-idempotent operations.

To summarize briefly, there are several potential problems:

1) Application failed because of receiving the abnormal result information returned by the server. Because non-idempotent types are repeatedly requested by the server for the second time, it may cause incorrect results to be returned. For example, when a file creation request is repeated, the system will return an error like FileAlreadyExistException the second time.

2) Destruction of metadata information on the Server side. Suppose we perform the create file operation, and then the file is read by the related task and then cleaned up normally, but the client's retry caused the file to be created again, which may cause the metadata information to be damaged.

3) Metadata consistency issues during Server HA failover switching. When the service is doing HA failover switching, service active/standby switching is a heavier operation. During the failover switching period, there will be cases where the client request does not respond and time out. At this time, part of the request may be processed, and part of it may not be processed in fact. After the service is switched between active and standby, in order to ensure the complete consistency of the server status, we need to use RetryCache to help the server do repeated request processing. Of course, this requires a new active server to rebuild the internal RetryCache.

In view of the above problems, we need to introduce an internal Cache to store the results of the executed request calls to prevent the repeated processing of non-idempotent operations. Here, we call the above Cache as RetryCache.

Implementation details of RetryCache

If we want to implement a complete RetryCache, what are the key points to consider?

The following points are mainly listed here:

The client requests the independent identification of the call. At present, the RPC server generally has a concept similar to callId to distinguish requests, but a single callId still cannot distinguish whether the request comes from the client of the same machine or the client of multiple machines. Here we need to introduce an additional clientId field to form a joint Id method of <callId+clientId>.
The mark distinguishes whether the operation method is idempotent or non-idempotent, and we only store the result of the latter type request in RetryCache.
Each Cache Entry inside RetryCache cannot guarantee permanent storage, and it needs to have an expiration time limit.
RetryCache's information persistence and reconstruction process are considered, this mainly occurs when the HA service is the master-slave switch.

RetryCache implementation example

For the above implementation details, we have a more detailed understanding through a specific example, which is taken from the RetryCache class used by Hadoop.

The first is the definition of Cache Entry:

  /**
   * CacheEntry is tracked using unique client ID and callId of the RPC request
   */
  public static class CacheEntry implements LightWeightCache.Entry {
    
    
    /**
     * Processing state of the requests
     */
    private static byte INPROGRESS = 0;
    private static byte SUCCESS = 1;
    private static byte FAILED = 2;

    /** 此entry代表的请求目前的状态，正在被处理，或者已经处理成功或失败*/
    private byte state = INPROGRESS;
    
    ...
    
    private final int callId;
    private final long expirationTime;
    private LightWeightGSet.LinkedElement next;

    /**
     * 一个全新的cache entry，它需要有clientId，callId以及过期时间.
     */
    CacheEntry(byte[] clientId, int callId, long expirationTime) {
    
    
      // ClientId must be a UUID - that is 16 octets.
      Preconditions.checkArgument(clientId.length == ClientId.BYTE_LENGTH,
          "Invalid clientId - length is " + clientId.length
              + " expected length " + ClientId.BYTE_LENGTH);
      // Convert UUID bytes to two longs
      clientIdMsb = ClientId.getMsb(clientId);
      clientIdLsb = ClientId.getLsb(clientId);
      this.callId = callId;
      this.expirationTime = expirationTime;
    }
	...

    @Override
    public boolean equals(Object obj) {
    
    
      if (this == obj) {
    
    
        return true;
      }
      if (!(obj instanceof CacheEntry)) {
    
    
        return false;
      }
      CacheEntry other = (CacheEntry) obj;
      // cache entry的equal通过callId和clientId联合比较，确保请求是来自重试操作的client
      return callId == other.callId && clientIdMsb == other.clientIdMsb
          && clientIdLsb == other.clientIdLsb;
    }

}
  /**
   * CacheEntry with payload that tracks the previous response or parts of
   * previous response to be used for generating response for retried requests.
   */
  public static class CacheEntryWithPayload extends CacheEntry {
    
    
    // palyload简单理解为带了返回结果对象实例的RPC call
    private Object payload;

    CacheEntryWithPayload(byte[] clientId, int callId, Object payload,
        long expirationTime) {
    
    
      super(clientId, callId, expirationTime);
      this.payload = payload;
    }

The following is the method call of the core RetryCache result acquisition:

   */
  private CacheEntry waitForCompletion(CacheEntry newEntry) {
    
    
    CacheEntry mapEntry = null;
    lock.lock();
    try {
    
    
      // 1)从Cache中获取是否有对应Cache Entry
      mapEntry = set.get(newEntry);
      // 如果没有，则加入此entry到Cache中
      if (mapEntry == null) {
    
    
        if (LOG.isTraceEnabled()) {
    
    
          LOG.trace("Adding Rpc request clientId "
              + newEntry.clientIdMsb + newEntry.clientIdLsb + " callId "
              + newEntry.callId + " to retryCache");
        }
        set.put(newEntry);
        retryCacheMetrics.incrCacheUpdated();
        return newEntry;
      } else {
    
    
        retryCacheMetrics.incrCacheHit();
      }
    } finally {
    
    
      lock.unlock();
    }
    // Entry already exists in cache. Wait for completion and return its state
    Preconditions.checkNotNull(mapEntry,
        "Entry from the cache should not be null");
    // Wait for in progress request to complete
    // 3）如果获取到了Cache Entry，如果状态是正在执行中的，则等待其结束
    synchronized (mapEntry) {
    
    
      while (mapEntry.state == CacheEntry.INPROGRESS) {
    
    
        try {
    
    
          mapEntry.wait();
        } catch (InterruptedException ie) {
    
    
          // Restore the interrupted status
          Thread.currentThread().interrupt();
        }
      }
      // Previous request has failed, the expectation is that it will be
      // retried again.
      if (mapEntry.state != CacheEntry.SUCCESS) {
    
    
        mapEntry.state = CacheEntry.INPROGRESS;
      }
    }
    // 4）Cache Entry对应的call已经结束，则返回之前cache的结果
    return mapEntry;
  }

Let's look at the actual RetryCache call scenario:

  public long addCacheDirective(
      CacheDirectiveInfo path, EnumSet<CacheFlag> flags) throws IOException {
    
    
    checkNNStartup();
    namesystem.checkOperation(OperationCategory.WRITE);
    // 1)从RetryCache中查询是否已经是执行过的RPC call调用
    CacheEntryWithPayload cacheEntry = RetryCache.waitForCompletion
      (retryCache, null);
    // 2)如果有同一调用，并且是成功状态的，则返回上次payload的结果
    // 否则进行后续处理操作的调用
    if (cacheEntry != null && cacheEntry.isSuccess()) {
    
    
      return (Long) cacheEntry.getPayload();
    }

    boolean success = false;
    long ret = 0;
    try {
    
    
      ret = namesystem.addCacheDirective(path, flags, cacheEntry != null);
      success = true;
    } finally {
    
    
      // 3)操作完毕后，在RetryCache内部更新Entry的状态结果，
      // 并设置payload对象(返回结果对象)
      RetryCache.setState(cacheEntry, success, ret);
    }
    return ret;
  }

For more implementation details above, please refer to the reference link code below.

Quote

[1].https://issues.apache.org/jira/browse/HDFS-4979
[2].https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RetryCache.java

RetryCache mechanism in distributed system

Preface

The problem of repeated processing of non-idempotent operations

Implementation details of RetryCache

RetryCache implementation example

Quote

Guess you like