[Redis Advanced] One article to understand the underlying implementation of Redisson's watchdog mechanism

1. Overview of watchdog mechanism

The watchdog mechanism is an automatic extension mechanism provided by Redission. This mechanism enables the distributed locks provided by Redission to be automatically renewed .

private long lockWatchdogTimeout = 30 * 1000;

The default timeout period provided by the watchdog mechanism is 30*1000 milliseconds, which is 30 seconds

If after a thread acquires the lock, the time it takes from running the program to releasing the lock is longer than the automatic release time of the lock (that is, the timeout period provided by the watchdog mechanism is 30s), then Redission will automatically extend the timeout period for the target lock in redis.

If we want to start the watchdog mechanism in Redission, then we don't have to define it ourselves when acquiring the lock leaseTime(锁自动释放时间).

If you define the lock automatic release time yourself, the watchdog mechanism cannot be enabled whether it is through lockor through the method.tryLock

However, if -1 is passed in leaseTime, the watchdog mechanism will also be turned on.

Distributed locks cannot be set to never expire . This is to avoid deadlocks when a node acquires a lock and goes down after acquiring the lock. Therefore, a distributed lock needs to be set with an expiration time. But this will cause a thread to acquire the lock, and the program has not finished running when the lock expiration time arrives , causing the lock to be released overtime, and then other threads can acquire the lock and come in, causing problems.

Therefore, the automatic renewal of the watchdog mechanism solves this problem well.


2. Interpretation of source code

Enter the method, there are three parameters tryLockheretryLock(waitTime, -1, unit)

  1. waitTime: The maximum waiting time for acquiring a lock (if not passed, the default is -1)
  2. leaseTime: The time when the lock is automatically released (default -1 if not passed)
  3. unit: unit of time (waiting time and time unit for lock automatic release)
public boolean tryLock(long waitTime, TimeUnit unit) throws InterruptedException {
    
    
    return tryLock(waitTime, -1, unit);
}
    @Override
    public boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException {
    
    
        long time = unit.toMillis(waitTime);
        long current = System.currentTimeMillis();
        long threadId = Thread.currentThread().getId();
        Long ttl = tryAcquire(waitTime, leaseTime, unit, threadId);
        // lock acquired
        if (ttl == null) {
    
    
            return true;
        }
        
        time -= System.currentTimeMillis() - current;
        if (time <= 0) {
    
    
            acquireFailed(waitTime, unit, threadId);
            return false;
        }
        
        current = System.currentTimeMillis();
        RFuture<RedissonLockEntry> subscribeFuture = subscribe(threadId);
        if (!subscribeFuture.await(time, TimeUnit.MILLISECONDS)) {
    
    
            if (!subscribeFuture.cancel(false)) {
    
    
                subscribeFuture.onComplete((res, e) -> {
    
    
                    if (e == null) {
    
    
                        unsubscribe(subscribeFuture, threadId);
                    }
                });
            }
            acquireFailed(waitTime, unit, threadId);
            return false;
        }

        try {
    
    
            time -= System.currentTimeMillis() - current;
            if (time <= 0) {
    
    
                acquireFailed(waitTime, unit, threadId);
                return false;
            }
        
            while (true) {
    
    
                long currentTime = System.currentTimeMillis();
                ttl = tryAcquire(waitTime, leaseTime, unit, threadId);
                // lock acquired
                if (ttl == null) {
    
    
                    return true;
                }

                time -= System.currentTimeMillis() - currentTime;
                if (time <= 0) {
    
    
                    acquireFailed(waitTime, unit, threadId);
                    return false;
                }

                // waiting for message
                currentTime = System.currentTimeMillis();
                if (ttl >= 0 && ttl < time) {
    
    
                    subscribeFuture.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                } else {
    
    
                    subscribeFuture.getNow().getLatch().tryAcquire(time, TimeUnit.MILLISECONDS);
                }

                time -= System.currentTimeMillis() - currentTime;
                if (time <= 0) {
    
    
                    acquireFailed(waitTime, unit, threadId);
                    return false;
                }
            }
        } finally {
    
    
            unsubscribe(subscribeFuture, threadId);
        }
//        return get(tryLockAsync(waitTime, leaseTime, unit));
    }

The pile of code above is mainly lock retry code. If you are interested, you can read [Redis] 4. Wanzi article takes you in-depth interpretation of Redisson and source code (recommended collection)——Blog with no inspiration in naming - CSDN Blog

The relevant code of the watchdog mechanism is mainly in tryAcquirethe method. In this method, the main method istryAcquireAsync(waitTime, leaseTime, unit, threadId)

private Long tryAcquire(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
    
    
    return get(tryAcquireAsync(waitTime, leaseTime, unit, threadId));
}

Since tryLockit is not passed in the method leaseTime, it leaseTimeis the default value -1

Call tryLockInnerAsync, if the lock acquisition fails, the returned result is the remaining validity period of the key, and null if the lock acquisition succeeds.

After the lock is acquired successfully, if there is no abnormality detected and the lock is successfully acquired `(ttlRemaining == null).

Then execute this.scheduleExpirationRenewal(threadId);to start the watchdog mechanism.

private <T> RFuture<Long> tryAcquireAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
    
    
    if (leaseTime != -1L) {
    
    
        return this.tryLockInnerAsync(waitTime, leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    } else {
    
    
        //如果获取锁失败,返回的结果是这个key的剩余有效期
        RFuture<Long> ttlRemainingFuture = this.tryLockInnerAsync(waitTime, this.commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout(), TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
        //上面获取锁回调成功之后,执行这代码块的内容
        ttlRemainingFuture.onComplete((ttlRemaining, e) -> {
    
    
            //不存在异常
            if (e == null) {
    
    
                //剩余有效期为null
                if (ttlRemaining == null) {
    
    
                    //这个函数是解决最长等待有效期的问题
                    this.scheduleExpirationRenewal(threadId);
                }

            }
        });
        return ttlRemainingFuture;
    }
}
<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
    
    
    internalLockLeaseTime = unit.toMillis(leaseTime);

    return evalWriteAsync(getName(), LongCodec.INSTANCE, command,
                          // 锁不存在,则往redis中设置锁信息
                          "if (redis.call('exists', KEYS[1]) == 0) then " +
                          "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                          "return nil; " +
                          "end; " +
                          // 锁存在
                          "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                          "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                          "return nil; " +
                          "end; " +
                          "return redis.call('pttl', KEYS[1]);",
                          Collections.singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
}

A lock corresponds to a ExpirationEntryclass of its own,

EXPIRATION_RENEWAL_MAPIt stores all the information.

According to the name of the lock EXPIRATION_RENEWAL_MAP, get the lock from it, if the lock exists, it will be flushed, if it does not exist, the new lock will be placed EXPIRATION_RENEWAL_MAP, and the watchdog mechanism will be turned on.

private static final ConcurrentMap<String, ExpirationEntry> EXPIRATION_RENEWAL_MAP = new ConcurrentHashMap<>();
private void scheduleExpirationRenewal(long threadId) {
    
    
    ExpirationEntry entry = new ExpirationEntry();
    //这里EntryName是指锁的名称
    ExpirationEntry oldEntry = (ExpirationEntry)EXPIRATION_RENEWAL_MAP.putIfAbsent(this.getEntryName(), entry);
    if (oldEntry != null) {
    
    
        //重入
        //将线程ID加入
        oldEntry.addThreadId(threadId);
    } else {
    
    
        //将线程ID加入
        entry.addThreadId(threadId);
        //续约
        this.renewExpiration();
    }
}

First, acquire the lock EXPIRATION_RENEWAL_MAPfrom it, and then define a delayed task task. The steps of this task are as follows

  1. Create a new sub-thread to call repeatedly
  2. EXPIRATION_RENEWAL_MAPGet the lock from
  3. Obtain the ID of the thread that acquired the lock from the lockthreadId
  4. call renewExpirationAsyncmethod to refresh the maximum waiting time
  5. If the refresh is successful, come in and call this function recursivelyrenewExpiration()

This task taskis set to this.internalLockLeaseTime / 3Lthe automatic release time of the lock, because it is not transmitted, that is, 10s.

In other words, this delayed task is executed once every ten seconds.

Finally, just eeset the delay task for this locktask

private void renewExpiration() {
    
    
    //先从map里得到这个ExpirationEntry
    ExpirationEntry ee = (ExpirationEntry)EXPIRATION_RENEWAL_MAP.get(this.getEntryName());
    if (ee != null) {
    
    
        //这个是一个延迟任务
        Timeout task = this.commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
    
    
            //延迟任务内容
            public void run(Timeout timeout) throws Exception {
    
    
                //拿出ExpirationEntry
                ExpirationEntry ent = (ExpirationEntry)RedissonLock.EXPIRATION_RENEWAL_MAP.get(RedissonLock.this.getEntryName());
                if (ent != null) {
    
    
                    //从ExpirationEntry拿出线程ID
                    Long threadId = ent.getFirstThreadId();
                    if (threadId != null) {
    
    
                        //调用renewExpirationAsync方法刷新最长等待时间
                        RFuture<Boolean> future = RedissonLock.this.renewExpirationAsync(threadId);
                        future.onComplete((res, e) -> {
    
    
                            if (e != null) {
    
    
                                RedissonLock.log.error("Can't update lock " + RedissonLock.this.getName() + " expiration", e);
                            } else {
    
    
                                if (res) {
    
    
                                    //renewExpirationAsync方法执行成功之后,进行递归调用,调用自己本身函数
                                    //那么就可以实现这样的效果
                                    //首先第一次进行这个函数,设置了一个延迟任务,在10s后执行
                                    //10s后,执行延迟任务的内容,刷新有效期成功,那么就会再新建一个延迟任务,刷新最长等待有效期
                                    //这样这个最长等待时间就会一直续费
                                    RedissonLock.this.renewExpiration();
                                }

                            }
                        });
                    }
                }
            }
        }, 
                                                                              //这是锁自动释放时间,因为没传,所以是看门狗时间=30*1000
                                                                              //也就是10s
                                                                              this.internalLockLeaseTime / 3L, 
                                                                              //时间单位
                                                                              TimeUnit.MILLISECONDS);
        //给当前ExpirationEntry设置延迟任务
        ee.setTimeout(task);
    }
}



// 刷新等待时间
protected RFuture<Boolean> renewExpirationAsync(long threadId) {
    
    
    return evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
                          "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                          "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                          "return 1; " +
                          "end; " +
                          "return 0;",
                          Collections.singletonList(getName()),
                          internalLockLeaseTime, getLockName(threadId));
}

Finally, when the lock is released, all delayed tasks will be closed . The core code is as follows

public RFuture<Void> unlockAsync(long threadId) {
    
    
    RPromise<Void> result = new RedissonPromise();
    RFuture<Boolean> future = this.unlockInnerAsync(threadId);
    future.onComplete((opStatus, e) -> {
    
    
        //取消锁更新任务
        this.cancelExpirationRenewal(threadId);
        if (e != null) {
    
    
            result.tryFailure(e);
        } else if (opStatus == null) {
    
    
            IllegalMonitorStateException cause = new IllegalMonitorStateException("attempt to unlock lock, not locked by current thread by node id: " + this.id + " thread-id: " + threadId);
            result.tryFailure(cause);
        } else {
    
    
            result.trySuccess((Object)null);
        }
    });
    return result;
}

void cancelExpirationRenewal(Long threadId) {
    
    
    //获得当前这把锁的任务
    ExpirationEntry task = (ExpirationEntry)EXPIRATION_RENEWAL_MAP.get(this.getEntryName());
    if (task != null) {
    
    
        //当前锁的延迟任务不为空,且线程id不为空
        if (threadId != null) {
    
    
            //先把线程ID去掉
            task.removeThreadId(threadId);
        }

        if (threadId == null || task.hasNoThreads()) {
    
    
            //然后取出延迟任务
            Timeout timeout = task.getTimeout();
            if (timeout != null) {
    
    
                //把延迟任务取消掉
                timeout.cancel();
            }
			//再把ExpirationEntry移除出map
            EXPIRATION_RENEWAL_MAP.remove(this.getEntryName());
        }

    }
}

3. Summary

When using Redis to implement distributed locks, there will be many problems.

For example, if the business logic processing time is greater than the lock automatic release time set by yourself, Redis will release the lock according to the timeout situation, and other threads will take advantage of the gap to snatch the lock and cause problems, so a renewal operation is required .

Moreover, if the operation of releasing the lock is finallybeing completed, it is necessary to judge whether the current lock belongs to its own lock, so as to prevent the release of the locks of other threads, so that the operation of releasing the lock is not atomic, and this problem is easy to solve, using the luascript That's it.

RedissonThe emergence of the watchdog mechanism is very good to solve the problem of renewal, its main steps are as follows:

  1. When acquiring the lock, it cannot be specified leaseTimeor can only be leaseTimeset to -1, so as to enable the watchdog mechanism.
  2. Try to acquire the lock in tryLockInnerAsyncthe method, if the lock is successfully acquired, scheduleExpirationRenewalthe execution watchdog mechanism is called
  3. scheduleExpirationRenewalThe more important method in is that when renewExpirationthe thread acquires the lock for the first time (that is, it is not the case of reentry), then it will call renewExpirationthe method to start the watchdog mechanism.
  4. renewExpirationA delayed task will be added to the current lock . taskThis delayed task will be executed after 10s. The executed task is to refresh the validity period of the lock to 30s (this is the default lock release time of the watchdog mechanism)
  5. And it will continue to call recursively at the end of the task renewExpiration.

That is to say, the general process is to first acquire the lock (the lock is automatically released after 30s), and then set a delay task for the lock (executed after 10s), the delay task refreshes the release time of the lock to 30s, and also sets the lock again The same delayed task (executed after 10s), so that if the lock is not released (the program is not executed), the watchdog mechanism will refresh the automatic release time of the lock to 30s every 10s.

And when the program is abnormal, the watchdog mechanism will not continue to call recursively renewExpiration, so that the lock will be automatically released after 30s.

Or, after the program actively releases the lock, the process is as follows:

  1. Remove the thread ID corresponding to the lock
  2. Then get the delayed task from the lock and cancel the delayed task
  3. Remove the lock EXPIRATION_RENEWAL_MAPfrom it .

Guess you like

Origin blog.csdn.net/weixin_51146329/article/details/129612350