Analysis of distributed lock and watchdog mechanism based on Redis

principle

Distributed locks must meet the basic requirements of mutual exclusion and deadlock prevention.

Further requirements are reentrant (not necessary, but important), efficient.

Implementation under Redisson

The following source code reading is based on redisson: 3.12.0
Please add a picture description

Mark resources and occupants

The key corresponding to a specific resource is set to accessible redis, and a certain overtime and a value that can represent the occupant are configured (the value can be ObjectId or a snowflake-like algorithm, and other algorithms that can globally uniquely identify a thread).
value is mainly used to help implement reentrant features.

In Redison, UUID is simply used as the global unique identifier of the connection manager, combined with the thread id of the machine, the globally unique identifier thread

Mutex/reentrant

Reentrancy is mainly judged by comparing whether the value of the corresponding resource key is consistent with the value saved in the ThreadLocal of the current thread.

Anti-deadlock locking process

Use time-sensitive locks to obtain resources, and use the watchdog mechanism to automatically renew contracts to compensate for possible timeout problems.
Please add a picture description
There are 4 ways to lock and release:

RedissonClient client = Redisson.create();

// 获取指定资源的锁
RLock lock = client.getLock("resource-1");

try {
    
    
    // 1. 堵塞加锁 配置占用时间是30s
    lock.lock();
    // 2. 堵塞加锁 指定占用时长
    lock.lock(30, TimeUnit.SECONDS);

    // 3. 尝试等待10s内获取锁
    boolean b = lock.tryLock(10, TimeUnit.SECONDS);
    // 4. 尝试等待10s内获取锁,并占用30s
    boolean b1 = lock.tryLock(10, 30, TimeUnit.SECONDS);
} catch (InterruptedException e) {
    
    
    e.printStackTrace();
} finally {
    
    
    lock.unlock();
}
private <T> RFuture<Long> tryAcquireAsync(long leaseTime, TimeUnit unit, long threadId) {
    
    
    // 如果有指定则使用时间
    if (leaseTime != -1) {
    
    
        return tryLockInnerAsync(leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    }
    // 默认配置的加锁时间是30s
    RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout(), TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
    ttlRemainingFuture.onComplete((ttlRemaining, e) -> {
    
    
        if (e != null) {
    
    
            return;
        }

        // lock acquired
        if (ttlRemaining == null) {
    
    
            scheduleExpirationRenewal(threadId);
        }
    });
    return ttlRemainingFuture;
}
// tryLockInnerAsync 加锁的lua脚本
// KEYS[1]:getName() 资源名称
// ARGV[1]:internalLockLeaseTime 加锁时间; ARGV[2]:getLockName(threadId) 占用资源的线程标识

// 检查对应资源锁是否存在 0表示不存在 1表示存在
if (redis.call('exists', KEYS[1]) == 0) 
then 
    // 不存在,则对资源加锁,并将占用线程的id写入值,用于可重入
    redis.call('hset', KEYS[1], ARGV[2], 1);
    // 设置超时时间 
    redis.call('pexpire', KEYS[1], ARGV[1]); 
    return nil; 
end; 

// 如果锁已存在,检查hash表的对应资源域的当前线程标识是否存在
if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) 
then 
    // 存在,则加锁数增加1
    redis.call('hincrby', KEYS[1], ARGV[2], 1); 
    // 更新加锁时间
    redis.call('pexpire', KEYS[1], ARGV[1]); 
    return nil; 
end; 
// 如果加锁失败,并且之前没有占用锁,则返回资源key的剩余过期的毫秒数,用于之后到期通知
return redis.call('pttl', KEYS[1]);
// unlockInnerAsync 解锁的lua脚本
// KEYS[1]:getName() 资源名称; KEYS[2]:getChannelName() 监听的channel名称
// ARGV[1]:LockPubSub.UNLOCK_MESSAGE 解锁发送给channel的消息; ARGV[2]: internalLockLeaseTime 加锁时间; ARGV[3]: getLockName(threadId) 占用资源的线程标识

// 检查hash表中的 加锁资源域的当前线程是否还持有锁
if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) 
then 
    // 没有直接退出
    return nil;
end; 
// 定义本地变量,值为当前线程对资源key的加锁次数减1
local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); 

if (counter > 0) 
then 
    // 如果计数器还大于0,说明之前重入了,等待线程后续继续释放
    // 更新过期时间
    redis.call('pexpire', KEYS[1], ARGV[2]); 
    return 0; 
else 
    // 锁已经完全释放
    // 删除表示占用的资源key
    redis.call('del', KEYS[1]); 
    // 推送释放消息给channel,告知所有监听该channel的客户端
    redis.call('publish', KEYS[2], ARGV[1]); 
    return 1; 
end;
return nil;

Redis EVAL command

Command format: EVAL script numkeys key [key ...] arg [arg ...]

  • The script parameter is a Lua5.1 script program. A script does not have to (and should[^1]) be defined as a Lua function
  • numkeys specifies how many keys there are in subsequent parameters, namely: the number of keys in key [key ...]. If there is no key, it will be 0
  • key [key …] Counting from the third parameter of EVAL, it means those Redis keys (keys) used in the script. Obtain through KEYS[1], KEYS[2] in Lua script.
  • arg [arg ...] Additional arguments. Get it through ARGV[1], ARGV[2] in Lua script.

watchdog mechanism

The watchdog mechanism is a compensation for deadlock-proof but insufficient execution time slices.

It is implemented by automatically renewing the lock at regular intervals.

// 自旋加锁源码
private void lock(long leaseTime, TimeUnit unit, boolean interruptibly) throws InterruptedException {
    
    
    long threadId = Thread.currentThread().getId();
    // tryAcquire 返回的是剩余占用的毫秒数
    Long ttl = tryAcquire(leaseTime, unit, threadId);
    // lock acquired
    if (ttl == null) {
    
    
        return;
    }

    // 注册对资源key的订阅
    RFuture<RedissonLockEntry> future = subscribe(threadId);
    if (interruptibly) {
    
    
        commandExecutor.syncSubscriptionInterrupted(future);
    } else {
    
    
        commandExecutor.syncSubscription(future);
    }

    try {
    
    
        // 自旋不断尝试取锁
        while (true) {
    
    
            ttl = tryAcquire(leaseTime, unit, threadId);
            // lock acquired
            if (ttl == null) {
    
    
                break;
            }

            // waiting for message
            // future 在前面设置好对channel的监听之后,会在收到通知时被唤醒,然后进入下次加锁尝试
            if (ttl >= 0) {
    
    
                try {
    
    
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                } catch (InterruptedException e) {
    
    
                    if (interruptibly) {
    
    
                        throw e;
                    }
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                }
            } else {
    
    
                if (interruptibly) {
    
    
                    future.getNow().getLatch().acquire();
                } else {
    
    
                    future.getNow().getLatch().acquireUninterruptibly();
                }
            }
        }
    } finally {
    
    
        // 取锁成功或者被中断之后取消对资源的订阅
        unsubscribe(future, threadId);
    }
//        get(lockAsync(leaseTime, unit));
}
// 看门狗机制的源码
private <T> RFuture<Long> tryAcquireAsync(long leaseTime, TimeUnit unit, long threadId) {
    
    
    // 如果指定了超时时间,则直接使用,不设置看门狗
    if (leaseTime != -1) {
    
    
        return tryLockInnerAsync(leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    }
    RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout(), TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
    ttlRemainingFuture.onComplete((ttlRemaining, e) -> {
    
    
        if (e != null) {
    
    
            return;
        }

        // lock acquired
        if (ttlRemaining == null) {
    
    
            // 取锁成功后,添加到看门狗自动续约调度表中
            scheduleExpirationRenewal(threadId);
        }
    });
    return ttlRemainingFuture;
}

// 添加到过期刷新调度器中,也就是看门狗队列
private void scheduleExpirationRenewal(long threadId) {
    
    
    ExpirationEntry entry = new ExpirationEntry();
    ExpirationEntry oldEntry = EXPIRATION_RENEWAL_MAP.putIfAbsent(getEntryName(), entry);
    if (oldEntry != null) {
    
    
        oldEntry.addThreadId(threadId);
    } else {
    
    
        entry.addThreadId(threadId);
        renewExpiration();
    }
}

This part is the source code of automatic contract renewal, which is mainly implemented with the help of netty's timing tasks. For details, you can directly read the relevant knowledge of netty

private void renewExpiration() {
    
    
    ExpirationEntry ee = EXPIRATION_RENEWAL_MAP.get(getEntryName());
    if (ee == null) {
    
    
        return;
    }
    
    Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
    
    
        @Override
        public void run(Timeout timeout) throws Exception {
    
    
            ExpirationEntry ent = EXPIRATION_RENEWAL_MAP.get(getEntryName());
            if (ent == null) {
    
    
                return;
            }
            Long threadId = ent.getFirstThreadId();
            if (threadId == null) {
    
    
                return;
            }
            
            RFuture<Boolean> future = renewExpirationAsync(threadId);
            future.onComplete((res, e) -> {
    
    
                if (e != null) {
    
    
                    log.error("Can't update lock " + getName() + " expiration", e);
                    return;
                }
                
                if (res) {
    
    
                    // reschedule itself
                    renewExpiration();
                }
            });
        }
    }, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);
    
    ee.setTimeout(task);
}

How to be notified about blocking and locking

Use redis channel publish subscription mechanism to achieve

subscription process
Please add a picture description

release process
Please add a picture description

Publish and subscribe command parameters

Order describe
PSUBSCRIBE pattern [pattern …]
Subscribe to one or more channels matching the given pattern.
PUBSUB subcommand [argument [argument …]] Check the subscription and publishing system status.
PUBLISH channel message Send a message to the specified channel.
PUNSUBSCRIBE [pattern [pattern …]] Unsubscribe from all channels of the given pattern.
SUBSCRIBE channel [channel …] Subscribe to information on the given channel or channels.
UNSUBSCRIBE [channel [channel …]] Refers to unsubscribe from a given channel.

Summarize

ReentrantLockAfter reading the operation process of Redis's distributed lock, if you have read the source code of the reentrant lock , students who understand its process will find that the similarity between the two is very high.

From how to lock, to how to implement reentrant features, to how to notify the next object waiting for the lock to be released.

Special attention: Because redis does not have the concept of transactions, although the atomicity of normal operations is achieved by using lua scripts, in special cases, such as redis hangs, all commands in the script will not be executed.
Therefore, the order of commands is very important. All check operations need to be prioritized. All setting value operations must first carry the expiration time, and then set the value last to avoid deadlock caused by an object that will never expire.
Think about the commands in the script. If the statement fails and the subsequent commands are not executed, will the previously set key last forever?
In the case of permanent existence, can the program execute normally?
In the case of normal execution, will these permanent keys cause serious memory leaks in redis?
Of course, if your redis hangs up, it doesn't matter if you don't follow the recovery logic.

related suggestion

References

  • https://www.cnblogs.com/jelly12345/p/14699492.html
  • https://www.runoob.com/redis/redis-pub-sub.html
  • https://www.cnblogs.com/jelly12345/p/14699492.html

Guess you like

Origin blog.csdn.net/weixin_46080554/article/details/123043176