Redisson单进程Redis分布式悲观锁的使用与实现

本文基于Redisson 3.7.5

2. 公平锁

这种锁的使用方式和Java本身框架中的FairLock一模一样：

RLock fairLock = redisson.getFairLock("testLock");
try{
    // 最常见的使用方法
    fairLock.lock();

    // 支持过期解锁功能, 10秒钟以后自动解锁,无需调用unlock方法手动解锁
    fairLock.lock(10, TimeUnit.SECONDS);

    // 尝试加锁，最多等待100秒，上锁以后10秒自动解锁
    boolean res = fairLock.tryLock(100, 10, TimeUnit.SECONDS);
} catch (InterruptedException e) {
    e.printStackTrace();
} finally {
    fairLock.unlock();
}

查看redisson.getFairLock("testLock");的源代码

@Override
public RLock getFairLock(String name) {
    return new RedissonFairLock(connectionManager.getCommandExecutor(), name);
}

可以看出实现类是RedissonFairLock

public class RedissonFairLock extends RedissonLock implements RLock {

    //默认等待锁获取时间
    private final long threadWaitTime = 5000;
    private final CommandAsyncExecutor commandExecutor;
    //等待获取锁的线程队列（其中的元素为线程threadId对应的LockName）redis keyname
    private final String threadsQueueName;
    //过期时间Zset（每个元素为线程threadId对应的LockName，value是过期时间戳）的redis keyname
    private final String timeoutSetName;

    protected RedissonFairLock(CommandAsyncExecutor commandExecutor, String name) {
        super(commandExecutor, name);
        this.commandExecutor = commandExecutor;
        threadsQueueName = prefixName("redisson_lock_queue", name);
        timeoutSetName = prefixName("redisson_lock_timeout", name);
    }
}

可以看出RedissonFairLock是RedissonLock的扩展，先来看下公平锁在Redis中的结构：

公平锁在Redis中比RedissonLock多了一个队列threadsQueue（线程等待队列），还有一个ZSET是timeoutSet（过期时间排列集合）。

2.1. 公平锁实现思路

我们可以先猜想下：

在获取锁时，如果没获取到，则进入等待队列threadsQueue，并在ZSET timeoutSet中记录尝试获取锁的时间戳。
之后像上一节讲的重入锁一样，订阅CHANNEL监听解锁消息，通过getEntry(threadId).getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);来等待。
监听到解锁消息之后，判断自己是否是队列第一个，如果是第一个就尝试去获取锁。如果获取到锁，则从threadsQueue还有timeoutSet中移除这个threadId
解锁和重入锁类似，可以判断下threadsQueue是否为空，如果为空则连解锁消息都不用发了

这样看，貌似是能完成一个简单的公平锁了。但是在异常的情况下，还是有问题。如果队列排第一的线程异常退出了，他一直会存在于threadsQueue和timeoutSet中，导致后面正常的线程一直获取不到锁。考虑这点，我们加入过期机制：

在尝试获取锁还有解锁时，先检查队列首个threadId是否过期（当前时间戳是否大于timeoutSet中的值），如果过期了，就从threadsQueue还有timeoutSet中移除这个threadId

还得考虑一种情况，就是加入这个过期机制之后，如果timeoutSet中记录的还是尝试获取锁的时间戳，那么会立刻过期。这时我们就想到，可以利用ttl，改变timeoutSet中记录的为尝试获取锁的时间戳加上当前锁的ttl。但是这样还是不太好，如果锁超时，redis清除了这个锁，下一个尝试获取锁的请求，有可能会把当前队列首位的threadId也认为是过期而去掉。所以，要加一个threadWaitTime(在Redisson中默认是5s)来缓冲。

在后面的源代码分析中，我们可以看到Redisson的缓冲机制是，假设已经有线程A获取到了锁，锁过期时间为30s。这时有B、C、D线程来尝试获取锁(调用tryLock(1, 30, TimeUnit.Seconds)，尝试获取锁的超时为1s，锁过期时间为30s)。

假设B在时间T尝试获取锁，C在时间T+5ms尝试获取锁，D在时间T+10ms尝试获取锁。则在threadsQueue还有timeoutSet中结构为：

threadsQueue order： B、C、D

timeoutSet：
SCORE(B)=当前时间戳+锁剩余过期时间+threadWaitTime = now() + ttl + 5000
SCORE(C)=SCORE(B) - now() + now() + 5000 = SCORE(B) + 5000
SCORE(D)=SCORE(B) - now() + now() + 5000 = SCORE(B) + 5000

在后面的代码分析中，我们可以看到，每个线程的过期时间戳（就是timeoutSet中的值）是：

如果是队列第一个，就是当前时间戳+锁剩余过期时间+threadWaitTime
如果不是队列第一个，就是队列第一个的过期时间戳+threadWaitTime

这样可以最大程度简化流程，但是也带来了一个不易察觉的隐患
假设后面有很多线程调用tryLock(1, 30, TimeUnit.Seconds)但依然没有获取锁，会导致timeoutSet的值一直增长下去。

例如，在T+1s后，又来了一个线程E，尝试获取锁，这时，B已过期从threadsQueue还有timeoutSet移除：

threadsQueue order： C、D、E

timeoutSet：
SCORE(C)=SCORE(B) + 5000
SCORE(D)=SCORE(B) + 5000
SCORE(E)=SCORE(B) + 5000 + 5000

然后在T+1s+10ms后，又来了一个线程F，尝试获取锁，这时，C、D已过期从threadsQueue还有timeoutSet移除：

threadsQueue order： E、F

timeoutSet：
SCORE(E)=SCORE(B) + 5000 + 5000
SCORE(F)=SCORE(B) + 5000 + 5000 + 5000

以此类推，这样SCORE会一直增长下去，这样会有什么问题呢？在正常情况下没问题，因为工作正常情况下这个SCORE不影响获取锁。但是如果线程E这时候挂了，A释放锁，必须等到SCORE(B) + 5000 + 5000时，E被去掉，F成为队列首位，F才能获取锁。这在生产上是不可以忍受的

所以，Redisson引入了一个机制，就是在调用tryLock(1, 30, TimeUnit.Seconds)没有获取到锁的时候，检查是否为队列首，如果是，则队列中每个线程的在timeoutSet中的SCORE都减去threadWaitTime

在引入这个机制后，再回到T+1s后，又来了一个线程E，尝试获取锁，这时，B已过期从threadsQueue还有timeoutSet移除：

threadsQueue order： C、D、E

timeoutSet：
SCORE(C)=SCORE(B)
SCORE(D)=SCORE(B)
SCORE(E)=SCORE(B) + 5000

然后在T+1s+10ms后，又来了一个线程F，尝试获取锁，这时，C、D已过期从threadsQueue还有timeoutSet移除：

threadsQueue order： E、F

timeoutSet：
SCORE(E)=SCORE(B)
SCORE(F)=SCORE(B) + 5000

可以看出，这个问题被很好地解决了。可以参考Redisson的这个Issue来看这个问题是怎么被发现的RFairLock dead lock issue

由于公平锁就是重入锁的扩展，源码只分析不同的部分

2.2. 上锁核心tryLockInnerAsync

@Override
<T> RFuture<T> tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
    internalLockLeaseTime = unit.toMillis(leaseTime);

    long currentTime = System.currentTimeMillis();
    //EVAL_NULL_BOOLEAN代表是tryLock（不带waitTime），不用阻塞，只尝试获取锁
    //尝试获取锁不用进入队列，只是检查是否能获取到锁
    if (command == RedisCommands.EVAL_NULL_BOOLEAN) {
        return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command,
                //移除已超过等待锁时间的threadId对应的LockName
                "while true do "
                //查看队列第一个threadId对应的lockNAme
                + "local firstThreadId2 = redis.call('lindex', KEYS[2], 0);"
                //如果没有元素则退出
                + "if firstThreadId2 == false then "
                    + "break;"
                + "end; "
                //检查这个threadId对应的LockName是否过期
                //先从timeoutSet里面获取这个threadId对应的LockName的对应过期时间，就是对应的value
                + "local timeout = tonumber(redis.call('zscore', KEYS[3], firstThreadId2));"
                //如果过期时间小于当前时间，证明已过期，从threadQueue还有timeoutQueue里面移除
                + "if timeout <= tonumber(ARGV[3]) then "
                    + "redis.call('zrem', KEYS[3], firstThreadId2); "
                    + "redis.call('lpop', KEYS[2]); "
                + "else "
                    + "break;"
                + "end; "
              + "end;"
                +
                //看本线程是否能直接获取到锁
                //如果这个锁在redis中不存在（证明锁已经被释放），并且对应的threadsQueue也不存在（代表没有其他线程抢锁）或者threadsQueue的第一个是本线程threadId对应的LockName（代表就是轮到本线程抢锁了）
                "if (redis.call('exists', KEYS[1]) == 0) and ((redis.call('exists', KEYS[2]) == 0) "
                        + "or (redis.call('lindex', KEYS[2], 0) == ARGV[2])) then " +
                        //从threadsQueue取出
                        "redis.call('lpop', KEYS[2]); " +
                        //从timeoutSet拿出
                        "redis.call('zrem', KEYS[3], ARGV[2]); " +
                        //设置为获取到锁了，和重入锁一样
                        "redis.call('hset', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                    "end; " +
                    //如果本来就已经获取到锁了，那么和重入锁一样， 加1
                    "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                        "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                    "end; " +
                    //否则代表获取锁失败
                    "return 1;", 
                Arrays.<Object>asList(getName(), threadsQueueName, timeoutSetName), 
                internalLockLeaseTime, getLockName(threadId), currentTime);
    }

    //EVAL_LONG代表是lock，阻塞获取锁以及是带waitTime的tryLock调用
    //需要threadQueue还有timeoutSet来实现公平锁阻塞等待
    if (command == RedisCommands.EVAL_LONG) {
        return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command,
                //这里和上面一样。先移除队列中查过等待时间过期的
                "while true do "
                + "local firstThreadId2 = redis.call('lindex', KEYS[2], 0);"
                + "if firstThreadId2 == false then "
                    + "break;"
                + "end; "

                + "local timeout = tonumber(redis.call('zscore', KEYS[3], firstThreadId2));"
                + "if timeout <= tonumber(ARGV[4]) then "
                    + "redis.call('zrem', KEYS[3], firstThreadId2); "
                    + "redis.call('lpop', KEYS[2]); "
                + "else "
                    + "break;"
                + "end; "
              + "end;"

                    //这里也是和之前一样，看是否这个现成能直接获取到锁
                  + "if (redis.call('exists', KEYS[1]) == 0) and ((redis.call('exists', KEYS[2]) == 0) "
                        + "or (redis.call('lindex', KEYS[2], 0) == ARGV[2])) then " +
                        "redis.call('lpop', KEYS[2]); " +
                        "redis.call('zrem', KEYS[3], ARGV[2]); " +
                        "redis.call('hset', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                    "end; " +
                    //如果本来就已经获取到锁了，那么和重入锁一样， 加1
                    "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                        "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                    "end; " +

                    //查看等待队列的第一个线程id对应的LockName
                    "local firstThreadId = redis.call('lindex', KEYS[2], 0); " +
                    "local ttl; " +
                    //如果队列第一个线程id对应的LockName不是本线程的，证明没有轮到本线程，抢锁，需要等待，设置ttl为第一的过期时间戳减去当前时间戳
                    "if firstThreadId ~= false and firstThreadId ~= ARGV[2] then " + 
                        "ttl = tonumber(redis.call('zscore', KEYS[3], firstThreadId)) - tonumber(ARGV[4]);" + 
                    "else "
                    //否则，代表本线程就排在第一，设置ttl直接为锁过期时间
                      + "ttl = redis.call('pttl', KEYS[1]);" + 
                    "end; " +
                    //timeout为过期时间戳，就是当前时间+默认等待锁时间+上面的ttl时间
                    //总结下，这里的timeout就是：
                        // 如果是队列第一个，就是当前时间戳+锁剩余过期时间+threadWaitTime
                        // 如果不是队列第一个，就是队列第一个的过期时间戳+threadWaitTime
                    "local timeout = ttl + tonumber(ARGV[3]);" +
                    //放入过期时间戳排序集合timeoutSet还有等待队列threadsQueue
                    "if redis.call('zadd', KEYS[3], timeout, ARGV[2]) == 1 then " +
                        "redis.call('rpush', KEYS[2], ARGV[2]);" +
                    "end; " +
                    "return ttl;", 
                    Arrays.<Object>asList(getName(), threadsQueueName, timeoutSetName), 
                                internalLockLeaseTime, getLockName(threadId), currentTime + threadWaitTime, currentTime);
    }
    
    throw new IllegalArgumentException();
}

2.3. 解锁核心unlockInnerAsync

@Override
protected RFuture<Boolean> unlockInnerAsync(long threadId) {
    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
            //这里和上锁一样。先移除队列中查过等待时间过期的
            "while true do "
            + "local firstThreadId2 = redis.call('lindex', KEYS[2], 0);"
            + "if firstThreadId2 == false then "
                + "break;"
            + "end; "
            + "local timeout = tonumber(redis.call('zscore', KEYS[3], firstThreadId2));"
            + "if timeout <= tonumber(ARGV[4]) then "
                + "redis.call('zrem', KEYS[3], firstThreadId2); "
                + "redis.call('lpop', KEYS[2]); "
            + "else "
                + "break;"
            + "end; "
          + "end;"
            //如果锁已经过期，并且等待队列不为空
          + "if (redis.call('exists', KEYS[1]) == 0) then " + 
                "local nextThreadId = redis.call('lindex', KEYS[2], 0); " + 
                "if nextThreadId ~= false then " +
                    //发布解锁消息
                    "redis.call('publish', KEYS[4] .. ':' .. nextThreadId, ARGV[1]); " +
                "end; " +
                "return 1; " +
            "end;" +
            //如果锁没过期但是持有锁的不是当前线程，则返回null
            "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " +
                "return nil;" +
            "end; " +
            //如果是当前线程获取了锁，将锁次数减一
            //
            "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " +
            //如果锁次数还大于0，证明重入锁次数还没用尽，返回0
            "if (counter > 0) then " +
                "redis.call('pexpire', KEYS[1], ARGV[2]); " +
                "return 0; " +
            "end; " +
            //如果次数不大于0，删除这个锁key
            "redis.call('del', KEYS[1]); " +
            //如果等待队列不为空，发布解锁消息
            "local nextThreadId = redis.call('lindex', KEYS[2], 0); " +
            "if nextThreadId ~= false then " +
                "redis.call('publish', KEYS[4] .. ':' .. nextThreadId, ARGV[1]); " +
            "end; " +
            //返回1
            "return 1; ",
            Arrays.<Object>asList(getName(), threadsQueueName, timeoutSetName, getChannelName()), 
            LockPubSub.unlockMessage, internalLockLeaseTime, getLockName(threadId), System.currentTimeMillis());
}

2.4. 带有waitTime的tryLock失败的清理

@Override
protected RFuture<Void> acquireFailedAsync(long threadId) {
    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_VOID,
                //查看等待队列中排第一的线程
         "local firstThreadId = redis.call('lindex', KEYS[1], 0); " +
                //如果为当前线程为队列中排第一的线程（入队列是调用阻塞获取或者tryLock带wait时间）
                 //如果是排第一的线程，就将每个timeoutSet中的过期时间戳减去threadWaitTime，这个原因在1.1节中已经讲明
                "if firstThreadId == ARGV[1] then " +
                    "local keys = redis.call('zrange', KEYS[2], 0, -1); " + 
                    "for i = 1, #keys, 1 do " + 
                        "redis.call('zincrby', KEYS[2], -tonumber(ARGV[2]), keys[i]);" + 
                    "end;" + 
                "end;" +
                 //清理threadsQueue还有timeoutSet
                "redis.call('zrem', KEYS[2], ARGV[1]); " +
                "redis.call('lrem', KEYS[1], 0, ARGV[1]); ",
                Arrays.<Object>asList(threadsQueueName, timeoutSetName), 
                getLockName(threadId), threadWaitTime);
}

张哈希博客专家

发布了194 篇原创文章 · 获赞 266 · 访问量 145万+

他的留言板关注

Redis系列-生产应用篇-分布式锁（3）-单进程Redis分布式锁的Java实现（Redisson使用与底层实现）-公平锁