Common problems of Redis, advantages and disadvantages of each distributed lock-05

Why does the Redis cluster require at least three master nodes, and the recommended number of nodes is an odd number?

Because the election of a new master requires the consent of more than half of the cluster master nodes to succeed in the election, if there are only two master nodes and one of them hangs up, the conditions for electing a new master cannot be met.

 An odd number of master nodes can save one node on the basis of meeting the election conditions. For example, compared with a cluster of three master nodes and four master nodes, if everyone hangs up a master node, they can elect a new master node. There is no way to elect a new master node after losing two master nodes, so the odd number of master nodes is more from the perspective of saving machine resources.

Redis cluster support for batch operation commands

For the native batch operation commands of multiple keys such as mset and mget, the redis cluster only supports the case that all keys fall in the same slot. If there are multiple keys that must be operated on the redis cluster with the mset command, you can set Add {XX} in front, so that the parameter data slice hash calculation will only be the value in the braces, which can ensure that different keys can fall into the same slot. The example is as follows:

1 mset {user1}:1:name zhuge {user1}:1:age 18

Assuming that the hash slot values ​​calculated by name and age are different, but this command is executed under the cluster, redis will only use user1 in curly brackets for hash slot calculation, so the calculated slot values ​​must be the same, and they can all fall in the same slot.

What kinds of data elimination strategies does Redis have?

The maximum memory amount of Redis can be configured in redis.conf maxmemory. If it is configured as 0, it means that there is no maximum memory limit under the 64-bit system, and it means that the maximum memory limit is 3 GB under the 32-bit system. When the actual memory mem_used reaches the set threshold maxmemory, Redis will eliminate data according to the preset elimination strategy.

Elimination Policy Name Policy Meaning

noeviction 默认策略,不淘汰数据;大部分写命令都将返回错误(DEL等少数除外)

allkeys-lru 从所有数据中根据 LRU 算法挑选数据淘汰

volatile-lru 从设置了过期时间的数据中根据 LRU 算法挑选数据淘汰

allkeys-random 从所有数据中随机挑选数据淘汰

volatile-random 从设置了过期时间的数据中随机挑选数据淘汰

volatile-ttl 从设置了过期时间的数据中,挑选越早过期的数据进行删除

allkeys-lfu 从所有数据中根据 LFU 算法挑选数据淘汰(4.0及以上版本可用)

volatile-lfu 从设置了过期时间的数据中根据 LFU 算法挑选数据淘汰(4.0及以上版本可用)

What are the expiration policies of Redis Key?

惰性删除:当读/写一个已经过期的key时,会触发惰性删除策略,直接删除掉这个过期key,很明显,这是被动的。(例如:先get 这个key ,如果不存在就再删除)
定期删除:由于惰性删除策略无法保证冷数据被及时删掉,所以 redis 会定期主动淘汰一批已过期的key。
主动删除:当前已用内存超过maxMemory限定时,触发主动清理策略。主动设置的前提是设置了maxMemory的值。

network jitter

The computer room network in the real world is often not smooth, and various small problems often occur in them. For example, network jitter is a very common phenomenon, suddenly part of the connection becomes inaccessible, and then quickly returns to normal.

To solve this problem, Redis Cluster provides an option clusternodetimeout, which means that when a node loses connection for timeout, it can be considered that the node is faulty and a master-slave switch is required. Without this option, network jitter will cause frequent master-slave switching (data re-replication).

Inconsistency between cache and database double-write

Under large concurrency, there will be data inconsistency problems when operating the database and cache at the same time

1. Double write inconsistency

2. Inconsistent reading and writing concurrency

solution:
1. For data with a low probability of concurrency (such as order data in the personal dimension, user data, etc.), this problem is almost unnecessary to consider, and cache inconsistencies rarely occur. You can add an expiration time to the cached data, and every once in a while The active update of the trigger read is enough.
2. Even if the concurrency is high, if the business can tolerate short-term cache data inconsistencies (such as product names, product classification menus, etc.), caching plus expiration time can still solve most business caching requirements.
3. If you cannot tolerate the inconsistency of the cached data, you can add distributed read-write locks to ensure that concurrent reads and writes or writes are queued in order, and reads and reads are equivalent to no locks .
4. You can also use Ali’s open-source canal to modify the cache in a timely manner by monitoring the binlog logs of the database, but the introduction of new middleware increases the complexity of the system

Summarize:
All of the above are aimed at adding cache to improve performance in the case of more reads and less writes . If the situation of more writes and more reads cannot tolerate the inconsistency of the cache data, then
There is no need to add a cache, you can directly operate the database. Of course, if the database cannot withstand the pressure, the cache can also be used as the main memory for data reading and writing.
Storage, asynchronously synchronize data to the database, and the database is only used as a backup of the data.
The data put into the cache should be data that does not require high real-time performance and consistency. Remember not to use the cache and at the same time ensure absolute consistency
Consistency does a lot of over-design and control, increasing system complexity

cache invalidation (breakdown)

Since a large number of caches fail at the same time, a large number of requests may penetrate the cache and go directly to the database at the same time, which may cause the database to be overloaded or even hang up instantly. The expiration time is set to a different time within a time period.

cache avalanche

Cache avalanche refers to that after the cache layer cannot support or crashes, the traffic will hit the back-end storage layer like a running bison.
Since the cache layer carries a large number of requests, the storage layer is effectively protected. However, if the cache layer cannot provide services due to some reasons (such as large concurrency, the cache layer cannot support it, or due to poor cache design, similar to a large number of requests to access bigkey , leading to a sharp drop in the concurrency that the cache can support), so a large number of requests will hit the storage layer, and the number of calls to the storage layer will increase sharply, causing the storage layer to cascade down. To prevent and solve the cache avalanche problem, we can start from the following three aspects.
1) Ensure high availability of cache layer services, such as using Redis Sentinel or Redis Cluster.
2) Dependent isolation components fuse and degrade for the back-end current limiter. For example, use Sentinel or Hystrix current limiting and downgrading components.
For example, service degradation, we can adopt different processing methods for different data. When the business application accesses non-core data (such as e-commerce product attributes, user information, etc.), temporarily stop querying these data from the cache, but directly return the predefined default degradation information, null value or error message; when When the business application accesses core data (such as e-commerce product inventory), it is still allowed to query the cache, and if the cache is missing, it can continue to read through the database.
3) Rehearse ahead of time. Before the project goes online, after the caching layer crashes, the application and back-end load conditions and possible problems will be rehearsed, and some pre-plan settings will be made on this basis

cache penetration

Cache penetration refers to querying a data that must not exist. Since the cache needs to be queried from the database when the cache misses , it will not be written to the cache if the data cannot be found. This will cause the non-existing data to go to the database every time it is requested. queries, which in turn put pressure on the database.

solution:

1 Symmetrically encrypt the key id value in the url, so that the real key value cannot be easily exposed to prevent hacker attacks

2 Regardless of whether the data actually exists or not, we store this key in the cache (set the validity period to be shorter, such as one minute to three minutes), and then set the value to a specific value. If the result obtained in the business is this If a specific value is specified, an error will be returned.

bloom filter

For malicious attacks, the cache penetration caused by requesting a large amount of non-existing data from the server can also be filtered with a Bloom filter first. Bloom filters can generally filter out non-existent data and prevent requests from going further. sent by the backend. When a Bloom filter says a value exists, it probably doesn't; when it says it doesn't, it sure doesn't.

  A Bloom filter is simply a large bit array and several different unbiased hash functions. The so-called unbiased means that the hash value of the element can be calculated relatively uniformly.
When adding a key to the Bloom filter, multiple hash functions are used to hash the key to obtain an integer index value, and then a modulo operation is performed on the length of the bit array to obtain a position. Each hash function will calculate a different position. Then set these positions of the bit array to 1 to complete the add operation.
When asking the Bloom filter whether the key exists, like add, it will also calculate several positions of the hash to see if these positions in the bit array are all 1. As long as one bit is 0, it means This key does not exist in the bloom filter. If they are all 1, this does not mean that the key must exist, but it is very likely to exist, because these bits are set to 1 may be caused by the existence of other keys. If the bit array is relatively sparse, the probability will be high, and if the bit array is crowded, the probability will decrease. This method is suitable for application scenarios where the data hit rate is not high, the data is relatively fixed, and the real-time performance is low (usually the data set is large). The code maintenance is more complicated, but the cache space is very small.  
Note: If the cached data is updated, the Bloom filter will not be updated and cannot be modified. All data needs to be generated once

Redis distributed lock

setnx pros and cons

Setnx(key,value);

Scenario: Request to set the key and value, return false if it exists, and return true if it does not exist. In this way, the next request will not be executed

Problem: If the first request comes to acquire the lock and the project hangs up, then the lock will never be released

increase timeout

Setnex (key, value, timeout)

Scenario: For the above problem, when setting key and value, add a timeout period, so that even if the project hangs up, the timeout will come. The lock will also be released

Problems and deficiencies

This operation has an obvious disadvantage. Once the business execution times out, if the lock automatically becomes invalid, it will cause a thread safety problem of deleting the wrong lock! To cite a scenario:

业务逻辑 1 首先持有锁,开始执行自己的业务逻辑。
业务逻辑 1 由于网络波动等原因,在锁到期之前未能执行完毕自己的业务逻辑,但是锁到期自动释放了,此时 Redis 中没有锁了。
业务逻辑 2 也来获取锁,顺利持有锁后开始执行自己的业务逻辑。
业务逻辑 1 终于执行完毕了自己的业务,删除了锁。注意,此时业务逻辑 1 删除的其实是业务逻辑 2 的锁,导致了线程安全问题。

Make sure you delete your own lock

S etnex(key, value, timeout) value is a unique identifier, such as thread id

When releasing the lock, judge whether it is yourself, and if it is yourself, delete it

Problems and deficiencies

This operation still has thread safety issues, because it cannot guarantee the atomicity of confirming its own lock and deleting the lock! Or to cite a scenario:

业务逻辑 1 执行删除时,查询到的 LikeLock 值确实与自己的 uuid 相等。
业务逻辑 1 执行删除前,LikeLock 刚好过期时间已到,被 redis 自动释放,在 redis 中没有了 LikeLock,没有了锁。
业务逻辑 2 获取了 LikeLock,加锁成功,开始执行自己的业务。
业务逻辑 1 此时执行了删除操作,会把业务逻辑 2 的 LikeLock 删除,导致出现进程安全问题。

Use lua script to ensure the atomicity of deleting locks

We use a LUA script to make the two commands run as a whole script on the Redis client, without inserting other commands in between.

Lua is a light-weight and compact scripting language written in standard C language and open in the form of source code. Its design purpose is to be embedded in applications, thus providing flexible expansion and customization functions for applications.

Lua script advantages:

Reduce network overhead: the original logic of multiple requests is completed on the redis server. Use scripts to reduce network round-trip delays

Atomic operation: Redis will execute the entire script as a whole, and will not be inserted by other commands in the middle (think of it as a transaction)

Reuse: The script sent by the client will be permanently stored in Redis, which means that other clients can reuse this script without using code to complete the same logic

The defect of setNX lock in non-standalone mode can only be said that in stand-alone Redis mode, setNX distributed lock is simply invincible!

But the biggest disadvantage of the setnx lock is that it only works on one Redis node when it is locked. Even if Redis guarantees high availability through Sentinel (sentinel, sentinel), if the master node has a master-slave switch for some reason, then it will When a lock is lost, the following is an example:

The lock is obtained on the master node of Redis; but the locked key has not been synchronized to the slave node; the master fails, a failover occurs, and the slave node is upgraded to the master node; the lock on the master node above is lost.

RedLock

His core idea is: create several independent Masters, for example, 5. Then lock one by one, as long as more than half (here is 5 / 2 + 1 = 3), it means that the lock is successful, and then when the lock is released, it is also released one by one. The advantage of this is that if one Master hangs up, there are others, so there is no delay. It seems that the above problem has been perfectly solved. But it's not 100% safe, as I'll talk about later.

The specific details are:

Use the same key and random number to acquire locks on N Master nodes. The time to acquire locks here is much shorter than the timeout time of the locks. This is to prevent us from continuously acquiring locks after a certain Master hangs up, resulting in being blocked. Blocked for too long. That is to say, assuming that the lock expires in 30 seconds and it takes 31 seconds for the three nodes to lock, naturally the lock failed.

Only when the lock is acquired on most nodes (usually [(2/n)+1]), and the total acquisition time is less than the lock timeout period, the lock acquisition is considered successful.

If the lock acquisition is successful, the lock timeout period is the initial lock timeout period minus the total time spent acquiring the lock.

If the lock acquisition fails, no matter because the number of successful nodes is not more than half, or because the time spent acquiring the lock exceeds the release time of the lock, the key on the master that has already set the key will be deleted.

There are two points to note:

  1. The machine time of multiple Redis masters must be synchronized.
  2. If the Redis red lock machine is hung up, the startup should be delayed for 1 minute (just greater than the lock timeout time), because: if there are three Masters, two of them are written successfully, and the lock is successful, but one is hung up, and one Master is still available, release The one that hangs up naturally when it is locked will not execute del. When it starts up again in an instant, it will find that the lock is still there (because it has not yet expired), which may cause unknown problems. So make Redis start with a delay.

The main problems:

  1. The implementation principle is extremely complicated, I believe everyone has seen it.
  2. It is still an unsafe locking method. For example: 5 masters are all locked, and the expiration time is 3s, but because of network jitter or other circumstances when locking, only 3 machines are locked and it will be 3s, and the lock will become invalid. The latter two have not been locked yet, and the first three have failed. But at this time, other threads lock again and find that the first 3 locks are normally locked, because it is more than half of the principle, 3 think that the lock is successful. This leads to the successful locking of two threads at the same time. The first three are the locks of the following threads, and the last two are the locks of the first thread. Is this not messy? Threads are no longer safe! Maybe you will say that watchDog renewal is enabled, that seems to be no problem, but let me change the question, I am not expired, but hung up one, and it has not been synchronized to Slave yet. Slave has been upgraded to Master, and other threads find There is no lock on this Slave, and 3 units can still be locked successfully, more than half of them. Still concurrent, not safe. So what to do? Don't want slaves anymore? RedLock is too much trouble!

Redisson

Redisson is a Java in-memory data grid framework built on the basis of Redis. It makes full use of a series of advantages provided by the Redis key-value database. Based on the common interfaces in the Java utility toolkit, it provides users with a series of distributed features. Common tools

指定一个 key 作为锁标记,存入 Redis 中,指定一个 唯一的用户标识 作为 value。
当 key 不存在时才能设置值,确保同一时间只有一个客户端进程获得锁,满足 互斥性 特性。
设置一个过期时间,防止因系统异常导致没能删除这个 key,满足 防死锁 特性。
当处理完业务之后需要清除这个 key 来释放锁,清除 key 时需要校验 value 值,需要满足 只有加锁的人才能释放锁。
WatchDog 机制 能够很好的解决锁续期的问题,预防死锁。
能够灵活的设置加锁时间,等待锁时间,释放锁失败后锁的存在时间。

acquire lock

public RLock getLock(String name) {
    return new RedissonLock(connectionManager.getCommandExecutor(), name);
}
public RedissonLock(CommandAsyncExecutor commandExecutor, String name) {
        super(commandExecutor, name);
        //异步处理的命令执行器
        this.commandExecutor = commandExecutor;
        //生成唯一id
        this.id = commandExecutor.getConnectionManager().getId();
        //锁存活时间,默认30s
        this.internalLockLeaseTime = commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout();
        //将id和业务key拼接,作为实际的key
        this.entryName = id + ":" + name;
        this.pubSub = commandExecutor.getConnectionManager().getSubscribeService().getLockPubSub();
}

Locking process

private void lock(long leaseTime, TimeUnit unit, boolean interruptibly) throws InterruptedException {
    long threadId = Thread.currentThread().getId();
    Long ttl = tryAcquire(-1, leaseTime, unit, threadId);
    // lock acquired
    if (ttl == null) {
        return;
    }

    RFuture<RedissonLockEntry> future = subscribe(threadId);
    if (interruptibly) {
        commandExecutor.syncSubscriptionInterrupted(future);
    } else {
        commandExecutor.syncSubscription(future);
    }

    try {
        while (true) {
            ttl = tryAcquire(-1, leaseTime, unit, threadId);
            // lock acquired
            if (ttl == null) {
                break;
            }

            // waiting for message
            if (ttl >= 0) {
                try {
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                } catch (InterruptedException e) {
                    if (interruptibly) {
                        throw e;
                    }
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                }
            } else {
                if (interruptibly) {
                    future.getNow().getLatch().acquire();
                } else {
                    future.getNow().getLatch().acquireUninterruptibly();
                }
            }
        }
    } finally {
        unsubscribe(future, threadId);
    }
//        get(lockAsync(leaseTime, unit));
}
private Long tryAcquire(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
        return get(tryAcquireAsync(waitTime, leaseTime, unit, threadId));
}
private <T> RFuture<Long> tryAcquireAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
    if (leaseTime != -1) {
        return tryLockInnerAsync(waitTime, leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    }
    RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(waitTime, internalLockLeaseTime,
                                                            TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
    ttlRemainingFuture.onComplete((ttlRemaining, e) -> {
        if (e != null) {
            return;
        }

        // lock acquired
        if (ttlRemaining == null) {
            scheduleExpirationRenewal(threadId);
        }
    });
    return ttlRemainingFuture;
}

The lock method we call directly, at this time, the leaseTime is -1, and the if branch is not executed.

<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
        internalLockLeaseTime = unit.toMillis(leaseTime);

        return evalWriteAsync(getName(), LongCodec.INSTANCE, command,
                "if (redis.call('exists', KEYS[1]) == 0) then " +
                        "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                        "end; " +
                        "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                        "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                        "end; " +
                        "return redis.call('pttl', KEYS[1]);",
                Collections.singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
}

At this time, the leaseTime is the default 30s, and the execution of this lua is the key point:

首先呢,他先用exists命令判断了待获取锁的key anyLock 存不存在,如果不存在,就使用hset命令将锁key testlock作为key的map结构中存入一对键值对,4afd01d9-48e8-4341-9358-19f0507a9dcc:397 1
同时还使用了pexpire命令给anyLock设置了过期时间30000毫秒,然后返回为空;
如果anyLock已经存在了,会走另一个分支,此时会判断anyLock Map中是否存在37f75873-494a-439c-a0ed-f102bc2f3204:1,如果存在的话,就调用hincrby命令自增这个key的值,并且将anyLock的过期时间设置为30000毫秒,并且返回空。
如果上面俩种情况都不是,那么就返回这个anyLock的剩余存活时间。

 Scripts can also guarantee the atomicity of executing commands. Then it directly returns an RFuture ttlRemainingFuture, and adds a listener to it. If the current asynchronous locking step is completed, it will be called. If the execution is successful, it will directly obtain a Long type ttlRemaining synchronously. According to the locked lua script, if the lock or reentry lock is successful, it will be found that TTLRemaining is null, then the following line of code will be executed, and we can see that the comment lock has been obtained.
 

// lock acquired

if (ttlRemaining == null) {
  scheduleExpirationRenewal(threadId);
}

We have analyzed the redisson locking process above. In summary, the process is not complicated, and the code is very intuitive. The main reason is that the locking logic is executed asynchronously through the lua script.

Among the watchdog mechanisms
, we have noticed some details, such as the variable internalLockLeaseTime in RedissonLock, the default value is 30000 milliseconds, and a getLockWatchdogTimeout() obtained from the connection manager by calling tryLockInnerAsync(), its default value is also 30,000 milliseconds. These are all related to the watchdog mechanism mentioned in the official redisson document. The watchdog still describes this mechanism very vividly. So what exactly does the watchdog do and why does it do it? Let's analyze and discuss it below.

Problems after successful locking

假设在一个分布式环境下,多个服务实例请求获取锁,其中服务实例1成功获取到了锁,在执行业务逻辑的过程中,服务实例突然挂掉了或者hang住了,那么这个锁会不会释放,什么时候释放?
回答这个问题,自然想起来之前我们分析的lua脚本,其中第一次加锁的时候使用pexpire给锁key设置了过期时间,默认30000毫秒,由此来看如果服务实例宕机了,锁最终也会释放,其他服务实例也是可以继续获取到锁执行业务。但是要是30000毫秒之后呢,要是服务实例1没有宕机但是业务执行还没有结束,所释放掉了就会导致线程问题,这个redisson是怎么解决的呢?这个就一定要实现自动延长锁有效期的机制。


Before, we analyzed that after the asynchronous execution of the lua script, we set up a listener to handle some work after the asynchronous execution

private void scheduleExpirationRenewal(long threadId) {
        ExpirationEntry entry = new ExpirationEntry();
        ExpirationEntry oldEntry = EXPIRATION_RENEWAL_MAP.putIfAbsent(getEntryName(), entry);
        if (oldEntry != null) {
            oldEntry.addThreadId(threadId);
        } else {
            entry.addThreadId(threadId);
            renewExpiration();
        }
}
首先,会先判断在expirationRenewalMap中是否存在了entryName,这是个map结构,主要还是判断在这个服务实例中的加锁客户端的锁key是否存在,如果已经存在了,就直接返回;第一次加锁,肯定是不存在的。
接下来就是搞了一个TimeTask,延迟internalLockLeaseTime/3之后执行,这里就用到了文章一开始就提到奇妙的变量,算下来就是大约10秒钟执行一次,调用了一个异步执行的方法,renewExpirationAsync方法,也是调用异步执行了一段lua脚本

private void renewExpiration() {
        ExpirationEntry ee = EXPIRATION_RENEWAL_MAP.get(getEntryName());
        if (ee == null) {
            return;
        }
        
        Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
            @Override
            public void run(Timeout timeout) throws Exception {
                ExpirationEntry ent = EXPIRATION_RENEWAL_MAP.get(getEntryName());
                if (ent == null) {
                    return;
                }
                Long threadId = ent.getFirstThreadId();
                if (threadId == null) {
                    return;
                }
                
                RFuture<Boolean> future = renewExpirationAsync(threadId);
                future.onComplete((res, e) -> {
                    if (e != null) {
                        log.error("Can't update lock " + getName() + " expiration", e);
                        EXPIRATION_RENEWAL_MAP.remove(getEntryName());
                        return;
                    }
                    
                    if (res) {
                        // reschedule itself
                        renewExpiration();
                    }
                });
            }
        }, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);
        
        ee.setTimeout(task);
}

First, determine whether there is a corresponding 4afd01d9-48e8-4341-9358-19f0507a9dcc:397 in the map structure of the lock key. If it exists, directly call the pexpire command to set the expiration time of the lock key. The default is 30000 milliseconds.

protected RFuture<Boolean> renewExpirationAsync(long threadId) {
        return evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
                "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return 1; " +
                        "end; " +
                        "return 0;",
                Collections.singletonList(getName()),
                internalLockLeaseTime, getLockName(threadId));
}

在上面任务调度的方法中,也是异步执行并且设置了一个监听器,在操作执行成功之后,会回调这个方法,如果调用失败会打一个错误日志并返回,更新锁过期时间失败;
然后获取异步执行的结果,如果为true,就会调用本身,如此说来又会延迟10秒钟去执行这段逻辑,所以,这段逻辑在你成功获取到锁之后,会每隔十秒钟去执行一次,并且,在锁key还没有失效的情况下,会把锁的过期时间继续延长到30000毫秒,也就是说只要这台服务实例没有挂掉,并且没有主动释放锁,看门狗都会每隔十秒给你续约一下,保证锁一直在你手中。完美的操作。

Other instances have not acquired the lock process

What happens if other service instances try to lock at this time? Or other threads of the current client to acquire the lock? Obviously, it will definitely block, let's see how it is done through the code. Still focus on the locked lua code that was analyzed before.

When the locked lock key exists and the unique key of the current client in the map structure corresponding to the lock key also exists, the hincrby command will be called to increase the value of the unique key by one, and the expiration time of the key will be set to pexpire 30,000 milliseconds, and then return nil. It can be imagined that the locking is successful here, and the scheduled scheduling tasks will continue to be executed to complete the renewal of the lock key expiration time. Here, the reentrancy of the lock is realized.

<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
        internalLockLeaseTime = unit.toMillis(leaseTime);

        return evalWriteAsync(getName(), LongCodec.INSTANCE, command,
                "if (redis.call('exists', KEYS[1]) == 0) then " +
                        "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                        "end; " +
                        "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                        "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                        "end; " +
                        "return redis.call('pttl', KEYS[1]);",
                Collections.singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
}

Then when the above situation does not happen, the remaining validity period of the current lock will be directly returned here, and the renewal logic will not be executed accordingly. At this point all the way back to the above method:

If the lock is successful, it will return directly, otherwise it will enter an infinite loop to try to lock, and will continue to try to lock after waiting for a period of time, blocking until the first service instance releases the lock. Trying to acquire a lock for different service instances is also similar to the above logic, and the mutual exclusion of locks is realized in this way.

private void lock(long leaseTime, TimeUnit unit, boolean interruptibly) throws InterruptedException {
    long threadId = Thread.currentThread().getId();
    Long ttl = tryAcquire(-1, leaseTime, unit, threadId);
    // lock acquired
    if (ttl == null) {
        return;
    }

    RFuture<RedissonLockEntry> future = subscribe(threadId);
    if (interruptibly) {
        commandExecutor.syncSubscriptionInterrupted(future);
    } else {
        commandExecutor.syncSubscription(future);
    }

    try {
        while (true) {
            ttl = tryAcquire(-1, leaseTime, unit, threadId);
            // lock acquired
            if (ttl == null) {
                break;
            }

            // waiting for message
            if (ttl >= 0) {
                try {
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                } catch (InterruptedException e) {
                    if (interruptibly) {
                        throw e;
                    }
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                }
            } else {
                if (interruptibly) {
                    future.getNow().getLatch().acquire();
                } else {
                    future.getNow().getLatch().acquireUninterruptibly();
                }
            }
        }
    } finally {
        unsubscribe(future, threadId);
    }
//        get(lockAsync(leaseTime, unit));
}

release lock

public void unlock() {
        try {
            get(unlockAsync(Thread.currentThread().getId()));
        } catch (RedisException e) {
            if (e.getCause() instanceof IllegalMonitorStateException) {
                throw (IllegalMonitorStateException) e.getCause();
            } else {
                throw e;
            }
        }
}
public RFuture<Void> unlockAsync(long threadId) {
        RPromise<Void> result = new RedissonPromise<Void>();
        RFuture<Boolean> future = unlockInnerAsync(threadId);

        future.onComplete((opStatus, e) -> {
            cancelExpirationRenewal(threadId);

            if (e != null) {
                result.tryFailure(e);
                return;
            }

            if (opStatus == null) {
                IllegalMonitorStateException cause = new IllegalMonitorStateException("attempt to unlock lock, not locked by current thread by node id: "
                        + id + " thread-id: " + threadId);
                result.tryFailure(cause);
                return;
            }

            result.trySuccess(null);
        });

        return result;
}

Determine whether the value of the unique key corresponding to the current client exists, and return nil if it does not exist; otherwise, the value is incremented by -1, and determine whether the value of the unique key is greater than zero, and return 0 if it is greater than zero; otherwise, delete the current lock key, and return 1.

protected RFuture<Boolean> unlockInnerAsync(long threadId) {
        return evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
                "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " +
                        "return nil;" +
                        "end; " +
                        "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " +
                        "if (counter > 0) then " +
                        "redis.call('pexpire', KEYS[1], ARGV[2]); " +
                        "return 0; " +
                        "else " +
                        "redis.call('del', KEYS[1]); " +
                        "redis.call('publish', KEYS[2], ARGV[1]); " +
                        "return 1; " +
                        "end; " +
                        "return nil;",
                Arrays.asList(getName(), getChannelName()), LockPubSub.UNLOCK_MESSAGE, internalLockLeaseTime, getLockName(threadId));
}

Returning to the previous method also operates on the return value. If the return value is 1, it will cancel the previous scheduled renewal task. If it fails, it will do something similar to setting the status

void cancelExpirationRenewal(Long threadId) {
        ExpirationEntry task = EXPIRATION_RENEWAL_MAP.get(getEntryName());
        if (task == null) {
            return;
        }
        
        if (threadId != null) {
            task.removeThreadId(threadId);
        }

        if (threadId == null || task.hasNoThreads()) {
            Timeout timeout = task.getTimeout();
            if (timeout != null) {
                timeout.cancel();
            }
            EXPIRATION_RENEWAL_MAP.remove(getEntryName());
        }
}

Guess you like

Origin blog.csdn.net/u011134399/article/details/131150555