Pit of Redis distributed lock

img

1. Non-atomic operations (setnx + expire)

When it comes to Redisthe distributed lock implemented, many friends immediately think of setnx+ expirecommands. In other words, setnxit is used to , and then expireset an expiration time for the lock after the lock is grabbed.

The pseudo code is as follows:

if(jedis.setnx(lock_key,lock_value) == 1{
    
     //加锁
    jedis.expire(lock_key,timeout); //设置过期时间
    doBusiness //业务逻辑处理
}

This piece of code has pitfalls , because it is written separately setnxfrom the two commands, and it is not an atomic operation! If the lock expireis just about to be executed and the expiration time is about to be executed, the process may restart for maintenance, then the lock will be " immortal ", and other threads will never be able to acquire the lock.setnxexpirecrash

2. Overwritten by other client requests (setnx + value is the expiration time)

In order to solve: the problem that the lock cannot be released when an exception occurs . Some friends suggested that the expiration time can be put in setnxit value. If the lock fails, take out valuethe value and the current system time to check whether it is expired. The pseudo code is implemented as follows:

long expireTime = System.currentTimeMillis() + timeout; //系统时间+设置的超时时间
String expireTimeStr = String.valueOf(expireTime); //转化为String字符串

// 如果当前锁不存在,返回加锁成功
if (jedis.setnx(lock_key, expireTimeStr) == 1) {
    
    
        return true;
} 

// 如果锁已经存在,获取锁的过期时间
String oldExpireTimreStr = jedis.get(lock_key);

// 如果获取到的老的预期过期时间,小于系统当前时间,表示已经过期了
if (oldExpireTimreStr != null && Long.parseLong(oldExpireTimreStr) < System.currentTimeMillis()) {
    
    

     //锁已过期,获取上一个锁的过期时间,并设置现在锁的过期时间(不了解redis的getSet命令的小伙伴,可以去官网看下哈)
    String oldValueStr = jedis.getSet(lock_key, expireTimeStr);
    
    if (oldValueStr != null && oldValueStr.equals(oldExpireTimreStr)) {
    
    
      //考虑多线程并发的情况,只有一个线程的设置值和当前值相同,它才可以加锁
      return true;
    }
}
        
//其他情况,均返回加锁失败
return false;
}

This kind of implementation scheme also has pitfalls: if multiple clients request at the same time when the lock expires, all of them will be executed. In the end, only jedis.getSet()one client can successfully lock, but the expiration time of the client lock may vary . Overwritten by other clients .

3. Forgot to set the expiration time

In the previous reviewcode, I saw the distributed lock implemented in this way, the pseudocode :

try{
    
    
  if(jedis.setnx(lock_key,lock_value) == 1){
    
    //加锁
     doBusiness //业务逻辑处理
     return true; //加锁成功,处理完业务逻辑返回
  }
  return false; //加锁失败
} finally {
    
    
    unlock(lockKey);- //释放锁
} 

What's wrong with this piece? Yes, forgot to set the expiration time . If the machine suddenly hangs up during the running of the program, the code level has not reached finallythe code block, that is, the lock has not been deleted before the shutdown. In this case, there is no way to guarantee unlocking, so lockKeyan expiration time needs to be added here. Note that when using distributed locks, you must set an expiration time .

4. After the business is processed, forget to release the lock

Many small partners will use Redisthe setinstruction extension parameters to implement distributed locks.

set指令扩展参数:SET key value[EX seconds][PX milliseconds][NX|XX]

- NX :表示key不存在的时候,才能set成功,也即保证只有第一个客户端请求才能获得锁,
  而其他客户端请求只能等其释放锁,才能获取。
- EX seconds :设定key的过期时间,时间单位是秒。
- PX milliseconds: 设定key的过期时间,单位为毫秒
- XX: 仅当key存在时设置值

The small partner will write the following pseudocode:

if(jedis.set(lockKey, requestId, "NX", "PX", expireTime)==1){
    
     //加锁
   doBusiness //业务逻辑处理
   return true; //加锁成功,处理完业务逻辑返回
}
return false; //加锁失败

This piece of pseudo-code, at first glance, I think there is nothing wrong with it, but after thinking about it, it is not quite right. Because I forgot to release the lock ! If you have to wait until the timeout period before releasing the lock every time the lock is successfully acquired , there will be problems. This program is not efficient, and the lock should be released every time the business logic is processed .

For example:

try{
    
    
  if(jedis.set(lockKey, requestId, "NX", "PX", expireTime)==1){
    
    //加锁
     doBusiness //业务逻辑处理
     return true; //加锁成功,处理完业务逻辑返回
  }
  return false; //加锁失败
} finally {
    
    
    unlock(lockKey);- //释放锁
}  

5. B's lock is released by A

Let's look at this piece of pseudocode:

try{
    
    
  if(jedis.set(lockKey, requestId, "NX", "PX",expireTime)==1){
    
    //加锁
     doBusiness //业务逻辑处理
     return true; //加锁成功,处理完业务逻辑返回
  }
  return false; //加锁失败
} finally {
    
    
    unlock(lockKey); //释放锁
}  

What pits do you think there will be ?

Suppose in such a concurrency scenario: A、Btwo threads try to lockKeylock the key of Redis, and Athe thread gets the lock first (if the lock timeout 3expires in seconds). If Athe business logic executed by the thread is time-consuming, 3it still has not been executed after more than a second. At this time, the lock Rediswill be released automatically lockKey. Just at this time, when the thread Bcomes over, it can grab the lock and start executing its business logic. At this time, when the thread Afinishes executing the logic and releases the lock, it Breleases the lock.

The correct way should be, when using setthe extended parameter to lock, put one more unique tag for this thread request , for example requestId, when releasing the lock, judge whether it is the request just now .

try{
    
    
  if(jedis.set(lockKey, requestId, "NX", "PX",expireTime)==1){
    
    //加锁
     doBusiness //业务逻辑处理
     return true; //加锁成功,处理完业务逻辑返回
  }
  return false; //加锁失败
} finally {
    
    
    if (requestId.equals(jedis.get(lockKey))) {
    
     //判断一下是不是自己的requestId
      unlock(lockKey);//释放锁
    }   
}  

6. When releasing the lock, it is not atomic

The above piece of code still has pitfalls:

   if (requestId.equals(jedis.get(lockKey))) {
    
     //判断一下是不是自己的requestId
      unlock(lockKey);//释放锁
    }   

Because judging whether it is a lock added by the current thread and releasing the lock is not an atomic operation . If unlock(lockKey)the release lock is called, the lock has expired, so the lock may no longer belong to the current client, and the lock added by others will be released .

Therefore, the pit is: 判断和删除there are two operations, not atomic, and there is a consistency problem. 释放锁必须保证原子性, can be Redis+Luadone using scripts, similar Luascripts are as follows:

if redis.call('get',KEYS[1]) == ARGV[1] then 
   return redis.call('del',KEYS[1]) 
else
   return 0
end;  

7. The lock expires and is released, and the business is not completed

After locking, if the timeout expires, Redisthe lock will be automatically released and cleared. In this way, the lock may be released in advance before the business is processed . How to do it?

Some friends think that it is enough to set the lock expiration time a little longer. In fact, let's imagine whether it is possible to start a timing daemon thread for the thread that acquires the lock, and check whether the lock still exists every once in a while. If it exists, the expiration time of the lock will be extended to prevent the lock from being released early.

The current open source framework Redisson solves this problem. Let's take a look at Redissonthe underlying schematic:

2

As long as the thread is locked successfully, a watchdog will be started . It is a background thread that will check every second. If thread 1 still holds the lock, the life time of the lock will be continuously extended . Therefore, it is used to solve the problem that the lock expires and is released, and the business is not completed .watch dog10keyRedissonRedisson

8. Redis distributed lock and @transactional use invalid

Let's take a look at this pseudocode:

@Transactional
public void updateDB(int lockKey) {
    
    
  boolean lockFlag = redisLock.lock(lockKey);
  if (!lockFlag) {
    
    
    throw new RuntimeException(“请稍后再试”);
  }
   doBusiness //业务逻辑处理
   redisLock.unlock(lockKey);
}

In the transaction, Redisa distributed lock is used. Once this method is executed, the transaction takes effect, and then Redisthe distributed lock takes effect. After the code is executed, Redisthe distributed lock is released first, and then the transaction data is submitted, and finally the transaction ends. In this process, before the transaction is committed, the distributed lock has been released, causing the distributed lock to fail

This is because:

springYes Aop, the transaction will updateDBbe opened before the method, and then the lock will be added. After the locked code is executed, the transaction will be submitted. Therefore, the locked code block is executed within the transaction, and it can be inferred that when the code block is executed , the transaction has not been committed yet, and the lock has been released. At this time, the code block that is locked after other threads get the lock, the inventory data read is not the latest.

The correct implementation method can be lockedupdateDB before the method , that is, before the transaction is opened, then the security of the thread can be guaranteed.

9. Locks are reentrant

RedisThe distributed locks discussed above are not reentrant .

The so-called non-reentrant means that the current thread has acquired the lock by executing a certain method, so when trying to acquire the lock again in the method, it will be blocked and the lock cannot be acquired again. The same person can take a lock only once and not at the same 2time.

Non-reentrant distributed locks can satisfy most business scenarios . But sometimes in some business scenarios, we still need reentrant distributed locks . In the process of implementing distributed locks, you need to pay attention to whether your current business scenarios need reentrant distributed locks.

RedisAs long as these two problems are solved, the reentrant lock can be realized :

  • How to save the currently held thread
  • How to maintain the number of locks (that is, how many times have you re-entered)

To implement a reentrant distributed lock, we can refer to JDKthe ReentrantLockdesign idea. In fact, you can use the framework directly Redisson, which supports reentrant locks.

10. Pit caused by Redis master-slave replication

When implementing Redisdistributed locks, pay attention to Redisthe pitfalls of master-slave replication . Because Redisit is generally deployed in clusters:

img

If thread one gets the lock on the node, but the lock Redishas not been synchronized to the node. Just at this time, a node fails, and a node will be upgraded to a node. Thread two can acquire the same lock, but thread one has already acquired the lock, and the security of the lock is gone.masterkeyslavemasterslavemasterkey

In order to solve this problem, Redis author antirez proposed an advanced distributed lock algorithm: Redlock. RedlockThe core idea is this:

Do multiple Redis master deployments to ensure that they don't go down at the same time. And these master nodes are completely independent of each other, and there is no data synchronization between them. At the same time, you need to ensure that the same method is used to acquire and release locks on multiple master instances as on a single instance of Redis.

We assume that there is currently 5a Redis masternode 5running these instances on a server Redis.

img

The implementation steps of RedLock are as follows:

  1. Get the current time in milliseconds.
  2. Request locks from nodes 5in order . masterThe client sets the network connection and response timeout period, and the timeout period should be less than the expiration time of the lock. (Assuming that the automatic lock expiration time is 10seconds, the timeout period is generally between 5-50milliseconds, let's assume that the timeout period is 50msright). If it times out, skip the masternode and try the next masternode as soon as possible.
  3. The client uses the current time to subtract the start time of acquiring the lock (that is, 1the time recorded in the step) to obtain the time used to acquire the lock. If and only if more than half ( N/2+1, here is 5/2+1=3a node) of Redis masterthe nodes have acquired the lock, and the use time is less than the lock expiration time, the lock is considered successful. (as shown above, 10s> 30ms+40ms+50ms+4m0s+50ms)
  4. If the lock is acquired, keythe real effective time of the lock will change, and the time used to acquire the lock needs to be subtracted.
  5. If the lock acquisition fails (the lock is not acquired at least in N/2+1个masterthe instance, or the lock acquisition time has exceeded the valid time), the client needs to masterunlock on all nodes (even if some masternodes have not been successfully locked at all, they still need to be unlocked, so as to To prevent some slipping through the net).

The simplified steps are:

  • Request locks from 5 master nodes in sequence
  • Judging according to the set timeout period, whether to skip the master node.
  • If more than or equal to 3 nodes are successfully locked, and the use time is less than the validity period of the lock, it can be determined that the lock is successful.
  • If acquiring the lock fails, unlock it!

Guess you like

Origin blog.csdn.net/zhw21w/article/details/129563500