Cache distributed lock exploration

First describe the business scenario,

Scenario 1: mq messages process data from various sources, perform user initialization, and write to the detail table. Due to mq's retry mechanism and high concurrency environment, user records may be written multiple times. It is necessary to ensure that during initialization, each Users (pins below) can only allow one transaction to operate, and each unique key can only allow one thread to insert, so locks are required.

Scenario 2: Workers are calculated in the background, and the newly inserted data in the detail table is calculated according to each time period. Only one worker is allowed to execute at the same time. Theoretically, the optimal solution is to use the worker scheduling platform + dns load balancing to achieve , but for simplicity, it is still implemented using locks (there is a serious problem with the use of locks, which will be mentioned at the end of the article).

It's useless to say more, just look at the first version

//version 1
public Boolean tryLock(String key) {
        Boolean isLock = true;
        try{
            if(cluster.incr(key) > 1){//   1
                return false;
            }
        }catch (Exception e){
            cluster.del(key);
            isLock = false;
        }
        cluster.expire(key,1, TimeUnit.MINUTES);
        return isLock;
    }

The first thought is to judge by the atomic operation of incr. When the accumulated number is greater than 1, the lock acquisition fails. After the lock acquisition is successful, the expiration time is set to prevent deadlock. This way of writing is not a problem at first glance, but after the review in the group, it is found that because the operations of incr and expire are separate, not an atomic operation, so after incr, before expire, there is an error, such as an instance error and it is not executed. expire, which will cause the key to deadlock forever, so iterate out the next version.

//version 2
public Boolean tryLock(String key) {
        Boolean isLock = true;
        try{
            if(cluster.incr(key) > 1){
                if(cluster.incr(key) > 2){//Lock only once to prevent instance errors from causing deadlock
                    cluster.del(key);
                    return false;
                }
                return false;
            }
        }catch (Exception e){
            cluster.del(key);
            isLock = false;
        }
        cluster.expire(key,1, TimeUnit.MINUTES);
        return isLock;
    }

Since an error will cause a deadlock, it is judged that when it is locked again, the lock will be deleted. The idea is very good, but if you think about it carefully, you will find the problem. At this time, the value of the deadlock is 2. If there are 4 instances When a lock is requested at the same time, the second request will be locked, and the third one will be locked, but the fourth one can acquire the lock, causing the lock function to fail.

The fundamental problem of using incr is the non-atomic operation and setting of expiration time, which requires other solutions. After referring to other solutions on the Internet, version 3 is available.

//version 3
public Boolean tryLock(String key) {
        try{
            long timeout = TimeUnit.MINUTES.toMillis(15);
            long timestamp = System.currentTimeMillis() + timeout + 1;
            if (cluster.setNX(key, String.valueOf(timestamp))) {
                return true;
            }

            long lockTimestamp = Long.valueOf(cluster.get(key));
            if (System.currentTimeMillis() > lockTimestamp) {
                lockTimestamp = Long.valueOf(cluster.getSet(key, String.valueOf(timestamp)));
                if (System.currentTimeMillis() > lockTimestamp) {
                    return true;
                }
            }

            return false;
        }catch (Exception e){
            cluster.del(key);
            return false;
        }
    }

By writing the expiration time into the value, since setNx is an atomic operation, it can be guaranteed that as long as the lock is acquired, the expiration time must be written, and it is guaranteed to fail immediately after it expires, which is an ideal solution.

Thought this was over? Too naive, if there is a lock, there must be an unlock. The unlock code at this time is as follows:

public void unlock(String key) {
      
            if(cluster.get(key) == null){
                return;
            }
            long lockTimestamp = Long.valueOf(cluster.get(key));
            if (System.currentTimeMillis() > lockTimestamp) {
                cluster.del(key);
            }
   

    }

When it is judged that the key has expired, it is a reasonable logic to delete the lock, but it cannot stand scrutiny. When the business processing time exceeds the expiration time, the lock will become invalid, resulting in data inconsistency;

//final version
private long timeout = TimeUnit.MINUTES.toMillis(30);
    private ThreadLocal<Boolean> threadOwner = new ThreadLocal<Boolean>();




    public Boolean tryLock(String key) {
        try {
            long timestamp = System.currentTimeMillis() + timeout + 1;
     
            if (cluster.setNX(key, String.valueOf(timestamp))) {
                threadOwner.set(true);       
                return true;
            }
          
            long lockTimestamp = Long.valueOf(cluster.get(key));
         
            if (System.currentTimeMillis() > lockTimestamp) {
                lockTimestamp = Long.valueOf(cluster.getSet(key, String.valueOf(timestamp)));
                if (System.currentTimeMillis() > lockTimestamp) {
                   
                    return true;
                }
            }
           
            return false;
        } catch (Exception e) {
           
            cluster.del(key);
            return false;
        }
    }


    public void unlock(String key) {
        if(threadOwner.get() != null && threadOwner.get()){
            threadOwner.remove();
            cluster.del(key);
        }else {
            if(cluster.get(key) == null){
                return;
            }
            long lockTimestamp = Long.valueOf(cluster.get(key));
            if (System.currentTimeMillis() > lockTimestamp) {
                cluster.del(key);        
            }
        }

    }

The thread cache is used here to ensure that only the instance that acquires the lock can delete the lock, and at the same time, the expiration time is increased, as long as it can exceed the business processing time.

Finally, there is the problem of using the lock scheme mentioned above. In a multi-instance environment, since the background computing workers run periodically, if the time of one instance is earlier than the time of other instances, then all workers will be executed by one machine. , resulting in a performance bottleneck, so the scheduling platform will be used to allocate workers in the future.

Cache distributed lock exploration

Guess you like