Java distributed lock is enough to read this #Multiple distributed locks

### What is a lock?

  • In a single-process system, when there are multiple threads that can change a variable (variable shared variable) at the same time, it is necessary to synchronize the variable or code block so that it can be executed linearly to eliminate concurrent modification when modifying this variable variable.
  • The essence of synchronization is achieved through locks. In order to realize that multiple threads can execute the same code block by only one thread at a time, then you need to make a mark somewhere. This mark must be visible to every thread. This mark can be set when the mark does not exist. If the other subsequent threads find that there is a mark, they wait for the thread with the mark to end the synchronization code block and cancel the mark before trying to set the mark. This mark can be understood as a lock.
  • Different places implement locks in different ways, as long as all threads can see the mark. For example, synchronize in Java sets a mark in the object header, and the implementation class of the Lock interface is basically just a volitile-modified int type variable, which ensures that each thread can have visibility and atomic modification of the int, as is the case in the Linux kernel Use memory data such as mutex or semaphore to mark.
  • In addition to using memory data as a lock, in fact, any mutually exclusive lock can be used (only the mutual exclusion is considered). For example, the serial number and time in the flow table can be regarded as a lock that will not be released, or use Whether a file exists as a lock, etc. It only needs to be satisfied that the atomicity and memory visibility can be guaranteed when the mark is modified.

### What is distributed?

The distributed CAP theory tells us:

Any distributed system cannot satisfy Consistency, Availability, and Partition tolerance at the same time, and can only satisfy two at the same time.

At present, many large-scale websites and applications are deployed in a distributed manner, and the issue of data consistency in distributed scenarios has always been a relatively important topic. Based on the CAP theory, many systems have to choose between these three at the beginning of their design. In most scenarios in the Internet field, strong consistency needs to be sacrificed in exchange for high availability of the system, and the system often only needs to ensure final consistency.

Distributed scenario


Here mainly refers to the cluster mode, multiple same services are opened at the same time.

In many scenarios, we need a lot of technical solutions to support in order to ensure the final consistency of the data, such as 分布式事务, 分布式锁etc. Many times we need to ensure that a method can only be executed by the same thread at the same time. In a stand-alone environment, we can solve it through the concurrent API provided by Java, but in a distributed environment, it is not that simple.

  • The biggest difference between distributed and single machine is that it is not multi-threaded 多进程.
  • Since multiple threads can share heap memory, they can simply use memory as the tag storage location. The processes may not even be on the same physical machine, so the tag needs to be stored in a place that all processes can see.
  • What is a distributed lock?


  • When in the distributed model, there is only one copy of the data (or limited). At this time, the lock technology needs to be used to control the number of processes that modify the data at a certain time.
  • The lock in standalone mode not only needs to ensure that the process is visible, but also needs to consider the network problem between the process and the lock. (I think the reason why the problem becomes complicated in the distributed case is that the delay and unreliability of the network need to be considered... a big pit)
  • Distributed locks can still store tags in memory, but the memory is not allocated by a process but public memory such as Redis and Memcache. As for the use of databases, files, etc. for locks, it is the same as the stand-alone implementation, as long as the tags can be mutually exclusive.

### What kind of distributed lock do we need?

  • It can be guaranteed that in a distributed application cluster, the same method can only be executed by one thread on one machine at the same time.
  • If this lock is a reentrant lock (to avoid deadlock)
  • This lock is best to be a blocking lock (consider whether or not this one is required according to business needs)
  • This lock is best to be a fair lock (consider whether or not to use this according to business needs)
  • Has highly available lock acquisition and release lock functions
  • The performance of acquiring and releasing locks is better

### Distributed lock based on database

Based on optimistic locking

Distributed lock based on table primary key uniquely

Idea: Using the unique feature of the primary key, if multiple requests are submitted to the database at the same time, the database will ensure that only one operation can succeed, then we can think that the thread that succeeds in the operation has obtained the lock of the method, and when the method is executed If you want to release the lock, just delete this database record.

The above simple implementation has the following problems:

  • This lock strongly depends on the availability of the database. The database is a single point. Once the database goes down, the business system will become unavailable.
  • This lock has no expiration time. Once the unlock operation fails, the lock record will remain in the database, and other threads can no longer obtain the lock.
  • This lock can only be non-blocking, because the insert operation of the data will directly report an error once the insert fails. Threads that have not acquired the lock will not enter the queue. If they want to acquire the lock again, they must trigger the acquisition operation again.
  • This lock is non-reentrant, and the same thread cannot acquire the lock again before releasing the lock. Because the data in the data already exists.
  • This lock is an unfair lock, and all threads waiting for the lock compete for the lock by luck.
  • In the MySQL database, the use of primary key conflicts to prevent repetition may cause table locks in large concurrency situations.

Of course, we can also solve the above problems in other ways.

  • Is the database single point? Engage in two databases, synchronize the data in both directions, and quickly switch to the standby database once it is down.
  • No expiration time? Just do a timed task to clean up the overtime data in the database at regular intervals.
  • Non-blocking? Make a while loop and return to success until insert is successful.
  • Non-reentrant? Add a field to the database table to record the host information and thread information of the machine currently acquiring the lock, then query the database first when acquiring the lock next time. If the host information and thread information of the current machine can be found in the database, directly Just assign the lock to him.
  • Unfair? Create an intermediate table to record all the threads waiting for locks and sort them according to the creation time. Only the first created one is allowed to acquire the lock.
  • A better way is to produce the primary key in the program for anti-weighting.

##### Distributed lock based on table field version number

This strategy is derived from the mvcc mechanism of mysql. There is no problem in using this strategy itself. The only problem is that the data table is more intrusive. We have to design a version number field for each table, and then write a judgment sql to judge every time , Increasing the number of database operations, under high concurrency requirements, the overhead of database connection is also intolerable.

Based on pessimistic lock

##### Distributed lock based on database exclusive lock

After the query statement is added for update, the database will add an exclusive lock to the database table during the query process (Note: When the InnoDB engine locks, the row-level lock is only used when searching through the index, otherwise it will use the table-level lock. Here we want to use row-level locks, we must add an index to the method field name to be executed, it is worth noting that this index must be created as a unique index, otherwise there will be a problem that multiple overloaded methods cannot be accessed at the same time .It is recommended to add the parameter type to overload the method.). When an exclusive lock is added to a record, other threads can no longer add an exclusive lock on the row of records.


We can think that the thread that obtains the exclusive lock can obtain the distributed lock. When the lock is obtained, the business logic of the method can be executed. After the method is executed, the connection.commit()lock is released through operation.


This method can effectively solve the above mentioned problems of unable to release locks and blocking locks.

  • Blocking lock? for updateThe statement will return immediately after the execution is successful, and will remain blocked when the execution fails until it succeeds.
  • The service is down after being locked and cannot be released? In this way, the database will release the lock by itself after the service goes down.

But it still can't directly solve the database single point and reentrant problem.

There may be another problem here, although we use a unique index on the method field name, and show that for update is used to use row-level locks. However, MySQL will optimize the query. Even if the index field is used in the condition, whether to use the index to retrieve the data is determined by MySQL by judging the cost of different execution plans. If MySQL thinks that the full table scan is more efficient, such as For some very small tables, it will not use indexes. In this case, InnoDB will use table locks instead of row locks. If this happens, it will be a tragedy. . .

Another problem is that we need to use exclusive locks for distributed locks. If an exclusive lock is not submitted for a long time, it will occupy the database connection. Once there are more similar connections, the database connection pool may burst.

##### Advantages and disadvantages

Advantages : simple and easy to understand


Disadvantages : there will be a variety of problems (operating the database requires a certain amount of overhead, the use of database row-level locks is not necessarily reliable, performance is not reliable)

### Distributed lock based on Redis

Distributed lock based on REDIS' SETNX(), EXPIRE() method


setnx()


The meaning of setnx is SET if Not Exists, and it mainly has two parameters setnx(key, value). This method is atomic. If the key does not exist, the setting of the current key succeeds and returns 1; if the current key already exists, the setting of the current key fails and returns 0.

expire()


Expire sets the expiration time. It should be noted that the setnx command cannot set the key's timeout time, only expire() can be used to set the key.

Steps for usage


1. Setnx(lockkey, 1) If it returns 0, it means the placeholder fails; if it returns 1, it means the placeholder succeeds.

2. The expire() command sets the timeout period for lockkey in order to avoid deadlock problems.

3. After executing the business code, you can delete the key through the delete command.

This scheme can actually solve the needs in daily work, but from the discussion of the technical scheme, there may be some areas that can be improved. For example, if after the first step of setnx is successfully executed, before the expire() command is executed successfully, there will be a deadlock problem, so if you want to improve it, you can use redis The setnx(), get() and getset() methods to implement distributed locks.

Distributed lock based on REDIS' SETNX(), GET(), GETSET() methods


The background of this scheme is mainly to optimize the setnx() and expire() schemes for possible deadlock problems.

getset ()


This command mainly has two parameters getset(key, newValue). This method is atomic. It sets the value of newValue to the key and returns the original old value of the key. Assuming that the key does not exist, if you execute this command multiple times, the following effects will appear:

  • getset(key, “value1”) returns null and the value of key will be set to value1
  • getset(key, “value2”) returns value1 and the value of key will be set to value2
  • And so on!

Steps for usage


  • setnx(lockkey, current time + expiration timeout), if it returns 1, the lock is successfully acquired; if it returns 0, the lock is not acquired, and turn to 2.
  • get(lockkey) Gets the value oldExpireTime and compares this value with the current system time. If it is less than the current system time, the lock is considered to have timed out, and other requests can be re-acquired and go to 3.
  • Calculate newExpireTime = current time + expiration timeout time, then getset(lockkey, newExpireTime) will return the value of current lockkey currentExpireTime.
  • Judge whether currentExpireTime and oldExpireTime are equal. If they are equal, it means that the current getset is successfully set and the lock is acquired. If they are not equal, it means that the lock has been acquired by another request, and the current request can directly return to failure or continue to retry.
  • After acquiring the lock, the current thread can start its own business processing. When the processing is completed, compare its processing time with the timeout time set for the lock. If it is less than the timeout time set by the lock, directly execute delete to release the lock; if it is greater than The timeout period set by the lock does not need to be locked for processing.
import cn.com.tpig.cache.redis.RedisService;
import cn.com.tpig.utils.SpringUtils;

//redis分布式锁
public final class RedisLockUtil {

    private static final int defaultExpire = 60;

    private RedisLockUtil() {
        //
    }

    /**
     * 加锁
     * @param key redis key
     * @param expire 过期时间,单位秒
     * @return true:加锁成功,false,加锁失败
     */
    public static boolean lock(String key, int expire) {

        RedisService redisService = SpringUtils.getBean(RedisService.class);
        long status = redisService.setnx(key, "1");

        if(status == 1) {
            redisService.expire(key, expire);
            return true;
        }

        return false;
    }

    public static boolean lock(String key) {
        return lock2(key, defaultExpire);
    }

    /**
     * 加锁
     * @param key redis key
     * @param expire 过期时间,单位秒
     * @return true:加锁成功,false,加锁失败
     */
    public static boolean lock2(String key, int expire) {

        RedisService redisService = SpringUtils.getBean(RedisService.class);

        long value = System.currentTimeMillis() + expire;
        long status = redisService.setnx(key, String.valueOf(value));

        if(status == 1) {
            return true;
        }
        long oldExpireTime = Long.parseLong(redisService.get(key, "0"));
        if(oldExpireTime < System.currentTimeMillis()) {
            //超时
            long newExpireTime = System.currentTimeMillis() + expire;
            long currentExpireTime = Long.parseLong(redisService.getSet(key, String.valueOf(newExpireTime)));
            if(currentExpireTime == oldExpireTime) {
                return true;
            }
        }
        return false;
    }

    public static void unLock1(String key) {
        RedisService redisService = SpringUtils.getBean(RedisService.class);
        redisService.del(key);
    }

    public static void unLock2(String key) {    
        RedisService redisService = SpringUtils.getBean(RedisService.class);    
        long oldExpireTime = Long.parseLong(redisService.get(key, "0"));   
        if(oldExpireTime > System.currentTimeMillis()) {        
            redisService.del(key);    
        }
   }
}
public void drawRedPacket(long userId) {
    String key = "draw.redpacket.userid:" + userId;

    boolean lock = RedisLockUtil.lock2(key, 60);
    if(lock) {
        try {
            //领取操作
        } finally {
            //释放锁
            RedisLockUtil.unLock(key);
        }
    } else {
        new RuntimeException("重复领取奖励");
    }
}

Distributed lock based on REDLOCK


Redlock is a Redis distributed lock in cluster mode given by the author of Redis, antirez, based on N completely independent Redis nodes (N can be set to 5 under normal circumstances).

The steps of the algorithm are as follows:

  • 1. The client obtains the current time in milliseconds.
  • 2. The client tries to acquire the locks of N nodes (each node acquires the lock in the same way as the cache lock mentioned above), and the N nodes acquire the lock with the same key and value. The client needs to set the interface access timeout, and the interface timeout time needs to be much shorter than the lock timeout time. For example, if the lock is automatically released for 10s, the interface timeout should be set to about 5-50ms. In this way, after a redis node is down, the access to the node can time out as soon as possible, and the normal use of the lock can be reduced.
  • 3. The client calculates how much time it took to acquire the lock by subtracting the time acquired in step 1 from the current time. Only the client has acquired the lock for more than 3 nodes, and the time for acquiring the lock is less than the lock timeout Time, the client obtained the distributed lock.
  • 4. The lock time acquired by the client is the set lock timeout time minus the lock acquisition time calculated in step 3.
  • 5. If the client fails to acquire the lock, the client will delete all locks in turn.

Using the Redlock algorithm, you can ensure that the distributed lock service can still work when up to 2 nodes are hung up. This greatly improves the availability compared to the previous database locks and cache locks. Due to the high efficiency of redis, the performance of distributed cache locks Not worse than database locks.

However, a distributed expert wrote an article "How to do distributed locking", questioning the correctness of Redlock.

https://mp.weixin.qq.com/s/1bPLk_VZhZ0QYNZS8LkviA

https://blog.csdn.net/jek123456/article/details/72954106

Pros and cons


Advantages:  high performance

Disadvantages:

How long is the expiration time set? How to set the invalidation time is too short, the lock is automatically released before the method is executed, then concurrency problems will occur. If the set time is too long, other threads that acquire the lock may have to wait for a while.

Distributed lock based on REDISSON


redisson is the official distributed lock component of redis. GitHub address: https://github.com/redisson/redisson

The above question --> How long is the expiration time set? This problem in redisson's approach is: each time a lock is acquired, only a short timeout period is set, and a thread is set to refresh the lock timeout period each time the timeout period is almost reached. End the thread while releasing the lock.

Distributed lock based on ZooKeeper


ZOOKEEPER lock related basic knowledge


  • zk is generally composed of multiple nodes (singular) and adopts the zab consensus protocol. Therefore, zk can be regarded as a single-point structure, and its modified data will automatically modify all node data before providing query services.
  • The data of zk is in the form of a directory tree. Each directory is called a znode. The znode can store data (generally no more than 1M), and it can also add child nodes.
  • There are three types of child nodes. Serialized node, each time a node is added under the node, the name of the node is automatically incremented. Temporary node. Once the client who created this znode loses contact with the server, this znode will also be deleted automatically. Finally, there are ordinary nodes.
  • Watch mechanism, the client can monitor the changes of each node, and when there is a change, an event will be generated for the client.

ZK basic lock


  • Principle: Use temporary nodes and watch mechanism. Each lock occupies a common node /lock. When a lock needs to be acquired, a temporary node is created in the /lock directory. If the creation is successful, it means the acquisition of the lock is successful. If it fails, the watch/lock node will be deleted. The advantage of the temporary node is that the node that can be automatically locked after the process hangs is automatically deleted or unlocked.
  • Disadvantages: All processes that fail to take locks listen to the parent node, which is prone to herding effect, that is, when the lock is released, all waiting processes create nodes together, and the amount of concurrent is large.

ZK lock optimization


  • Principle: The lock is changed to create a temporary ordered node. Each locked node can create a node successfully, but its sequence number is different. Only the node with the smallest sequence number can own the lock. If the node sequence number is not the smallest, watch the previous node with a smaller sequence number (fair lock).

step:

  • 1. Create an ordered temporary node (EPHEMERAL_SEQUENTIAL) under the /lock node.
  • 2. Judge whether the sequence number of the created node is the smallest, if it is the smallest, the lock is successfully acquired. If it is not, the lock fails, and then watch the previous node with a smaller serial number than itself.
  • 3. When the lock fails, after setting watch, wait for the watch event to arrive, and then judge whether the serial number is the smallest.
  • 4. The code is executed if the lock is successfully taken, and the lock is finally released (delete the node).
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;

import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.ZooDefs;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.data.Stat;

public class DistributedLock implements Lock, Watcher{
    private ZooKeeper zk;
    private String root = "/locks";//根
    private String lockName;//竞争资源的标志
    private String waitNode;//等待前一个锁
    private String myZnode;//当前锁
    private CountDownLatch latch;//计数器
    private int sessionTimeout = 30000;
    private List<Exception> exception = new ArrayList<Exception>();

    /**
     * 创建分布式锁,使用前请确认config配置的zookeeper服务可用
     * @param config 127.0.0.1:2181
     * @param lockName 竞争资源标志,lockName中不能包含单词lock
     */
    public DistributedLock(String config, String lockName){
        this.lockName = lockName;
        // 创建一个与服务器的连接
        try {
            zk = new ZooKeeper(config, sessionTimeout, this);
            Stat stat = zk.exists(root, false);
            if(stat == null){
                // 创建根节点
                zk.create(root, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);
            }
        } catch (IOException e) {
            exception.add(e);
        } catch (KeeperException e) {
            exception.add(e);
        } catch (InterruptedException e) {
            exception.add(e);
        }
    }

    /**
     * zookeeper节点的监视器
     */
    public void process(WatchedEvent event) {
        if(this.latch != null) {
            this.latch.countDown();
        }
    }

    public void lock() {
        if(exception.size() > 0){
            throw new LockException(exception.get(0));
        }
        try {
            if(this.tryLock()){
                System.out.println("Thread " + Thread.currentThread().getId() + " " +myZnode + " get lock true");
                return;
            }
            else{
                waitForLock(waitNode, sessionTimeout);//等待锁
            }
        } catch (KeeperException e) {
            throw new LockException(e);
        } catch (InterruptedException e) {
            throw new LockException(e);
        }
    }

    public boolean tryLock() {
        try {
            String splitStr = "_lock_";
            if(lockName.contains(splitStr))
                throw new LockException("lockName can not contains \\u000B");
            //创建临时子节点
            myZnode = zk.create(root + "/" + lockName + splitStr, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.EPHEMERAL_SEQUENTIAL);
            System.out.println(myZnode + " is created ");
            //取出所有子节点
            List<String> subNodes = zk.getChildren(root, false);
            //取出所有lockName的锁
            List<String> lockObjNodes = new ArrayList<String>();
            for (String node : subNodes) {
                String _node = node.split(splitStr)[0];
                if(_node.equals(lockName)){
                    lockObjNodes.add(node);
                }
            }
            Collections.sort(lockObjNodes);
            System.out.println(myZnode + "==" + lockObjNodes.get(0));
            if(myZnode.equals(root+"/"+lockObjNodes.get(0))){
                //如果是最小的节点,则表示取得锁
                return true;
            }
            //如果不是最小的节点,找到比自己小1的节点
            String subMyZnode = myZnode.substring(myZnode.lastIndexOf("/") + 1);
            waitNode = lockObjNodes.get(Collections.binarySearch(lockObjNodes, subMyZnode) - 1);
        } catch (KeeperException e) {
            throw new LockException(e);
        } catch (InterruptedException e) {
            throw new LockException(e);
        }
        return false;
    }

    public boolean tryLock(long time, TimeUnit unit) {
        try {
            if(this.tryLock()){
                return true;
            }
            return waitForLock(waitNode,time);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return false;
    }

    private boolean waitForLock(String lower, long waitTime) throws InterruptedException, KeeperException {
        Stat stat = zk.exists(root + "/" + lower,true);
        //判断比自己小一个数的节点是否存在,如果不存在则无需等待锁,同时注册监听
        if(stat != null){
            System.out.println("Thread " + Thread.currentThread().getId() + " waiting for " + root + "/" + lower);
            this.latch = new CountDownLatch(1);
            this.latch.await(waitTime, TimeUnit.MILLISECONDS);
            this.latch = null;
        }
        return true;
    }

    public void unlock() {
        try {
            System.out.println("unlock " + myZnode);
            zk.delete(myZnode,-1);
            myZnode = null;
            zk.close();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (KeeperException e) {
            e.printStackTrace();
        }
    }

    public void lockInterruptibly() throws InterruptedException {
        this.lock();
    }

    public Condition newCondition() {
        return null;
    }

    public class LockException extends RuntimeException {
        private static final long serialVersionUID = 1L;
        public LockException(String e){
            super(e);
        }
        public LockException(Exception e){
            super(e);
        }
    }
}

Pros and cons


advantage:

Effectively solve single-point problems, non-reentrant problems, non-blocking problems, and locks that cannot be released. It is relatively simple to implement.

Disadvantages:

The performance may not be as high as the cache service, because each time in the process of creating and releasing locks, temporary nodes must be dynamically created and destroyed to achieve the lock function. The creation and deletion of nodes in ZK can only be performed through the Leader server, and then the data is synchronized to all follower machines. Also need to understand the principle of ZK.

Distributed lock based on Consul


DD has written a similar article, but in fact, it mainly uses the acquire and release operations in Consul's Key/Value storage API.

Article address: http://blog.didispace.com/spring-cloud-consul-lock-and-semphore/

Precautions for using distributed locks


1. Pay attention to the overhead of distributed locks

2. Pay attention to the granularity of locking

3. The way of locking

to sum up


No matter what kind of company you are in, the first job may need to start from the simplest. Don't mention how big the qps of Ali and Tencent's business scenarios are, because in such a big scenario, you may not be able to personally participate in the project, and personally participating in the project may not be the core designer, and the core designer may not be able to design alone. I hope everyone can choose a plan that suits their projects according to their company's business scenarios.

Guess you like

Origin blog.csdn.net/superiorpengFight/article/details/102724663