Distributed -- lock

What is a lock?

  • In a single-process system, when there are multiple threads that can change a variable at the same time (variable shared variable), it is necessary to synchronize the variable or code block so that it can linearly execute and eliminate concurrent changes when modifying such variables. variable.

  • The essence of synchronization is achieved through locks. In order to realize that multiple threads can only execute the same code block by one thread at a time, a mark needs to be made somewhere. This mark must be visible to every thread. When the mark does not exist, the mark can be set , and the other subsequent threads find that there is already a mark, and wait for the thread with the mark to end the synchronization code block to cancel the mark and then try to set the mark. This mark can be understood as a lock.

  • The way to implement locks in different places is different, as long as all threads can see the mark. For example, synchronize in Java is to set a mark in the object header, and the implementation class of the Lock interface is basically just a certain volitile-modified int variable, which ensures that each thread can have the visibility and atomic modification of the int, and the linux kernel is also Mark with memory data such as mutex or semaphore.

  • In addition to using memory data as a lock, in fact, any mutually exclusive lock can be used as a lock (only the mutual exclusion situation is considered). Does a certain file exist as a lock, etc. It only needs to satisfy the atomicity and memory visibility when modifying the tag.

 

What is distributed?

The distributed CAP theory tells us:

No distributed system can satisfy Consistency, Availability, and Partition tolerance at the same time, and can only satisfy two at most.

At present, many large-scale websites and applications are deployed in a distributed manner, and the issue of data consistency in distributed scenarios has always been a relatively important topic. Based on the CAP theory, many systems have to choose between these three at the beginning of their design. In the vast majority of scenarios in the Internet field, strong consistency needs to be sacrificed in exchange for high system availability, and the system often only needs to ensure eventual consistency.

 

Distributed scene

Here mainly refers to the cluster mode, multiple identical services are enabled at the same time.

In many scenarios, in order to ensure the eventual consistency of data, we need a lot of technical solutions, such as distributed transactions, distributed locks, etc. Many times we need to ensure that a method can only be executed by the same thread at the same time. In a stand-alone environment, we can solve it through the concurrent API provided by Java, but in a distributed environment, it is not so simple.

  • The biggest difference between distributed and stand-alone is that it is not multi-threaded but multi-process.

  • Since multiple threads can share heap memory, they can simply take memory as the mark storage location. Processes may not even be on the same physical machine, so the tag needs to be stored in a place that all processes can see.

 

What is a distributed lock?

  • When in the distributed model, there is only one copy of the data (or there is a limit), at this time, it is necessary to use the lock technology to control the number of processes that modify the data at a certain time.

  • The lock in the stand-alone mode not only needs to ensure that the process is visible, but also needs to consider the network problem between the process and the lock. (I think the reason why the problem becomes complicated in a distributed situation is that the delay and unreliability of the network need to be considered... a big pit)

  • Distributed locks can still store tags in memory, but the memory is not memory allocated by a process but public memory such as Redis and Memcache. As for using databases, files, etc. as locks, it is the same as the implementation of a single machine, as long as the tags are mutually exclusive.

 

What kind of distributed lock do we need?

  • It can be guaranteed that in a distributed application cluster, the same method can only be executed by one thread on one machine at the same time.

  • The lock must be a reentrant lock (avoid deadlock)

  • This lock is preferably a blocking lock (consider whether or not to use this according to business needs)

  • This lock is preferably a fair lock (consider whether or not to use this according to business needs)

  • There are highly available lock acquisition and release lock functions

  • Better performance in acquiring and releasing locks

 

Distributed lock based on database

Based on optimistic locking

Unique distributed lock based on table primary key

Using the unique feature of the primary key, if multiple requests are submitted to the database at the same time, the database will ensure that only one operation can be successful, then we can consider that the thread that succeeded in the operation has obtained the lock of the method. After the method is executed, we want to To release the lock, delete the database record.

The above simple implementation has the following problems:

  • This lock is strongly dependent on the availability of the database. The database is a single point. Once the database hangs, the business system will be unavailable.

  • This lock has no expiration time. Once the unlock operation fails, the lock record will always be in the database, and other threads can no longer obtain the lock.

  • This lock can only be non-blocking, because the insert operation of the data will directly report an error once the insertion fails. Threads that have not acquired the lock will not enter the queue. To acquire the lock again, the lock acquisition operation is triggered again.

  • This lock is non-reentrant, and the same thread cannot acquire the lock again without releasing the lock. Because the data already exists in the data.

  • This lock is an unfair lock, and all threads waiting for the lock compete for the lock by luck.

  • In MySQL database, primary key conflict prevention is adopted, which may cause table lock phenomenon in the case of large concurrency.

Of course, we can also have other ways to solve the above problems.

  • Is the database a single point? Engage in two databases, the data is synchronized in both directions before, and once it hangs, it is quickly switched to the standby database.

  • No expiration time? Just do a scheduled task, and clean up the timeout data in the database at regular intervals.

  • non-blocking? Engage in a while loop until the insert is successful and then return to success.

  • non-reentrant? Add a field to the database table to record the host information and thread information of the machine that currently obtains the lock, then query the database first when acquiring the lock next time. If the host information and thread information of the current machine can be found in the database, directly Just assign the lock to him.

  • unfair? Build another intermediate table, record all the threads waiting for the lock, and sort them according to the creation time. Only the first created one is allowed to acquire the lock.

  • A better way is to produce the primary key in the program for anti-duplication.

Distributed lock based on table field version number

This strategy is derived from the mvcc mechanism of mysql. There is no problem in using this strategy. The only problem is that it invades the data table. We need to design a version number field for each table, and then write a judgment sql to judge each time. , which increases the number of database operations, and under high concurrency requirements, the overhead of database connections is unbearable.

Based on pessimistic locking

Distributed lock based on database exclusive lock

Add for update after the query statement, and the database will add an exclusive lock to the database table during the query process (Note: When the InnoDB engine is locked, it will only use row-level locks when retrieving through indexes, otherwise it will use table-level locks Lock. Here we want to use row-level locks, and we need to add an index to the field name of the method to be executed. It is worth noting that this index must be created as a unique index, otherwise multiple overloaded methods cannot be accessed at the same time. If the method is overloaded, it is recommended to add the parameter type as well.). When an exclusive lock is added to a record, other threads cannot add an exclusive lock to the row.

We can think that the thread that obtains the exclusive lock can obtain the distributed lock. When the lock is obtained, the business logic of the method can be executed. After the method is executed, the lock is released through the connection.commit() operation.

This method can effectively solve the above-mentioned problems of inability to release locks and blocking locks.

  • blocking lock? The for update statement will return immediately after the execution is successful, and will be blocked when the execution fails until it succeeds.

  • After the lock, the service is down and cannot be released? In this way, the database will release the lock itself after the service goes down.

However, it still cannot directly solve the problem of database single point and reentrancy.

There may be another problem here, although we use a unique index on the method field name, and it is shown using for update to use row-level locking. However, MySQL will optimize the query. Even if the index field is used in the condition, whether to use the index to retrieve data is determined by MySQL by judging the cost of different execution plans. If MySQL thinks that the full table scan is more efficient, such as For some very small tables, it will not use indexes, in which case InnoDB will use table locks instead of row locks. It would be tragic if that happened. . .

Another problem is that we need to use exclusive locks to lock distributed locks. If an exclusive lock is not submitted for a long time, it will occupy the database connection. Once there are too many similar connections, the database connection pool may burst.

Advantages and disadvantages

Pros: Simple, easy to understand

Disadvantages: There will be various problems (a certain overhead is required to operate the database, the use of row-level locks in the database is not necessarily reliable, and the performance is not reliable)

 

Distributed lock based on Redis

Distributed locks based on redis's setnx() and expire() methods

setnx()

The meaning of setnx is SET if Not Exists, which mainly has two parameters setnx(key, value). This method is atomic. If the key does not exist, set the current key successfully and return 1; if the current key already exists, fail to set the current key and return 0.

expire()

expire sets the expiration time. It should be noted that the setnx command cannot set the timeout time of the key. It can only be set for the key through expire().

Steps for usage

1. setnx(lockkey, 1) If it returns 0, it means the placeholder fails; if it returns 1, it means the placeholder is successful

2. The expire() command sets a timeout for the lockkey in order to avoid deadlock problems.

3. After executing the business code, you can delete the key through the delete command.

This solution can actually solve the needs of daily work, but from the discussion of technical solutions, there may be some areas that can be improved. For example, if a shutdown occurs after the successful execution of setnx in the first step and before the expire() command is successfully executed, there will still be a deadlock problem, so if you want to improve it, you can use redis The setnx(), get() and getset() methods to implement distributed locks.

Distributed lock based on redis setnx(), get(), getset() methods

The background of this scheme is mainly to optimize the setnx() and expire() schemes for possible deadlock problems.

getset()

This command has two main parameters getset(key, newValue). This method is atomic, sets the value of newValue to the key, and returns the old value of the key. Assuming that the key does not originally exist, then executing this command multiple times will produce the following effects:

  1. getset(key, "value1") returns null and the value of key will be set to value1

  2. getset(key, "value2") returns value1 and the value of key will be set to value2

  3. And so on!

Steps for usage
  1. setnx(lockkey, current time + expiration timeout), if it returns 1, the lock is acquired successfully; if it returns 0, the lock is not acquired, turn to 2.

  2. get(lockkey) Get the value oldExpireTime, and compare this value with the current system time. If it is less than the current system time, it is considered that the lock has timed out, and other requests can be re-acquired, turning to 3.

  3. Calculate newExpireTime = current time + expiration timeout, and then getset(lockkey, newExpireTime) will return the currentExpireTime of the current lockkey value.

  4. Determine whether currentExpireTime and oldExpireTime are equal. If they are equal, it means that the current getset is successfully set and the lock is acquired. If it is not equal, it means that the lock has been acquired by another request, then the current request can directly return to failure, or continue to retry.

  5. After acquiring the lock, the current thread can start its own business processing. When the processing is completed, compare its own processing time with the timeout time set for the lock. If it is less than the timeout time set by the lock, it will directly execute delete to release the lock; if it is greater than If the timeout time set by the lock is set, no more lock is required for processing.

import cn.com.tpig.cache.redis.RedisService; import cn.com.tpig.utils.SpringUtils; //redis distributed lock public final class RedisLockUtil { private static final int defaultExpire = 60; private RedisLockUtil() { // } /** * lock* @param key redis key * @param expire expiration time, in seconds* @return true: lock success, false, lock failure*/ public static boolean lock(String key, int expire) { RedisService redisService = SpringUtils.getBean(RedisService.class); long status = redisService.setnx(key, "1"); if(status == 1) { redisService.expire(key, expire); return true; } return false; } public static boolean lock(String key) { return lock2(key, defaultExpire); } /** * lock* @param key redis key * @param expire expiration time, in seconds * @return true:加锁成功,false,加锁失败     */    public static boolean lock2(String key, int expire) {        RedisService redisService = SpringUtils.getBean(RedisService.class);        long value = System.currentTimeMillis() + expire;        long status = redisService.setnx(key, String.valueOf(value));        if(status == 1) {            return true;        }        long oldExpireTime = Long.parseLong(redisService.get(key, "0"));        if(oldExpireTime < System.currentTimeMillis()) {            //超时            long newExpireTime = System.currentTimeMillis() + expire;            long currentExpireTime = Long.parseLong(redisService.getSet(key, String.valueOf(newExpireTime)));            if(currentExpireTime == oldExpireTime) {                return true;            }        }        return false;   }    public static void unLock1(String key) {        RedisService redisService = SpringUtils.getBean(RedisService.class);        redisService.del(key);    }    public static void unLock2(String key) {            RedisService redisService = SpringUtils.getBean(RedisService.class);            long oldExpireTime = Long.parseLong(redisService.get(key, "0"));          if(oldExpireTime > System.currentTimeMillis()) {                    redisService.del(key);            }   } }currentTimeMillis()) {                    redisService.del(key);            }   } }currentTimeMillis()) {                    redisService.del(key);            }   } }

public void drawRedPacket(long userId) { String key = "draw.redpacket.userid:" + userId; boolean lock = RedisLockUtil.lock2(key, 60); if(lock) { try { //receive operation} finally { // Release the lock RedisLockUtil.unLock(key); } } else { new RuntimeException("Duplicate reward"); } }

Distributed lock based on Redlock

Redlock is a cluster-mode Redis distributed lock given by the author of Redis, antirez. It is based on N completely independent Redis nodes (usually N can be set to 5).

The steps of the algorithm are as follows:

  • 1. The client obtains the current time in milliseconds.

  • 2. The client tries to acquire the locks of N nodes, (each node acquires the lock in the same way as the cache lock mentioned above), and the N nodes acquire the lock with the same key and value. The client needs to set the interface access timeout. The interface timeout needs to be much smaller than the lock timeout. For example, if the lock is automatically released for 10s, the interface timeout should be set to about 5-50ms. In this way, after a redis node goes down, the access to the node can time out as soon as possible, thereby reducing the normal use of locks.

  • 3. The client calculates how much time it takes to acquire the lock by subtracting the time acquired in step 1 from the current time. Only the client acquires locks of more than 3 nodes, and the time to acquire the lock is less than the timeout of the lock Time, the client obtains the distributed lock.

  • 4. The time for the client to acquire the lock is the set lock timeout time minus the time spent for acquiring the lock calculated in step 3.

  • 5. If the client fails to acquire the lock, the client will delete all the locks in turn.
    Using the Redlock algorithm can ensure that the distributed lock service can still work when at most 2 nodes hang up, which greatly improves the availability compared to the previous database locks and cache locks. Due to the efficient performance of redis, the performance of distributed cache locks No worse than database locks.

However, a distributed expert wrote an article "How to do distributed locking" questioning the correctness of Redlock.

https://mp.weixin.qq.com/s/1bPLk_VZhZ0QYNZS8LkviA

https://blog.csdn.net/jek123456/article/details/72954106

Advantages and disadvantages

advantage:

high performance

shortcoming:

How long should the expiration time be set? How to set the invalidation time is too short, the lock will be automatically released before the method is executed, then there will be concurrency problems. If the set time is too long, other threads that acquire the lock may have to wait for a while.

Distributed lock based on redisson

redisson is the official distributed lock component of redis. GitHub address: https://github.com/redisson/redisson

The above question -> How long should the expiration time be set? This problem in redisson is: every time a lock is obtained, only a short timeout period is set, and a thread is started to refresh the timeout period of the lock every time the timeout period is approaching. End this thread while releasing the lock.

 

Distributed lock based on ZooKeeper

Basic knowledge of zookeeper locks

  • zk is generally composed of multiple nodes (singular) and adopts the zab consensus protocol. Therefore, zk can be regarded as a single-point structure, which automatically modifies all node data within its modified data before providing query services.

  • The data of zk is in the form of directory tree, each directory is called znode, znode can store data (generally no more than 1M), and you can add child nodes in it.

  • There are three types of child nodes. Serialize the node. Each time a node is added under the node, the name of the node is automatically incremented. Ephemeral node, once the client that created this znode loses contact with the server, this znode will also be deleted automatically. The last is the normal node.

  • Watch mechanism, the client can monitor the changes of each node, and when a change occurs, an event will be generated for the client.

zk basic lock

  • Principle: Use temporary nodes and watch mechanism. Each lock occupies a common node /lock. When a lock needs to be acquired, a temporary node is created in the /lock directory. If the lock is successfully created, it means that the lock is acquired successfully. If it fails, the watch/lock node will be used to fight for the lock after the deletion operation. The advantage of temporary nodes is that when the process hangs, the nodes that can be automatically locked are automatically deleted, that is, the lock is cancelled.

  • Disadvantages: All processes that fail to acquire locks monitor the parent node, which is prone to herd effect, that is, when the lock is released, all waiting processes create nodes together, and the amount of concurrency is large.

zk lock optimization

  • Principle: The lock is changed to create a temporary ordered node. Each locked node can successfully create a node, but its serial number is different. Only the node with the smallest sequence number can own the lock. If the node sequence number is not the smallest, the watch sequence number is smaller than the previous node (fair lock).

  • step:

  1. Create an ordered temporary node (EPHEMERAL_SEQUENTIAL) under the /lock node.

  2. Determine whether the created node sequence number is the smallest, and if it is the smallest, the lock is acquired successfully. If it is not, the lock acquisition fails, and then the watch sequence number is smaller than the previous node itself.

  3. When the lock fails, after setting the watch, wait for the watch event to arrive, and then judge again whether the serial number is the smallest.

  4. If the lock is successful, execute the code, and finally release the lock (delete the node).

import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.Lock; import org.apache.zookeeper.CreateMode; import org.apache.zookeeper.KeeperException; import org.apache.zookeeper.WatchedEvent; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.ZooDefs; import org.apache.zookeeper.ZooKeeper; import org.apache.zookeeper.data.Stat; public class DistributedLock implements Lock, Watcher{    private ZooKeeper zk;    private String root = "/locks";//根    private String lockName;//竞争资源的标志    private String waitNode;//等待前一个锁    private String myZnode;//当前锁    private CountDownLatch latch;//Counter private int sessionTimeout = 30000; private List<Exception> exception = new ArrayList<Exception>(); /** * Create a distributed lock, please confirm that the zookeeper service configured by config is available before use * @param config 127.0.0.1 :2181 * @param lockName competition resource flag, lockName cannot contain the word lock */ public DistributedLock(String config, String lockName){ this.lockName = lockName; // Create a connection to the server try { zk = new ZooKeeper(config , sessionTimeout, this); Stat stat = zk.exists(root, false); if(stat == null){ // Create root node zk.create(root, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode .PERSISTENT); } } catch (IOException e) { exception.add(e); } catch (KeeperException e) { exception.add(e); } catch (InterruptedException e) { exception.add(e);       }    }    /**     * zookeeper节点的监视器     */    public void process(WatchedEvent event) {        if(this.latch != null) {            this.latch.countDown();        }    }    public void lock() {        if(exception.size() > 0){            throw new LockException(exception.get(0));        }        try {            if(this.tryLock()){                System.out.println("Thread " + Thread.currentThread().getId() + " " +myZnode + " get lock true");                return;            }            else{                waitForLock(waitNode, sessionTimeout);//等待锁            }        } catch (KeeperException e) {            throw new LockException(e);        } catch (InterruptedException e) {            throw new LockException(e);       }    }    public boolean tryLock() {        try {            String splitStr = "_lock_";            if(lockName.contains(splitStr))                throw new LockException("lockName can not contains \\u000B");            //创建临时子节点            myZnode = zk.create(root + "/" + lockName + splitStr, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.EPHEMERAL_SEQUENTIAL);            System.out.println(myZnode + " is created ");            //取出所有子节点            List<String> subNodes = zk.getChildren(root, false);            //取出所有lockName的锁            List<String> lockObjNodes = new ArrayList<String>();            for (String node : subNodes) {                String _node = node.split(splitStr)[0];                if(_node.equals(lockName)){                    lockObjNodes.add(node);} } Collections.sort(lockObjNodes); System.out.println(myZnode + "==" + lockObjNodes.get(0)); if(myZnode.equals(root+"/"+lockObjNodes.get(0))){ //If it is the smallest node, it means that the lock is obtained and return true; } //If it is not the smallest node, find a node that is 1 smaller than yourself String subMyZnode = myZnode.substring(myZnode.lastIndexOf("/") + 1); waitNode = lockObjNodes.get(Collections.binarySearch(lockObjNodes, subMyZnode) - 1); } catch (KeeperException e) { throw new LockException(e); } catch (InterruptedException e) { throw new LockException(e); } return false; } public boolean tryLock(long time, TimeUnit unit) { try { if(this.tryLock()){ return true;} return waitForLock(waitNode,time); } catch (Exception e) { e.printStackTrace(); } return false; } private boolean waitForLock(String lower, long waitTime) throws InterruptedException, KeeperException { Stat stat = zk.exists(root + "/" + lower,true); //Determine whether a node smaller than yourself exists, if not, you don't need to wait for the lock, and register and monitor if(stat != null){ System.out.println("Thread " + Thread.currentThread().getId() + " waiting for " + root + "/" + lower); this.latch = new CountDownLatch(1); this.latch.await(waitTime, TimeUnit.MILLISECONDS); this .latch = null; } return true; } public void unlock() { try { System.out.println("unlock " + myZnode); zk.delete(myZnode,-1);           myZnode = null;            zk.close();        } catch (InterruptedException e) {            e.printStackTrace();        } catch (KeeperException e) {            e.printStackTrace();        }    }    public void lockInterruptibly() throws InterruptedException {        this.lock();    }    public Condition newCondition() {        return null;    }    public class LockException extends RuntimeException {        private static final long serialVersionUID = 1L;        public LockException(String e){            super(e);        }        public LockException(Exception e){            super(e);        }    } }       }    }    public void lockInterruptibly() throws InterruptedException {        this.lock();    }    public Condition newCondition() {        return null;    }    public class LockException extends RuntimeException {        private static final long serialVersionUID = 1L;        public LockException(String e){            super(e);        }        public LockException(Exception e){            super(e);        }    } }       }    }    public void lockInterruptibly() throws InterruptedException {        this.lock();    }    public Condition newCondition() {        return null;    }    public class LockException extends RuntimeException {        private static final long serialVersionUID = 1L;        public LockException(String e){            super(e);        }        public LockException(Exception e){            super(e);        }    } }

Advantages and disadvantages

advantage:

Effectively solve single-point problems, non-reentrant problems, non-blocking problems and problems that locks cannot be released. It is simpler to implement.

shortcoming:

The performance may not be as high as the cache service, because each time in the process of creating and releasing locks, temporary nodes must be dynamically created and destroyed to realize the lock function. The creation and deletion of nodes in ZK can only be performed through the Leader server, and then the data is synchronized to all Follower machines. It also requires some understanding of the principles of ZK.

 

Distributed lock based on Consul

DD has written a similar article, but in fact, it mainly uses the acquire and release operations in Consul's Key / Value storage API.

Article address: http://blog.didispace.com/spring-cloud-consul-lock-and-semphore/

 

Notes on using distributed locks

1. Pay attention to the overhead of distributed locks

2. Pay attention to the granularity of locking

3. How to lock

 

Summarize

No matter what kind of company you're in, it's likely to start with the simplest of jobs. Don't mention how big the business scenarios of Alibaba and Tencent are, because in such a big scenario, you may not be able to personally participate in the project, and you may not be the core designer if you personally participate in the project, and the core designer may not be able to design alone. I hope that you can choose the plan that suits your project according to your company's business scenario.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325537162&siteId=291194637