Distributed-Distributed lock

Distributed lock

What scenarios need to use locks?

The scene using locks has two characteristics:

  • There is competition for shared resources
  • Mutual exclusion of shared resources

The problem to be solved by distributed lock?

Ensure that in a distributed application cluster, the same method can only be executed by one thread on one machine at the same time

What characteristics do distributed locks need to have?

  • Mutual exclusion , at any time, only one client can hold the lock, and other clients that try to acquire the lock will fail to acquire the lock and return directly or block waiting
  • Robustness : a client crashes while holding the lock without actively releasing the lock, and it is also necessary to ensure that the subsequent client can lock
  • Uniqueness : both locking and unlocking must be the same client
  • High availability : Locking and unlocking operations must be efficient

One, based on MySQL

The implementation of MySQL-based distributed locks is achieved by using MySQL's unique index . The main process is:

  • Acquire the lock: Insert a record into the database lock record table. If the insert is successful, it means that the lock is successfully acquired
  • Execute business logic
  • Release lock: delete records in the lock record table

The unique index can ensure that this piece of data is inserted only once, that is, only one client can insert it correctly, and all others will return insertion failure.

eg:

CREATE TABLE `method_lock` (
    `id` INT(11) NOT NULL AUTO_INCREMENT COMMENT '自增id',
    `method_name` VARCHAR(64) NOT NULL COMMENT '方法名',
    `method_desc` VARCHAR(1024) NOT NULL COMMENT '方法描述',
    `create_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
    `update_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
    PRIMARY KEY (`id`),
    UNIQUE INDEX `uniq_method_name` (`method_name`)
)
COMMENT='分布式锁'
COLLATE='utf8_general_ci'
ENGINE=InnoDB;

The unique index is the method name, that is, the same method can only be inserted once and cannot be inserted repeatedly, that is, only one client can obtain the lock. After the client's business logic is executed, the record needs to be deleted, that is, the lock is released

  • Locked:
insert into method_lock(method_name, method_desc) values("methodName", "desc");
  • Unlock:
delete from method_lock where method_name = "methodName";

What are the disadvantages of implementing distributed locks based on MySQL?

  • The lock has no expiration time . If the unlocking fails, the record will always exist in the database table, causing other nodes to be unable to obtain the lock.
  • Non-blocking lock , return 0 (insert ignore) on insertion failure or throw an exception directly and cannot retry
  • Availability , if the database is hung up, then the entire business logic that needs to be locked will be unavailable
  • Non-reentrant , because it is implemented based on a unique index, the thread that has acquired the lock can no longer acquire the lock

Solution:

  • There is no expiration time for the lock:
    • Add a monitoring node, regularly check whether there are lock records in the database that have not been deleted after the timeout period, and delete if there are
  • For availability:
    • Database master-slave cluster deployment

Second, based on redis implementation

2.1 Based setNxand setExachieve

redis command setnx

  • Set if the key does not exist, return 1 on success, 0 on failure
  • If the key exists, return 0 directly

Plus unlock:

if (setnx(key, 1) == 1){
    
    
    expire(key, 30)
    try {
    
    
        //TODO 业务逻辑
    } finally {
    
    
        del(key)
    }
}

There is a problem:

  • A single redis instruction is atomic, but the combination does not have atomicity. The setnx + expirecombination is non-atomic, and there may be setnxsuccessful but expirefailed scenarios. If the timeout is not successfully set, there will still be situations where other nodes cannot obtain the lock.

    • Solution: LUA script, the execution of LUA script in redis is atomic

      if (redis.call('setnx', KEYS[1], ARGV[1]) < 1)
      then return 0;
      end;
      redis.call('expire', KEYS[1], tonumber(ARGV[2]));
      return 1;
      
      // 使用实例
      EVAL "if (redis.call('setnx',KEYS[1],ARGV[1]) < 1) then return 0; end; redis.call('expire',KEYS[1],tonumber(ARGV[2])); return 1;" 1 key value 100
      

The redis command setex solves the non-atomic problem of setnx + expire combination

  • setex is an atomic command, yes setnx + expire key timeouta combination
  • setex key time value
  • Set the timeout, the lock is prevented since access node does not release the lock after processing the service logic del keynode to acquire the lock or collapse causes lock can not be released, leading to subsequent nodes can not lock properly complete the relevant business logic

Plus unlock:

if (setex(key,30,1) == 1){
    
    
    try {
    
    
        //TODO 业务逻辑
    } finally {
    
    
        del(key)
    }
}

The second problem: the lock error is eliminated, that is, the thread for locking and unlocking is not the same

Scenario: Thread A successfully acquires the lock, and the lock expiration time is 30s, but thread A has been executed for more than 30s due to network congestion and other reasons, then when it reaches 30s, thread A has not finished execution, and the lock expires and is automatically deleted, thread B The lock is acquired and the business logic of thread B is executed. At this time, thread A is also executed, and the del (key) of the finally block of thread A is executed, but the lock of thread B is unlocked at this time

img

Solution:

  • setex key timeout value, where value sets some unique identifiers of the current thread + UUID string, first judge whether it is a locked thread when unlocking

The third problem, timeout unlocking leads to concurrency

Scenario: If thread A successfully acquires the lock and sets an expiration time of 30 seconds, but the execution time of thread A exceeds 30 seconds, the lock will be automatically released after expiration. At this time, thread B acquires the lock, and thread A and thread B execute concurrently.

img

Solution:

  • Crude: Set the expiration time long enough to ensure that the business logic can be executed before the lock is released
  • Increase the daemon thread for the thread that acquires the lock, and increase the effective time for the lock that is about to expire but has not been released

Lock reentrancy implementation

Among the distributed locks implemented by redis, the distributed locks implemented by Redisson implement reentrant locks through the hashmap structure of redis.

Redisson lock and unlock source code analysis:

First of all, the Redissonbottom layer is also based on luascripts to achieve atomic locking and unlocking. Clients that cannot obtain the lock will monitor the lock topic.

  • Locked:

    Insert picture description here
  • Unlock:

    Insert picture description here

Redis usability issues

Asynchronous replication of redis copies makes it impossible to guarantee the mutual exclusion of locks, which may cause two clients to hold locks

In order to ensure the availability of Redis, a master-slave deployment is generally adopted. There are two ways of master-slave data synchronization: asynchronous and synchronous. Redis records the instructions in the local memory buffer, and then asynchronously synchronizes the instructions in the buffer to the slave node. The slave node executes the synchronous instruction stream to reach the state consistent with the master node. , While feeding back the synchronization status to the master node.

In the cluster deployment method that includes the master-slave mode, when the master node goes down, the slave node will take its place, but the client has no obvious perception. When client A successfully locks and the instructions are not synchronized yet, the master node hangs up and the slave node is promoted to master. The new master node has no locked data. When client B locks, it will succeed. This will cause two clients to hold the lock

2.2 Distributed locks in redis cluster environment

Redlock algorithm, in the redis cluster, as long as most of the nodes are alive, the client can perform the locking and unlocking logic

Use multiple redis instances to complete distributed locks, which is to ensure that it is still available in the event of a single point of failure

RedLock steps:

  • Get the current Unix time in milliseconds.
  • Try to acquire locks from 5 instances in turn, using the same key and unique value (such as UUID). When requesting to acquire a lock from Redis, the client should set a network connection and response timeout time, which should be less than the lock expiration time. For example, if your lock automatically expires in 10 seconds, the timeout period should be between 5-50 milliseconds. This can avoid the case where the server-side Redis has hung up, and the client-side is still waiting for the response result. If the server does not respond within the specified time, the client should try to go to another Redis instance to request the lock as soon as possible.
  • The client uses the current time to subtract the time to acquire the lock (the time recorded in step 1) to get the time used to acquire the lock. If and only if the lock is obtained from most of the Redis nodes (N/2+1, here are 3 nodes) , and the time used is less than the lock expiration time, the lock is considered to be acquired successfully .
  • If the lock is acquired, the real effective time of the key is equal to the effective time minus the time used to acquire the lock (calculated in step 3).
  • If for some reason, the lock acquisition fails (the lock has not been acquired in at least N/2+1 Redis instances or the lock acquisition time has exceeded the valid time), the client should unlock all the Redis instances (even if some The Redis instance did not successfully acquire the lock at all, preventing some nodes from acquiring the lock but the client did not get a response, resulting in that the lock cannot be reacquired in the next period of time).

Failure retry mechanism:

  • If a client fails when applying for a lock, the client will wait for a random time before re-applying for the lock on the instance to avoid the situation where multiple clients apply for locks at the same time. If the lock acquisition timeout period expires, the client will exit and try again.

Unlock:

  • Execute directly on all redis instancesdel key

If the node is not our persistence mechanism, client access to the lock from 3 at 5 master, and then restart one of which is pay attention to the whole environment and the emergence of three master for another client to apply the same lock ! Violation of mutual exclusion. If we enable AOF persistence, the situation will be slightly better, because Redis's expiration mechanism is implemented at the semantic level, so time is still elapsed when the server is hung, and the lock state will not be polluted after restarting. However, after considering the power failure, some AOF commands are lost before the flashback to the disk, unless we configure the flashback strategy to fsnyc = always, but this will damage performance. The way to solve this problem is that when a node restarts, we stipulate that it is unavailable during the max TTL period, so that it will not interfere with the lock that has been originally applied, and wait until the part of the lock before it crashes has expired. There is no history lock in the environment, then add this node to work normally.

Problems that may occur in the cluster environment:

  • The data synchronization delay of the master-slave architecture or the data synchronization delay of the master/backup switch during failover may cause two clients to hold the lock or unlock the delay at the same time
  • Network partition, there may be two master nodes, at this time there will also be two clients holding locks at the same time

Third, based on zookeeper implementation

Zookeeper provides a tree structure-level namespace, and the parent node of the /app1/p_1 node is /app1.

Insert picture description here

Node type

  • Persistent node, after the client is created, the client session will not be terminated or deleted
  • Temporary node, deleted after client session ends or timeout
  • The sequence node will add a number suffix after the node name, and it is ordered. For example, the generated ordered node is /lock/node-0000000000, and its next ordered node is /lock/node-0000000001, And so on.

Listener watch mechanism:

Register a listener for the node watch node, when the node status changes, it will send a message to the client

achieve:

  • Create lock directory /lock
  • When a client needs to acquire a lock, in lockcreating a temporary order of child nodes
  • The client obtains /lockthe sub-node directory. If the sub-node created by itself is the node with the smallest serial number in the current sub-node directory, it is considered that the lock has been successfully obtained, otherwise it will monitor its previous sub-node, which is equivalent to queuing to obtain the sub-node After the change, continue to compare whether the serial number of the node created by yourself is the smallest
  • Execute business code
  • Release lock: delete the corresponding child node

Lock timeout

If a session that has acquired the lock times out, because the temporary node is created, the temporary node corresponding to the session will be deleted, and other sessions can obtain the lock. It can be seen that Zookeeper distributed locks will not have the problem of lock release failures of distributed locks implemented by the unique index of the database.

Herding effect:

Another implementation is that the successful creation of the node represents the successful acquisition of the lock, and the client that has not acquired the lock listens to the node. Then when the lock release node is deleted, all clients will be notified, causing all waiting nodes to try to go together. The creation of nodes on the zookeeper server may cause temporary blockage of the network

A node has not obtained the lock and only needs to monitor its previous child node. This is because if all child nodes are monitored, then any child node status changes, all other child nodes will be notified (the herd effect), and we It only hopes that the next child node will be notified.

Four, summary of advantages and disadvantages

  • Based on MySQL:

    • Low performance: limited to the database, performance is limited, not suitable for high concurrency scenarios, you need to consider lock timeouts, transactions, etc.
    • The advantage is that there is no need to maintain middleware, and the implementation is simple
  • Based on ZK implementation:

    • ZK performance is actually not much different from MySQL
    • The watch mechanism allows us not to pay attention to the lock timeout time. ZK implements a fair distributed lock
  • Based on Redis:

    • High availability: Redis clusters need to be maintained. If RedLock is to be implemented, more clusters need to be maintained.

Guess you like

Origin blog.csdn.net/weixin_41922289/article/details/108019042