redis (eight) Distributed Lock

Outline

The article describes the use of distributed in redis introduces the realization of ideas as well as functions RedLock achieve.

Distributed Lock motivation

When there are multiple client but only one has the right to perform distributed architecture can consider using a distributed lock. We must first know the purpose of the lock is to achieve sequential execution.

RedLock design

We must meet two objectives: to protect the safety and activity

  1. Security attribute: the mutex to ensure that only one client the same time to get a lock.
  2. Active properties A: Deadlock release, client crash or cluster partition obtained lock can be released
  3. Active property B: fault tolerance, as long as most of the redis nodes are alive, the client is likely to acquire and release the lock.

Why only achieve failover-base is not enough

This is because we typically use master-slave Master-slave replication to achieve high availability, but there may be the following:

  1. Client A acquires the lock. (ClientA to acquired in master latch) in the master
  2. The master crashes before the write to the key is transmitted to the slave. (Written in master to slave replication time, master down off)
  3. The slave gets promoted to master. (At this point slave inheritance, became the new master from the original masster)
  4. Client B acquires the lock to the same resource A already holds a lock for. SAFETY VIOLATION! (Because the old master down the enemy, then the write request is not able to replicate the success, clientB acquire the lock in the new master, in which case two clients have got the lock)

RedLock implementation

Distributed Lock single Redis nodes

Acquire a lock

SET resource_name my_random_value NX PX 30000

We see here the use of two main things: my_randow_value and expiration time. my_random_value for each client is unique, is used to order the release of locked up in a safe manner, using lua to represent the following process: If this key is present, and among my_random_value exactly like mine, then you can safely deleted.

In regard to the expiration time

When a client gets the lock is successful, if it crashes, or due to network segmentation (network partition) took place causing it no longer and Redis node communication, then it would have been holding the lock, and other clients forever You can not get a lock

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

Why use this my_random_value it? Each client's my_random_value are the same for you? Consider the following scenario.

  1. The client 1 acquires the lock success.
  2. Client 1 on an operating blocked for a long time.
  3. To the expiration time, the lock is automatically released.
  4. Client 2 get to correspond to the same resource lock.
  5. Client 1 recovery from blocking up, freed the locks held by the client 2

In fact, this situation is like the ABA problem CAS in the same operation after a do not know the resource is held by the other clients.

ok, we then look at the situation RedLock Distributed Lock distributed implementation.

Distributed Lock process outlined RedLock

The following description from references:

It is based on the N completely independent Redis node.

  1. Acquires the current time (in milliseconds).

  2. Sequentially performed operations to acquire the lock to the N Redis nodes sequentially. This acquisition process operation to acquire the lock on the front with a single node of the same Redis, including random string my_random_value, also contains the expiration time (such as PX 30000, i.e. lock valid time). In order to ensure a Redis node is not available when the algorithm can continue to run the operation of the lock also get a time-out (time out), it is much smaller than the effective time of the lock (tens of milliseconds). Client to get a lock Redis node failure after Redis should try the next node immediately. Here failure, should include any type of failure, such as the Redis node is not available, or the lock on the Redis node has been held by another client (Note: Redlock description Redis mentioned only the case where the node is not available here, but it should also include other failures).

  3. Calculation for the entire process of acquiring the lock of consuming a total of how much time is calculated by subtracting the time of the first step of recording with the current time. If the client node successfully acquired Redis from most (> = N / 2 + 1) to the lock, the lock acquisition time and no effective total consumed time (lock validity time) than the lock, then the time that the client was finally obtain lock successful; otherwise, that the ultimate failure to acquire the lock.

  4. If the final acquire a lock is successful, then the effective time of the lock should be recalculated, which is equal to the effective time of the initial lock minus the time consumed to acquire the lock step 3 calculated.

  5. If the final lock acquisition failed (probably due to the time of acquisition is less than the number of Redis Nodelock N / 2 + 1, or the entire lock acquisition process consumes time beyond the initial effective lock), then the client must immediately inform all Redis node initiates the release operation of the lock (ie Redis Lua script described earlier).

We can see the record of the time there is no single standard, if there is a node time goes by faster,

Examples restart Notes

Redis assuming a total of five nodes: A, B, C, D, E. Imagine the following sequence of events occurs:

  1. 1 successfully locked client A, B, C, successfully acquire the lock (but not locked D and E).
  2. Node C collapse of the restart, but the client 1 in C plus the lock is not persistent down, lost.
  3. Node C after the restart, the client 2 locked C, D, E, successfully acquiring the lock.

As the above analysis of node restart caused by lock failure problem, there is always possible. To address this problem, it antirez also proposed the concept of the restart delay (delayed restarts) a. In other words, after a node crash, it first does not restart immediately, but wait for some time to restart, this time should be greater than the effective time of the lock (lock validity time). In this case, the node will restart before participating lock expires, it will not affect the existing locks after the restart.

Release the lock Notes

In the final release of the lock when, antirez special emphasis on the algorithm description, the client should initiate the release operation of the lock Redis to all nodes. That is, even when acquiring the lock to a node without success, when releasing the lock should not miss this node. Why is this? Imagine this situation, acquiring the lock request is sent to a client node Redis Redis successfully reached the node that has successfully implemented the SET operation, but it returned to the client response packet was lost. This client seems to get a lock request failed because of a timeout, but it seems here in Redis, lock has been successful. Therefore, when the lock is released, the client should get the time for those Redis node lock failure also initiated the request. In fact, in this case an asynchronous communication model is likely to occur: the client to server communication is normal, but in the opposite direction is problematic.

Martin's analysis

Martin Kleppmann at 2016-02-08 this day I published a blog, called "How to do distributed locking", at the following address: https://martin.kleppmann.com/2016/02/08/how-to-do -distributed-locking.html he put forward several questions about redlock:

  • RedLock likely due to the GC lead to lock failure
  • RedLock strongly dependent on time, security in itself is not enough.

The impact of GC on Distributed Lock

1297993-20200407095540783-1330354705.png

GC thread can be seen when executing the clog leading lock expires, when the client 1 GC pause in the recovery from over, it does not know their own locks held has expired, it is still to shared resources (image above is a storage service) launched a write data request, but this time the lock is actually the client 2 holds, so both the client's request it is possible to write exclusive role conflict (failure of the lock).

Since it is an important factor in the destruction GC mutex lock, it can not yet GC environment. M in the article also made a complex computer system, such as memory page fault, and so are likely to lead to such a phenomenon, M proposed fencing token of things to avoid such incidents.

1297993-20200407100150240-1179496212.png

Personally I feel very strange, this fencing token and RedLock in my_random_value role is not the same as you, although there are maintaining order token but is to identify the resource has been locked other clients.

Strong temporal dependence caused by security

References from the description of:

Martin construct some sequence of events in the text, allowing Redlock failure (two clients at the same time holding the lock). To illustrate Redlock referred to over-reliance on the system (Timing), he first gives a following example (assuming there are five or Redis nodes A, B, C, D, E):

  • From 1 Redis client node A, B, C successfully acquired the lock (majority node). Due to network problems, D and E communicate with failure.
  • The clock on the node C occurred jump forward, resulting in the maintenance of a lock on it quickly expired.
  • 2 Redis client from the nodes C, D, E successfully acquired the lock with a resource (majority node).
  • Client 1 and Client 2 now think they hold the lock. The above reason this may occur because of the security Redlock (safety property) clock has relatively strong dependence on the nature of the system, once the system clock becomes inaccurate, the security of the algorithm also can not guarantee the . In fact, when Martin here is to point out some basic questions Distributed algorithm, or some common sense issues that should be better distributed algorithm based on asynchronous model (asynchronous model), the security of the algorithm should not rely on any note Suppose (timing assumption). In the asynchronous model: process might pause for any length of time, the message may be delayed in the network for any length of time, or even lost, the system clock might be wrong in any way. A good distributed algorithm, these factors should not affect its safety (safety property), but may affect its activity (liveness property), that is to say, even in very extreme cases (such as the system clock a serious error ), the algorithm can not give results at most only for a limited period of time, should not give the wrong results. Such algorithms exist in reality, like the more famous Paxos, or Raft. But apparently according to this standard, then, Redlock level of security is unattainable.

This section described above reminds us of the CAP CP, in order to maintain consistency, sacrifice can only be availability.

supplement

SETNX

Meaning SETNX command is: [Set if Not exists] will say there is no time will be set, SETNX is not supported expiration settings, so the above is achieved by Lua to achieve atomic execution.

Unlock cause concurrency timeout

When a client may acquire the lock because of the time is too short resulting in not executing the lock is released due expired, while other clients can perform locked, so there will be two clients access to resources. The solution is as follows:

  • Increase the execution time as the expiration time, that is, to increase the expiration time
  • Increase daemon thread, increase the expiration time when to expire

Lock reentrancy

We know java, it can re-lock as a data structure into the judgment are: ThreadLocal, then Redis is how to achieve it? We look at how Redission is achieved.

// If not present lock_key 
IF (redis.call ( 'EXISTS', KEYS [1]) == 0) 
the then 
    // lock_key provided for locking the thread identifier 1 
    redis.call ( 'hset', KEYS [ 1], ARGV [2], 1); 
    // set the expiration time 
    redis.call ( 'pexpire', KEYS [1], ARGV [1]); 
    return nil; 
    End; 
// if lock_key current exists and the thread identifier to be locked thread identification 
IF (redis.call ( 'hexists', KEYS [. 1], ARGV [2]) ==. 1) 
    // increment 
    then redis.call ( 'hincrby', KEYS [1], ARGV [2], 1 ); 
    // reset the expiration time 
    redis.call ( 'pexpire', KEYS [. 1], ARGV [. 1]); 
    return nil; 
    End; 
// If the lock fails, the return lock remaining time 
return redis.call ( 'pttl ', KEYS [1]);

Hset can be seen using a structure realized, in fact, it is also used ThreadLocal hash table to store the corresponding data.

Release the lock retry

When the client will fail to obtain the lock again to retry acquiring the lock, then retry function can achieve this:

  • polling
  • Signaling, using redis publication subscriptions, when the acquisition fails, the lock release subscription information. Process signaled as follows from FIG: https://xiaomi-info.github.io/2019/12/17/redis-distributed-lock/

1297993-20200407104331124-1808925956.png

to sum up

RedLock implementation need to rely on each node of the time, which we need to focus a little. Article about the redLock and realization of ideas about some controversy RedLock, the last section summarizes the ideas of several complementary functions implemented.

Reference material

  • https://redis.io/topics/distlock (official documents)
  • http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html (this issue RedLock distributed lock)
  • http://zhangtielei.com/posts/blog-redlock-reasoning.html (must see)
  • https://www.one-tab.com/page/Wuz27GojRK6uiiBMgKcbwQ (page Complete Works)

Guess you like

Origin www.cnblogs.com/Benjious/p/12651995.html