Everyone respected Redis distributed lock really foolproof it?

In a single JVM instance, there is a common way to deal with a lot of concurrency issues, such as the synchronized keyword access control, volatile keyword, ReentrantLock other commonly used methods. But in a distributed environment, the above method can not be used to handle concurrency issues across JVM scenario, when business scenarios need to deal with concurrency issues in a distributed environment, requires the use of distributed lock is achieved.

Distributed Lock, it refers to a distributed deployment environment, to allow more customers through the end of the lock mechanism mutually exclusive access to shared resources.

Currently more common distributed lock implementations are the following:

  1. Based on the database, such as MySQL
  2. Based caching, such as Redis
  3. Based Zookeeper, etcd and so on.

In the previous " based on implementation of distributed database lock describes" how to implement a distributed database based lock, explain here how to use caching (Redis) implement a distributed lock.

Redis achieved using a distributed lock the simplest solution is to use the command SETNX. SETNX (SET if Not eXist) is in use: SETNX key value, only in the case where the key does not exist in the keys, the key value of the key is value, which key when the key is present, then no action SETNX. SETNX return when setting success, returns 0 when setup failed. When you want to acquire the lock directly SETNX acquire the lock when to release the lock, use the DEL command to delete the corresponding key to key.

Above this scheme has a fatal problem, that is a thread due to some abnormal factors (such as downtime) can not normally perform the unlock operation after acquiring the lock, the lock will never be released can not afford. To do this, we can add a timeout for the lock. The first time we associate with the Redis EXPIRE command (EXPIRE key seconds). But here we can not use the EXPIRE to implement a distributed lock because it is SETNX with two operations, an exception may occur between these two operations, which is still not achieve the desired results, for example:

// STEP 1
SETNX key value
// 若在这里(STEP1和STEP2之间)程序突然崩溃,则无法设置过期时间,将有可能无法释放锁
// STEP 2
EXPIRE key expireTime

In this regard, the correct posture should be using the "SET key value [EX seconds] [PX milliseconds] [NX | XX]" command.

Starting Redis 2.6.12 version, the behavior of the SET command can be modified by a series of parameters:

  • EX seconds: the key expiration time set to seconds seconds. SET key value EX seconds performing same effect as performing SETEX key seconds value.
  • PX milliseconds: set the expiration time to bond milliseconds milliseconds. Performing SET key value PX milliseconds same effect as performing PSETEX key milliseconds value.
  • NX: only when the bond is absent, fishes key set operation. Performing SET key value NX same effect as performing SETNX key value.
  • XX: only when key already exists, fishes key set operation.

For example, we need to create a distributed lock and set the expiration time for the 10s, you can execute the following command:

SET lockKey lockValue EX 10 NX
或者
SET lockKey lockValue PX 10000 NX

Note EX and PX can not be used, otherwise it will error: ERR syntax error.

Unlock time or use the DEL command to unlock.

After the modifications looks perfect, but in fact there was still a problem. Imagine a thread A acquires a lock for and set an expiration time for the 10s, 15s and then spent time in the execution of business logic, in which case A thread acquired the lock mechanism has long been Redis expires automatically released. A thread acquires the lock and after 10s, change the locks may have been acquired to other threads. A thread when executing the business logic ready to unlock (DEL key), it is possible to delete the other thread has acquired the lock.

So the best way is to determine whether the lock is in the unlocked own. We can set the key when the value is set to a unique value uniqueValue (can be random values, UUID, or a combination of machine number + thread number, signature, etc.). When unlocked, that is, when the delete key to determine what the key corresponding to the value equal to the value previously set, you can remove the key if they are equal, pseudo-code examples are as follows:

if uniqueKey == GET(key) {
	DEL key
}

Here we can see the problem at a glance: GET and DEL are two separate operations, and the gap is likely to perform before DEL exception occurs after the execution of the GET. If we just make sure to unlock the code are atomic able to solve the problem. Here we introduce a new way, that is, Lua script, for example:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

Wherein ARGV [1] represents a unique value designated when setting key.

Because atomic Lua scripts, Redis during execution of the script, the other client orders need to wait for the Lua script completes to perform.

Here we use the Jedis show you get to lock and unlock the realization, as follows:

public boolean lock(String lockKey, String uniqueValue, int seconds){
    SetParams params = new SetParams();
    params.nx().ex(seconds);
    String result = jedis.set(lockKey, uniqueValue, params);
    if ("OK".equals(result)) {
        return true;
    }
    return false;
}

public boolean unlock(String lockKey, String uniqueValue){
    String script = "if redis.call('get', KEYS[1]) == ARGV[1] " +
            "then return redis.call('del', KEYS[1]) else return 0 end";
    Object result = jedis.eval(script, 
            Collections.singletonList(lockKey), 
            Collections.singletonList(uniqueValue));
    if (result.equals(1)) {
        return true;
    }
    return false;
}

So foolproof it? Obviously not!

On the surface, this method seems to be very effective, but there is a problem here: there is a single point of failure in our system architecture, if Redis the master node goes down how to do it? Some might say: Add a slave node! When a slave master is down on the list!

But in fact, this program is obviously not feasible because Redis replication is asynchronous. for example:

  1. A thread in the master node to get a lock.
  2. master node before the creation of the key A write down the slave.
  3. slave becomes a master node.
  4. Thread B and A also has been held by the same lock. (Information because the original slave there yet A lock is held)

Of course, in some scenarios this program without any problems, such as business model allows the situation while holding the lock, then the use of this program is also not a bad idea.

Way of example, a service has two service instances: A and B, and the initial situation A and acquire the resource lock operation (can be assumed that this operation is resource intensive), B is not acquired lock without performing any operation, this when B a can be seen as a hot standby. When abnormal A, B can be "positive." When an exception occurs the lock, such as Redis master goes down, then B while holding the lock may operate and resources, if the result of the operation is idempotent (or otherwise), then this solution may be used. Here the introduction of distributed lock service allows to avoid double counting under normal circumstances and a waste of resources.

To deal with this, antriez Redlock proposed algorithm. The main idea Redlock algorithm is: Suppose we have N Redis master nodes, which are completely independent, we can use to get in front of the program to lock and unlock the front of a single Redis master node, if we can in general within a reasonable range or N / 2 + 1 locks, then we can think successfully obtained a lock, and vice versa does not acquire a lock (which can be compared Quorum model). Although the principle Redlock well understood, but its internal implementation details are very complex, many factors to consider, the specific content can refer to: https://redis.io/topics/distlock. Specific use related Redlock can refer to my previous two articles reprinted " Redis distributed locks for Best of " and " N postures Redission achieve Redis distributed lock ."

Redlock algorithm is not a "silver bullet", in addition to his condition a bit harsh, but the algorithm itself is questioned. Security issues on Redis distributed lock between distributed systems expert Martin Kleppmann and Redis authors antirez on an argument occurred. The content of the debate as follows:

Martin Kleppmann published a blog, called "How to do distributed locking", address: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html. Martin talked about a lot of fundamental problems in distributed systems (especially asynchronous distributed computing model) In this article, distributed systems for practitioners is very worth reading.

Martin's article is published in 2016-02-08 this day, but according to Martin that he put before the draft was sent to antirez published article one week before review, and among them were discussed by email . I do not know Martin has not really expected, antirez Soon, Martin published in the article out of the reaction to the matter the next day, antirez posted on his blog on the matter for his rebuttal article, called " is Redlock safe? ", addresses http://antirez.com/news/101.

This is a contest of strength between the master. antirez This article is also very clear regulations, and intermediate involves a lot of details. antirez think, Martin articles for Redlock criticism can be summarized as two aspects (in two parts corresponding to the front and rear and Martin article):

  • Distributed Lock function with automatic expiration, it must provide some kind of fencing mechanisms to ensure genuine exclusive protection of shared resources. Redlock not provide such a mechanism.
  • Redlock built on a less secure system model. It is assumed that for the timekeeping system (timing assumption) have relatively strong requirements, and these requirements in real systems is not guaranteed.

antirez for both aspects to refute.

First, fencing mechanism. antirez for this argument questioned the way Martin: Now that there is a fencing mechanism in case of failure of the lock can continue to maintain exclusive access to a resource, then why use a distributed lock and also requires it to provide it strong security guarantee? Even say the least, Redlock although not provide spoken Martin incremented fencing token, but the random string (my_random_value) generated using Redlock can achieve the same effect. Although this is not a random string of incremental, but it is unique, it can be called unique token.

Then, antirez retort is concentrated on the second aspect: the algorithm on the model assumptions (Timing) in terms of chronograph. In our previous analysis of the article also mentioned Martin, Martin Redlock think the situation will fail there are three: 1 clock generation jump; 2 long GC pause; 3 long network latency...

antirez certainly aware of three cases of the most deadly Redlock is actually the first point: clock generation jump. Once this happens, Redlock is not working properly. For the latter two cases it is, Redlock when the original design has been taken into account, there is a certain degree of immunity to the consequences they cause. So, antirez next focus is illustrated by the proper operation and maintenance, you can avoid a major beating the clock, but the clock Redlock requirements in reality the system is fully meet the.

Immortals fight, we stand next to look like. Despite this level, in understanding Redlock algorithm to understand "each node completely independent" concept. Redis itself there are several deployment modes: single mode, master-slave mode, Sentinel mode, the cluster model. For example, the use of cluster deployment mode, if you need five nodes, you need to deploy five Redis Cluster cluster. Obviously, this requires that each master node are independent Redlock algorithm a little harsh conditions, it takes more use of resources, and for each node to request additional overhead caused by a lock can not be ignored. Unless there is real demand business applications, or have the resources can be reused.

Use Redis distributed lock does not foolproof. In general, Redis distributed lock has the advantage of performance, but if you want to take into account the reliability, then Zookeeper, etcd of these components will be higher than Redis. Of course, in the right circumstances based database implementation of distributed lock would be more appropriate, refer to " database implementations based on distributed lock ."

But it is in terms of reliability, there is no component is completely reliable, the value lies not only in appearance programmers how to utilize these components, but rather how to build a reliable system based on these unreliable components.

Or the old saying goes, to choose which programs, appropriate the most important.

References:

  1. https://redis.io/topics/distlock
  2. https://www.jianshu.com/p/7e47a4503b87
  3. http://ifeve.com/redis-lock/
  4. https://www.cnblogs.com/linjiqin/p/8003838.html
  5. http://zhangtielei.com/posts/blog-redlock-reasoning.html
  6. http://zhangtielei.com/posts/blog-redlock-reasoning-part2.html

Horizontal


We welcome the support of new work: "In-depth understanding of Kafka: the core design principles and practice" and "RabbitMQ practical guide", while welcoming the attention of the author micro-channel public number: Zhu servant of the blog.

Guess you like

Origin blog.csdn.net/u013256816/article/details/93305532