Redis and zk distributed lock

Why use distributed locks?

Before discussing this issue, let's first look at a business scenario:

System A is an e-commerce system. It is currently deployed on a machine. There is an interface for users to place orders in the system. However, users must check the inventory before placing an order to ensure that the inventory is sufficient before placing an order for the user.

Because the system has a certain degree of concurrency, the inventory of the goods will be stored in redis in advance, and the redis inventory will be updated when the user places an order.

At this time, the system architecture is as follows:

But this will cause a problem : if at a certain moment, the inventory of a certain product in redis is 1, then two requests come at the same time, and one of the requests is executed to step 3 of the above figure, and the inventory of the update database is 0. , But step 4 has not been executed yet.

And another request is executed to step 2, and it is found that the inventory is still 1, so we continue to execute step 3.

The result of this is that two products are sold, but in fact there is only one in stock.

Obviously something is wrong! This is a typical inventory oversold problem

At this point, we can easily think of a solution: use a lock to lock steps 2, 3, and 4, so that after they are executed, another thread can come in and execute step 2.

According to the above figure, when performing step 2, use synchronized or ReentrantLock provided by Java to lock, and then release the lock after step 4 is executed.

In this way, the three steps 2, 3, and 4 are "locked", and multiple threads can only be executed serially.

But the good times didn't last long, the concurrency of the entire system soared, and one machine could not hold it. Now we need to add a machine, as shown below:

After adding the machine, the system becomes as shown in the picture above, my God!

Assuming that two user requests come at the same time, but fall on different machines, then the two requests can be executed at the same time, or there will be an oversold inventory problem.

why? Because the two A systems in the above figure are running in two different JVMs, the locks they add are only valid for threads belonging to their own JVM, and are invalid for threads of other JVMs.

Therefore, the problem here is: the native lock mechanism provided by Java fails in a multi-machine deployment scenario

This is because the locks added by the two machines are not the same lock (the two locks are in different JVMs).

Then, as long as we ensure that the locks added by the two machines are the same, won't the problem be solved?

At this time, it is time for the distributed lock to make its debut. The idea of ​​distributed lock is:

Provide a global and unique "thing" for acquiring locks in the entire system , and then each system asks this "thing" to get a lock when it needs to lock, so that what different systems get can be considered as The same lock.

As for this "thing", it can be Redis, Zookeeper, or a database.

The text description is not very intuitive, let's look at the picture below:

Through the above analysis, we know that in the case of an inventory oversold scenario, the use of Java's native lock mechanism cannot guarantee thread safety in the case of a distributed deployment system, so we need to use a distributed lock solution.

So, how to implement distributed locks? Then look down!

Realize distributed lock based on Redis

 

The above analysis is why distributed locks are used. Here we will take a look at how to deal with distributed locks when they land.

The most common solution is to use Redis for distributed locks

The idea of ​​using Redis for distributed locks is roughly like this: set a value in redis to indicate that the lock is added, and then delete the key when the lock is released.

The specific code is like this:

// 获取锁

// NX是指如果key不存在就成功,key存在返回false,PX可以指定过期时间

SET anyLock unique_value NX PX 30000



// 释放锁:通过执行一段lua脚本

// 释放锁涉及到两条指令,这两条指令不是原子性的

// 需要用到redis的lua脚本支持特性,redis执行lua脚本是原子性的

if redis.call("get",KEYS[1]) == ARGV[1] then

return redis.call("del",KEYS[1])

else

return 0

end

There are several main points of this approach:

  • Must use the SET key value NX PX milliseconds command

    If you don’t use it, set the value first, and then set the expiration time. This is not an atomic operation, and it may crash before the expiration time is set, which will cause a deadlock (key exists forever)

  • The value must be unique

    This is for when unlocking, you need to verify that the value is consistent with the locked one before deleting the key.

    This is to avoid a situation: suppose that A acquires the lock and the expiration time is 30s. At this time, after 35s, the lock has been automatically released, and A releases the lock, but at this time, B may acquire the lock. Client A cannot delete B's lock.

In addition to considering how the client implements distributed locks, the deployment of redis also needs to be considered.

There are 3 ways to deploy redis:

  • Stand-alone mode

  • master-slave + sentinel election mode

  • redis cluster mode

The disadvantage of using redis as a distributed lock is that if you use a stand-alone deployment mode, there will be a single point of problem, as long as redis fails. Locking will not work.

In the master-slave mode, only one node is locked when locking. Even if high availability is made through sentinel, if the master node fails and a master-slave switch occurs, the problem of lock loss may occur at this time.

Based on the above considerations, in fact, the author of redis also considered this problem. He proposed a RedLock algorithm. The meaning of this algorithm is roughly like this:

Assuming that the deployment mode of redis is redis cluster, there are a total of 5 master nodes, and a lock is obtained through the following steps:

  • Get the current timestamp in milliseconds

  • Try to create a lock on each master node in turn, the expiration time is set to be shorter, usually tens of milliseconds

  • Try to establish a lock on most nodes, for example, 5 nodes require 3 nodes (n / 2 +1)

  • The client calculates the time to establish the lock, if the time to establish the lock is less than the timeout time, even if the establishment is successful

  • If the lock establishment fails, then delete the lock in turn

  • As long as someone else establishes a distributed lock, you have to keep polling to try to acquire the lock

However, this kind of algorithm is still quite controversial, and there may be many problems, and there is no guarantee that the locking process will be correct.

Another way: Redisson

 

In addition, to implement Redis distributed locks, in addition to their own implementation based on the redis client native api, you can also use an open source framework: Redission

Redisson is an enterprise-level open source Redis Client, which also provides support for distributed locks. I also highly recommend it to everyone, why?

Recall what I said above, if you write your own code to set a value through redis, you set it through the following command.

  • SET anyLock unique_value NX PX 30000

The timeout period set here is 30s. If I have not completed the business logic for more than 30s, the key will expire and other threads may acquire the lock.

In this way, the first thread has not finished executing the business logic, and thread safety issues will also occur when the second thread comes in. So we still need to maintain this expiration time, which is too troublesome~

Let's take a look at how redisson is implemented? First feel the coolness of using redission:

Config config = new Config();

config.useClusterServers()

.addNodeAddress("redis://192.168.31.101:7001")

.addNodeAddress("redis://192.168.31.101:7002")

.addNodeAddress("redis://192.168.31.101:7003")

.addNodeAddress("redis://192.168.31.102:7001")

.addNodeAddress("redis://192.168.31.102:7002")

.addNodeAddress("redis://192.168.31.102:7003");


RedissonClient redisson = Redisson.create(config);



RLock lock = redisson.getLock("anyLock");

lock.lock();

lock.unlock();

It's that simple, we only need to use lock and unlock in its api to complete distributed locks. He helped us consider a lot of details:

  • All redisson instructions are executed through lua scripts, redis supports atomic execution of lua scripts

  • Redisson sets the default expiration time of a key to 30s. What if a client holds a lock for more than 30s?

    There is a watchdogconcept in redisson, which is translated as watchdog. It will help you set the key timeout time to 30s every 10 seconds after you acquire the lock.

    In this case, even if the lock is held all the time, the key will not expire and other threads will get the lock.

  • Redisson's "watchdog" logic ensures that no deadlock occurs.

    (If the machine is down, the watchdog will disappear. At this time, the key expiration time will not be extended, and it will automatically expire after 30s, and other threads can acquire the lock)

The implementation code is posted here:

 
// 加锁逻辑

private <T> RFuture<Long> tryAcquireAsync(long leaseTime, TimeUnit unit, final long threadId) {

    if (leaseTime != -1) {

        return tryLockInnerAsync(leaseTime, unit, threadId, RedisCommands.EVAL_LONG);

    }

    // 调用一段lua脚本,设置一些key、过期时间

    RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout(), TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);

    ttlRemainingFuture.addListener(new FutureListener<Long>() {

        @Override

        public void operationComplete(Future<Long> future) throws Exception {

            if (!future.isSuccess()) {

                return;

            }


            Long ttlRemaining = future.getNow();

            // lock acquired

            if (ttlRemaining == null) {

                // 看门狗逻辑

                scheduleExpirationRenewal(threadId);

            }

        }

    });

    return ttlRemainingFuture;

}



<T> RFuture<T> tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {

    internalLockLeaseTime = unit.toMillis(leaseTime);


    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command,

              "if (redis.call('exists', KEYS[1]) == 0) then " +

                  "redis.call('hset', KEYS[1], ARGV[2], 1); " +

                  "redis.call('pexpire', KEYS[1], ARGV[1]); " +

                  "return nil; " +

              "end; " +

              "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +

                  "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +

                  "redis.call('pexpire', KEYS[1], ARGV[1]); " +

                  "return nil; " +

              "end; " +

              "return redis.call('pttl', KEYS[1]);",

                Collections.<Object>singletonList(getName()), internalLockLeaseTime, getLockName(threadId));

}




// 看门狗最终会调用了这里

private void scheduleExpirationRenewal(final long threadId) {

    if (expirationRenewalMap.containsKey(getEntryName())) {

        return;

    }


    // 这个任务会延迟10s执行

    Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {

        @Override

        public void run(Timeout timeout) throws Exception {


            // 这个操作会将key的过期时间重新设置为30s

            RFuture<Boolean> future = renewExpirationAsync(threadId);


            future.addListener(new FutureListener<Boolean>() {

                @Override

                public void operationComplete(Future<Boolean> future) throws Exception {

                    expirationRenewalMap.remove(getEntryName());

                    if (!future.isSuccess()) {

                        log.error("Can't update lock " + getName() + " expiration", future.cause());

                        return;

                    }


                    if (future.getNow()) {

                        // reschedule itself

                        // 通过递归调用本方法,无限循环延长过期时间

                        scheduleExpirationRenewal(threadId);

                    }

                }

            });

        }


    }, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);


    if (expirationRenewalMap.putIfAbsent(getEntryName(), new ExpirationEntry(threadId, task)) != null) {

        task.cancel();

    }

}
 
  1. 另外,redisson还提供了对redlock算法的支持,

  2. 它的用法也很简单:

  3. RedissonClient redisson = Redisson.create(config);

  4. RLock lock1 = redisson.getFairLock("lock1");

  5. RLock lock2 = redisson.getFairLock("lock2");

  6. RLock lock3 = redisson.getFairLock("lock3");

  7. RedissonRedLock multiLock = new RedissonRedLock(lock1, lock2, lock3);

  8. multiLock.lock();

  9. multiLock.unlock();

  10. 小结:

  11. 本节分析了使用redis作为分布式锁的具体落地方案

  12. 以及其一些局限性

  13. 然后介绍了一个redis的客户端框架redisson,

  14. 这也是我推荐大家使用的,

  15. 比自己写代码实现会少care很多细节。

Implement distributed lock based on zookeeper

There are two types of locks: shared locks (read locks) and exclusive locks (write locks). Read locks: when one thread acquires the read lock, other threads can also acquire the read lock, but before the read lock is completely released, other threads Cannot acquire write lock. Write lock: When a thread acquires a write lock, other threads cannot acquire a read lock and a write lock .

Zookeeper has a node type called temporary sequence number node, which will automatically create temporary nodes according to the sequence number , which can be used as a distributed lock implementation tool.

Read lock acquisition principle: 1. Create a temporary serial number node according to the resource id: /lock/mylockR0000000005 Read 2. Obtain all child nodes under /lock, and determine whether all nodes smaller than him are read locks. If it is a read lock, acquire The lock is successful 3. If it is not, it will block and wait and monitor its previous node. 4. When the previous node is changed , perform the second step again.

The principle of write lock acquisition: 1. Create a temporary serial number node according to the resource id: /lock/mylockW0000000006 Write 2. Obtain all child nodes under /lock, and judge whether the smallest node is yourself , if it is, the lock is successful 3. If not, is blocked waiting, listening their previous node 4, a section of the current surface point changed when re-execute the second step.

The phenomenon can be seen more clearly through a picture: the first is the write lock, because the write lock is not the front node, so it is blocked, and the 008 read lock is blocked because not all of the previous read locks are read locks.

 

Zookeeper is a centralized service that provides configuration management, distributed collaboration, and naming.

The zk model is like this: zk contains a series of nodes, called znodes, just like a file system, each znode represents a directory, and then znode has some characteristics:

  • Ordered node: If there is currently a parent node /lock, we can create child nodes under this parent node;

    Zookeeper provides an optional ordering feature. For example, we can create a child node "/lock/node-" and specify the order, then zookeeper will automatically add an integer number according to the current number of child nodes when generating child nodes

    In other words, if it is the first child node created, then the generated child node is /lock/node-0000000000, the next node is /lock/node-0000000001, and so on.

  • Temporary node: The client can establish a temporary node. Zookeeper will automatically delete the node after the session ends or the session times out.

  • Event monitoring: When reading data, we can set event monitoring for the node at the same time. When the node data or structure changes, zookeeper will notify the client. Currently zookeeper has the following four events:

    • Node creation

    • Node delete

    • Node data modification

    • Child node change

Based on some of the above characteristics of zk, we can easily come up with a landing solution using zk to implement distributed locks:

  1. Using zk's temporary nodes and ordered nodes, each thread acquiring a lock is to create a temporary ordered node in zk, such as in the /lock/ directory.

  2. After the node is successfully created, get all the temporary nodes in the /lock directory, and then determine whether the node created by the current thread is the node with the smallest sequence number of all nodes

  3. If the node created by the current thread is the node with the smallest sequence number of all nodes, it is considered that the lock is acquired successfully.

  4. If the node created by the current thread is not the node with the smallest serial number of all nodes, an event listener is added to the previous node of the node serial number.

    For example, if the node serial number obtained by the current thread is /lock/003, and then all the node lists are [/lock/001,/lock/002,/lock/003], then /lock/002add an event listener to this node.

If the lock is released, the node with the next sequence number will be awakened, and then step 3 will be executed again to determine whether its own node sequence number is the smallest.

Such /lock/001release, /lock/002listening to time, then the node is set [/lock/002,/lock/003], the /lock/002minimum node ID, the acquired lock.

The whole process is as follows:

The specific implementation idea is like this. As for how to write the code, it's more complicated here and I won't post it.

Introduction to Curator

table of Contents

Why use distributed locks?

Realize distributed lock based on Redis

Another way: Redisson

Implement distributed lock based on zookeeper

Introduction to Curator

 

Comparison of the advantages and disadvantages of the two schemes

Suggest


Curator is an open source client of zookeeper and also provides the implementation of distributed locks.

His usage is also relatively simple:

InterProcessMutex interProcessMutex = new InterProcessMutex(client,"/anyLock");

interProcessMutex.acquire();

interProcessMutex.release();

其实现分布式锁的核心源码如下:
private boolean internalLockLoop(long startMillis, Long millisToWait, String ourPath) throws Exception

{

    boolean  haveTheLock = false;

    boolean  doDelete = false;

    try {

        if ( revocable.get() != null ) {

            client.getData().usingWatcher(revocableWatcher).forPath(ourPath);

        }


        while ( (client.getState() == CuratorFrameworkState.STARTED) && !haveTheLock ) {

            // 获取当前所有节点排序后的集合

            List<String>        children = getSortedChildren();

            // 获取当前节点的名称

            String              sequenceNodeName = ourPath.substring(basePath.length() + 1); // +1 to include the slash

            // 判断当前节点是否是最小的节点

            PredicateResults    predicateResults = driver.getsTheLock(client, children, sequenceNodeName, maxLeases);

            if ( predicateResults.getsTheLock() ) {

                // 获取到锁

                haveTheLock = true;

            } else {

                // 没获取到锁,对当前节点的上一个节点注册一个监听器

                String  previousSequencePath = basePath + "/" + predicateResults.getPathToWatch();

                synchronized(this){

                    Stat stat = client.checkExists().usingWatcher(watcher).forPath(previousSequencePath);

                    if ( stat != null ){

                        if ( millisToWait != null ){

                            millisToWait -= (System.currentTimeMillis() - startMillis);

                            startMillis = System.currentTimeMillis();

                            if ( millisToWait <= 0 ){

                                doDelete = true;    // timed out - delete our node

                                break;

                            }

                            wait(millisToWait);

                        }else{

                            wait();

                        }

                    }

                }

                // else it may have been deleted (i.e. lock released). Try to acquire again

            }

        }

    }

    catch ( Exception e ) {

        doDelete = true;

        throw e;

    } finally{

        if ( doDelete ){

            deleteOurPath(ourPath);

        }

    }

    return haveTheLock;

}

 

In fact, the underlying principle of curator to implement distributed locks is similar to the analysis above. Here we use a picture to describe its principle in detail:

summary:

This section introduces zookeeperr's implementation of distributed locks and the basic use of zk's open source client, and briefly introduces its implementation principles.

Comparison of the advantages and disadvantages of the two schemes

 

After learning the two implementation schemes of distributed locks, this section needs to discuss the advantages and disadvantages of the implementation schemes of redis and zk.

For redis distributed locks, it has the following disadvantages:

  • The way it acquires the lock is simple and rude. If it fails to acquire the lock, it tries to acquire the lock continuously, which consumes performance.

  • In addition, the design positioning of redis determines that its data is not strongly consistent. In some extreme cases, problems may occur. The lock model is not robust enough

  • Even if it is implemented using the redlock algorithm, in some complex scenarios, there is no guarantee that its implementation will be 100% without problems. For discussions on redlock, see How to do distributed locking.

  • Redis distributed locks actually need to keep trying to acquire locks by themselves, which consumes performance.

But on the other hand, the use of redis to implement distributed locks is very common in many enterprises, and in most cases, you will not encounter the so-called "extremely complex scenarios"

Therefore, using redis as a distributed lock is also a good solution. The most important point is that redis has high performance and can support high-concurrency acquisition and release lock operations.

For zk distributed locks:

  • Zookeeper's natural design positioning is distributed coordination and strong consistency. The lock model is robust, easy to use, and suitable for distributed locks.

  • If you can't get the lock, you only need to add a listener, instead of polling all the time, the performance consumption is small.

But zk also has its shortcomings: If more clients frequently apply for locks and release locks, the pressure on the zk cluster will be greater.

summary:

In summary, redis and zookeeper have their advantages and disadvantages. We can use these issues as reference factors when we make technical selection.

Suggest

 

Through the previous analysis, two common solutions for implementing distributed locks: redis and zookeeper, they each have their own merits. How to choose?

Personally, I prefer the lock implemented by zk:

Because redis may have hidden dangers, it may cause data incorrectness. However, how to choose depends on the specific situation in the company.

If the company has zk cluster conditions, zk is preferred, but if the company only has redis clusters, there is no condition to build zk clusters.

Then it can be realized by using redis. In addition, the system designer may choose redis if he considers that the system already has redis, but does not want to introduce some external dependencies again.

This is based on the architecture considerations of the system designer

Guess you like

Origin blog.csdn.net/Baron_ND/article/details/115356917