[Java Interview] Several common distributed locks

foreword

With the development of the Internet, there are more and more scenarios with high concurrency and massive processing. In order to achieve a highly available and scalable system, distributed is often used, which avoids single point of failure and bottlenecks such as cpu and memory of ordinary computers.

However, distributed systems also bring about data consistency problems, such as users snapping up seckill products and multiple machines executing together and oversold. Some students tend to confuse distributed locks with thread safety. Thread safety refers to collaboration between threads. If distributed locks are required for collaboration between multiple processes, this article summarizes several common distributed locks.

database based

Pessimistic Locks - Transactions

For example, in a scenario where a user snaps up a seckill product, multiple machines have received the snap-purchase request, and multiple database operations such as obtaining inventory, judging availability, user payment, and deducting inventory can be put into one transaction, so that when one machine and database Establishing a link requests the transaction of snapping up a commodity, and other machines can only operate the database after this machine completes the request. In practical application scenarios, inventory and transaction are often two independent systems. At this time, the transaction is a distributed transaction, and two-stage and three-stage commits are required.

Advantages: It is a relatively safe implementation method.

Disadvantages: The overhead is intolerable in high concurrency scenarios. Database deadlocks are prone to occur.

Optimistic locking - based on version number

Optimistic locking is often used in distributed systems to perform update operations on a specific table in the database. Considering the scenario of online seat selection, users A and B select a seat in a certain movie at the same time, and both set the seat status to sold.

Imagine this execution sequence:

1. User A judges that the seat is unsold;

2. User B judges that the seat is unsold;

3. User A executes update and the seat is sold;

4. User B executes update and the seat is sold.

In this case, the same seat will be sold twice. The solution is to add a version number field to this database table. Read the version number in the current database table before executing the operation, and put the version number in the where statement when executing the update statement. If the record is updated, it means success, if there is no update record, it means the update failed.

Execution sequence with optimistic locking added:

1. User A queries the seat and finds that the seat is unsold and the version number is 5;

2. User B queries the seat and finds that the seat is unsold and the version number is 5;

3. User A executes the update statement to update the seat status to sold and the version number to 6;

4. When user B executes the update statement, the record version number of this seat is 6, and there is no record of this seat with version number 5, and the execution fails.

Advantages: The performance of optimistic locks is higher than that of pessimistic locks, and deadlocks are not prone to occur.

Disadvantages: Optimistic locking can only lock the data of one table. If it is necessary to add distributed locks to data operations on multiple tables, optimistic locking based on version numbers cannot be done.

based on memcached

memcached can be locked based on the add command. The add command of memcached means that if there is this key, the add command fails, and if there is no such key, the add command succeeds. And memcached supports the add atomic operation that sets the expiration time. Only one concurrent add of the same key will succeed.

The idea of ​​adding a distributed lock based on the add instruction of memcached is: define a key as the key of the distributed lock, if adding a key with an expiration time is successful, execute the corresponding business operation, and judge whether the lock has expired after the execution. Do not delete the lock, delete the lock if the lock has not expired. The expiration time is to prevent the machine from being down, and the lock cannot be released all the time.

Many people based on the distributed lock implemented by memcached do not judge whether the lock has expired, and the following problems will occur when the lock is deleted directly after executing the corresponding business operation.

Imagine this execution sequence:

1. Machine A successfully adds a key with an expiration time;

2. Machine A has a long pause when performing business operations, such as a long GC pause;

3. Machine A has not recovered in a long pause, the lock has expired, and machine B successfully adds a lock with an expired time;

4. At this time, machine A recovers from a long pause, performs the corresponding business operations, and deletes the lock of machine Badd;

5. At this time, the business operation of machine B is performed without lock protection.

However, memcached does not provide an operation for judging whether the key exists. It needs to rely on the clock when locking is subtracted from the clock when the business operation is completed to obtain the execution time, and compare the execution time with the expiration time of the lock. Or set the value corresponding to the lock key to the clock of the current time plus the expiration time, and compare the value of the lock key obtained after the corresponding business operation is performed with the current clock.

Note: The expiration time must be longer than the execution time of the business operation.

Advantages: Higher performance than database-based implementations.

based on redis

redis提供了setNx原子操作。基于redis的分布式锁也是基于这个操作实现的,setNx是指如果有这个key就set失败,如果没有这个key则set成功,但是setNx不能设置超时时间。

基于redis组成的分布式锁解决方案为:

1、setNx一个锁key,相应的value为当前时间加上过期时间的时钟;

2、如果setNx成功,或者当前时钟大于此时key对应的时钟则加锁成功,否则加锁失败退出;

3、加锁成功执行相应的业务操作(处理共享数据源);

4、释放锁时判断当前时钟是否小于锁key的value,如果当前时钟小于锁key对应的value则执行删除锁key的操作。

注:这对于单点的redis能很好地实现分布式锁,如果redis集群,会出现master宕机的情况。如果master宕机,此时锁key还没有同步到slave节点上,会出现机器B从新的master上获取到了一个重复的锁。

设想以下执行序列:

1、机器AsetNx了一个锁key,value为当前时间加上过期时间,master更新了锁key的值;

2、此时master宕机,选举出新的master,新的master正同步数据;

3、新的master不含锁key,机器BsetNx了一个锁key,value为当前时间加上过期时间;

这样机器A和机器B都获得了一个相同的锁;解决这个问题的办法可以在第3步进行优化,内存中存储了锁key的value,在执行访问共享数据源前再判断内存存储的锁key的value与此时redis中锁key的value是否相等如果相等则说明获得了锁,如果不相等则说明在之前有其他的机器修改了锁key,加锁失败。同时在第4步不仅仅判断当前时钟是否小于锁key的value,也可以进一步判断存储的value值与此时的value值是否相等,如果相等再进行删除。

此时的执行序列:

1、机器AsetNx了一个锁key,value为当前时间加上过期时间,master更新了锁key的值;

2、此时,master宕机,选举出新的master,新的master正同步数据;

3、机器BsetNx了一个锁key,value为此时的时间加上过期时间;

4、当机器A再次判断内存存储的锁与此时的锁key的值不一样时,机器A加锁失败;

5、当机器B再次判断内存存储的锁与此时的锁key的值一样,机器B加锁成功。

注:如果是为了效率而使用分布式锁,例如:部署多台定时作业的机器,在同一时间只希望一台机器执行一个定时作业,在这种场景下是允许偶尔的失败的,可以使用单点的redis分布式锁;如果是为了正确性而使用分布式锁,最好使用再次检查的redis分布式锁,再次检查的redis分布式锁虽然性能下降了,但是正确率更高。

基于zookeeper

基于zookeeper的分布式锁大致思路为:

1、客户端尝试创建ephemeral类型的znode节点/lock;

2、如果客户端创建成功则加锁成功,可以执行访问共享数据源的操作,如果客户端创建失败,则证明有别的客户端加锁成功,此次加锁失败;

3、如果加锁成功当客户端执行完访问共享数据源的操作,则删除znode节点/lock。

基于zookeeper实现分布式锁不需要设置过期时间,因为ephemeral类型的节点,当客户端与zookeeper创建的session在一定时间(session的过期时间内)没有收到心跳,则认为session过期,会删除客户端创建的所有ephemeral节点。

但是这样会出现两个机器共同持有锁的情况。设想以下执行序列。

1、机器A创建了znode节点/lock;

2、机器A执行相应操作,进入了较长时间的GC pause;

3、机器A与zookeeper的session过期,相应的/lock节点被删除;

4、机器B创建了znode节点/lock;

5、机器A从较长的停顿中恢复;

6、此时机器A与机器B都认为自己获得了锁。

与基于redis的分布式锁,基于zookeeper的锁可以增加watch机制,当机器创建节点/lock失败的时候可以进入等待,当/lock节点被删除的时候zookeeper利用watch机制通知机器。但是这种增加watch机制的方式只能针对较小客户端集群,如果较多客户端集群都在等待/lock节点被删除,当/lock节点被删除时,zookeeper要通知较多机器,对zookeeper造成较大的性能影响。这就是所谓的羊群效应。

优化的大致思路为:

1、客户端调用创建名为“lock/number_lock_”类型为EPHEMERAL_SEQUENTIAL的节点;

2、客户端获取lock节点下所有的子节点;

3、判断自己是否是序号最小的节点的,如果是最小的节点则加锁成功,如果不是序号最小的节点,则在比自己小的并且最接近的节点注册监听;

4、当被关注的节点删除后,再次获取lock节点下的所有子节点,判断是否是最小序号,如果是最小序号则加锁成功;

优化后的思路,虽然能一定程度避免羊群效应,但是也不能避免两个机器共同持有锁的情况。

工作一到五年的程序员朋友面对目前的技术无从下手,感到很迷茫可以加群744677563,里面有阿里Java高级大牛直播讲解知识点,分享知识,课程内容都是各位老师多年工作经验的梳理和总结,带着大家全面、科学地建立自己的技术体系和技术认知!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325771236&siteId=291194637