Distributed Lock understanding of the technical solution

Distributed Lock understanding of the technical solution

weixin_33709364

Foreword

Because in the daily work, the online server is a distributed deployment of multiple, often faced with solving the distributed data consistency problems at the scene, then they would make use of distributed lock to solve these problems.

The first step, its business scenarios:

In my daily to do the project, the current business scenario involves the following:

Scene One: for example, assign tasks scene. In this scenario, because the business is the company's back-office systems, mainly for audit auditors, concurrency is not very high, but the design task allocation rules become active each time the auditors request to pull through, then server tasks assigned from a random pool of selected tasks. The scene you see here will feel relatively simple, but the actual allocation process, as it relates to the pressing problems of user clustering, so more complex than I have described, but here to illustrate the problem, we can keep things simple to understand. So in the course, mainly to avoid the same task simultaneously acquire two auditors to the problem. I ended up using a problem-based distributed database lock resource table to solve.

Scene Two: such as payment scenarios. In this scenario, I offer three to users to protect the privacy of users mobile phone number (these numbers are obtained from the operators at, and real cell phone number look the same), allowing users to select one to make a purchase, the user after the purchase payment, I need the user to select the number assigned to users, but also will have no choice released. In this process, the user number to be screened within a certain time (user screened within the normal time frame) so that the current user has exclusivity for this product, in order to ensure 100% payment can get; and because the product resource pool limited resources, but also to maintain the flow of resources that can not let the resources for a long time been occupied with a user. For the design of the target service on the line when a project is at least capable of supporting peak qps request 300, while in the design process to consider the issue of user experience. I ended up using the memecahed the add () method and problem based distributed database lock resource table to solve.

Scene Three: I have a data service, in the amount of 300 million calls a day, every day by 86,400 seconds calculated qps in 4000, due to the amount of service calls during the day to be significantly higher than at night, so during the day and afternoon peak reached qps 6000, a total of there are four servers, a single qps to be able to reach more than 3,000. I ended up using the redis of setnx () and expire () to solve the problem of distributed lock.

Scene Four: an upgraded version of a scene and the scene two. In this scenario, it does not involve the payment. However, due to a resource allocation process, the need to maintain consistency involve local increases, and designed to reach a peak qps500, so we need to further optimize the scene. I ended up using the redis of setnx (), expire () and issue-based distributed lock database table to solve.

See here, whether you think I proposed business scenario qps is large enough, want you to read on, because no matter what you're in a company, the beginning of the work may need to start from the most simple. Do not Tie Li and Tencent's business scene qps how big, because in such a large scene you may not be personally involved in the project, may not be personally involved in the project is the core of the designer, is the core of the designer may not be able to design their own. If we can really meet the above three, you can not close the page Kanla, if not, the proposal is still reading, I have to say the lack of places open to suggestions, I mean a good place, a point I also want to give praise or comment it can be regarded as the greatest encouragement to me ha.

The second step, distributed lock the solution:

  1. First thing clear, one might ask whether you can be considered ReentrantLock to achieve, but in fact to achieve when there is a problem, ReentrantLock the lock and unlock requirements must be carried out in the same thread, and distributed applications, lock and unlock are two unrelated requests, and therefore certainly not the same thread, and therefore unusable ReentrantLock.
  2. Based on the database tables do optimistic locking for distributed lock.
  3. Of using memcached add () method, a distributed lock.
  4. Of using memcached CAS () method for distributed lock. (uncommonly used)
  5. Using the redis setnx (), expire () method for distributed lock.
  6. Using the redis setnx (), get (), getset () method for distributed lock.
  7. Redis use the watch, multi, exec command, for distributed lock. (uncommonly used)
  8. Use zookeeper, for distributed lock. (uncommonly used)

The third step is to do optimistic locking based on a database resource table for distributed lock:

1. First, the meaning of optimistic locking:

Most are based on data version (version) of the recording mechanism to achieve. What data version number? Data is the addition of a version identifier, the version of the database table based solutions generally be achieved when reading out the data by adding a "version" field in the database table, read together this version, after update, this version number is incremented.

During the update, version numbers will be compared, if it is consistent, has not changed, it will successfully execute this operation; if inconsistent with the version number, the update will fail.

2. have a certain understanding of the meaning of the optimistic lock, combined with specific examples, the next we to deduce how we should deal with:

(1) Suppose we have a resource table, as shown below: t_resource, which has six fields id, resoource, state, add_time, update_time, version, respectively, primary key, resource allocation state (1 unassigned 2 is distribution), resource creation time, resources updated resource data version number.

(2) Suppose we now we id = 5780 assigned to this data, then the non-distributed case scenario, we generally first check out the state data = 1 (unallocated), and from which data can be selected by a the following statements, if you can update is successful, then that has occupied this resource.

(3) If in a distributed scenario, because the database update operations are atomic atom is, in fact, on top of this statement in theory, there is no problem, but if this statement in a typical "ABA", we are unable to perception. One might ask what is the "ABA" problem? You can search on the Internet, here I say simple point is, if you select the first and second update process, because the two operations are non-atomic, so this process, if there is one thread, first taking up resources (state = 2), and then release the resources (state = 1), the last time you actually perform the update operation is no way of knowing this happened resource changes. Perhaps you will say that you should be okay in said scene, but in actual use, such as a bank account deposit or debit in the process, this situation is more terrifying.

(4) So if you are using optimistic locking us how to solve the problem on top of it?

. A first select operations to query the data version number of the current data, the data such as the current version number is 26: select id, resource, state, version from t_resource where state = 1 andid = 5780;

b. 执行更新操作:update t_resoure set state=2, version=27, update_time=now() where resource=xxxxxx and state=1 and version=26

c. If the above statement is true update to the update affects one row of data, it shows a placeholder success. If you do not update affected one row of data, then the resource has been occupying the others.

3. by 2 to explain, I believe we have to how to do optimistic locking based on a database table has a certain understanding, but there is still need to explain some of the disadvantages based on optimistic locking database tables:

. (1) This mode of operation, so that the original time update, you must become second operation: select a version number; update once. Increase the number of database operations.

(2) If a business process business scene, the more resources are needed to ensure data consistency, then if all resource-based database using the optimistic lock table, it is necessary so that each resource has a resource table, this in actual use, the scene is certainly not satisfied. And these are based on database operations, under high concurrency requirements, the cost of the database connection must be unbearable.

(3). Optimistic locking mechanisms are often based on logical data storage system, and therefore may cause dirty data is updated in the database. In the system design stage, we should fully take into account the possibility of these situations arise, and make the appropriate adjustments, such as the optimistic locking strategy implemented in the database stored procedure, based on the data available only outside this stored procedure update paths, rather than the database table directly open to the public.

4. stresses the implementation and disadvantages of optimistic locking, and is not afraid to feel optimistic locking it? ? ?

Of course not, at the beginning of the article my own business scenarios, scenarios 1 and 2 are part of the scene based on the use of optimistic locking database resource table, we have a good solution to the problem online. So we have to be specific business scenarios according to the technical solution of choice, not just to find a sufficiently complex enough trendy technology solutions to business problems is a good program? ! For example, if I'm in a scene, I do use zookeeper lock, you can do so, but really necessary? ? ? The answer that is not necessary! ! !

A fourth step of using memcached add () method, a distributed lock:

For the memcached add () method to do a distributed lock, the Internet is a relatively common way, but basically you can solve most of the scenarios on your own hand. Before using this method, as long as thoroughly understand the memcached add () and set () difference, and know why can add () method to do a distributed lock just fine. If you do not know the add () and set () method, please Baidu it, this needs its own look.

I would like to note here is another question, when people focus on distributed lock design is good or bad, will focus on the question that whether the deadlock can be avoided? ? ? ! ! !

If you use the memcached add () command to the resource footprint successful, then it is not a finished thing of it? of course not! We need to specify the effective time key is currently added using the add (), and if you do not specify a valid time, under normal circumstances, you can after performing their business, use the delete method to delete the key, which is released the resources used. However, if, after occupying success, memecached own business or server goes down, then the resources will not be released. So by setting key timeout, even during a downtime situation, nor it will tie up resources, avoid deadlocks.

The fifth step, using the memcached cas () method for distributed lock: Slightly

A sixth step, using the redis setnx (), expire () method for distributed lock:

For the redis setnx (), expire () to implement a distributed lock, this program relative to memcached () the add () program, redis dominant is that it supports more data types, and memcached only support a String data types. In addition, both in terms of performance, ease of operation, it is, in fact, there is not much difference, totally your choice, such as the company with which more and more, which you can use.

First, explain setnx () command, setnx meaning is SET if Not Exists, there are two main parameters setnx (key, value). This method is atomic, if the key does not exist, then the current key set successfully, return 1; if the current key already exists, then the current key set fails, returns 0. However, to note that setnx command can not set the timeout key, the key can only be set by the expire ().

Use of specific steps are as follows:

  1. setnx (lockkey, 1) if the return 0, then the occupying failed; if it returns 1, it indicates successful placeholder
  2. expire () command to set the timeout lockkey, in order to avoid deadlock.
  3. After you perform business code, you can delete the key by the delete command.
    This program can actually be addressing the needs of daily work, but from a discuss technical solutions, it may be some perfect place. For example, if the front after performing a successful first step setnx, the expire () command is successful, the phenomenon occurs downtime, then the problem is still deadlock occurs, so if you want to be perfect, you can use redis the setnx (), get () and getset () method to implement a distributed lock.

A seventh step, using the redis setnx (), get (), getset () method for distributed lock:

Background This program is primarily on programs setnx () and expire () for the deadlock problem may exist, made a version optimized.

So first explain these three commands, for setnx () and get () these two commands, do not believe anything more to say. So getset () command? This command has two main parameters getset (key, newValue). This method is atomic, the value of the key set newValue, and return the old value of the original key. The original assumption that key does not exist, then repeatedly execute this command, the effect of the following occur:

  1. getset (key, "value1") returns the value nil key at this time will be set value1
  2. getset (key, "value2") returns the key value at this time value1 is set to value2
  3. And so on!

After describes the command to be used, the use of specific steps are as follows:

  1. setnx (lockkey, the current time + timeout expires), if it returns 1, the lock is acquired successfully;
    if not returns 0 acquired lock, 2 turn.
  2. get (lockkey) Gets the value oldExpireTime, and this value and the current value of the system time, and if less than the current system time, the lock is considered to have timed out, may allow other requests to reacquire the steering 3.
  3. Calculating newExpireTime = current time + time-out expires, then getset (lockkey, newExpireTime) returns the current value of currentExpireTime lockkey.
  4. Judgment currentExpireTime and oldExpireTime are equal, if they are equal, indicating that the current getset set successfully, to get a lock. If not equal, indicating that the lock has been requested else get away, then the current request failure can return directly, or continue to try again.
  5. After acquiring the lock, the current thread can start your own business process, when the process is finished, compare your processing time and for a time-out lock set, if less than the timeout lock set, then executed directly delete the lock is released; if more than lock timeout setting, the locks do not need to be processed.

Note: When using this program I had on the line is no problem, so when he started writing this article also believes that there is no problem. But as of 2017.05.13 (Saturday), in their own time to revisit this article, read a lot of comments of friends next article, I found two problems are concentrated:

Question 1: In the "get (lockkey) Gets the value oldExpireTime" This operation "getset (lockkey, newExpireTime)" between this operation, if there are N threads get the same oldExpireTime operations acquired after, then go getset, will will not return newExpireTime are the same, it will be successful, and then have to get a lock? ? ?

I think that this program is not the existence of this problem. There are two basis: First, redis is a single-threaded process model, serial execution order. Second, under the prerequisite of serial execution, then returned currentExpireTime getset compares with oldExpireTime are equal.

Problem 2: In the "get (lockkey) Gets the value oldExpireTime" the operation "getset (lockkey, newExpireTime)" between the operation, if there are N threads get the same operation acquired after oldExpireTime, then go GetSet, assuming the first one thread acquires the lock success, other lock acquisition failed, but failed to acquire the lock thread it does initiated getset command is executed, it will not cause the lock timeout thread to acquire the lock first set has been extended? ? ?

I believe that this program may be the problem does exist. But I personally think that this smile of error is negligible, but flawed on technical solutions, we can choose their own Ha.

Eighth step, using the redis watch, multi, exec command for distributed locking:

The ninth step, using ZooKeeper, a distributed lock:

The tenth step, summary

In summary, the first article on distributed lock me finish here, in the article describes the main daily project will be used to compare the four options, we mastered these four programs, in fact, in their daily work distributed lock can solve many of the problems in the business scenario. From the very beginning of my own actual use, it can also be seen, so to speak entirely there is a certain basis. In addition to those three options, I will in the next article on distributed lock in, and then we explore.

Commonly used four scenarios:

  1. Based on the database tables do optimistic locking for distributed lock.
  2. Of using memcached add () method, a distributed lock.
  3. Using the redis setnx (), expire () method for distributed lock.
  4. Using the redis setnx (), get (), getset () method for distributed lock.

Less common but can be used to explore the technical solution:

  1. Of using memcached CAS () method for distributed lock.
  2. Redis use the watch, multi, exec command, for distributed lock.
  3. Use zookeeper, for distributed lock.

Written in the last

Original link: https://shimo.im/docs/f2ajdNJBQJItSobT/

Reproduced in: https: //www.cnblogs.com/Java-no-1/p/11061262.html

Published 17 original articles · won praise 224 · views 290 000 +

Guess you like

Origin blog.csdn.net/cxu123321/article/details/105092497