Baidu Java architects share the technical selection and thinking of distributed locks

 

This article comes from the technical selection and thinking of distributed locks shared by the author and his party on GitChat

Locks and Distributed Locks

In computers, the role of locks is to solve the problem of mutual exclusion of shared resources in a concurrent state, ensuring that only one process/thread can control the resources at the same time.

For example the following situations:

  1. The implementation of file lock is to solve the concurrent problem of different users reading and writing the same file at the same time, preventing the content of the file from being damaged.

  2. Queues implemented using arrays generally need to be locked at the place where the push operation is performed to solve the problem of contention for slots and prevent multiple push conflicts that may result in data loss.

  3. For 12306, the train ticket is his resource. When the ticket is finally released, it needs to be locked to ensure the unique correspondence between the ticket, the person and the seat.

  4. ……

The above example actually includes the traditional stand-alone locks we usually talk about and the distributed locks I want to talk about.

In a stand-alone environment, resource competitors are all from within the machine ((process/thread), so the solution to implement locks only needs to rely on stand-alone resources, such as disks, memory, and registers.

However, in a distributed environment, the living environment of resource competitors is more complicated, and the original solution that relies on a single machine no longer works. At this time, a coordinator recognized by everyone is needed to help solve the competition problem. The coordinator said that It is a distributed lock.

The above example is like a conflict between two employees, which can be resolved as long as the leader of the company comes forward. When two companies have competition and conflicts, they need the judicial organs to come forward, which is the same reason.

Simply put, distributed lock is a means to solve the problem of resource competition in a distributed environment.

Application Scenarios of Distributed Locks

In all distributed environments where resource competition occurs, the coordination of distributed locks is required. In addition to the 12306 ticket issuance described above, there are also applications such as editing issues on shared document platforms, hero selection for King of Glory, and global auto-increment primary keys. . Briefly introduce the usage scenarios of multi-person collaborative editing platforms such as the company's internal Wiki.

Multiplayer online editing in the wiki

Scenario 1: Before the Qingming Festival, the team asked us to register our vacation status on the Wiki, assuming that we recorded our vacation time and contact number on the document id=1. Two classmates A and C started editing at the same time, and A and C submitted the results at the same time, and their documents were empty before submission. How does the service need to handle these two requests? Whose will prevail? Will there be an overwrite phenomenon that causes A's records to be lost?

Scenario 2: Another case, I am classmate Z, and others have already filled it out before me. I have a bad habit. I like to press Ctrl+s 3-5 times continuously when saving, and each Ctrl+s will trigger a requests, but each request is processed for about 1s, but the actual requests are sent out within 20ms.

The problem is the same as above, how to ensure that the additional records are not repeated?

Suppose your storage service and storage architecture are like this:

Baidu Java architects share the technical selection and thinking of distributed locks

The general processing code is as follows:

//Get the file content according to the docid, from the distributed file system, the time is uncontrollable nowFileContent = getFileByDocId(docId) //do something, similar to diff, append operation newFileContent = doSomeThing() //Store to the file system setNewFileContent(docId,newFileContent )

For the two requests A and C mentioned in scenario 1, the two requests arrive at the code segment at the same time, but due to network reasons, A gets the document content first, and C reads the file content before A writes, so the final result is that both will lose one write.

Baidu Java architects share the technical selection and thinking of distributed locks

Therefore, it is necessary to perform a lock on the read and write operations to ensure the integrity and consistency of the transaction.

The image below is an illustration from Modern Operating Systems, and the effect here is hopefully the same.

Baidu Java architects share the technical selection and thinking of distributed locks

Scenarios such as Wiki belong to the resource processing problem of long time-consuming transactions. The appearance of locks ensures that the write coverage will not be caused by the large span between reads and writes in the transaction, so that requests are queued and processed sequentially.

Solution selection

The problem I encountered is also the problem of long affairs such as Wiki. The first thought when encountering a problem is to look for solutions on the Internet.

There are many implementations of MySQL, ZK, and Redis on the Internet. Which one do I need to choose? How to choose? What do I need to weigh?

When I read distributed books before, a word that was mentioned many times was: trade-off. I understand it to be a trade-off or a trade-off.

As a web developer, the main things I need to consider include the following:

  1. Is it OK to implement my function, and does it take time to meet online needs?

  2. Realization difficulty and learning cost;

  3. Operation and maintenance costs.

So let's take a look at the current options according to these criteria:

Method to realize Functional requirements Difficulty to achieve learning cost Operation and maintenance cost
MySQL's solution is implemented with table locks/row locks meet basic requirements not difficult familiar A small amount of OK, a large amount of influence on the existing business, 1 master multi-slave architecture, inconvenient for expansion
Implemented by creating data nodes in ZK fulfil requirements Familiar with ZK API is enough need to learn Heavy, need to stack machines, there are cross-room requests
Redis uses setnxex basic requirements not difficult familiar Easy expansion, existing services

MySQL single-master architecture, writing will go to the master, there is a bottleneck. The ZK method requires its own construction, operation and maintenance, and requires stacking machines, so the utilization rate is not high. In the end, Redis was used to implement it. The traffic/storage can be expanded, and the operation and maintenance do not need to be done by yourself.

accomplish

After choosing the plan, the following is the realization. If we finally implement this lock, what are the requirements for it?

  1. The lock implementation must be atomic, while ensuring that only one contender is exclusive at any time;

  2. unlock must be atomic, while ensuring that only oneself can unlock itself;

  3. No deadlock can occur, and other locking behaviors will not be affected when the process hangs up;

  4. Support architecture and stand-alone in Twemproxy mode;

  5. Time consuming is acceptable.

Based on the above requirements, my implementation is as follows (only approximate, sensitive information removed):

<?phpclass LockUtility{ const DEFAULT_UNLOCK_TIME = 4 ; const COMMON_REDISKEY_PREFIX = 'xxxxx' ; /** * @brief * * @param $ukey the key to be locked * @param $unlockTime lock holding time * * @return */ public function __construct($ukey,$unlockTime=self::DEFAULT_UNLOCK_TIME){ $this->_objRedis = RedisFactory::getRedis(); $this->_redisKey = self::COMMON_REDISKEY_PREFIX.$ukey; $this->_unLockTime = $unlockTime ; //Generate a unique guid for a single lock $this->_guid = genGuid(); } /** * @brief locks the given key* * @return * * true means the lock is successful* * If an exception is thrown, it means that the lock is not successful. Choose your own care level according to the business* Exception error code: * 1. Network error: ErrorCodes::REDIS_ERROR Depending on the business rigor, whether this error is ignored* 2. The lock is occupied: ErrorCodes ::LOCK_IS_USED clearly determines that the lock is owned by someone else */ public function lock(){ /* * The process of setting the lock needs to be atomic, so set is used to operate * SET key value [EX seconds] [PX milliseconds] [NX| XX] * Redis 2.6.12 version starts to support setexnx function by specifying parameters in set * * php syntax: $redis->set('key', 'value', Array('xx', 'px'=>1000)); * */ $setRet = $this->_objRedis->set($this->_redisKey,$this->_guid,array('nx' , 'ex' => $this->_unLockTime)); //Return false to indicate that the lock request failed if(false === $setRet){ //The lock is occupied, throw an exception throw new Exception("get Lock Failed!Locking ",Constants_ErrorCodes::LOCK_IS_USED); } //redis returns null, which is network, machine authorization, syntax error, etc. if(is_null($setRet)){ //Network error, exception throw new Exception("Request Redis Failed", Constants_ErrorCodes::REDIS_ERROR); } return $setRet ; } /** * @brief unlocks a key, in principle, you don't need to care about the return value, you can call it multiple times * * @return * 1 The redis session is successful and successful Deleted key * 0 The redis session is successful, but the key to be deleted no longer exists. * */ public function unlock(){ //Reids 2.6 version adds support for the Lua environment, which solves the problem that CAS cannot be handled efficiently for a long time (check -and-set) command disadvantages $luaScript = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" ; $delRet = $this->_objRedis->eval($luaScript,array($this->_redisKey,$this->_guid),1); if(is_null($delRet )){ //redis returns null, which is network, machine authorization, syntax error, etc. throw new Exception("Request Redis Failed", Constants_ErrorCodes::REDIS_ERROR); } return $delRet ; }}

Has the code been written to solve the above problems? Let's take a look at the use of stand-alone and clustered Redis solutions.

Stand-alone Redis architecture

Baidu Java architects share the technical selection and thinking of distributed locks

For the single-point architecture shown above, read and write are not separated.

So does the above code meet the above requirements?

  1. lock adopts set + nx + ex parameters + redis single thread to ensure that lock is an atomic operation, the success of locking means success, and the failure means failure, satisfying requirements 1 and 3 deadlock processing, and the timeout key is invalid;

  2. Unlock uses Lua to ensure that the compare and del operation is atomic, and at the same time solves the need to delete yourself;

  3. How long does it take? It is a one-time request, acceptable, and the same computer room is at the ms level.

Multi-region and multi-shard master-slave architecture in Twemproxy mode

Baidu Java architects share the technical selection and thinking of distributed locks

Twemproxy is a proxy for Redis/Memcache, mainly responsible for the function of routing to shards according to the key, there are operations that it does not support, for example  keys *. The reason it is not supported is that it needs to traverse all shards to complete the operation. For simple set/get or routing to the corresponding shard, the working principle is the same.

What about Lua scripts? How are Lua scripts routed? support?

When we use eval to execute, I found that our cluster's documentation says:

There must be at least one key after script. The command will be sent to the shard where the first key is located.

That is to say, using eval to complete the work, the command is sent to the first key, and our first key is the key we want to process, so this code is also supported in cluster mode.

However, for the cluster, the mode of eventual consistency, single-domain master and multiple-region slaves, and write-away master region is now adopted.

So that means that the write request is cross-regional? I optimized this by using one more step operation read, because reading does not cross regions, and writing does not cross regions, but more than 99% of the requests master-slave delay is not so large, of course, 99% of this ratio is my guess.

The specific code is as follows:

function lock(){ //First use exist to see if the specified key exists if($objRedis->exist($key)){ //The key must be occupied, throw an exception} //if not exist, It does not mean that the lock is really not occupied, it may be the master-slave delay. At this time, it is safer to reuse the above code and reduce a cross-machine room write}

The precautions for use are as follows:

  1. When using it, you need to control your own lockTime, which needs to be longer than your transaction execution time;

  2. When the upper layer fails to acquire the lock, it needs to choose whether to block or discard the request and let the client retry.

The current issues to be resolved are:

  1. If your process is suspended because the CPU is tight, and the suspension time exceeds the expiration time of the lock you set, will there still be problems?

  2. What happens if a shard in cluster mode dies?

  3. Do you have any solution? Welcome to leave a message for discussion.

Summarize

To sum up my sharing this time, there are mainly the following conclusions:

  1. Distributed locks refer to locks required in a distributed business environment, and there is no requirement for services that support locks to be distributed;

  2. The lock is actually the role of a resource coordinator, managing resource control in concurrent state;

  3. Scheme selection is like investment, and the input-output ratio needs to be considered;

  4. Redis stand-alone and cluster solutions have their own optimization points, which can be optimized according to the scenario;

  5. After writing the article, I found that there is a problem with my topic. A more accurate name should be "Thinking of Redis Implementing Distributed Locks". If I lied to you, please tell me.

refer to

  1. Wu Dashan's blog: Remind me that I need to tie the bell (Lua script)

  2. Twemproxy: I didn't read the code of Twemproxy, but built a service test.

Architecture technology is an inescapable topic for programmers. About distributed, microservices, source code, framework structure, design patterns and other technologies, I share them in the group 697579751, which can be downloaded for free. I hope to help friends and children's shoes who are developing in this industry, spend less time looking for information in forums, blogs and other places, and spend limited time on learning. I will share these videos. I believe that for coders who have already worked and encountered technical bottlenecks, there must be something you need in this group.

Baidu Java architects share the technical selection and thinking of distributed locks

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324418128&siteId=291194637