Use memcached for concurrency control (transfer)

Copyright statement: This article is an original article by the blogger and may not be reproduced without the blogger's permission.

Introduction

A discussion of using cache for concurrency control, let me learn the balance between cost and benefit, and what is real availability...

There are many ways to prevent concurrency, this article only covers using cache memcached control.

Concurrent Scenario:

     Use Case: A person with a premium membership in the SNS system initiates a campaign.

     Business rules: 1. One person can only create one activity at a time. 2. Have a premium membership.

     The basic process is as follows:









There are obvious concurrency problems in this process. When process A verifies that member M is eligible and has created an activity, but to start the creation operation, another process B also makes a rule judgment and goes well. Pass, and complete the creation operation. At this time, A continues to execute, and two M activities will be generated. (This concurrency scenario is very simple and common)

The initial solution:

     plan to use the atomicity of memcached's add operation to control concurrency, as follows:

     1. Apply for a lock: Before verifying whether an activity has been created, execute the add operation key It is memberId. If the add operation fails, it means that another process is concurrently creating activities for the memberId, and the creation failure is returned. Otherwise, there is no concurrency

     2. Execute the creation activity

     3. Release the lock: After the creation activity is completed, execute the delete operation to delete the memberId.

Problem:

     There are some problems with this implementation:

     1. The value stored in memcached has a validity period, that is, it will automatically become invalid after expiration. For example, after adding M1, M1 will become invalid, and the add can be successful here.

     2. Even through configuration, memcached can be made permanently valid, that is, no validity period is set. Memcached has a capacity limit. When the capacity is not enough, it will be automatically replaced, that is, it is possible that after adding M1, M1 is replaced by other key values, then add it again can be successful.

     3. In addition, memcached is based on memory, and all data will be lost after a power failure, resulting in all memberIds can be re-added after restart.

Coping with problems:

     For the above problems, the root cause is that the add operation is time-sensitive, expires, is replaced, and restarts, all of which will cause the original add operation to fail. There are two ways to solve this problem:

     1. Use persistent cache solutions, such as TT (Tokyo Tyrant: http://fallabs.com/tokyotyrant/)

     2. Reduce the impact of timeliness, use memcached CAS (check and set )Way.

     The first one does not need to be explained. It is very simple. All the original problems are caused by timeliness. The timeliness is due to the fact that memcached is based on memory. Then TT using persistent call storage can completely solve this problem.

     The second method needs to be briefly introduced:

     In addition to the atomic add operation in memcached, there are two other operations that are also atomic: incr and decr. Using the CAS mode is:

     1. Set a key value in memcached in advance, assuming it is CREATKEY=1

     2. Every time an activity is created, get CREATEKEY=x before rule verification;

     3. Perform rule verification

     4. Execute the incr CREATEKEY operation to check whether the return value is the expected x+1. If not, it means that another process has performed the incr operation during this period, that is, there is concurrency, and the update is abandoned. Otherwise ,

     5. Execute the creation activity

     to compare these two methods. From the effect point of view, it can be found that the first one is 100% reliable and there is no problem; the second one, there may be misjudgment, that is, there is no concurrency, but it is judged as Concurrency, such as cache restart, or after the key value is invalid, the incr value may be different from the expected value, resulting in misjudgment.

     But in terms of cost, TT is a persistent caching solution, perfect means high cost, we must maintain persistent data, and using memcached's CAS method can solve the problem of timeliness at almost zero cost, although there is a point Small bug, but one that can be fixed with a simple retry. Considering the actual output ratio, the CAS method using memcached is more suitable for the actual situation.

     The balance between cost and benefit, the difference between doing science and doing engineering~]




Application scenario analysis:

       For example, the content of KES in MEMCACHED is A, both client C1 and client C2 have taken out A, and C1 is going to go to it. Add B, C2 is going to add C to it, which will cause the CACHE KEYS after the execution of C1 and C2 to be either AB or AC, instead of the ABC we expect. In this case, if it is not in a cluster environment, but just a single server, the problem can be solved by adding a synchronization lock when writing CACHE KEYS, but in a cluster environment, the synchronization lock obviously cannot solve the problem.

Is memcached atomic? macro
         All single commands sent to memcached are completely atomic. If you send a set command and a get command at the same time for the same data, they will not affect each other. They will be serialized and executed one after the other. Even in multithreaded mode, all commands are atomic; sequences of commands are not atomic. If you get an item through the get command, modify it, and then want to set it back to memcached, we do not guarantee that this item has not been manipulated by other processes (processes, not necessarily processes in the operating system). In the case of concurrency, you may also overwrite an item set by another process.
Memcached 1.2.5 and later, provides the gets and cas commands, which can solve the above problems. If you use the gets command to query an item for a key, memcached will return you a unique identifier for the current value of that item. If you overwrite the item and want to write it back to memcached, you can send that unique identifier to memcached along with the cas command. If the unique ID of the item stored in memcached is the same as the one you provided, your write operation will succeed. If another process also modifies the item during this period, the unique identifier of the item stored in memcached will change and your write operation will fail.

Microscopic analysis

memcache adds some special atomic operations to avoid some competition: add cas incr decr

"add" means "store this data, but only if the server *doesn't* already;

"cas" is a check and set operation which means "store this data but only if no one else has updated since I last fetched it.";

Their implementation principle is based on the CAS (check and save) mode:

the following is a new CAS mode implementation:

     1. Set a key value in memcached in advance, assuming CREATKEY=1

     2. Every time an activity is created, check the rules Get CREATEKEY=x before;

     3. Perform rule verification

     4. Execute the incr CREATEKEY operation to check whether the return value is the expected x+1, if not, it means that another process has performed the incr operation during this period, that is, there is Concurrent, give up the update. Otherwise

     5. The key value saved by memcached in the creation activity

has a unique identifier casUnique. When performing the incr decr operation, first obtain casUnique, execute incr, and check whether the return value is casUnique+1. If it is, update it, otherwise, it will not fail. renew!

Although this design has flaws in handling concurrency, a simple retry can solve the problem!

Interface analysis:

Return MemcachedItem object:

public MemcachedItem gets(String key) {
               return client.gets(key);
          }

public MemcachedItem gets(String key, Integer hashCode) {
                   return gets(OPCODE_GET, key, hashCode, false);
         }

Ordinary get method, returns Value object

public Object get(String key) {
              return client.get(key);
         }

casUnique: is the unique identifier
        public boolean cas(String key, Object value, long casUnique) {
             return client.cas( key, value, casUnique);
        }

public boolean cas(String key, Object value, long casUnique) {
               return set(OPCODE_SET, key, value, null, null, casUnique, primitiveAsString);
          }

MemcachedItem class structure:

public final class MemcachedItem {
              public long casUnique;
               public Object value;

}

Other constraints:

32-bit unsigned integer

The following article is the use of the atomicity of memcache add in practice:

The original text is from: http://blog.csdn.net/jiangbo_hit/article/details/6211704

Introduction

A discussion on using cache for concurrency control, let me learn the cost and The balance between benefits, and what is true availability...

There are many ways to prevent concurrency, this article only covers the use of cache memcached control.

Concurrent Scenario:

     Use Case: A person with a premium membership in the SNS system initiates a campaign.

     Business rules: 1. One person can only create one activity at a time. 2. Have a premium membership.

     The basic process is as follows:






There are obvious concurrency problems in this process. When process A verifies that member M is eligible and has created an activity, but to start the creation operation, another process B also makes a rule judgment and goes well. Pass, and complete the creation operation. At this time, A continues to execute, and two M activities will be generated. (This concurrency scenario is very simple and common)

The initial solution:

     plan to use the atomicity of memcached's add operation to control concurrency, as follows:

     1. Apply for a lock: Before verifying whether an activity has been created, execute the add operation key It is memberId. If the add operation fails, it means that another process is concurrently creating activities for the memberId, and the creation failure is returned. Otherwise, there is no concurrency

     2. Execute the creation activity

     3. Release the lock: After the creation activity is completed, execute the delete operation to delete the memberId.

Problem:

     There are some problems with this implementation:

     1. The value stored in memcached has a validity period, that is, it will automatically expire after expiration. For example, after adding M1, M1 will be invalid, and you can add it here successfully

     . 2. Even through configuration, memcached can be permanently valid, that is, no validity period is set, and memcached has capacity. Restriction, when the capacity is not enough, it will be automatically replaced, that is, it is possible that after adding M1, M1 is replaced by other key values, and then adding again can be successful.

     3. In addition, memcached is based on memory, and all data will be lost after a power failure, resulting in all memberIds can be re-added after restart.

Coping with problems:

     For the above problems, the root cause is that the add operation is time-sensitive, expires, is replaced, and restarts, all of which will cause the original add operation to fail. There are two ways to solve this problem:

     1. Use persistent cache solutions, such as TT (Tokyo Tyrant: http://fallabs.com/tokyotyrant/)

     2. Reduce the impact of timeliness, use memcached CAS (check and set )Way.

     The first one does not need to be explained. It is very simple. All the original problems are caused by timeliness. The timeliness is due to the fact that memcached is based on memory. Then TT using persistent call storage can completely solve this problem.

     The second method needs to be briefly introduced:

     In addition to the atomic add operation in memcached, there are two other operations that are also atomic: incr and decr. Using the CAS mode is:

     1. Set a key value in memcached in advance, assuming it is CREATKEY=1

     2. Every time an activity is created, get CREATEKEY=x before rule verification;

     3. Perform rule verification

     4. Execute the incr CREATEKEY operation to check whether the return value is the expected x+1. If not, it means that another process has performed the incr operation during this period, that is, there is concurrency, and the update is abandoned. Otherwise ,

     5. Execute the creation activity

     to compare these two methods. From the effect point of view, it can be found that the first one is 100% reliable and there is no problem; the second one, there may be misjudgment, that is, there is no concurrency, but it is judged as Concurrency, such as cache restart, or after the key value is invalid, the incr value may be different from the expected value, resulting in misjudgment.

     However, considering the cost, TT is a persistent caching solution, perfect means high cost, we must maintain persistent data, and using the CAS method of memcached can solve the problem of timeliness with almost zero cost, although there is a point Small bug, but one that can be fixed with a simple retry. Considering the actual output ratio, the CAS method using memcached is more suitable for the actual situation.

     The balance between cost and benefit, the difference between doing science and doing engineering~

http://blog.csdn.net/jiangbo_hit/article/details/6211704
http://zhengjunwei2007.blog.163.com/blog/static/3529794220117325112464/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326265480&siteId=291194637