Talk about idempotence

What is idempotence
A few points to note

When an exception occurs, should I be idempotent?
What is the basis for idempotence?
Do I only need to return the return code when idempotent?
Does idempotent judgment need to control concurrency?
other

Common practice

Use distributed lock + cache to be idempotent
Use distributed lock + database to do idempotence
Use database unique key to do idempotence
other

Talk about idempotence

Let us first look at such a problem.

If you go to the cafeteria to eat. You shouted into the cafeteria window: "Master, two steamed buns!" After waiting for a long time, you haven't seen your steamed buns. So you shouted into the cafeteria window again: "Master, two buns!"

At this time, what do you want the cafeteria chef to do?

A: Give you two buns.

B: Here are four buns.

C: I promised "coming", but I won't give you anything.

D: Bring a bowl of porridge and put it on your face.

Normal people should all choose A. This is idempotence.

(Speaking of the one who chose B, after I said "You owe me 10 yuan" twice next time, please remember to pay me 20.)

What is idempotence

We will not discuss the various mathematical definitions of idempotence. Generally speaking, we will define idempotence like this:

Idempotence means that when other conditions remain unchanged, using the same data and executing the same method, one execution and execution 10,000 times can get the same result.

For example, if you perform a query operation when no one modifies the database data, you should get the same data once and 10,000 times with the same SQL query.

A few points to note

Idempotent operations seem simple, but there are actually many areas that require special attention.

When an exception occurs, should I be idempotent?

Let’s look at this scenario: if an exception is thrown inside the method when executing a method; then, we fix the problem inside the method; at this time, the caller will call again. If the requirement is idempotent, do we have to give Does the caller throw an exception or return the corresponding error code?

In addition to considering idempotence, we also need to consider one point: the transactional nature of business operations, that is, business transactions.

Business transactions are similar to database transactions, and they should also have atomicity, consistency, isolation, and durability. For this reason, we often bind a business transaction to a database transaction: as long as the database transaction is submitted successfully, the business transaction is successful.

However, the scope of business transactions is larger than that of database transactions: it not only includes database operations in this business operation, but may also include three-party interface calls, MQ message sending, cache write operations and other operations. Of course, to ensure that all operations, especially the transactional four elements of distributed operations, will become very complicated in specific implementation. However, we must ensure the transactional nature of the entire business operation at least at the design level and at least in terms of data consistency.

So, from the perspective of business transaction data consistency, when an exception is thrown in the middle of a business operation, what should we do?

Obviously, we should roll back the current business transaction and undo all operation results before the exception. Otherwise, the operation in the first half of the exception was successful and some data was written; the second half of the exception was not executed and no data was written. At this time, there will be data inconsistencies.

After the rollback is complete, for our system, the number of executions of this request is 0. At this time, the caller uses the same data to call us again-although for the caller, this is the second request; but for us, from the perspective of business transactions and idempotence, this is the first Calls.

Since it is the first request, just put it in and operate normally.

What is the basis for idempotence?

Idempotence has three conditions: other conditions remain unchanged, the same data, and the same method. Among these three conditions, the "same data" is the most difficult to judge: the caller passes an a=1. How do we know if this is a new business operation or a replay of an operation?

In the request messages of many interfaces, we have defined a field such as "request serial number". Most of the time, we directly use this field as an idempotent basis.

This approach is really simple and easy to do. But it actually has hidden dangers: even if the business data of the two requests are completely different, as long as the serial number is the same, the second request is an idempotent request; even if the business data of the two requests are exactly the same, as long as the serial number is different, The second request is a new request.

The direct cause of this problem is that we handed important business constraints to callers who are likely to be uncontrollable and unreliable. Fundamentally speaking, the reason for this problem is that this "request serial number" is a business-independent "unique key": it cannot really be used to uniquely identify a piece of "business data".

Therefore, a more rigorous idempotent method is to find the business unique key in the request data, and do idempotence based on the business unique key. Sometimes the unique key needs to combine too many fields, which is difficult to deal with when making unique judgments. We can also splice these fields and calculate MD5 once: Although this MD5 cannot be traced back to business data, it is indeed a business-related Key value.

However, this kind of business unique key is really hard to find, and sometimes the business logic is indeed not unique: I can indeed go to the cafeteria window to shout "Master, two steamed buns" after eating the two buns; even after eating the two buns After that, I can have another "Master, two buns". In this case, the master should give me a total of six buns.

In this case, there is nothing to do. Although it is not the best solution, it is also a solution to use special fields such as serial number and unique key passed in by the interface as the basis for idempotence. But at this time, we must strictly require the caller to pass values according to constraints.

Do I only need to return the return code when idempotent?

Most interface return messages contain code and msg fields to mark the status of this operation. For this kind of interface, it is obviously sufficient to return only code and msg when idempotent.

It should be noted that the code and msg here should be the same as the first normal request. If the response code=0 in the normal request and the code=1 in the idempotent request, then the two requests will not get the "same result"-after you call the cafeteria twice, "Master, two buns" , It’s one thing for the master to give you two buns and then put a bowl of porridge on your face.

However, there are also many interfaces, in addition to returning code and msg, it also needs to return business data. For this kind of interface, simply returning the code is obviously not enough-otherwise, isn't it the same as saying "Master, two steamed buns" at the cafeteria window, and the master only promised "come" but not giving you the buns?

In either case, the core is to ensure that idempotent requests and normal requests return the same results.

Does idempotent judgment need to control concurrency?

In most cases, we need to determine idempotence by querying certain data based on the unique key of the request: if there is data, it is an idempotent request; if there is no data, it is not an idempotent request.

This logic is basically feasible, except when concurrent requests occur: at this time, for the two concurrent requests, no data can be found with the unique key in the request, so they will all be judged as idempotent requests. In this way, it is equivalent to the execution of a request twice, which also declares that the idempotent judgment has failed.

Therefore, if you use this query method to determine idempotence, you must pay attention to controlling concurrency.

other

If you have any questions, you can raise them for discussion.

Common practice

Use distributed lock + cache to be idempotent

For the unique key in the request, first add a distributed lock, and then determine whether there is a value in the cache. If there is no value, the normal request is executed and the normal return result (whether successful or failed) is stored in the cache; if there is a value, the cached result is directly returned.

This method is the most efficient and easiest. However, this approach has two problems.

First, if the cache is invalidated, then the idempotent judgment is invalidated. Of course, this situation is more controllable and less probable.

Second, if the premise of "other conditions remain unchanged" is broken, cached data will often become "outdated". At this time, should the idempotent result return the result before "other conditions remain unchanged" or the result after it? This is a question worthy of serious consideration.

For example, the query method is naturally idempotent; but this is based on the premise that the database data has not been updated. If the query operation has a cache, and the cache is not updated after the database is updated, then the query operation should return the results in the cache? Or should it return the actual results in the database?

Similarly, if there is a cache on operation A, its idempotence is based on the premise that its basic data has not changed. However, operation B will update the basic data of operation A, but will not update the cache on operation A; at this time, if operation A is performed, operation B is performed again, and then operation A is performed again, operation A should return to the cache. Is the result modified by operation B or the result modified by operation B in the database returned?

Use distributed lock + database to do idempotence

Compared with the cache, the database is a more reliable persistence tool. Moreover, most business operations will eventually store the results in the database. Therefore, using a database to determine idempotence is more reliable than using a cache.

But in this case, after finding that there is data in the database and determining it as idempotent, we often need to manually assemble the business data in the database into an interface to return the result. Moreover, this method will bring an additional database query operation. If the interface pressure is too large, the impact on database performance cannot be underestimated.

Use database unique key to do idempotence

In normal database addition, deletion, modification, and checking operations, only addition is naturally not idempotent. However, we can turn it into an idempotent operation by adding a unique index to the database table.

However, when using this method, we need to catch the unique key conflict exception thrown by the database and treat this exception as an idempotent result-sometimes we need to assemble the result again.

This way does not need to do distributed lock processing, and one less database operation than the second way. However, you also need to manually transfer the results, and the performance of the database is not very friendly-the unique index will also drag down the performance of the insert.

other

If there are other ways, you might as well bring them up and discuss them together.

Jing Xin's Garden.png

Talk about idempotence

Guess you like