Art~How to ensure 100% data transmission, idempotent design plan

How to ensure 100% data transmission

For example, we now have an order service message, how can we ensure that this message is 100% sent to the logistics service?

Let me talk about my solution.
Insert picture description here
(1) The order service delivers the message to the MQ middleware
(2) The logistics service monitors the MQ middleware message for consumption

  • Take RabbitMQ for example.

Let’s talk about a scenario below. What happens if the MQ server suddenly goes down? Are all the messages sent by our order service gone? Yes, general MQ middleware will store messages in memory in order to improve the throughput of the system. If no other processing is done, once the MQ server goes down, all messages will be lost. This is not allowed by the business and has a great impact.

The solution to this problem is to persist the order service message, so even if the MQ is down, the logistics can be monitored normally after restarting.

  • Endurance

Experienced friends will say, I know that one way is to persist the message. When sending a message in RabbitMQ, there will be a durable parameter that can be set. If it is set to true, it will be persisted.

In this case, even if the MQ server is down, there will be message storage in the disk file after restart, so that it will not be lost. Yes, there is a certain probability that the message will not be lost.

But there will be another scenario, that is, the message has just been saved in the MQ memory, but it has not had time to update to the disk file, and it suddenly crashes. (I rely, this time is so short, it will appear, the probability is too low), this scene will be very common in the continuous delivery of a large number of messages.
What to do then? How can we ensure that it will be persisted to the disk? This requires a confirm mechanism of mq.

  • confirm mechanism

The above problem is that no one tells us whether the persistence is successful. Fortunately, many MQs have callback notification features. RabbitMQ has a confirm mechanism to notify us whether the persistence is successful?

Principle of confirm mechanism:

(1) The message producer sends the message to MQ, and if the reception is successful, MQ will return an ack message to the producer;
(2) If the message is unsuccessfully received, MQ will return a nack message to the producer;

Does this guarantee that 100% of the messages will not be lost?

Let's take a look at the confirm mechanism, and imagine if every time our producer sends a message, MQ must be persisted to disk, and then an ack or nack callback will be initiated. In this case, is the throughput of our MQ very low, because the message must be persisted to the disk every time. The action of writing to disk is very slow. This is unacceptable in high concurrency scenarios, and the throughput is too low.

Therefore, the real implementation of MQ persistent disk is processed through asynchronous calls. It has a certain mechanism, such as: when there are thousands of messages, it will flash the disk to the disk at a time. Instead of refreshing the disk every time a message comes.

Therefore, the comfirm mechanism is actually an asynchronous monitoring mechanism to ensure the high throughput of the system. This results in the 100% guarantee that the message is not lost, because even if the confirm mechanism is added, the message has not been flushed in the MQ memory. It crashes when the disk arrives, and it still cannot be processed.
Having said so much, I still can’t make sure, so what should I do? ? ?

  • Message persistence in advance + scheduled task

In fact, the essential reason is that it is impossible to determine whether it will persist or not. Can we make the message persistent? The answer is yes, and our plan has evolved further.

(1) Before delivering the message, the order service producer should persist the message to Redis or DB. Redis is recommended for high performance. The status of the message is sending.
(2) Is the confirm mechanism monitoring message sent successfully? If the ack message is successful, delete this message in Redis.
(3) If the nack is unsuccessful, you can choose whether to resend the message according to your own business. You can also delete this message, depending on your business.
(4) A timed task is added here to pull the message after a certain period of time, and the message status is still sending. This status indicates that the order service has not received the ack success message.
(5) Timed tasks will deliver compensatory messages. At this time, if the MQ callback ack is successfully received, delete the message in Redis.

This mechanism is actually a compensation mechanism. I don't care whether MQ has actually received it or not, as long as the message status in my Redis is also [Sending], it means that the message was not delivered correctly and successfully. Then start the timing task to monitor and initiate compensation delivery.

Of course, we can also add a number of compensation for the timed task. If it is more than 3 times and the ack message is still not received, then directly set the status of the message to [Failed], and manually investigate why?

In this case, the plan is more perfect, and 100% of the messages are guaranteed not to be lost (of course, it does not include the disk and it is broken, so it can be a master-slave plan).
However, with such a scheme, it is possible to send the same message multiple times. It is very likely that MQ has already received the message, that is, a network failure occurred during the callback of the ack message and the producer did not receive it.
Then consumers must be required to ensure idempotence when consuming!

What is idempotence

What is idempotence? It is a logic repeated operation, the result is the same, and the result of the previous execution will not be destroyed for each execution of this logic.

Common idempotent and non-idempotent

1) Select query is naturally idempotent;
2) Delete delete is also idempotent, and the effect of deleting the same multiple times is the same;
3) Update directly updates a certain value, idempotent;
4) Update update cumulative operation, non-idempotent;
5 ) insert is a non-idempotent operation, one is added each time.

cause

Due to repeated clicks or network retransmissions

1) Click the submit button twice;
2) Click the refresh button;
3) Use the browser back button to repeat the previous operation, resulting in repeated submission of the form;
4) Use the browser history to repeatedly submit the form;
5) Repeat the browser HTTP request ;
. 6), etc. Nginx retransmission;
7) distributed RPC try a retransmission of the like.

when do you needit

In business development, repeated submissions are often encountered. Whether it is a re-initiated request due to a network problem that cannot receive the request result, or a front-end operation jitter causes repeated submissions, this situation should be guaranteed to be idempotent.

There are also users who have clicked multiple times to submit an order on the APP, and only one order should be generated in the background.

Initiate a payment request to Alipay. Due to network problems or system bugs, Alipay should only deduct the money once.

So obviously, a service that declares idempotence should be considered that there will be multiple calls by external callers. In order to prevent multiple changes to the system data state by multiple external calls, the service is designed to be idempotent.

Idempotent deficiency

Idempotence is to simplify the client logic processing, but it increases the logic and cost of the service provider. Whether it is necessary or not, it needs to be analyzed according to specific scenarios. Therefore, in addition to special business requirements, try not to provide idempotent interfaces.

  1. Added additional control idempotent business logic, which complicates business functions;

  2. Change the function of parallel execution to serial execution, which reduces the execution efficiency

solution

Page redirect

After the front-end form is submitted, the page redirection is performed, and the page is transferred to the successful submission information page.
This can avoid repeated submissions caused by user keystrokes, and there will be no warning of repeated submission of browser forms, and it can also eliminate the same problems caused by pressing the browser forward and backward.

Advantages: Simple to implement
Disadvantages: Insufficient reliability

Optimistic lock

Generally, optimistic locking is done by storing the version field in mysql, which not only guarantees execution efficiency, but also guarantees idempotence.
For example: UPDATE tab1 SET col1=1,version=version+1 WHERE version=#version#
According to the version version, that is, get the version number of the current product before operating the inventory, and then bring this version number when operating.
We sorted it out. When we first operated the inventory, we got the version as 1, and the calling inventory service version changed to 2. But there was a problem returning to the order service, and the order service once again initiated a call to the inventory service. When the order service passed the version It is still 1, and when the above SQL statement is executed again, it will not be executed; because the version has changed to 2, the where condition does not hold. This ensures that no matter how many times it is called, it will only be processed once.

Advantages: relatively simple to implement
Disadvantages: not very efficient

Distributed lock

Such as Redis. When an order initiates a payment request, the payment system will check whether the key of the order number exists in the Redis cache. If it does not exist, add the Key to Redis as the order number and continue the subsequent operations. If the order number already exists, then the request will be abandoned. operating.
Query the order payment has been paid, if not, proceed to the payment, delete the key of the order number after the payment is completed. Distributed lock is achieved through Redis. Only when the order payment request is completed this time, the next request can come in. Compared with the deduplication table, it is more efficient to put the concurrency in the cache. The idea is the same, only one payment request can be completed at the same time.

Advantages: Simple to implement
Disadvantages: Once the ID is repeated, there will be big problems

token token

This method is divided into two stages: the token application stage and the payment stage. In the first stage, before entering the order submission page, the order system needs to initiate a token request to the payment system based on user information. The payment system saves the token in the Redis cache and uses it for the second stage of payment. In the second stage, the order system initiates a payment request with the applied token, and the payment system checks whether the token exists in Redis. If it exists, it means that the first payment request is initiated. After the token in the cache is deleted, the payment logic processing starts; if cached Does not exist, indicating an illegal request. In fact, the token here is a token, and the payment system confirms according to the token that you are your mother’s child. The disadvantage is that it requires two interactions between systems, and the process is more complicated than the above-mentioned method.

Advantages: Simple to implement
Disadvantages: The selection of the token generation algorithm is very important

Guess you like

Origin blog.csdn.net/Shangxingya/article/details/115056971