MQ messaging solutions eventual consistency

With the popularity and the popularity of distributed services architecture, more logical operations performed in the original monomer applications, it is now split into multiple remote calls between the service. While the service has brought into our system of horizontal scaling capabilities, however, comes the challenge is distributed transaction issues, using its own separate database maintained between multiple services between each other not in the same transaction, if a successful execution, execution B and failed, but this time a transaction has been submitted can not be rolled back, then eventually it will lead to data inconsistency on both sides of the issue; although very early before XA-based distributed two-phase commit affairs, but such programs because of the global resources required to lock, resulting in poor performance; therefore back gradually spawned a message eventual consistency, TCC and other 柔性事务distributed services program, the main analysis of this paper is based on the final news program consistency .

A common message processing flow

image

  1. Generating a message sends a message
  2. MQ message is received, the message persistence, a new record in the store
  3. ACK returns to producers
  4. MQ push message corresponding to the consumer, and the consumer returns the ACK waiting
  5. If the message consumer ack successful return within the specified time, then the MQ message is considered successful consumption, delete messages in the store, that is, to step 6; if MQ ACK is not received within the specified time, it considers the message consumer fails, try re-push message, repeatedly performed steps 4,5,6
  6. MQ delete messages

Normal message processing consistency problems

We order creation, for example, the order system to create orders (local affairs), and then send a message to downstream processing; if the order is successfully created, but the message was not sent, then all downstream systems are unable to perceive this event, there will be dirty data ;

public void processOrder() {
    // 订单处理(业务操作) 
    orderService.process();
    // 发送订单处理成功消息(发送消息) 
    sendBizMsg ();
}

复制代码

If the order message sent first, and then create an order; then there may be a message sent successfully, but when the order creation has failed, then the downstream system thought that this order has been created, there will be dirty data.

public void processOrder() {
   // 发送订单处理成功消息(发送消息) 
    sendBizMsg ();
    // 订单处理(业务操作) 
    orderService.process();
}

复制代码

A wrong idea

At this point some students might be thinking, can we send messages and business processes on the same local transaction to be processed, if the service message transmission fails, then the local transactions to be rolled back, so is not able to solve the same message sent sexual problems?

@Transactionnal
public void processOrder() {
    try{
        // 订单处理(业务操作) 
        orderService.process(); 
        // 发送订单处理成功消息(发送消息) 
        sendBizMsg ();
    }catch(Exception e){
         事务回滚;   
    }
}

复制代码

Analysis abnormality message transmitted

Possible consistency
Order processing is successful, then suddenly goes down, uncommitted transactions, the message was not sent Consistency
Order processing is successful, due to network downtime reasons or MQ, the message was not sent, the transaction is rolled back Consistency
Order processing is successful, the message is sent successfully, but MQ for other reasons, resulting in failure to store messages, transaction rollback Consistency
Order processing is successful, the message that the store successfully, but MQ handle timeouts, thus confirming ACK failure, resulting in the sending local transaction rollback Inconsistent

From the above analysis, we can see that using common approach, however, can not guarantee that business processes and message consistency on both sides, the fundamental reason is that: long-distance calls, the results could end up as a success, failure , overtime; and in the case of a timeout, the final result of the process may be successful party, it could be a failure, the caller is not known. It once appeared in a similar situation in the project, the caller to write data locally and then initiate RPC service call, but the processing side due to the large amount of data DB, leading to handle timeouts, the caller after a timeout exception occurs, immediate rollback local transaction, causing the caller data is not here, but the data processing side there has been written, both sides eventually lead to inconsistent business data. In order to ensure data consistency on both sides, we can only find a new breakthrough from elsewhere.

Transaction message

As traditional approach can not solve 消息生成者本地事务处理成功the 消息发送成功consistency problem between the two, so the transaction message was born, which implements the messages generated by local affairs and atomic message sent to ensure that the messages generated by local transaction successfully sent the message success the 最终一致性problem.

Transaction processing flow of news

image

  1. Transaction message with a common message difference is that message production processes, the producer first send a message to the MQ advance (this is also called message sent half)

  2. MQ After receiving the message, the first for persistence, the store will add a status 待发送message

  3. ACK message is then returned to the producer, this time MQ event will not trigger push notifications

  4. After the pre-producer sends a message successfully, perform local transactions

  5. Perform local affairs, after the execution is complete, the results sent to the MQ

  6. MQ will delete or update message status is based on the results可发送

  7. If the message status is updated 可发送, the MQ will push the message to the consumer, the consumer and the general message behind the message is the same

Precautions : Because MQ messages can usually guarantee delivery success, therefore, if the business did not return ACK result, then it may cause duplicate message delivery problems MQ. Therefore, the final Consistency of message, the message must be consumer support power consumption of the message, etc., can not cause the situation repeated consumption of the same message.

Transaction message analysis anomalies

abnormal situation consistency Abnormal processing method
Message is not stored, the business operation is not performed Consistency no
Storing 待发送the message successfully, but ACK failure, resulting in service not performed (possibly MQ handle timeouts, network jitter and other reasons) Inconsistent MQ confirm the operational results, process messages (deleted messages)
Storing 待发送the message successfully, ACK successful business executive (may succeed or fail), but the end result did not receive MQ producer of business processes Inconsistent MQ confirm the operational results, process the message (according to the results of the business process, updating status messages, if the business is successful, the message is delivered, fails to delete messages)
Business process successfully, and sends the result to MQ, MQ update message but failed, resulting in the message status is still待发送 Inconsistent Ditto

MQ transactional messages

Now more mainstream MQ, such as ActiveMQ, RabbitMQ, Kafka, RocketMQ, and only RocketMQ transactional messages. According to my understanding, but also because the early years Ali Alipay there on business because the demand generated by the increase of MQ transaction message. So, if we want a strong dependence transaction message MQ message eventual consistency to do so, under the present circumstances, to choose RocketMQ can only be solved on technology selection. We also analyzed the above anomalies are present transaction messages that MQ stores the 待发送message, but MQ can not perceive the end result of upstream processing. For RocketMQ, its solution is very simple, is its internal implementation will have a regular job, go in rotation status 待发送message, and then send a check request to the producer, and the producer must implement a listener check the contents of listeners is usually to check the corresponding local transaction is successful (that is, the general query DB), if successful, the message will be set to MQ 可发送, or to delete the message.

Common Problems

  1. Q: If the pre-failure to send a message, not the business is not carried out?

    A: Yes, for the consistency of the program based on the final message, usually strongly dependent on this step, if this step can not be guaranteed, then the final will be impossible to achieve the eventual consistency.

  2. Q: Why add a message 预发送mechanism for increasing the retry mechanism news release out twice, why not after business success, failed to send, then use a retry mechanism?

    A: If the business succeeds, the message again, this time if you have not had time messaging, business has been down the system, the system is restarted, sent a message if there is no prior record, this will lead to a successful business executive, the final message did not make the situation go.

  3. If the consumer fails consumption, the need to roll back the producer to do it?

    A: Here's affairs news, producer consumer consumption will not fail to do a rollback, the application uses a transaction message, it is the pursuit of high availability and eventual consistency , news consumption fails, MQ will be responsible for their own re-launch news, until consumer success. Therefore, the transaction message is intended for the production side in terms of consistency and the end consumer, the consumer side is through MQ retry mechanism to complete.

  4. If the consumer end due to business exceptions and cause a rollback, it would mean that both sides ultimately can not guarantee consistency?

    A: Based on the consistency of the final plan must ensure that the consumer end of the message in the operation of the business did not barriers , it only allows the system to abnormal failure, failure is not allowed on the business, such as throw a question NPE like in your business, cause you to execute a transaction fails the consumer side, it is difficult to achieve in agreement.

Because not all MQ support the transaction message, if we do not choose RocketMQ as MQ system is able to achieve eventual consistency of the message it? The answer is yes.

Based on the final consistency of the local news

image

基于本地消息的最终一致性Program core approach is in the implementation of business operations, recorded a message data to the DB, and recording of data and business information data must be completed within the same transaction, which is a prerequisite to protect the core of the program . After recording the message data is completed, we can later by a timed task in rotation to the DB status 待发送message, then the message will be delivered to the MQ. This process may exist in message delivery may fail, in which case it relies on 重试机制to ensure that, until after the successful receipt of the ACK confirmation MQ, then the message status update or message clear; while consumption later message fails, then rely MQ itself retry to complete its final consistency on both sides to achieve a final system data. 基于本地消息服务Although the program can be done consistent final messages, but it has a more serious drawbacks, each business system when using this program, we need to create a message table in the corresponding business library to store messages. To address this issue, we can extract the features singled out, made a deal with a unified messaging service, and therefore it is derived from the program we will discuss below.

Eventual consistency independent news services

image

独立消息服务最终一致性The 本地消息服务最终一致性biggest difference is that the messages are stored separately made an RPC service, this process is actually simulate the process of sending a message pre-transaction message, and if the pre-send a message fails, then the producers would not have to carry out business, So for producers of business, it is strongly dependent on the messaging service. Fortunately, however, the level of support for the independent news service expansion, so as long as the deployment of multiple, made HA cluster mode, it is possible to ensure its reliability. In the messaging service, and a separate timed task, it regularly for a long time in rotation in a 待发送message state, through a check compensation mechanism to confirm whether or not the message corresponding to successful service, the corresponding service if the process is successful, the message is modified to 可发送then deliver it to the MQ; if the service process fails, the corresponding message can be updated or deleted. Therefore, when using this program, a message producer must implement a check service, to do service for the message to confirm the message. For consumer's message, the program is the same with the above process, to ensure that messages are being consumed by the MQ own retransmission mechanism.

Summary : After the upstream transaction commits, in the MQ-based scenarios do not consider a rollback. Failure may be due to network, service downtime as a result, the article mentioned that the implementation of the business is accessible. If the downstream service is not restored for a long time, then it should set alarm, where there are several types of psoriasis mechanisms to solve some problems, if the message is always sent upstream failure (this possibility does not exist unless the code is basically false) this we can set the alarm mechanism such as a log can be printed when an exception occurs, send text messages, send a message, save the abnormal orders to the database, these measures can be used to simultaneously downstream some unusual orders, but can also create a new exception occurs when an abnormal Topic news tips to make manual intervention to revised data.

Guess you like

Origin juejin.im/post/5d8882bdf265da03c9273821