RocketMQ message loss scenarios and solutions

Author: Ji-yun HYY

Source: https://blog.csdn.net/LO_YUN/article/details/103949317

 

Since MQ is used in the project, it is inevitable to consider the problem of message loss. In some scenarios involving money transactions, message loss can be fatal. So what are the scenarios of message loss in RocketMQ?

Let's first come to the simplest consumption flow chart:

The above figure roughly contains several scenarios:

  • The producer generates a message and sends it to RocketMQ

  • After RocketMQ receives the message, it must be saved to the disk, otherwise the data will be lost after power failure or downtime

  • Consumers get message consumption from RocketMQ. After the consumption is successful, the whole process ends

These three scenarios may cause message loss, as shown in the following figure:

1. When the producer sends a message to Rocket MQ in scenario 1, if there is a network jitter or abnormal communication, the message may be lost

2. In Scenario 2, the message needs to be persisted to disk. At this time, there will be two situations that cause the message to be lost

  • In order to reduce the disk IO, RocketMQ will first write the message to the os cache instead of directly writing to the disk. The consumer getting the message from the os cache is similar to getting the message directly from the memory, which is faster. The time will be flushed to the disk by the os thread asynchronously, and then the message persistence is truly completed. In this process, if the message has not completed the asynchronous flash disk, the Broker in RocketMQ will be down, which will cause the message to be lost

  • If the message has been flushed to the disk, but the data has not been backed up, once the disk is damaged, the message will also be lost

3. The consumer successfully obtains the message from RocketMQ. When the message is not completely consumed, RocketMQ is notified that I have consumed the message, and then the consumer goes down, but RocketMQ believes that the consumer has successfully consumed the data, so The data is still lost.

So how to ensure zero loss of messages?

1. The solution to ensure that messages are not lost in Scenario 1 is to use RocketMQ's own transaction mechanism to send messages. The general process is:

  • First, the producer sends the half message to RocketMQ. At this time, the consumer cannot consume the half message. If the half message fails to be sent, the corresponding rollback logic is executed.

  • After the half message is sent successfully, and RocketMQ returns a successful response, the producer's core link is executed

  • If the producer's own core link fails to execute, it will roll back and notify RocketMQ to delete the half message

  • If the producer’s core link is executed successfully, the RocketMQ commit half message will be notified so that consumers can consume this data

Among them, there are some RocketMQ that has not received a response from the producer for commit/rollback operations for a long time. Call back the details of the producer interface. If you are interested, please refer to "RocketMQ Distributed Transaction Principle" (https://blog.csdn.net /LO_YUN/article/details/101673893)

 

After the RocketMQ transaction is used to successfully send the producer's message to RocketMQ, it can be guaranteed that the message will not be lost at this stage

2. To ensure that the message is not lost in Scenario 2, you first need to change the asynchronous flushing strategy of the os cache to a synchronous flushing. In this step, you need to modify the Broker configuration file and change the flushDiskType to the SYNC_FLUSH synchronous flushing strategy. The default is ASYNC_FLUSH flushes the disk asynchronously.

Once the synchronization flashing returns successfully, then it must be ensured that the message has been persisted to the disk; in order to ensure that the disk is damaged without losing data, we need to adopt a master-slave mechanism for RocketMQ, cluster deployment, and the data in the leader in multiple followers All have backups to prevent single points of failure.

3. In scenario 3, when the message reaches the consumer, RocketMQ can ensure that the message will not be lost in the code

//注册消息监听器处理消息
consumer.registerMessageListener(new MessageListenerConcurrently() {
   @Override
    public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs, ConsumeConcurrentlyContext context){                                  
        //对消息进行处理
        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
    }
});

In the above code, RocketMQ registers a listener in the consumer. When the consumer gets the message, it will call back the listener function to process the message inside.

When your message is processed, it will return to ConsumeConcurrentlyStatus.CONSUME_SUCCESS. Only when CONSUME_SUCCESS is returned, the consumer will tell RocketMQ that I have finished consumption. If the consumer is down, the message has been processed, and the message will not be lost.

If the consumer is down before returning to CONSUME_SUCCESS, RocketMQ will think that your consumer node is down, and will automatically failover, and hand over the message to other consumers in the consumer group to consume the message and guarantee the message Not lost

In order to ensure that the message will not be lost, it is enough to write the business logic of message consumption directly in the consumeMessage method. If you have to do some operation, such as the following code

//注册消息监听器处理消息
consumer.registerMessageListener(new MessageListenerConcurrently() {
   @Override
    public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs, ConsumeConcurrentlyContext context){ 
     //开启子线程异步处理消息
     new Thread() {
   public void run() {
    //对消息进行处理
   }
  }.start();                                 
        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
    }
});

If the newly opened child thread processes the messages asynchronously, it may happen that the message has not been consumed yet, and the consumer tells RocketMQ that the message has been consumed, and as a result, the message is lost due to the downtime.

Using the above set of solutions can guarantee zero message loss when using RocketMQ, but performance and throughput will also drop significantly

  • The use of transaction mechanism to transmit messages will have many more steps than ordinary message transmission, which consumes performance

  • Synchronous flashing is compared with asynchronous flashing, one is stored in the disk and the other is stored in the memory, the speed is not an order of magnitude at all

  • For the master-slave organization, the leader needs to synchronize the data to the follower

  • It is not possible to consume asynchronously during consumption. You can only wait for the consumption to complete and then notify RocketMQ that the consumption is complete

Zero message loss is a double-edged sword. If you want to use it well, it still depends on the specific business scenario. Choosing the right solution is the best

Guess you like

Origin blog.csdn.net/csdn_lulinwei/article/details/108596549