Several ways for RabbitMQ to ensure that messages are not lost


When using message queues, in the face of complex network conditions, we must consider how to ensure that messages can be consumed normally. Before analyzing how to ensure that the message is not lost, we need to prescribe the right medicine and what kind of situation will cause the message to be lost.

1. Three cases of RabbitMQ message loss

Before clarifying the situation of message loss, let's take a look at the process that a message goes through from generation to final consumption.
insert image description here

The figure above shows the entire process of sending a message on the official website. The message will go through the following processes:

  • Producer sends message to Exchange
  • Exchange routes to Queue according to Routing Key
  • Consumers subscribe to Queue and get data consumption from Queue

Through the above RabbitMQ model of sending messages, we can know that messages may be lost in the following processes:
insert image description here

The first: the producer lost the data. Lost when producer sends message to Exchange. For example, during the sending process, the sending failed due to network reasons, or because it was sent to an Exchange that does not exist.

The second type: routing failure. In this case, the message has been sent to Exchange, but Exchange fails to route the message to the corresponding Queue according to the Routing Key. For example, the Exchange is not bound to the Queue at all.

Type 3: The client fails while processing the message. The client has obtained the message, but an exception occurred during the processing of the message, and the exception was not handled, resulting in the loss of the message.

The above situations are caused by the failure of the message to be delivered to different modules, resulting in the loss of the message. If the above situations can be resolved, there is no guarantee that the message will not be lost. If the RabbitMQ service is down, if these messages are not persisted After the RabbitMQ service is restarted, these unpersisted messages will also be lost.

Analyzing so many situations may lead to message loss, which will be solved according to the corresponding analysis of various situations.

2. RabbitMQ message loss solution

insert image description here

2.1 For producers

Producer fails to send message to Exchange

We are well aware of the failure of sending messages to Exchange due to network reasons, and we only need to deal with sending exceptions. Excluding this reason, by default, the producer will not return any information to the producer when sending the message to Exchange, and it is impossible to know whether the message has really reached the server as the producer.

For this problem, there are two ways in RabbitMQ to solve the problem:

  • Realized by transaction mechanism
  • Realized by sender confirmation mechanism

2.1.1 Solution 1: Open RabbitMQ transaction

You can choose to use the transaction function provided by RabbitMQ, that is, the producer opens the RabbitMQ transaction channel.txSelect before sending data, and then sends a message. If the message is not successfully received by RabbitMQ, the producer will receive an exception error and can roll back at this time Transaction channel.txRollback, and then retry sending the message; if the message is received, then transaction channel.txCommit can be committed.

It is worth noting that the transaction in RabbitMQ is slightly different from the transaction in the database. The database needs to open the transaction every time, and finally there is a commit or rollback corresponding to it, while the transaction in the channel in RabbitMQ only needs to be opened once. Multiple commits or rollbacks.

An example of opening a transaction is as follows:

// 开启事务  
channel.txSelect();  
try {
    
      
   // 这里发送消息  
} catch (Exception e) {
    
      
   channel.txRollback(); 
// 这里再次重发这条消息
}
// 提交事务  
channel.txCommit(); 

It may not be intuitive to look at it this way. Let me simply write a piece of code that uses RabbitMQ, and then explain it to you.

//channel开启事务
channel.txSelect();
//发送3条消息
String msgTemplate = "测试事务消息内容[%d]";
channel.basicPublish("tx.exchange", "tx", new AMQP.BasicProperties(), String.format(msgTemplate,1).getBytes(StandardCharsets.UTF_8));
channel.basicPublish("tx.exchange", "tx", new AMQP.BasicProperties(), String.format(msgTemplate,2).getBytes(StandardCharsets.UTF_8));
channel.basicPublish("tx.exchange", "tx", new AMQP.BasicProperties(), String.format(msgTemplate,3).getBytes(StandardCharsets.UTF_8));
//消息回滚
channel.txRollback();
//成功提交
channel.basicPublish("tx.exchange", "tx", new AMQP.BasicProperties(), String.format(msgTemplate,4).getBytes(StandardCharsets.UTF_8));
channel.txCommit();

In the above method, a total of 4 messages are sent, and txRollback is finally called after the first three messages, which will cause the first three messages to be rolled back without being sent successfully. And call commit after the fourth send, and finally there will be only one message in RabbitMQ.

Although the transaction can guarantee that the message must be submitted to the server, it is simple enough in terms of client coding. But it is not so perfect, in terms of performance, transactions will have a greater performance impact. The RabbitMQ transaction mechanism is synchronous. After you submit a transaction, it will be blocked there. In this way, the throughput will basically drop, because it consumes too much performance.

2.1.2 Solution 2: Use the confirm mechanism

The biggest difference between the transaction mechanism and the confirm mechanism is that the transaction mechanism is synchronous, and you will block there after submitting a transaction, but the confirm mechanism is asynchronous.

The confirm mechanism is a solution to solve transaction performance problems. We can enable the confirm mode by using the channel.confirmSelect method. After the producer enables the confirm mode, each message written will be assigned a unique id, and then if the write In rabbitmq, rabbitmq will send you back an ack message, telling you that the message is sent OK;

If rabbitmq fails to process the message, it will call back a nack interface to tell you that the message failed, and you can try again. And you can combine this mechanism to know that you maintain the id of each message in memory. If you have not received the callback of this message after a certain period of time, then you can resend it.

Code sample:

  • producer
public static void main(String[] args) throws Exception{
    
    
        ConnectionFactory connectionFactory=new ConnectionFactory();
        connectionFactory.setHost("127.0.0.1");
        connectionFactory.setPort(5672);
        connectionFactory.setUsername("guest");
        connectionFactory.setPassword("guest");
        //设置虚拟主机
        connectionFactory.setVirtualHost("/");

        //创建一个链接
        Connection connection = connectionFactory.newConnection();

        //创建channel
        Channel channel = connection.createChannel();

        //消息的确认模式
        channel.confirmSelect();

        String exchangeName="test_confirm_exchange";
        String routeKey="confirm.test";
        String msg="RabbitMQ send message confirm test!";
        for (int i=0;i<5;i++){
    
    
            channel.basicPublish(exchangeName,routeKey,null,msg.getBytes());
        }
        //确定监听事件
        channel.addConfirmListener(new ConfirmListener() {
    
    

            /**
             *  消息成功发送
             * @param deliveryTag   消息唯一标签
             * @param multiple  是否批量
             * @throws IOException
             */
            @Override
            public void handleAck(long deliveryTag, boolean multiple) throws IOException {
    
    
                System.out.println("**********Ack*********");
            }

            /**
             *  消息没有成功发送
             * @param deliveryTag
             * @param multiple
             * @throws IOException
             */
            @Override
            public void handleNack(long deliveryTag, boolean multiple) throws IOException {
    
    
                System.out.println("**********No Ack*********");
            }

        });
    }

  • consumer
public static void main(String[] args) throws  Exception{
    
    
        System.out.println("======消息接收start==========");
        ConnectionFactory connectionFactory=new ConnectionFactory();
        connectionFactory.setHost("127.0.0.1");
        connectionFactory.setPort(5672);
        connectionFactory.setUsername("guest");
        connectionFactory.setPassword("guest");
        //设置虚拟主机
        connectionFactory.setVirtualHost("/");
        //创建链接
        Connection connection = connectionFactory.newConnection();

        //创建channel
        Channel channel = connection.createChannel();
        String exchangeName="test_confirm_exchange";
        String exchangeType="topic";
        //声明Exchange
        channel.exchangeDeclare(exchangeName,exchangeType,true,false,false,null);
        String queueName="test_confirm_queue";
        //声明队列
        channel.queueDeclare(queueName,true,false,false,null);
        String routeKey="confirm.#";
        //绑定队列和交换机
        channel.queueBind(queueName,exchangeName,routeKey);
            channel.basicConsume(queueName, true, new DefaultConsumer(channel) {
    
    

                @Override
                public void handleDelivery(String consumerTag, Envelope envelope, AMQP.BasicProperties properties, byte[] body) throws IOException {
    
    
                    System.out.println("接收到消息::"+new String(body));
                }
            });

    }

It should be noted that the confirm mechanism and transactions cannot coexist. Simply put, the confirmation cannot be used when the transaction is turned on, and the transaction cannot be used when the confirm is turned on.

2.2 Exchange routing to the queue failed

When the producer pushes the message to RabbitMQ, we can use the transaction or confirm mode to ensure that the message will not be lost. However, these two measures can only ensure that the message reaches the Exchange. If our message cannot reach the corresponding Queue according to the RoutingKey, then our message will be lost in the end.

For this case, RabbitMQ provides a mandatory parameter when sending a message. If mandatory is true, Exchange cannot find the corresponding Queue according to its own type and RoutingKey, it will not discard the message, but will return the message to the producer.

Code sample:

//创建Exchange
channel.exchangeDeclare("mandatory.exchange", BuiltinExchangeType.DIRECT, true, false, new HashMap<>());
//创建Queue
channel.queueDeclare("mandatory.queue", true, false, false, new HashMap<>());
//绑定路由
channel.queueBind("mandatory.queue", "mandatory.exchange", "mandatory");
channel.addReturnListener(new ReturnListener() {
    
    
    @Override
    public void handleReturn(int replyCode, String replyText, String exchange, String routingKey, AMQP.BasicProperties properties, byte[] body) throws IOException {
    
    
        log.error("replyCode = {},replyText ={},exchange={},routingKey={},body={}",replyCode,replyText,exchange,routingKey,new String(body));
    }
});
//设置mandatory = true
//void basicPublish(String exchange, String routingKey, boolean mandatory, BasicProperties props, byte[] body)
channel.basicPublish("mandatory.exchange", "mandatory-1",true, new AMQP.BasicProperties(), "测试mandatory的消息".getBytes(StandardCharsets.UTF_8));

When we call the BasicPublish method, we set mandatory to true, and also set a ReturnListener for the channel to listen for messages that fail to be routed to the queue.

2.3 Solution to the problem of message loss caused by RabbitMq's own problems

RabbitMQ itself mainly deals with three points:

  • To ensure that rabbitMQ does not lose messages, it is necessary to enable the persistence mechanism of rabbitMQ, that is, to persist messages to the hard disk, so that even if rabbitMQ hangs up and restarts, messages can still be read from the hard disk;

  • What to do if the rabbitMQ single point of failure, this situation will not cause message loss, here are three installation modes of rabbitMQ, stand-alone mode, common cluster mode, mirror cluster mode, here to ensure the high availability of rabbitMQ Cooperate with HAPROXY to do mirror cluster mode;

  • If the hard disk is broken, how to ensure that the message will not be lost.

2.3.1 Message Persistence

RabbitMQ messages are stored in the memory by default. If the settings are not specified, the messages will not be persisted to the hard disk. If the node restarts or accidentally crashes, the messages will be lost, so the messages must be persisted.

In RabbitMQ, we can ensure persistence by setting the value of durable to true. How to persist is described in detail below. To achieve message persistence, the following three conditions must be met, none of which is dispensable.

  • Exchange Settings Persistence

  • Queue setting persistence

  • Message persistent sending: send a message and set the sending mode deliveryMode=2, which represents a persistent message

2.3.2 Set the cluster mirroring mode

Let me first introduce the three deployment modes of RabbitMQ:

  • Single-node mode: In the simplest case, non-cluster mode, if the node hangs up, the message cannot be used. Businesses may be paralyzed and can only wait.

  • Normal mode: The message will only exist in the current node, and will not be synchronized to other nodes. If the current node goes down, the affected business will be paralyzed. You can only wait for the node to recover and restart to be available (when the message must be persisted).

  • Mirror mode: messages will be synchronized to other nodes, and the number of nodes to be synchronized can be set, but the throughput will decrease. HA solution belonging to RabbitMQ

Why set up a mirror mode cluster, because the content of the queue only exists on a certain node, not on all nodes, and all nodes only store message structure and metadata.
insert image description here

If you want to solve the above problems on the way and ensure that messages are not lost, you need to use HA mirror mode queues.

The following three HA policy modes are introduced:

  • sync to all

  • Synchronize up to N machines

  • Only sync to nodes matching the specified name

But: HA mirror queue has a big disadvantage that the throughput of the system will decrease.

2.3.3 Message Compensation Mechanism

The system is in a complex environment. Although the above three solutions can basically guarantee the high availability of messages without loss, they still encounter the problem of message loss, such as: persistent messages are saved to the hard disk. The current queue node is down, and the hard disk of the storage node is broken. In this case, the message will still be lost.

In order to avoid the above problem, we can let the production end store business data and message data in the database first. If the message data fails to be stored in the same transaction, the whole rollback will be performed.
insert image description here

Then, according to the message status in the message table, if it fails, we will take message compensation measures and resend the message for processing.
insert image description here

2.3 For consumers

The consumer failed to process the message after obtaining the message

Through the above method, we ensure that the message from the producer to RabbitMQ will not be lost, and now it is time for the consumer to consume the message.

When the consumer is processing the business, the message may not be processed normally due to the exception of our business code, but the message has been removed from the queue in RabbitMQ, so our message is lost.

I can also avoid this situation through the ACK confirmation mechanism

When the producer sends a message to RabbitMQ, we can use ack to confirm whether the message has reached the server. Similarly, the consumer also provides a manual ack mode when consuming the message. By default, the consumer will automatically ack after getting the message from the queue. We can use manual ack to ensure that the consumer actively controls the ack behavior, so that we can avoid the loss of messages caused by business abnormalities.

DeliverCallback deliverCallback = new DeliverCallback() {
    
    
    @Override
    public void handle(String consumerTag, Delivery message) throws IOException {
    
    
        try {
    
    
            byte[] body = message.getBody();
            String messageContent = new String(body, StandardCharsets.UTF_8);
            if("error".equals(messageContent)){
    
    
                throw new RuntimeException("业务异常");
            }
            log.info("收到的消息内容:{}",messageContent);
            channel.basicAck(message.getEnvelope().getDeliveryTag(),false);
        }catch (Exception e){
    
    
            log.info("消费消息失败!重回队列!");
            channel.basicNack(message.getEnvelope().getDeliveryTag(),false,true);
        }
    }
};
CancelCallback cancelCallback = new CancelCallback() {
    
    
    @Override
    public void handle(String consumerTag) throws IOException {
    
    
        log.info("取消订阅:{}",consumerTag);
    }
};
channel.basicConsume("confirm.queue",false,deliverCallback,cancelCallback);

3. Summary

By analyzing the whole process of the message from the producer sending the message to the consumer consuming the message, we have come up with several scenarios where the message may be lost, and given the corresponding solution. If it is necessary to ensure that the message is not lost in the entire link , then the production side, MQ itself and the consumer side need to jointly guarantee it.

Production side: Mark the status of the produced messages, enable the confirm mechanism, update the message status according to the mq response, use the scheduled task to re-deliver the timed-out message, and send an alarm for multiple delivery failures.

mq itself: enable persistence, and perform ack after placing the disk. If it is a mirror deployment mode, it is necessary to perform ack after synchronizing to multiple copies.

Consumer side: Enable the manual ack mode, perform ack after the business processing is completed, and need to ensure idempotency.

The whole process is shown in the figure below:
insert image description here

Through the above processing, there is no message loss in theory, but the throughput and performance of the system are reduced. In actual development, it is necessary to consider the impact of message loss to make a trade-off between reliability and performance.

Guess you like

Origin blog.csdn.net/zhiyikeji/article/details/130190175