RabbitMQ message reliability analysis

Introduction

A lot of people have asked me questions like: **How ​​does RabbitMQ ensure reliable messages? **Many times, the author's answer is: How can a long story come to a long story. Indeed, ensuring that the message is reliable is not just a few sentences that can be explained, including Kafka. Reliability is not an absolute concept. Someone once said in a message that a similar loss of all disks would lead to message loss. The author joked that the bombing of the computer room would also lead to message loss. Reliability is a relative concept, the reliability of how many 9s the system can ensure within the scope of reasonable conditions. Everything tends to be as perfect as possible without being able to achieve perfection. We can ensure that RabbitMQ messages are as reliable as possible. Before discussing RabbitMQ's message reliability in detail, let's review the path of messages in RabbitMQ.

As shown in the figure, from the AMQP protocol level:

The message first starts from the producer Producer to the exchange Exchange; the exchange exchange forwards the message to the corresponding queue Queue according to the routing rules; the message is stored in the queue Queue; the consumer Consumer subscribes to the queue Queue and consumes it. Our analysis of message reliability also discusses these four stages one by one.

Phase 1

The message is sent from the producer to the exchange, and various situations can occur during this process. After the producer client sends it out, network packet loss, network failure, etc. may occur, resulting in message loss. In general, if no action is taken, the producer cannot sense whether the message has been sent to the exchange without error. If the message fails during transmission to Exchange and the producer can perceive it, the producer can perform further processing actions, such as re-delivering the relevant message to ensure the reliability of the message.

For this reason, the AMQP protocol provided a transaction mechanism considering this situation at the beginning of its establishment. There are three methods related to the transaction mechanism in the RabbitMQ client:

  • channel.txSelect、
  • channel.txCommit
  • channel.txRollback。

channel.txSelect is used to set the current channel to transaction mode, channel.txCommit is used to commit the transaction, and channel.txRollback is used to roll back the transaction. After opening the transaction through the channel.txSelect method, we can publish the message to RabbitMQ. If the transaction is submitted successfully, the message must arrive in RabbitMQ. If RabbitMQ crashes or throws an exception for other reasons before the transaction is submitted and executed, this At that time, we can capture it, and then implement the transaction rollback by executing the channel.txRollback method. Note that the transaction mechanism in RabbitMQ here is not the same as the transaction concept in most databases, and needs to be distinguished.

Transactions can indeed solve the problem of message confirmation between the message sender and RabbitMQ. Only when the message is successfully received by RabbitMQ can the transaction be submitted successfully. Otherwise, we can roll back the transaction after catching the exception, and at the same time, we can retransmit the message. But using the transaction mechanism will "suck up" the performance of RabbitMQ, so is there a better way to ensure that the sender of the message confirms that the message has been delivered correctly, without basically causing performance losses? From the AMQP protocol level, there is no better way, but RabbitMQ provides an improved solution, the sender confirm mechanism (publisher confirm).

The producer sets the channel to confirm mode. Once the channel enters confirm mode, all messages published on the channel will be assigned a unique ID (starting from 1). Once the message is delivered to all matching queues, RabbitMQ will send an acknowledgment (Basic.Ack) to the producer (containing the unique ID of the message), which will let the producer know that the message has arrived at the destination correctly. The deliveryTag in the confirmation message returned by RabbitMQ to the producer contains the sequence number of the confirmation message. In addition, RabbitMQ can also set the multiple parameter in the channel.basicAck method, indicating that all messages before this sequence number have been processed.

The transaction mechanism blocks the sender after a message is sent, waiting for a response from RabbitMQ before continuing to send the next message. In contrast, the biggest advantage of the sender confirmation mechanism is that it is asynchronous. Once a message is published, the producer application can continue to send the next message while waiting for the channel to return confirmation. When the message is finally confirmed, the production The producer application can process the confirmation message through the callback method. If RabbitMQ loses the message due to its own internal error, it will send a nack (Basic.Nack) command, and the producer application can also process the nack command in the callback method.

The producer sets the channel to confirm mode by calling the channel.confirmSelect method (ie, the Confirm.Select command), and then RabbitMQ returns the Confirm.Select-Ok command to indicate that the producer agrees to set the current channel to confirm mode. All subsequent messages sent are acked or nacked once, and there is no situation where a message is both acked and nacked. And RabbitMQ does not make any guarantees about the speed of the message being confirmed.

The transaction mechanism and the publisher confirm mechanism are mutually exclusive and cannot coexist. RabbitMQ will report an error if an attempt is made to set a channel that is already in transaction mode to publisher confirm mode again: {amqp_error, precondition_failed, "cannot switch from tx to confirm mode", 'confirm.select'}, or if an attempt is made to put publisher confirm already enabled If the channel of the mode is set to transaction mode, RabbitMQ will also report an error: {amqp_error, precondition_failed, "cannot switch from confirm to tx mode", 'tx.select' }.

The transaction mechanism and the publisher confirm mechanism ensure that the message can be sent to RabbitMQ correctly. The meaning of "send to RabbitMQ" here means that the message is sent to the RabbitMQ exchange correctly. If the exchange does not have a matching queue, Then the message will also be lost. So when using these two mechanisms, make sure that the exchanges involved can have matching queues. Furthermore, the sender should cooperate with the mandatory parameter or the backup switch to improve the reliability of message transmission.

Phase 2

Mandatory and immediate are two parameters in the channel.basicPublish method, they both have the function of returning a message to the producer when the destination is not reachable during message delivery. The alternate exchange (Alternate Exchange) provided by RabbitMQ can store messages that cannot be routed by the exchange (no binding queue or no matching binding) without returning it to the client. RabbitMQ version 3.0 has removed the support for immediate parameters. The official explanation of RabbitMQ is that the immediate parameters will affect the performance of the mirror queue and increase the complexity of the code. It is recommended to use the TTL and DLX methods instead. So this article only briefly introduces the mandatory and backup switches. When the mandatory parameter is set to true, if the exchange cannot find a qualified queue according to its own type and routing key, then RabbitMQ will call the Basic.Return command to return the message to the producer. When the mandatory parameter is set to false, if the above situation occurs, the message is directly discarded. So how do producers get messages that aren't being routed correctly to the appropriate queue? At this time, you can add the ReturnListener listener implementation by calling channel.addReturnListener. The key code for using the mandatory parameter is as follows:

channel.basicPublish(EXCHANGE_NAME, "", true, MessageProperties.PERSISTENT_TEXT_PLAIN, "mandatory test".getBytes());
channel.addReturnListener(new ReturnListener() {
    public void handleReturn(int replyCode, String replyText, String exchange, String routingKey, AMQP
            .BasicProperties basicProperties, byte[] body) throws IOException {
        String message = new String(body);
        System.out.println("Basic.Return返回的结果是:" + message);
    }
});

In the above code, the producer did not successfully route the message to the queue. At this time, RabbitMQ will return the message "mandatory test" through Basic.Return. After that, the producer client listens to this event through ReturnListener. The final output of the above code should be "The result returned by Basic.Return is: mandatory test".

The producer can re-deliver the message returned in the ReturnListener or other schemes to improve the reliability of the message. The backup switch, the English name Alternate Exchange, referred to as AE, or more bluntly can be called "spare tire switch". If the producer does not set the mandatory parameter when sending a message, the message will be lost if it is not routed. If the mandatory parameter is set, the programming logic of ReturnListener needs to be added, and the code of the producer will become complicated. If you don't want to complicate the producer's programming logic and don't want messages to be lost, you can use a backup exchange, which can store unrouted messages in RabbitMQ and process them when needed. It can be implemented by adding the alternate-exchange parameter when declaring the exchange (calling the channel.exchangeDeclare method), or it can be implemented by strategy. If both are used at the same time, the former has a higher priority and will override the Policy settings.

Referring to the figure below, if we send a message to normalExchange at this time, when the routing key is equal to "normalKey", the message can be correctly routed to the normalQueue queue. If the routing key is set to another value, such as "errorKey", that is, the message cannot be correctly routed to any queue bound to normalExchange, then it will be sent to myAe, and then sent to the unroutedQueue queue.

In fact, the backup switch is not much different from the ordinary switch. For the convenience of use, it is recommended to set it to the fanout type. If the reader wants to set it to the direct or topic type, there is nothing wrong with it. Note that the routing key when the message is resent to the backup exchange is the same as the routing key sent from the producer. The essence of the backup switch is a "spare tire" of the original switch. All messages that cannot be routed correctly are sent to this backup switch. You can set the same AE for all switches, but you need to ensure in advance that AE has correctly bound the queue, and the best type is also fanout. If the backup switch is used with the mandatory parameter, the mandatory parameter has no effect.

Phase 3

Mandatory or AE can make the message get a great reliability guarantee before routing to the queue, but how to guarantee the reliability of the message after it is stored in the queue?

The first is persistence. Persistence can improve the reliability of the queue to prevent data loss under abnormal conditions (restart, shutdown, downtime, etc.). The persistence of the queue is achieved by setting the durable parameter to true when declaring the queue. If the queue is not persistent, the metadata of the related queue will be lost after the RabbitMQ service is restarted, and the data will also be lost at this time. As the saying goes, "if the skin doesn't exist, the hair will follow." If the queue is gone, where can the message exist? The persistence of the queue can ensure that its own metadata will not be lost due to abnormal conditions, but it does not guarantee that the messages stored inside will not be lost. To ensure that messages are not lost, it needs to be made persistent. Message persistence can be achieved by setting the delivery mode of the message (deliveryMode property in BasicProperties) to 2.

The persistence of queues and messages is set. When the RabbitMQ service is restarted, the messages still exist. If you only set the persistence of the queue, the message will be lost after restart; if you only set the persistence of the message, the queue will disappear after the restart, and the message will also be lost. It is pointless to set message persistence without setting queue persistence.

After the persistent message is correctly stored in RabbitMQ, it will take some time (although it is very short, but it cannot be ignored) before it can be stored in the disk. RabbitMQ does not perform synchronous storage (calling the kernel's fsync6 method) for each message, and may only be stored in the operating system cache instead of the physical disk. If the RabbitMQ service node is down, restarted and other abnormal situations during this period, and the messages have not been saved in time, these messages will be lost.

If the transaction mechanism or publisher confirm mechanism is used in Phase1, the server's return is executed after the message is placed on the disk, which can further improve the reliability of the message. But even so, the loss of messages caused by single machine failure and irreparable (such as disk damage) cannot be avoided. Here, mirror queues need to be introduced. The mirror queue is equivalent to configuring a copy, and most distributed things have the concept of multiple copies to ensure HA. In the mirror queue, if the master node (master) dies within this special time, it can automatically switch to the slave node (slave), which effectively ensures high availability unless the entire cluster dies. Although this cannot completely guarantee that RabbitMQ messages will not be lost (for example, the computer room is bombed...), the reliability of configuring a mirrored queue is much higher than that of not configuring a mirrored queue. In the actual production environment, the key business queues are generally Set up a mirror queue.

Phase 4

Further, from the consumer's point of view, if the consumer fails to process the relevant message after receiving the relevant message, it is also considered data loss.

In order to ensure that messages from the queue reach consumers reliably, RabbitMQ provides a message acknowledgement mechanism. Consumers can specify the autoAck parameter when subscribing to the queue. When autoAck is equal to false, RabbitMQ will wait for the consumer to explicitly reply to the confirmation signal before removing the message from the memory (or disk) (essentially marking it for deletion first, delete it later). When autoAck is equal to true, RabbitMQ will automatically set the sent message as acknowledgment, and then delete it from memory (or disk), regardless of whether the consumer actually consumes these messages.

After the message confirmation mechanism is adopted, as long as the autoAck parameter is set to false, the consumer has enough time to process the message (task), and there is no need to worry about the loss of the message after the consumer process hangs up in the process of processing the message, because RabbitMQ will always wait for the hold Messages until the consumer explicitly calls the Basic.Ack command.

When the autoAck parameter is set to false, for the RabbitMQ server, the messages in the queue are divided into two parts: one is the message waiting to be delivered to the consumer; the other is the message that has been delivered to the consumer, but has not yet received confirmation from the consumer Signal message. If RabbitMQ has not received the confirmation signal from the consumer, and the consumer who consumes the message has been disconnected, RabbitMQ will arrange for the message to re-enter the queue and wait for delivery to the next consumer. Of course, it may still be the original consumer. By.

RabbitMQ does not set an expiration time for unacknowledged messages. The only basis for judging whether the message needs to be re-delivered to consumers is whether the consumer connection that consumes the message has been disconnected. The reason for this design is that RabbitMQ allows consumers to consume a message. Messages can take a long time.

If the message consumption fails, you can also call Basic.Reject or Basic.Nack to reject the current message instead of acknowledging it. If it is simply rejected, the message will be lost, and the corresponding requeue parameter needs to be set to true, then RabbitMQ will reset this message Messages are queued so that they can be sent to the next subscribed consumer. If the requeue parameter is set to false, RabbitMQ will immediately remove the message from the queue without sending it to a new consumer.

There is another situation to consider: the message of the requeue is stored in the head of the queue, that is, it can be quickly sent to the consumer. If the consumer cannot consume correctly and requeue at this time, it will enter an endless loop. middle. In this case, the author's suggestion is not to use requeue to ensure the reliability of the message when there is a message that cannot be consumed correctly, but to re-deliver it to a new queue, such as the set dead letter queue, so as to ensure the reliability of the message. Avoid the aforementioned infinite loop and ensure that the corresponding message is not lost. For the messages in the dead letter queue, consumption analysis can be used in another way to find out the root of the problem.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324080055&siteId=291194637