How to ensure the reliability of the transmission of messages? (How to deal with the problem of lost messages)

Copyright: ~ reproduced please marked, thank you - if infringement please private letter to me, I will immediately delete ~ https://blog.csdn.net/baidu_26954625/article/details/90647771

本系列内容转载自git项目advancejava

Face questions analysis

Data loss problem that may arise in the producer, MQ, consumers, and Kafka are we from RabbitMQ to analyze it.

RabbitMQ

Here Insert Picture Description

Producers lost data

Producers will send data to RabbitMQwhen the data is likely to get lost halfway, because of what network problems are likely.
At this point you can choose to use RabbitMQthe transaction functions provided , is to open before sending data producers RabbitMQtransaction channel.txSelect, then send the message, if the message is not successfully RabbitMQreceived, then the producer will receive an exception error, then you can roll back the transaction channel.txRollbackand try try to send a message; if you receive a message, you can commit the transaction channel.txCommit.

// 开启事务
channel.txSelect
try {
    // 这里发送消息
} catch (Exception e) {
    channel.txRollback
// 这里再次重发这条消息
}
// 提交事务
channel.txCommit

But the problem is that RabbitMQthe transaction mechanisms (synchronous) in a practice, basically throughput will be down, because the consumption of too much performance.
So, in general, if you want to make sure to speak and write RabbitMQmessages do not lose, you can open confirmmode, the producers set to open confirmlater mode, every time you write a message will be assigned a unique id, then if written RabbitMQin, RabbitMQwill you ack to return a message telling you the news say ok. If you RabbitMQdid not deal with this message, the callback you a nack interfaces, to tell you the message reception fails, you can try again. And you can combine this mechanism to maintain their own state id for each message in memory, if more than a certain time has not received the news of callbacks, then you can re-issued.
Transaction mechanism and the confirmbiggest difference is the mechanism that transaction mechanism are synchronized, then you commit a transaction will be blocked there, but confirm the mechanism is asynchronous, after you send a message you can send the next message, then that message RabbitMQ received after the asynchronous callback will notify you of an interface you this message has been received.
This is generally the producer to avoid data loss, confirm the mechanism are used.

RabbitMQ lost data

Is RabbitMQhimself lost data, you must turn on the RabbitMQpersistence, after the message is written to be persisted to disk, even RabbitMQhimself hung up the data previously stored will automatically read after recovery, general data will not be lost. Unless it is extremely rare, RabbitMQyet persistent, and that they hung up, it could result in a small amount of data loss, but the probability is small.
There are two steps to set persistence:
Create a queue when it is set to persistent
so as to ensure RabbitMQ persistent metadata queue, but it is not persistent queue in the data.
The second is the time to send a message to deliveryMode message is set 2
is set to the message of persistence, then RabbitMQwill the message go persisted to disk.
Setting both must persist for the job, RabbitMQeven if it is hung up, reboot again will restart recovery queue from disk, restore the data in the queue.
Note that even if you gave RabbitMQopened the persistence mechanism, there is also a possibility that the news wrote RabbitMQin, but not enough time persisted to disk, the results unfortunately, this time RabbitMQhung up, it will lead to a memory little data loss.
Therefore, the persistence mechanism can cooperate with producers confirm the up side, the message is only after the disk, will inform the producer ack persistence, even before it is persisted to disk, RabbitMQhung up, lost data, producers can not receive ack, you also can own retransmission.

Consumer side lost data

RabbitMQIf you have lost data, mainly because when you consume, just to consume, not processed and the results linked to the process, such as restart, then embarrassed, and RabbitMQthink you are a consumer, this data is lost.
This time starting RabbitMQack mechanism provides, in simple terms, is that you must turn off RabbitMQthe automatic ack, can be called via an api on the line, then make sure your code every time when processed, and then a ack in the program. In this case, if you have not been processed, not on the no ack? That RabbitMQthink you have not been processed, this time RabbitMQwill be assigned to the consumer to deal with other consumer, the message is not lost.

Here Insert Picture Description

Kafka

Consumer side lost data

The only conditions that may cause consumers to lose data, that is, you consume to this message, and then automatically submit the consumer side offset, let Kafka thought you were a consumer good news, but in fact you are just ready to process the message, you have not treated yourself hung up, then this message is thrown slightly.
This is not with the RabbitMQalmost right, we all know that Kafka will automatically submit offset, you can turn off auto-commit offset, offset to manually submit after processing, can ensure that the data will not be lost. But this time may indeed still have a repeat consumption, such as you have just processed, not submitting offset, the result himself hung up, then consumption will certainly be repeated once promised myself idempotency just fine.
One problem encountered in a production environment, that is to say after our Kafka consumer spending data is written to the queue in a first buffer memory, the results sometimes, you just had a message written to the memory queue, then consumers will automatically submit offset. Then we restart the system at this time, it will cause data in memory queue not had time to deal with the lost.

Kafka lost data

A piece of the more common scenario is that Kafka a broker goes down, then the re-election of the leader partition. Think about it, after this time if the other follower just some data is not synchronized, the result this time leader hung up, and then the election of a follower into a leader, not less some data? It lost some data ah.
Production environments encountered, we also, before Kafka's leader machine goes down, then the follower is switched to the leader, you will find this to say on the lost data.
At this time, it is usually the minimum requirement set the following four parameters:
• Set replication.factor parameters to topic: This value must be greater than one, requires that each partition must have at least two copies.
• Set Kafka server min.insync.replicas parameters: This value must be greater than 1, this requires a leader is at least perceived to have at least one follower was kind enough to keep in touch with yourself, not left behind, so as to ensure that the leader and a follower hung up it.
• Set acks the producer end = all: this is to ask each of the data must be written after all the replica, believed to be written in order to succeed.
• Set up (a lot of very big value, meaning unlimited retries) retries = MAX at the producer end: this is required once the write fails, it is infinitely retry card here.
Our production environment is configured according to the above requirements, then this configuration, at least at the end of Kafka broker can guarantee when the leader where the broker fails, a leader switches, data is not lost.

Producers will not lose data?

If you set up in accordance with the above ideas acks = all, must not be lost, the requirement is that your leader a message is received, all of the follower are synchronized to the news only after considers that the write succeeded. If you do not meet this condition, the producer will automatically continue to retry, retry unlimited.

Guess you like

Origin blog.csdn.net/baidu_26954625/article/details/90647771