Message loss how to do? RabbitMQ analysis and Kafka message loss causes and solutions

If you are using MQ to pass the very core of the message, for example, some message billing, chargeback, we must ensure that the MQ transfer process will not lose the message to the billing .

First, to be clear: missing data problem may occur in the producer, MQ, consumers , and Kafka are we from RabbitMQ to analyze it.

1、RabbitMQ

1.1 Producers lost data

RabbitMQ producer to send data to when the data is likely to get lost halfway, because of what network problems are likely.

At this point you can choose to provide transactional capabilities with RabbitMQ, is the producer before sending the data to open RabbitMQ transaction channel.txSelect, then send the message, if the message is not received RabbitMQ to succeed, then the producer will receive an exception error, then you can roll back the transaction channel.txRollbackand try to send a message; if you receive a message, you can commit the transaction channel.txCommit.

// 开启事务
channel.txSelect
try {
    // 这里发送消息
} catch (Exception e) {
    channel.txRollback

    // 这里再次重发这条消息
}

// 提交事务
channel.txCommit

But the problem is, RabbitMQ transaction mechanism (synchronous) in a practice, basically throughput will be down, because the consumption of too much performance .

So, in general, if you want to make sure to speak and write RabbitMQ messaging do not lose, you can open confirm mode.

In the producers set to open confirm later mode, every time you write a message will be assigned a unique id, then if written in RabbitMQ, RabbitMQ will give you a return ack message that tells you that the news ok. If RabbitMQ could not process the message, you will be a callback interface to receive a message telling you this fails, you can try again. And you can combine this mechanism to maintain their own state id for each message in memory, if more than a certain time has not received the news of callbacks, then you can re-issued. nack

Transaction mechanism and confirm the biggest mechanism different from that:

  • Transaction mechanism are synchronized , and after you submit a transaction is blocked in there;
  • confirmMechanism is asynchronous later, after you send a message you can send the next message, then the message will be received RabbitMQ asynchronous callback interface to inform you that one of your received the news.

This is generally the producer to avoid loss of data , are used confirmmechanisms.

1.2 RabbitMQ lost data

If RabbitMQ own lost data, and that you have to turn persistence of RabbitMQ , after the message is written to be persisted to disk, even if it is hung up RabbitMQ own, previously stored data will be automatically read after recovery , the data is not generally It will lose. Unless it is extremely rare, RabbitMQ not persistent, and that they hung up, it could result in a small amount of data loss , but the probability is small.

Setting persistence has two steps :

  • Create a queue when it is set to persist;

    This ensures that metadata RabbitMQ persistent queue, but it is not persistent queue in the data.
  • The second message is sent when the message is deliveryModeset to 2;

    is set to the message persistent, in which case the message will RabbitMQ persisted up to disk.

Both must be set to persist for the job at the same time , RabbitMQ even if it is hung up, reboot again will restart recovery queue from disk, restore the data in the queue.

note:

Even if you give RabbitMQ open the persistence mechanism, there is also a possibility - that the news wrote in RabbitMQ, but not enough time persisted to disk, the results unfortunately, at this time RabbitMQ hung up, it will lead to memory a little bit of data loss.

So, persistence can tell there's producers confirm with up mechanism, messages are only after the disk, will inform the producer persisted ack , so even before the persisted to disk, RabbitMQ hung up, lost data, production who do not receive ack, you too can own retransmission.

1.3 consumer side lost data

RabbitMQ If you have lost data, mainly because when you consume, just to consume, not processed and the results process hung up. For example, restart, and then embarrassing, RabbitMQ think you are a consumer, this data is lost.

This time starting RabbitMQ provides a ackmechanism, in simple terms, is that you must turn off the automatic RabbitMQ ackcan be called via an api on the line, then each time your own code to ensure that when processed, and then in the program acka. In this case, if you have not been processed, it is not no ackthe? That RabbitMQ think you have not been processed, this time the consumer will RabbitMQ assigned to another consumer to deal with, the message is not lost.

2、Kafka

2.1 consumer side lost data

The only conditions that may cause consumers to lose data, that is, to the news you consume, then the consumer side automatically submitted offset , let Kafka thought you were a consumer good news, but in fact you are just ready to process the message, you also did not handle, you hung up, then this message is thrown slightly.

It's not almost like RabbitMQ it, we all know that Kafka will be automatically submitted to offset, as long turn off the automatic submission offset, offset to manually submit after processing, can ensure that the data will not be lost. But this time, or indeed may be a repeat consumption , such as you have just processed, not submitting offset, the result himself hung up, then consumption will certainly be repeated once promised myself idempotency just fine.

One problem encountered in a production environment, that is to say after our Kafka consumer spending data is written to the queue in a first buffer memory, the results sometimes, you just had a message written to the memory queue, then consumers will automatically submit offset. Then we restart the system at this time, it will cause data in memory queue not had time to deal with the lost.

2.2 Kafka lost data

A piece of the more common scenario is that Kafka a broker goes down, then the re-election of the leader partition. Think about it, after this time if the other follower just some data is not synchronized, the result this time leader hung up, and then the election of a follower into a leader, not less some data? It lost some data ah.

Production environments encountered, we also, before Kafka's leader machine goes down, then the follower is switched to the leader, you will find this to say on the lost data.

At this time, it is usually the minimum requirement set the following four parameters:

  • Set to topic parameters: This value must be greater than one, it requires that each partition must have at least two copies. replication.factor
  • Set in Kafka server parameters: This value must be greater than 1, this requires a leader is at least perceived to have at least one follower was kind enough to keep in touch with yourself, not left behind, so as to ensure that the leader and a follower hang of it. min.insync.replicas
  • At the producer end of the set : this is to ask each of the data must be after writing all replica, believed to be written in order to succeed . acks=all
  • Provided at an end producer (a lot of very big value, meaning infinite retries): This is the required write once failed, would infinitely retry , the card here. retries=MAX

Our production environment is configured according to the above requirements, then this configuration, at least at the end of Kafka broker can guarantee when the leader where the broker fails, a leader switches, data is not lost.

2.3 Producers will not lose data?

If in accordance with the above idea , it will not be lost; acks=all

Requirements: Your leader a message is received, all of the follower are synchronized to the news only after we consider that the write succeeded.

If you do not meet this condition, the producer will automatically continue to retry, retry unlimited.

 

"How to solve several million messages a message queue backlog continued for several hours? " : Https://blog.csdn.net/weixin_44259720/article/details/104845731


Respect the original, this original address: https://jsbintask.cn/2019/01/28/interview/interview-middleware-reliable/

 

Shaoxia Please stay ...ヾ(◍ ° ∇ ° ◍) Techno゙... 
welcome thumbs up, comment, plus interest, to allow more people to see learn to earn
more exciting, please pay attention to my "Today's headlines No ": Java cloud notes

Published 171 original articles · won praise 311 · Views 100,000 +

Guess you like

Origin blog.csdn.net/weixin_44259720/article/details/104844231