Java Spring Recruitment Interview Question Answer Series: Where is the MQ data? How to ensure the reliable transmission of messages?

1 Interview questions

How to ensure the reliable transmission of messages (how to deal with the problem of message loss)?

2 Test site analysis

This is affirmative. There is a basic principle of using mq, that is, there can be no more data, no less one, no more, which is the problem of repeated consumption and idempotence just mentioned. It cannot be less, which means that this data should not be lost. Then you have to think about this issue.

If you use mq to deliver very core messages, such as billing and deductions, because I previously designed and developed a company’s very core advertising platform, billing system, billing system is very heavy For a business, the operation is very time-consuming. Therefore, in the overall architecture of the advertising system, the billing is actually made asynchronous, and then an MQ is added in the middle.

At that time, we spent a lot of effort to ensure that the billing message was never lost during the MQ delivery process. The advertiser puts an advertisement, and it is clear that the user will deduct 1 yuan per click. As a result, if the user clicks once at all times, and the news made during the deduction is lost, our company will continue to save a few dollars, a few dollars, and accumulate more. This is a great loss to the company.

3 Detailed

This lost data, mq is generally divided into two types, either mq loses itself, or we lose when we consume it.

Let's analyze it separately from rabbitmq and kafka

Rabbitmq, this kind of mq, generally carries the company’s core business, and the data must not be lost.

3.1 rabbitmq

3.1.1 Producers lose data

When the producer sends the data to rabbitmq, the data may be lost halfway, because of network problems, it is possible.

At this time, you can choose to use the transaction function provided by rabbitmq, which is to start the rabbitmq transaction (channel.txSelect) before the producer sends the data, and then send the message

  • If the message is not successfully received by rabbitmq, then the producer will receive an exception error, at this time, you can roll back the transaction (channel.txRollback), and then try to send the message again;
  • If you receive a message, you can commit the transaction (channel.txCommit)

But the problem is that once the rabbitmq transaction mechanism is implemented, the throughput will basically go down because it consumes too much performance.

So in general, if you want to make sure that you don’t lose the message writing rabbitmq, you can turn on the confirm mode. After the confirm mode is set on the producer, each message you write will be assigned a unique id, and then if it is written In rabbitmq, rabbitmq will send you an ack message back to tell you that the message is ok. If rabbitmq fails to process the message, it will call back a nack interface to tell you that the message has failed to be received, and you can try again. And you can use this mechanism to maintain the status of each message id in the memory yourself. If you haven't received the callback of this message for a certain period of time, you can resend it.

The biggest difference between the transaction mechanism and the cnofirm mechanism is that the transaction mechanism is synchronous. After you submit a transaction, it will block there, but the confirm mechanism is asynchronous. After you send a message, you can send the next message, and then rabbitmq receives that message. After that, an interface will be called back asynchronously to notify you that the message has been received.

Therefore, the producer generally uses the confirm mechanism to avoid data loss.

3.1.2 rabbitmq loses data

It means that rabbitmq loses the data. You must enable the persistence of rabbitmq. After the message is written, it will be persisted to the disk. Even if rabbitmq dies, it will automatically read the previously stored data after recovery. Generally, the data will not be read. throw. Unless it is extremely rare that rabbitmq has not persisted yet, it will hang by itself, which may cause a small amount of data to be lost, but this probability is small.

There are two steps to set up persistence. The first is to set it to be persistent when creating a queue, so that it can ensure that rabbitmq persists the metadata of the queue, but does not persist the data in the queue; the second is When sending a message, set the deliveryMode of the message to 2, which is to set the message to be persistent. At this time, rabbitmq will persist the message to the disk. These two persistences must be set at the same time. Even if rabbitmq is hung up, it will restart from the disk to restore the queue and restore the data in this queue.

And persistence can be combined with the confirm mechanism on the producer side. Only after the message is persisted to the disk, the producer will be notified of the ack, so even before the persistence to the disk, rabbitmq hangs and the data is lost. If the producer cannot receive the ack, you can also resend it yourself.

Even if you turn on the persistence mechanism for rabbitmq, there is a possibility that the message is written to rabbitmq, but it has not had time to persist to the disk. The result is unfortunate. At this time, rabbitmq hangs, which will cause the memory A little bit of data will be lost.

3.1.3 The consumer loses data

If rabbitmq loses data, it is mainly because when you consume it, it has just been consumed and has not been processed. As a result, the process hangs, such as restarting, then it will be embarrassing. Rabbitmq thinks that you have consumed all the data, and the data is lost.

At this time, you have to use the ack mechanism provided by rabbitmq. Simply put, you can turn off rabbitmq's automatic ack, you can call it through an api, and then every time you make sure that the processing in your own code is completed, then ack in the program.

In this case, if you haven't finished processing, won't there be no ack? Then rabbitmq thinks that you haven't finished processing it yet. At this time, rabbitmq will allocate the consumption to other consumers for processing, and the message will not be lost.

3.2 Kafka

3.2.1 Data loss on the consumer side

The only situation that may cause consumers to lose data is that you consume the message, and then the consumer automatically submits the offset, making Kafka think that you have consumed the message. In fact, you are just about to process the message, and you have not Deal with it, you hang up by yourself, and the message will be lost at this time.

Isn't this the same? Everyone knows that Kafka will automatically submit the offset, so as long as you turn off the automatic submission of the offset, and manually submit the offset after processing, you can ensure that the data will not be lost. But at this time there will indeed be repeated consumption. For example, you have just finished processing and haven't submitted the offset, and you hang up by yourself. At this time, you will definitely consume it again, just guarantee the idempotence yourself.

A problem encountered in the production environment is that after our Kafka consumer consumes the data, it writes it to a memory queue and buffers it first. As a result, sometimes, you just write the message to the memory queue, and then the consumer will automatically Submit the offset.

Then we restarted the system at this time, which will cause the data in the memory queue to be lost before it can be processed.

3.2.2 Kafka loses data

A common scenario in this area is when a broker in Kafka goes down and then re-elects the leader of Partiton. Think about it, if other followers happen to have some data out of sync at this time, and as a result, the leader hangs up at this time, and then after a certain follower is elected as the leader, will he lose some data? Some data is lost.

We have also encountered it in the production environment, and so did we. The leader machine of Kafka was down before. After switching the follower to the leader, we will find that this data is lost.

Therefore, it is generally required to set at least the following 4 parameters at this time:

Set the replication.factor parameter for this topic: this value must be greater than 1, and each partition must have at least 2 copies

Set the min.insync.replicas parameter on the kafka server: This value must be greater than 1. This is to require a leader to at least sense that there is at least one follower and keep in touch with itself, so as to ensure that the leader is down and there is a follower.

Set acks=all on the producer side: This requires each piece of data, and it must be written to all replicas before it can be considered as successful.

Set retries=MAX on the producer side (a large, large, large value, meaning unlimited retries): This is to require infinite retry once the write fails, and I am stuck here

Our production environment is configured in accordance with the above requirements. After this configuration, at least on the Kafka broker side, it can be guaranteed that the broker where the leader is located will fail, and the data will not be lost when the leader is switched.

3.2.3 Will the producer lose data?

If you set ack=all according to the above ideas, you will not lose it. The requirement is that your leader receives the message and all the followers have synchronized to the message before you think this write is successful. If this condition is not met, the producer will automatically retry continuously for an unlimited number of times.

Guess you like

Origin blog.csdn.net/weixin_43314519/article/details/112389921