Interviewer: how to ensure that messages are not repeated consumption? Or, how to ensure the message consumption idempotency?

This article from the yanglbme starting on GitHub technology community Doocs , currently stars exceeded 30k.
Project Address: github.com/doocs/advan...

stars

Interview questions

How to ensure that messages are not repeated consumption? Or, how to ensure the message consumption idempotency?

Interviewer psychological analysis

In fact, this is a very common problem, maybe you can put together a basic question to ask. Since it is a consumer news, it certainly would be considered repeated consumption? You can avoid duplication of spending? Or repeated consumption of the system and do not cause abnormal okay? This is the basic problem MQ field, in fact, in essence, is to ask you to use the Message Queue idempotency how to ensure that this is a problem in your architecture to consider.

Face questions analysis

Answer this question, first of all you do not hear the message repeat this thing, we know nothing about it, your first question about which repeated consumption of talk may have .

First of all, such as RabbitMQ, RocketMQ, Kafka, there may appear the message repeated consumption problem, normal. Because this problem is usually not guaranteed MQ own, it is ensured by our development. Pick a Kafka give you an example, talk about how it repeated consumption.

Kafka actually have a concept of offset is written into each message, there is an offset, on behalf of the serial number of the message, then consumer after consumer data, from time to time (timed on a regular basis), will own over-consumption message offset submit it, she said, "I've been a consumer, if the next time I restart what, you let me continue from consumption to offset the last to continue to spend it."

But there are always unexpected things, such as we often encounter prior to production, that is, sometimes you reboot the system and see how you restart, if you encounter a point in a hurry, directly kill the process, and then restart. This can lead to some consumer message processing, but did not have time to submit offset, embarrassing. After the restart, a few messages will be spending time again.

For chestnuts.

There is such a scene. Data 1/2/3 turn into kafka, kafka give these three data for each assigned an offset, which represents the number of data, we assume that the distribution of offset followed 152/153/154. From kafka consumers to spend time, but also in that order to the consumer. If the consumer when the consumer offset=153of this data, just ready to submit to offset zookeeper, this time the consumer process is restarted. Then the time-consumer data offset 1/2 and did not submit, kafka also do not know that you have consumed offset=153this data. So after the restart, consumers will find kafka say, Hey, buddies, you give me that back then the last time I place the consumer to continue to give me the data transfer over. Due to the success of the previous offset is not submitted, then the data will pass over half again, this time if consumers do not go heavy, then it will lead to duplication of spending.

If consumers get a dry thing is to write data to go out into a database, it will lead to say, you might put the data inserted 2 1/2 times in the database, then the data is wrong.

In fact, repeated consumption is not terrible, horrible thing is that you do not take into account after repeated consumption, how to ensure idempotency .

To give you an example. Suppose you have a system, a message to go out into the consumer database inserts a row, if you repeat a message twice, you will not insert the two, this data is not wrong? But if you consume a second time to judge for themselves whether what has been a consumer, if you direct throw, so not to retain a piece of data, thus ensuring the accuracy of the data.

A data repeated twice, on only one database data, which ensures idempotent of the system.

Idempotency, popular point that it is a data or a request to repeat to you many times, you have to make sure the corresponding data will not change, you can not go wrong .

So the second question is, how to ensure idempotency message queue consumption?

In fact, still have to think about the business combination, give me a few ideas:

  • For example, you get a data write database, you first check based on the primary key, if it has the data, you do not inserted, update it right.
  • For example, you are writing Redis, then no problem, anyway, every time set, natural idempotency.
  • For example, you are not above two scenarios, that do little more complicated, you need to allow producers to send data each time, which add a globally unique id, order id like similar things, and then you here to the consumer after, according to this first id such as Redis in check, before the consumer before? If you do not consume too, you deal with, and then write the id Redis. If you consume too, then you do not deal with, and do not repeat the process to ensure that the same message can be.
  • Based on such a unique key database to ensure that data is not repeated a plurality of repeated insertion. Because of the unique key constraint, duplicate data insertion error will not result in stale data in the database.

Of course, how to ensure the consumer is MQ idempotency, and requires a combination of specific business point of view.


I welcome attention to the micro-channel public number "Doocs the open source community," the first time to push original technology articles.

Guess you like

Origin juejin.im/post/5dcdfd49f265da0bc10e3224