Distributed message queue RocketMQ&Kafka -- "sequential consumption" of messages -- a seemingly simple and complex problem

When talking about message middleware, we usually talk about one feature: the problem of sequential consumption of messages. The problem seems simple: Producer sends messages 1, 2, 3. . . Consumer press 1, 2, 3. . . Sequential consumption.

But the actual situation is: no matter RocketMQ or Kafka, the strict and orderly consumption of messages is not guaranteed by default!

This feature seems simple enough, but why are they not guaranteed by default?

How difficult is "strictly sequential consumption"

Let's analyze from three aspects, how difficult or impossible it is for a message middleware to "strictly consume in order".

sender

The sender cannot send asynchronously. In the case of asynchronous sending, there is no way to guarantee the order of messages.

For example, if you send 1, 2, 3 in a row. After a while, the return result 1 failed, 2, 3 succeeded. If you resend 1 again, the sequence will be messed up.

storage side

For the storage side, to ensure the order of messages, there will be the following problems: 
(1) Messages cannot be partitioned. That is, 1 topic can only have 1 queue. In Kafka, it's called partition; in RocketMQ, it's called queue. If you have multiple queues, the messages of the same topic will be distributed into multiple partitions, and the order cannot be guaranteed.

(2) Even if there is only one queue, there will be a second problem. After the machine hangs up, can I switch to another machine? That is, the high availability problem.

For example, if your current machine hangs up, there are still messages on it that have not been consumed. Switch to other machines at this time, and availability is guaranteed. But the message order is messed up.

To ensure that, on the one hand, it must be replicated synchronously, but not asynchronously; on the other hand, it must be ensured that all messages must be consumed before the machine is switched off, and there must be no residue. Obviously, this one is hard! ! !

Receiving end

For the receiving end, parallel consumption cannot be performed, that is, multithreading or multiple clients cannot consume the same queue.

Summarize

From the above analysis, we can see how difficult it is to ensure the strict order of messages!

For the problem of the sender and the receiver, it is better to solve a little, limit the asynchronous sending and limit the parallel consumption. But for the storage side, after the machine hangs, the problem of switching is difficult to solve.

If you switch, the news may be messed up; if you don't switch, it will be temporarily unavailable. There is a trade-off between the two.

Does the business need global order?

It can be seen from the above analysis that it is very difficult to ensure that the messages are strictly ordered within a topic, or the conditions are very harsh.

then what should we do? Do we have to use all our strength and use all means to ensure the strict order of the news?

Here we need to consider this issue from another perspective: the business perspective. As stated in this blog post: 
http://www.jianshu.com/p/453c6e7ff81c

In actual situations: 
(1) There are a large number of services that do not pay attention to the order; 
(2) The disorder of the queue does not mean that the message is disordered.

Item (2) means: we do not guarantee the global ordering of the queue, but we can guarantee the local ordering of the messages.

For example: ensure that messages from the same order id are ordered!

Let's take a look at how Kafka and RocketMQ deal with this problem:

In Kafka: When sending a message, you can specify (topic, partition, key) 3 parameters. partiton and key are optional.

If you specify a partition, all messages are sent to the same partition, which is ordered. And on the consumer side, Kafka guarantees that one partition can only be consumed by one consumer.

Or if you specify a key (such as order id), all messages with the same key will be sent to the same partition. Also in order.

RocketMQ: Based on Kafka, RocketMQ relaxes this restriction one step further. Only specify (topic, key), do not specify which queue to send to. In other words, it does not want the business side to have a strict global order.

Key point: This release actually involves a bigger problem. This is the major difference between RocketMQ and Kafka in the underlying storage. I have introduced this in the continuation of the previous article, "Setting things right".

Later in the source code analysis sequence, this problem will be further analyzed.

This is the end of the discussion on the issue of "message order".

 

http://m.blog.csdn.net/chunlongyu/article/details/53977819

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326785473&siteId=291194637