The following article comes from Junge Chat Technology, author Zhu Jinjun

Interviewer: RocketMQ has a backlog of messages, is it useful to increase consumers?

Me: It depends on the specific scene, and the situation is different in different scenes.

Interviewer: Can you explain in detail?

Me: If the number of consumers is less than the number of MessageQueue, adding consumers can speed up message consumption and reduce message backlog. For example, a Topic has 4 MessageQueue, and 2 consumers consume it. If one consumer is added, the frequency of pulling messages can be obviously accelerated. As shown below:

If the number of consumers is greater than or equal to the number of MessageQueue, adding consumers is useless. For example, a Topic has 4 MessageQueue, and there are 4 consumers for consumption. As shown below

Interviewer: In the first case you mentioned, will adding consumers definitely speed up the speed of message consumption?

Me: This..., under normal circumstances, it is possible.

Interviewer: Are there any special circumstances?

Me: Of course. The speed at which consumer messages are pulled also depends on the consumption speed of local messages. If the consumption of local messages is slow, it will be delayed for a period of time before being pulled.

Interviewer: Under what circumstances will consumers wait for a period of time before pulling?

Me: The messages pulled by consumers exist in ProcessQueue, and consumers have flow control. If the following three situations occur, they will not take the initiative to pull:

The number of messages saved by ProcessQueue exceeds the threshold (1000 by default, configurable);
The size of the message saved by ProcessQueue exceeds the threshold (default 100M, configurable);
For non-sequential consumption scenarios, the difference between the offsets of the last and first messages stored in ProcessQueue exceeds the threshold (default 2000, configurable).

Interviewer: Any other circumstances?

Me: For the sequential consumption scenario, if the ProcessQueue fails to lock, the pull will also be delayed. The delay time is 3s.

Interviewer: What are the possible reasons for consumers to delay pulling messages?

Me: In fact, the essence of delayed pull is that consumers consume slowly, which causes the backlog of messages in ProcessQueue to exceed the threshold when the pull is next time. Take the following architecture diagram as an example:

Consumer consumption is slow, but it can be due to the following reasons:

The business logic processed by consumers is complex and takes a long time;
Consumers have slow queries, or the database load is high resulting in slow response;
Middleware such as caching responds slowly, such as Redis responds slowly;
The response of calling the external service interface is slow.

Interviewer: Are there any countermeasures for the slow response of external interfaces?

Me: This needs to be discussed on a case-by-case basis.

If calling the external system is just a notification, or the result of calling the external interface is not processed, you can use the asynchronous method, and the retry method is used in the asynchronous logic to ensure the success of the interface call. If the result returned by the external interface must be processed, you can consider whether the result returned by the interface can cache the default value (considering the business is feasible), and use the default value instead of returning the interface return value in a fast downgrade method after the call fails. If the result returned by this interface must be processed and cannot be cached, you can save the fetched message locally and directly return CONSUME_SUCCESS to the Broker. Wait for the external system to return to normal before taking it out from the local area for processing.

Interviewer: If the number of consumers is less than the number of MessageQueue, and the external system responds normally, is there anything to consider in order to quickly consume the backlog of messages and increase consumers?

Me: Although the external system responds normally, after adding multiple consumers, the interface calls of the external system will suddenly increase. If the throughput limit is reached, the external system will respond slowly or even hang up. At the same time, the pressure of the local database and cache must also be considered. If the database response slows down, the speed of processing messages will slow down, which will not alleviate the backlog of messages.

Interviewer: After adding a new consumer, how to allocate MessageQueue to it?

Me: Consumer needs to perform load operation on MessageQueue before pulling messages. RocketMQ uses a timer to complete the load operation. By default, it reloads every 20s.

Interviewer: Can you elaborate on the load strategies?

Me: RocketMQ provides 6 load strategies, let’s take a look at them in turn.

Load Average Policy

Sort consumers;
Calculate the number of MessageQueues that each consumer can allocate equally;
If the number of consumers is greater than the number of MessageQueue, the extra consumers will not be allocated;
If it cannot be divided equally, use the total number of MessageQueue to find the remainder mod of the number of consumers;
For the previous mod number of consumers, add one to each consumer, thus obtaining the number of MessageQueue allocated to each consumer.

For example, the case of 4 MessageQueue and 3 consumers:

The logic of the source code is very simple, as follows:

// AllocateMessageQueueAveragely 这个类
// 4 个 MessageQueue 和 3 个消费者的情况，假如第一个，index = 0
int index = cidAll.indexOf(currentCID);
// mod = 1
int mod = mqAll.size() % cidAll.size();
// averageSize = 2
int averageSize =
    mqAll.size() <= cidAll.size() ? 1 : (mod > 0 && index < mod ? mqAll.size() / cidAll.size()
                                         + 1 : mqAll.size() / cidAll.size());
// startIndex = 0
int startIndex = (mod > 0 && index < mod) ? index * averageSize : index * averageSize + mod;
// range = 2,所以第一个消费者分配到了2个
int range = Math.min(averageSize, mqAll.size() - startIndex);
for (int i = 0; i < range; i++) {
    result.add(mqAll.get((startIndex + i) % mqAll.size()));
}

round robin allocation strategy

This is easy to understand. When traversing consumers, assign one MessageQueue to the traversed consumer. If the number of MessageQueue is more than that of consumers, multiple traversals are required. The number of traversals is equal to (number of MessageQueues/number of consumers), or 4 The situation of MessageQueue and 3 consumers is as follows:

The source code is as follows:

//AllocateMessageQueueAveragelyByCircle 这个类
//4 个 MessageQueue 和 3 个消费者的情况，假如第一个，index = 0
int index = cidAll.indexOf(currentCID);
for (int i = index; i < mqAll.size(); i++) {
    if (i % cidAll.size() == index) {
        //i == 0 或者 i == 3 都会走到这里
        result.add(mqAll.get(i));
    }
}

custom allocation strategy

This strategy can specify which MessageQueue to consume when the consumer starts. You can refer to the following code:

AllocateMessageQueueByConfig allocateMessageQueueByConfig = new AllocateMessageQueueByConfig();
//绑定消费 messageQueue1
allocateMessageQueueByConfig.setMessageQueueList(Arrays.asList(new MessageQueue("messageQueue1","broker1",0)));
consumer.setAllocateMessageQueueStrategy(allocateMessageQueueByConfig);
consumer.start();

According to the computer room allocation strategy

In this way, the Consumer only consumes the MessageQueue of the specified computer room, as shown in the figure below: Consumer0, Consumer1, and Consumer2 are bound to the two computer rooms room1 and room2, and the computer room room3 has no consumers.

When the Consumer starts, it needs to bind the computer room name. You can refer to the following code:

AllocateMessageQueueByMachineRoom allocateMessageQueueByMachineRoom = new AllocateMessageQueueByMachineRoom();
//绑定消费 room1 和 room2 这两个机房
allocateMessageQueueByMachineRoom.setConsumeridcs(new HashSet<>(Arrays.asList("room1","room2")));
consumer.setAllocateMessageQueueStrategy(allocateMessageQueueByMachineRoom);
consumer.start();

The broker of this strategy must be named according to the format: computer room name@brokerName, because when consumers allocate queues, they first filter out all MessageQueue according to the computer room name, and then distribute them according to the average allocation strategy.

//AllocateMessageQueueByMachineRoom 这个类
List<MessageQueue> premqAll = new ArrayList<MessageQueue>();
for (MessageQueue mq : mqAll) {
    String[] temp = mq.getBrokerName().split("@");
    if (temp.length == 2 && consumeridcs.contains(temp[0])) {
        premqAll.add(mq);
    }
}
//上面按照机房名称过滤出所有的 MessageQueue 放入premqAll，后面就是平均分配策略

Allocate according to the nearest machine room

Compared with the allocation principle of the computer room, the advantage of the nearest distribution is that it can be allocated to the computer room without consumers. As shown in the figure below, the MessageQueue in computer room 3 is also allocated to consumers:

If there is no consumer in a computer room, the MessageQueue of this computer room will be allocated to all consumers in the cluster.

Consistent Hash Algorithm Strategy

All consumers are distributed to the Hash ring through Hash calculation, Hash calculation is performed on all MessageQueue, and the nearest consumer node clockwise is found for binding. As shown below: