RocketMQ source code analysis: summary of the core process of message sending and consumption

Please add image description

A look at message sending and consumption from God's perspective

When we use RocketMQ, RocketMQ-Dashboard is a very useful graphical interface tool.
insert image description here
We first create a topic on RocketMQ-Dashboard, with 4 queues under each topic

Each topic is a collection of a type of message, and the queue is subdivided under the topic to improve the concurrency of message consumption.
insert image description here
When a producer sends a topic message, which queue under the topic should it be sent to?

The producer will use the polling strategy to send

So which queue should the consumer consume?

When there is a consumer, of course consumes all queues
insert image description here

What if there are multiple consumers?

You only need to assign queues to consumers according to various load balancing strategies. The following figure shows two load balancing methods.
insert image description here
insert image description here
You ask me how these two load strategies are implemented? Go look at the source code, I will not analyze the detailed process.

What happens if the number of consumers exceeds the number of queues?

The extra consumers will not consume any queues.
insert image description here
Why can a consumer only consume one queue?

There will definitely be concurrency problems when multiple consumers consume a queue, so it has to be locked. It is better to set the number of queues under the topic a little more.

Can I set the number of queues under the topic during the running process?

of course can. Not only can the number of queues be reset, but also consumers can be increased or decreased in real time to cope with different traffic scenarios

So when the number of queues or consumers changes, do you need to re-execute load balancing?

Yes, people generally call this process rebalancing

Let's share the details below

message sending process

There are three main ways to send messages: one-way sending (only sending, regardless of the result), synchronous sending and asynchronous sending
insert image description here

message consumption process

Is the message based on push or pull?

There are two modes of message consumption:

  1. Pull: Consumer keeps pulling from Broker
  2. Push: Broker pushes to Consumer

Both approaches have their own drawbacks:

  1. Pull: The pull interval is not easy to determine. If the interval is too short, there will be a waste of bandwidth, and if the interval is too long, the message will not be consumed in time.
  2. Push: Push and rate are difficult to adapt to the consumption rate . What should I do if the push is too fast and consumers can’t consume it? Pushing too slowly, messages cannot be consumed in time

It seems that the choice between pull and push is difficult

Then some bosses changed the pull mode, that is, it will not cause bandwidth waste, and the frequency of pull can be determined based on the consumption rate!

Guess how?

In fact, it is very simple. The Consumer sends a pull request to the Broker. If the Broker has data, it returns, and the Consumer pulls it again. If there is no data on the Broker side, it does not return immediately, but waits for a period of time (for example, 5s).

  1. If there is a message to be pulled during the waiting period, the message will be returned, and the Consumer will pull it again.
  2. If you wait for a timeout, it will return directly, and the request will not be held all the time, and the Consumer will pull it again

By the way, this strategy is called long polling

RocketMQ has two consumption methods: pull and push, but push is based on long polling

Specific consumption process

insert image description here
How to deal with the message after pulling it?

The member variables of the PullRequest class are shown in the figure below.
insert image description here
When the message is pulled, the message will be put into the msgTreeMap, where the key is the offset of the message, and the value is the message entity

In addition, there is an important attribute dropped, which is related to rebalancing. When rebalancing, it will cause repeated consumption of messages. The specific mechanism is not analyzed. See the column for

msgCount (total number of unconsumed messages) and msgSize (unconsumed message size) are related to flow control

What is flow control?

It is flow control. When consumers consume slowly, the speed of pulling is slowed down. As shown in the figure below,
insert image description here
when a PullRequest is obtained from a blocking queue, it does not directly initiate a network request, but first checks whether the flow control rules are triggered, such as the total number of unconsumed messages exceeds a certain value, and the size of unconsumed messages exceeds a certain value, etc.
insert image description here
The next step is to receive the response, process the message, and put the key PullRequest into the blocking queue again.

Did you miss a step? Is it that the Consumer told Broker that I consumed this part of the message?

Hmm, do you think the process of submitting offsets is synchronous? Not really, it's asynchronous

How does the consumer submit the offset?

insert image description here
When the consumer finishes consuming the message, it only stores the offset locally, and submits the offset to the broker through a scheduled task. In addition, after the broker receives the request to submit the offset, it only stores the offset in the map and persists it to the file through the scheduled task.

This will cause repeated consumption of messages

  1. After the consumer consumes the message, it is not synchronized to the broker in real time, but the offset is first saved in the local map and persisted through the scheduled task. This causes the message to be consumed, but at this time the consumer is down and the offset is not submitted, and the part of the message that does not submit the offset will be consumed again next time
  2. Even if the offset is submitted to the broker, the broker is down before it has time to persist. When restarting, the broker will read the offset information saved in consumerOffset.json, which will cause this part of the message that is not persisted to offset. be consumed again

Reference blog

[1]http://www.tianshouzhi.com/api/tutorials/rocketmq/409
[2]https://mp.weixin.qq.com/s/iultj1KtBSWxDbcYrZsJOg

Guess you like

Origin blog.csdn.net/zzti_erlie/article/details/123598615