RocketMQ's tag also has this "pit"!

> About the author: Hello everyone, I am the author of the book "RocketMQ Technology Insider", the chief evangelist of the RocketMQ open source community, and the maintainer of the public account "Middleware Interest Circle". , Dubbo, Sentinel, Canal, ElasticJob and other middleware 15 columns.

RocketMQ provides a tag-based message filtering mechanism, but many friends have more or less questions during use. I inadvertently in the RocketMQ official Dingding group, I remember that many friends have asked the following questions:

Today, I will share with you a few issues worth paying attention to with RocketMQ Tag. After reading it, if you find it helpful, I look forward to your likes and support .

  • Why is the incoming message lost when the subscription relationship of the consumer group is inconsistent?
  • If a tag has a small number of messages, will it show high latency?

1. Inconsistent consumer group subscription relationship leads to message loss

From the perspective of message consumption, a consumer group is a basic unit of physical isolation. Each consumer group has its own consumption site, consumer thread pool, and so on.

Beginners of RocketMQ are prone to make such a mistake: different consumers in the consumption group subscribe to different tags of the same topic, which will lead to message loss (some messages are not consumed). When thinking about this problem, we might as well look at it first. A picture:

Briefly explain its core key points:

  1. For example, a topic has a total of 4 queues.

  2. After the message sender continuously sends 4 messages of tagA, it sends 4 messages of tagb continuously. The message sender adopts the round-robin load balancing mechanism by default, so that there are two tags of tagA and tabB in each queue of the topic. information.

  3. Consumer group dw_tag_test whose IP is 192.168.3.10 subscribes to tagA, and another consumer whose IP is 192.168.3.11 subscribes to tagB.

  4. Before the consumers in the consumption group consume messages, they will first perform queue load, which is evenly distributed by default. The distribution results are as follows:

    • 192.168.3.10 is assigned to q0, q1.

    • 192.168.3.11 is assigned to q2, q3.

      • The consumer then initiates a message pull request to the Broker. On 192.168.3.10, the consumer will only subscribe to tagA, so the messages with tagB in q0 and q1 will be filtered, but the filtered tagB will not be delivered to another subscription. The consumer of tagB is removed, resulting in this part of the message not being delivered, resulting in message loss.
      • Similarly, the 192.168.3.11 consumer will only subscribe to tagB, so the messages with tagA in q2 and q3 will be filtered, but the filtered tagA will not be delivered to another consumer subscribed to tagA, resulting in this part of the message Not delivered, resulting in message loss.

2. If a tag has a small number of messages, will it show a high delay?

At the beginning of the chapter, there are group friends who will have such a concern, and the scene is probably as shown in the following figure:

After consumers consume this tag1 message with offset=100, 1000W non-tag1 messages appear in a row, will the backlog of this consumer group continue to increase, directly to 1000W?

To understand this problem, we should at least focus on looking at the source code of the following functions:

  • Message pull process
  • Site submission mechanism

> This article does not intend to analyze the source code of this piece of the whole process. If you are interested in this piece of code, you can refer to the book "RocketMQ Technology Insider" published by the author .

This article will be problem-oriented, through my own thinking, and find the key source code to verify, and finally verify with a simple sample code.

Before encountering a problem, we can try to think about it first. If this function wants us to realize, how will we think about it?

To judge that after the consumer group consumes messages with offset=100, in the case that the next 1000W messages will be filtered, if we want the site to be submitted, how should we design it? I think there should be at least the following key points:

  • When the message is pulled, 1000W of consecutive messages cannot find a suitable message. What will the server do?
  • How to submit the site when the client pulls the message and does not pull the message

2.1 Key Designs in the Message Pulling Process

The client pulls messages from the server, and 1000W messages in a row do not meet the conditions. It must be very time-consuming to filter and find so many messages at a time, and the client cannot wait for so long. Then the server must take measures and must trigger a stop search. condition and return NO_MESSAGE to the client, how long will the client wait for the message lookup?

Core key point 1 : The client will set a timeout when initiating a message pull request to the server. The code is as follows:

Among them, the two variables related to the timeout time have different meanings:

  • long brokerSuspendMaxTimeMillis allows the suspension time on the broker side when there is no current message. The default is 15s, and customization is not supported temporarily.
  • long timeoutMillis The timeout period for message pulling, the default is 30s, and customization is not supported temporarily.

That is, the maximum timeout time for a message pull is 30s.

Core key point 2 : The Broker side sets a complete exit condition when processing the message pull, which is determined by the getMessage method of DefaultMessageStore. The specific code is as follows:

Core points :

  • First, the client will pass in the number of messages it expects to pull this time when it initiates, which corresponds to the maxMsgNums in the above code. If the specified number of messages is pulled into the message (readers and friends such as body code readers can refer to the isTheBatchFull method), then Exit normally.
  • Another very critical filter condition is the maximum number of index bytes scanned by the server during a message pull process, that is, the number of bytes of ConsumeQueue scanned by a pull, take 16000 and the expected number of pulls multiplied by 20, because A consumequeue entry occupies 20 bytes.
  • The server also contains a long round-robin mechanism, that is, if the specified number of bytes is scanned, but no message is queried, it will hang on the broker side for a period of time. If a new message arrives and meets the filter conditions, it will be Wake up and return a message to the client.

Returning to this question, if the server has 1000W non-tag1 messages in a row, the pull request will not be screened at one time, but will be returned, so that the client will not time out .

From here, the first concern can be dispelled: the server will not wait stupidly and not return when no message is found. Next, the key to seeing whether there will be a backlog is how to submit the site.

2.2 Site submission mechanism

2.2.1 The client pulls to the appropriate message site submission mechanism

After the Pull thread pulls the structure from the server, it will submit the message to the thread pool of the consumer group, which is mainly defined in the PullTask ​​class of DefaultMQPushConsumerImpl. The specific code is as follows:

As we all know, RocketMQ performs site submission after successful consumption. The code is in ConsumeMessageConcurrentlyService, as shown below:

Here's the core takeaway:

  • After the consumer successfully completes the consumption of the message, the minimum site submission mechanism will be adopted to ensure that the consumption is not lost.

  • The minimum location submission mechanism is actually to put the pulled message into a TreeMap, and then after the consumer thread successfully consumes a message, remove the message from the TreeMap, and then calculate the location:

    • If there are still messages being processed in the current TreeMap, return the first message (minimum position) in the TreeMap
    • If there is no message processing in the current TreeMap, the returned location is this.queueOffsetMax, and the queueOffsetMax indicates the maximum consumption location pulled from the current consumption queue, because all the pulled messages are consumed at this time.
  • Finally, call the updateoffset method to update the local site cache (with a timing persistence mechanism)

2.2.2 The client did not pull the appropriate message site submission mechanism

If the client does not pull the appropriate messages, for example, all are filtered by the tag, the processing method is defined in the PullTask ​​of DefaultMqPushConsumerImpl, as shown below:

The key code is in correctTasOffset, please see the specific code:

Core point : If the message in the processing queue is 0 at this time, the next pull offset will be used as a point , and this value will be driven forward when the server performs message search, the code is in the getMessage of DefaultMessageStore :

Therefore, it can be seen from here that even if all the messages are filtered out, the site will still be driven forward without causing a large backlog.

2.2.3 When a message is pulled, a site submission will be attached

In fact, in RocketMQ's site submission, when the client submits the site, it will be stored in the local cache first, and then the site information will be submitted to the Broker at one time at a time. In fact, there is another relatively implicit site submission mechanism:

That is, when the message is pulled, if there is site information in the local cache, a system flag will be set: FLAG_COMMIT_OFFSET, which will trigger a site submission on the server. The specific code is as follows:

2.2.4 Summary and Verification

In summary, the use of TAG will not cause a large backlog due to the relatively small number of corresponding tags.

In order to verify this point of view, I also made a simple verification. The specific method is to start a message sender and send a message of tag B to the specified topic, and the consumer only subscribes to tag A, but the consumer does not have a consumption backlog. The test code is shown below:

Check the backlog of the consumption group as shown in the figure below:

Article first published: https://www.codingw.net/Article?id=759

One last word (please pay attention, don't prostitute me for nothing)

If this article is helpful or enlightening to you, please give it a thumbs up.

Mastering one or two mainstream Java middleware is a necessary skill to knock on BAT and other big companies. I will give you a Java middleware learning route to help you realize the transformation of the workplace.

The ladder of Java advancement, growth routes and learning materials, helping to break through the field of middleware

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324278318&siteId=291194637