RocketMQ consumption failure message processing

I have been idle to study the processing logic of RocketMQ consumption failure messages and record them here. In more detail, only the Push mode (in fact, the implementation is still the Pull mode) is non-sequential consumption. Pull and sequential messages will not be discussed here for the time being~( Haven't researched yet - -)

Consumption failure processing logic

  • If the consumption is successful, RockeMQ will indicate that the message has been processed by moving the consumption offset point forward.
  • The strategy adopted for the failed messages in the business department is to send the messages back to the broker (and store them in a %RETRY%XXtopic). As soon as you hear it, you can think of what to do when the broker hangs.
  • When the postback fails, a task will be started on the local machine to retry. Well, then what should I do if the consumer machine hangs up and retry is gone, do I lose the message?
  • Therefore, in order to ensure that the consumer will not lose the consumption failure and fail to send back the message when the consumer is powered off, the code ensures that the offset in the offsetManage (local or remote) will not move forward beyond the offset of the retransmission failure message, which can ensure that the next time it is needed, if the next time When the consumer comes alive (the offset must be taken from offsetManage at this time, but in fact, it will not be taken every time in normal operation, except for sequential messages...), it must be redrawn to the message of consumption failure (this will be mentioned later). The price is that many messages that have been consumed last time will be repeatedly pulled, but the code of business classmates is idempotent, so escape~)
  • For messages that fail to be consumed but are sent back successfully, the offset will be updated directly to pretend that those messages have been successfully consumed, because they have been reincarnated in the %RETRY%XXtopic as new messages waiting to be consumed ~ the current consumer can focus on other things.( Supplement: RERTRY topic will actually bring delay, so it is actually first SCHEDULE_TOPICand then %RETRY%XX, for details, see other students' analysis of delay messages~)

It can be seen that the retransmission mode will not lose messages. Even if the broker hangs up and the consumer hangs up, it will definitely consume it, although it may get a lot of unwanted duplicate messages - -

why do

The reason for writing this article is that a few friends in my group think this is very strange, why do you do it like this~? Personally, I have studied and understood this.... (Actually, I have just started to study RocketMQ, and many understandings may have problems. Welcome everyone to discuss and learn hahaha)

You can calmly look at the characteristics of the current message consumption scenario:

  • We are given a Queue, and we need to go through a layer of network when accessing
  • In order to hope that consumers can get a large number of messages as soon as possible, combined with the previous article, it is better for consumers to get a batch of messages at a time instead of one message
  • Because the order is not important, the consumer itself should be able to consume the batch of messages concurrently
  • Because messages are consumed concurrently, instead of waiting for the previous one to be ok before consuming the next one, there will be a problem with the order of acks
  • For server-side acks to be efficient, one status per record vs successfully offset?
  • Going further, is it necessary to wait for the last batch of consumption to be completed to obtain the second batch of messages? In fact, it is not necessary as long as the consumer has an idle thread to grab the second batch first, although some of the first batch are processed slowly, but In most cases, it should be able to recover in a while, and other threads will do the second batch of work first, so pulling and processing should be separated
  • In fact, it can be seen from the previous article that the speed of grabbing should be controlled according to the processing capacity of the consumer~ If there is still time for consumption, you can grab it from the Queue frantically, as long as you don’t ack the ones that have not been processed successfully; if you consume If the user has serious delay and cannot handle it, you need to reduce the crawl speed

Complete processing

I feel that RocketMQ handles this part of the code very cleverly... Several core participation classes:

  • ConsumeQueue: A Queue consumed from the consumer perspective (somewhat similar to the concept of partition in kafka, a Queue can only be consumed by one consumer, although a consumer can consume multiple Q)
  • PullRequest: Each ConsumeQueue consumer that has been allocated under the current consumer will create a new PullRequest, which records from nextOffsetthe Server 拉取offsetand two offsets to pull the second batch before the first batch is consumed; this Request will be created in the rebalanceService, And it is updated multiple times and the nextOffset enters the PullRequestQueue multiple times to achieve a continuous pull cycle effect - - (it will also be delayed and lost Q to control the rate)拉取offset消费offset
  • PullRequestQueue: a memory queue that acts as PullMessageServicean input parameter
  • PullMessageService: Responsible for pulling messages 拉取线程, keep reading PullRequestQueueand pulling messages according to the request, then throw the messages ProcessQueuein and ConsumeRequestsubmit them ConsumeServicefor processing, and then generate the next batch of PullRequests and throw them PullRequestQueue: Continue to consume the next batch to achieve continuous loop pull take effect
  • ConsumeRequest: Although it is called request, in addition to the message data to be consumed, there is also a specific consumption logic (it is a Runnable- -); the key element is the processing logic of this batch of messages to be processed by the msg list, and the user registration will be called in run listener, and post back the failure message according to the processing situation, and update ProcessQueue and OffsetStore according to the failure and postback results
  • ProcessQueue: Another memory queue storage implementation is TreeMap. Messages in processing will be removed from this Queue if the processing is successful or if the processing fails but the postback is successful. 消费offsetThe reporting is done based on the smallest offset in the ProcessQueue (so the failure is not sent back) The successful ones will not be removed); in addition, when the maximum offset and the minimum offset in the ProcessQueue are too large (MaxSpan), the previous PullMessageService will slow down and wait for a while to fetch based on the operation (wait a while and then throw messages into the PullRequestQueue).
  • ConsumeService: An Executor of ConsumeRequest can be understood as a thread pool
  • OffsetStore: Maintenance 消费offset(that is, the offset is processed before the completion)
  • RebalanceService: Responsible for allocating queue to consume, and for the current discussion process, his role is to initialize the corresponding PullRequestQueue and the first PullRequest, offset is obtained from offsetStore

I simply drew a picture to illustrate the relationship between the above classes~ (the fingers are drawn on the ipad without a pen, so it is particularly ugly--let's do it first)

Notes - Page 1.png

 

The details of the failed retry do not seem to be drawn, = = drawing is not easy to draw. . Look at the code in combination with the above description~--

final effect

  • Pulled from Server in batches
  • The pulling thread can start pulling the next batch without waiting for the previous batch to be processed. As long as the ProcessQueue is not overrun MaxSpan(that is, consuming certain card owners for too long), it can continue to pull
  • Consumption listeners can consume concurrently, and each returns the completion status. Some consumers' cards will not affect other consumers' consumption for a period of time.
  • Consumer guarantees that the actual consumer offset guarantees that the offset must have been processed successfully or failed but has been successfully sent back
  • If the postback is unsuccessful, it will retry locally and the remote offset will not move forward
  • If the queue is restarted or newly allocated, the initial offset will be obtained from offsetStorore, so there may be unnecessary duplicate messages, so message processing needs to be idempotent

Summarize

  • It can be batched, and the pulling and processing are separated, while ensuring that no data is lost, improving efficiency and at the cost of repeating messages
  • Improve ack efficiency by only recording offsets (you can ack a batch at a time, and do not need to record the status of each record)
  • By separating the pull offset (in PullRequest) and the consumption offset (in OffsetStore), the pull and processing progress are separated to improve the pull efficiency, and the pull threshold is controlled according to the customer's processing card master situation.
  • Accelerate the ack consumption offset of other successfully processed messages in the batch by sending back messages. If you do not send back, the consumption offset cannot be migrated, and restarting will cause more messages to be re-sent. You can move the offset forward (in any case, you don't care about the order, only the efficiency is required)
  • For postback failure, the consumption offset cannot be moved forward. . . Then optimize non-consumer downtime through local retry



Author: lysu
Link: https://www.jianshu.com/p/4bbf8ed23af4
Source : Jianshu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324452707&siteId=291194637