Article directory
-
- question
- Concurrent consumption
- Sequential consumption
- Q&A
-
- When consuming, there is a batch of messages. If one of the messages fails to be consumed, will all the messages be retried?
- Can users control the number of retries and the retry interval?
- When consuming messages in batches, can I control the starting offset of retry? For example, if there are 10 messages and the 5th message fails, then only the 5th message and all subsequent messages will be retried.
- How are retried messages re-consumed?
- If the broker's write permission is turned off, will it have any impact on the retry of message consumption?
question
- When consuming, there is a batch of messages. If one of the messages fails to be consumed, will all the messages be retried?
- Can users control the number of retries and the retry interval?
- When consuming messages in batches, can I control the starting offset of retry? For example, if there are 10 messages and the 5th message fails, then only the 5th message and all subsequent messages will be retried.
- How are retried messages re-consumed?
- If the broker's write permission is turned off, will it have any impact on the retry of message consumption?
- What happens if a Topic is consumed sequentially or concurrently by different consumers of the same ConsumerGroup?
For more details, please see: Rocketmq concurrent consumption failure retry mechanism
Concurrent consumption
Trigger time
After the consumption is completed, the consumer needs to process the result of the consumption, whether it is success or failure.
ConsumeMessageConcurrentlyService#processConsumeResult
/**
* 石臻臻的杂货铺
* vx: shiyanzu001
**/
public void processConsumeResult(
final ConsumeConcurrentlyStatus status,
final ConsumeConcurrentlyContext context,
final ConsumeRequest consumeRequest
){
int ackIndex = context.getAckIndex();
if (consumeRequest.getMsgs().isEmpty())
return;
switch (status) {
case CONSUME_SUCCESS:
if (ackIndex >= consumeRequest.getMsgs().size()) {
ackIndex = consumeRequest.getMsgs().size() - 1;
}
// 这个意思是,就算你返回了消费成功,但是你还是可以通过设置ackIndex 来标记从哪个索引开始时消费失败了的;从而记录到 消费失败TPS的监控指标中;
int ok = ackIndex + 1;
int failed = consumeRequest.getMsgs().size() - ok;
this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
break;
case RECONSUME_LATER:
ackIndex = -1;
this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
consumeRequest.getMsgs().size());
break;
default:
break;
}
List<MessageExt> msgBackFailed = new ArrayList<>(consumeRequest.getMsgs().size());
for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
MessageExt msg = consumeRequest.getMsgs().get(i);
// Maybe message is expired and cleaned, just ignore it.
if (!consumeRequest.getProcessQueue().containsMessage(msg)) {
log.info("Message is not found in its process queue; skip send-back-procedure, topic={}, "
+ "brokerName={}, queueId={}, queueOffset={}", msg.getTopic(), msg.getBrokerName(),
msg.getQueueId(), msg.getQueueOffset());
continue;
}
boolean result = this.sendMessageBack(msg, context);
if (!result) {
msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
msgBackFailed.add(msg);
}
}
if (!msgBackFailed.isEmpty()) {
consumeRequest.getMsgs().removeAll(msgBackFailed);
this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
}
//..... 部分代码省略....
}
Part of the code is omitted above. The above code is mainly for sending failed messages back to the Broker;
Just looking at the code means the following:
- If the processing result is CONSUME_SUCCESS and there is no need to retry, record the monitoring indicators, TPS of successful consumption and TPS of failed consumption; here the user can
context.setAckIndex()
set the index value of ACK through settings; for example, if your batch size is 10 messages, If you set it to 4 here; it means that the first 5 items are successful and the next 5 items are failed; of course, there will be no retry here for the failed ones; - If the processing result is RECONSUME_LATER , it means that it needs to be retried, and all the messages of the batch will be traversed and sent back to the Broker synchronously; if a synchronization request fails, it will be recorded; it will be consumed again on the local client in a while;
- Remove these messages from the TreeMap of messages to be consumed (except for the failure of the synchronization request back to the Broker), and obtain the smallest value in the current TreeMap;
- Update the value of the consumed offset in the local cache; so that the consumed offset can be submitted
Look at the picture and talk about a few key points.
-
Messages that need to be retried will be sent back to the retry queue first. After being sent successfully, it will be regarded as successfully consumed. The purpose of this is to prevent the failure of a certain message from hindering the submission of the entire consumption offset; for example
, Of the four messages 1, 2, 3, and 4, the consumption of the first one failed, and the others were successful. Then because the smallest Offset 1 failed, the subsequent messages could not be marked as successful for submission.
So if 1 is also set to success, it will not become a blocking point. Of course, it must be sent to the retry queue to wait for retry. -
The value of the submittable consumption Offset is always the minimum value in the TreeMap. This TreeMap stores all the Msg to be consumed obtained by pullMessage. Delete it after successful consumption.
For example, four messages 1, 2, 3, and 4. 1 and 2 are successfully consumed and deleted, then the smallest offset is 3, then all the offsets before it can be submitted; if 2, 3, and 4 are all consumed successfully and deleted, but 1 is still there, then the offset that can be submitted is The shift amount is still the current minimum value 1;
Users can decide which message to start retrying from
As mentioned above, users can ConsumeConcurrentlyContext
set the starting index of ackIndex to control retries by entering parameters;
/**
* 石臻臻的杂货铺
* vx: shiyanzu001
**/
consumer.registerMessageListener((MessageListenerConcurrently) (msg, context) -> {
System.out.printf(" ----- %s 消费消息: %s 本批次大小: %s ------ ", Thread.currentThread().getName(), msg, msg.size());
for (int i = 0; i < msg.size(); i++) {
System.out.println("第 " + i + " 条消息, MSG: " + msg.get(i));
try{
// 消费逻辑
}catch(Exception e){
// 这条消息失败, 从这条消息以及其后的消息都需要重试
context.setAckIndex(i-1);
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
}
}
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
});
PS: The current version I am looking at is (5.1.3). The author always feels that there is something wrong with the ackIndex setting;
- Setting the ackIndex will only take effect when the consumption is successful. Since the user returns success, it means that it does not need to be retried; setting this value always feels awkward.
- When consumption fails, ackIndex is forcibly set to -1, indicating that all messages must be retried. Normally, when batch consumption occurs and one of the messages fails, subsequent messages should start from the index of this message. All need to be retried. Those that have been consumed previously and are successful do not need to be retried;
Regarding this, I'm more inclined to think this is a bug; or a design flaw
Optimization suggestions:
- In order to be compatible with the previous logic, the logic of the successful state will not be modified.
- In the case of failure, there is no need to force it to be set to -1, causing all to be retried. This allows users to set partial messages for retry through ackIndex instead of retrying all.
The client initiates a request CONSUMER_SEND_MSG_BACK
If the message consumption of this batch fails, it will try again.
The retry will try to send the messages back one by one.
DefaultMQPushConsumerImpl#sendMessageBack
Request header ConsumerSendMsgBackRequestHeader
Attributes | illustrate |
---|---|
group | GroupName |
originTopic | Topic |
offset | The offset of the message in the Log |
delayLevel | Delayed retry level; also the retry policy level; [-1: No retry, directly placed in the dead letter queue, 0: Broker controls retry frequency, >0: Client controls retry frequency]; if it is greater than 0 situation, the corresponding delay level (delay message) will be delayed when retrying; if it is 0, the delay level is the number of retries + 3, which means that the delay increases by one level for each retry; the delay level mentioned here It is 18 levels of delayed messages |
originMsgId | Message ID |
maxReconsumeTimes | The maximum number of retries, in concurrent mode, defaults to 16; in ordered mode, the default is Integer.MAX_VALUE. |
bname | BrokerName |
target address
The address of the Broker where the Message is located
msg.getStoreHost()
Request method
synchronous request
Request process
/**
* 石臻臻的杂货铺
* wx: szzdzhp001
**/
private void sendMessageBack(MessageExt msg, int delayLevel, final String brokerName, final MessageQueue mq)
throws RemotingException, MQBrokerException, InterruptedException, MQClientException {
boolean needRetry = true;
try {
// 部分代码忽略....
String brokerAddr = (null != brokerName) ? this.mQClientFactory.findBrokerAddressInPublish(brokerName)
: RemotingHelper.parseSocketAddressAddr(msg.getStoreHost());
this.mQClientFactory.getMQClientAPIImpl().consumerSendMessageBack(brokerAddr, brokerName, msg,
this.defaultMQPushConsumer.getConsumerGroup(), delayLevel, 5000, getMaxReconsumeTimes());
} catch (Throwable t) {
log.error("Failed to send message back, consumerGroup={}, brokerName={}, mq={}, message={}",
this.defaultMQPushConsumer.getConsumerGroup(), brokerName, mq, msg, t);
if (needRetry) {
//以发送普通消息的形式发送重试消息
sendMessageBackAsNormalMessage(msg);
}
} finally {
msg.setTopic(NamespaceUtil.withoutNamespace(msg.getTopic(), this.defaultMQPushConsumer.getNamespace()));
}
}
- The timeout for the first send is 5000ms; the request is
RequestCode.CONSUMER_SEND_MSG_BACK
- If the above request fails to be sent, the cover-up strategy is to send ordinary messages directly; but the Topic is %RETRY%{consumerGroup}; the delay level is; the
3 + msg.getReconsumeTimes()
Producer client that sends the message here is the built-in Producer created by the Consumer when building the instance. Client, the client instance name is: CLIENT_INNER_PRODUCER ; this sending is also synchronous; the timeout is3000
- If the above fails and an exception is thrown, the local client will retry consumption (with a delay of 5 seconds);
Does the local client always retry or is there a limit to the number of times it can be retried?
If it keeps failing and the client retries, there is no limit to the number of times, and consumption is delayed for 5 seconds each time; it will become a blocking point for consuming Offset; subsequent messages have the possibility of being re-consumed (for example, the client restart)
Broker handles CONSUMER_SEND_MSG_BACK request
AbstractSendMessageProcessor#consumerSendMsgBack
- If the current Broker is not the Master, a system exception error code will be returned.
- If the consuming Group subscription relationship does not exist, an error code will be returned.
- If
brokerPermission
the permission is not writable, a no permission error code will be returned. - If the number of retry queues of the current Group
retryQueueNums
<=0, a no-permission error code will be returned. - If the Group's new Topic does not exist, create one, TopicName: %RETRY%GroupName; read and write permissions
- Search the Message according to the input parameters
offset
; if not found, the system exception error code will be returned. - If the number of retries for the message has exceeded the maximum number, or the retry policy is no retry, the message will be sent to the dead letter queue; dead letter queue Topic: %DLQ%GroupName
- If the number of retries has not been exceeded, the message is sent to the retry topic: %RETRY%GroupName
- If there is a ConsumeMessageHook list, execute
consumeMessageAfter
the method - Return Response.
Note: No matter how many times a message is retried, the Message IDs of these retried messages will not change. Therefore, we need to do idempotent consumption operations on the consumer side.
Sequential consumption
The process of executing the processing results after sequential consumption
ConsumeMessageOrderlyService#processConsumeResult
Several important points
- Sequential consumption will only have one consumption task, ConsumeRequest, executed for the same ProcessQueue.
- The user will try again if he returns SUSPEND_CURRENT_QUEUE_A_MOMENT . The retry process will decide whether to send the message back to the retry queue based on whether the maximum number of retries has been exceeded.
- This retry queue is sent directly to the retry Topic %RETRY%{consumerGroup} using the built-in Producer instance of the Consumer;
- This maximum number of retries is generally INTEGER.MAXVALUE; so it generally does not exceed it, then it will always be retried locally, with a delay of 1s every time it is retried; this process does not write the message back to the Broker.
- If a certain message continues to fail to be consumed, then the entire queue consumption will be blocked.
Q&A
When consuming, there is a batch of messages. If one of the messages fails to be consumed, will all the messages be retried?
If you return ConsumeConcurrentlyStatus#RECONSUME_LATER when consuming, it means that the consumption failed and needs to be retried. Then all the Msgs allocated this time will be retried;
the number of Msgs allocated this time isconsumer.setConsumeMessageBatchMaxSize(1)
determined by; the default is 1; means consuming one message at a time;
Can users control the number of retries and the retry interval?
Can.
Control the number of retries: Before
3.4.9 , the subscriptionGroupConfig consumer group configuration was used. AfterretryMaxTimes
3.4.9 , it was specified by the client (requestHeader.getMaxReconsumeTimes()). The value
can beConsumer#setMaxReconsumeTimes(最大次数)
set here
. The default concurrency mode is 16 times.Retry interval:
By default, the retry interval is controlled by the Broker. The interval is implemented using delay messages. For example, the delay level of the Broker is; by default, the3+重试次数
level corresponding to the first retry. The time interval of 3 is: 10s;
If you want to customize the retry interval , you need to handle it yourself when consuming, for example
/**
* 石臻臻的杂货铺
* vx: shiyanzu001
**/
consumer.registerMessageListener((MessageListenerConcurrently) (msg, context) -> {
System.out.printf(" ----- %s 消费消息: %s 本批次大小: %s ------ ", Thread.currentThread().getName(), msg, msg.size());
for (int i = 0; i < msg.size(); i++) {
System.out.println("第 " + i + " 条消息, MSG: " + msg.get(i));
if(消费失败){
// 延迟等级5 = 延迟1分钟;
context.setDelayLevelWhenNextConsume(5);
// 或者你也可以根据重试的次数来递增延迟级别
context.setDelayLevelWhenNextConsume(3 + msg.get(i).getReconsumeTimes());
}
// 需要重试
return ConsumeConcurrentlyStatus.RECONSUME_LATER;
}
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
});
When consuming messages in batches, can I control the starting offset of retry? For example, if there are 10 messages and the 5th message fails, then only the 5th message and all subsequent messages will be retried.
Yes
, but currently only when ConsumeConcurrentlyStatus.CONSUME_SUCCESS is returned. If ConsumeConcurrentlyStatus.RECONSUME_LATER is returned , the entire batch of messages will be retried.
For details, please see the code below
consumer.registerMessageListener((MessageListenerConcurrently) (msg, context) -> {
System.out.printf(" ----- %s 消费消息: %s 本批次大小: %s ------ ", Thread.currentThread().getName(), msg, msg.size());
for (int i = 0; i < msg.size(); i++) {
System.out.println("第 " + i + " 条消息, MSG: " + msg.get(i));
try{
// 消费逻辑
}catch(Exception e){
// 这条消息失败, 从这条消息以及其后的消息都需要重试
context.setAckIndex(i-1);
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
}
}
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
});
The author believes that consumption failure ( RECONSUME_LATER ) should also be supported here to allow users to control which message needs to be retried.
How are retried messages re-consumed?
Messages that need to be retried will be written to the %RETRY%{consumerGroup} retry queue. When the delay time is up, the client will consume these messages again.
If the number of retries is exceeded, it will be placed in the dead letter queue %DLQ%{consumerGroup}. Won't try again
If the broker's write permission is turned off, will it have any impact on the retry of message consumption?
Answer: It has an impact.
The mechanism of consumption retry is to first send a retry message back to the Broker. If you turn off the write permission, then the process will be blocked and it will keep retrying on the local client, with an unlimited number of delays of 5 seconds for consumption.
Of course, if you keep retrying locally, this Msg will be a blocking point for successful consumption, and all the Offsets behind it cannot be submitted even if they are consumed.
Therefore, you still need to be careful when closing the Broker write permission.
For more details, please see: Rocketmq concurrent consumption failure retry mechanism