RocketMQ retry message delivery

Get into the habit of writing together! This is the 9th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event

retry message delivery

  • How does the client implement the delivery of retry topic messages?
  1. Send out through Consumer client (timeout)
  2. Send it out through the producer (consumption failure)

ConsumerTwo consumption methods are provided: pullandpush

In fact, it pushis also wrapped to pullachieve message consumption.

DefaultMQPushConsumerImpl#sendMessageBack

Several scenarios for calling sendMessageBack?

  1. In concurrent consumption mode, message consumption timeout (default 15 minutes)
  2. In the concurrent consumption mode, if the message consumption fails, it may be that the business returns to the action abnormally, or the display returns.ConsumeConcurrentlyStatus.RECONSUME_LATER
 public void sendMessageBack(MessageExt msg, int delayLevel, final String brokerName)
            throws RemotingException, MQBrokerException, InterruptedException, MQClientException {
        try {
            String brokerAddr = (null != brokerName) ? this.mQClientFactory.findBrokerAddressInPublish(brokerName)
                    : RemotingHelper.parseSocketAddressAddr(msg.getStoreHost());
            this.mQClientFactory.getMQClientAPIImpl().consumerSendMessageBack(brokerAddr, msg,
                    this.defaultMQPushConsumer.getConsumerGroup(), delayLevel, 5000, getMaxReconsumeTimes());
        } catch (Exception e) {
            log.error("sendMessageBack Exception, " + this.defaultMQPushConsumer.getConsumerGroup(), e);

            Message newMsg = new Message(MixAll.getRetryTopic(this.defaultMQPushConsumer.getConsumerGroup()), msg.getBody());

            String originMsgId = MessageAccessor.getOriginMessageId(msg);
            MessageAccessor.setOriginMessageId(newMsg, UtilAll.isBlank(originMsgId) ? msg.getMsgId() : originMsgId);

            newMsg.setFlag(msg.getFlag());
            MessageAccessor.setProperties(newMsg, msg.getProperties());
            MessageAccessor.putProperty(newMsg, MessageConst.PROPERTY_RETRY_TOPIC, msg.getTopic());
            MessageAccessor.setReconsumeTime(newMsg, String.valueOf(msg.getReconsumeTimes() + 1));
            MessageAccessor.setMaxReconsumeTimes(newMsg, String.valueOf(getMaxReconsumeTimes()));
            MessageAccessor.clearProperty(newMsg, MessageConst.PROPERTY_TRANSACTION_PREPARED);
            newMsg.setDelayTimeLevel(3 + msg.getReconsumeTimes());

            this.mQClientFactory.getDefaultMQProducer().send(newMsg);
        } finally {
            msg.setTopic(NamespaceUtil.withoutNamespace(msg.getTopic(), this.defaultMQPushConsumer.getNamespace()));
        }
    }
复制代码

Timeout Scenario

The consumption timeout is called by the timing thread when the consumer client is started. The timing time is the expiration time of the message, and the default is 15min.

 public void start() {
        this.cleanExpireMsgExecutors.scheduleAtFixedRate(new Runnable() {

            @Override
            public void run() {
                try {
                    cleanExpireMsg();
                } catch (Throwable e) {
                    log.error("scheduleAtFixedRate cleanExpireMsg exception", e);
                }
            }

        }, this.defaultMQPushConsumer.getConsumeTimeout(), this.defaultMQPushConsumer.getConsumeTimeout(), TimeUnit.MINUTES);
    }
    
public long getConsumeTimeout() {
        return consumeTimeout;
    }
/**
     * Maximum amount of time in minutes a message may block the consuming thread.
     */
    private long consumeTimeout = 15;    
复制代码

The consumption timeout call path is as follows:

ConsumeMessageConcurrentlyService#start -> ConsumeMessageConcurrentlyService#cleanExpireMsg -> ProcessQueue#cleanExpiredMsg -> DefaultMQPushConsumer#sendMessageBack -> DefaultMQPushConsumerImpl#sendMessageBack
复制代码

The cleanExpireMsg code is implemented as follows:

public void cleanExpiredMsg(DefaultMQPushConsumer pushConsumer) {
        if (pushConsumer.getDefaultMQPushConsumerImpl().isConsumeOrderly()) {
            return;
        }

        int loop = msgTreeMap.size() < 16 ? msgTreeMap.size() : 16;
        for (int i = 0; i < loop; i++) {
            MessageExt msg = null;
            try {
                this.treeMapLock.readLock().lockInterruptibly();
                try {
                    if (!msgTreeMap.isEmpty()) {
                        String consumeStartTimeStamp = MessageAccessor.getConsumeStartTimeStamp(msgTreeMap.firstEntry().getValue());
                        if (StringUtils.isNotEmpty(consumeStartTimeStamp) && System.currentTimeMillis() - Long.parseLong(consumeStartTimeStamp) > pushConsumer.getConsumeTimeout() * 60 * 1000) {
                            msg = msgTreeMap.firstEntry().getValue();
                        } else {
                            break;
                        }
                    } else {
                        break;
                    }
                } finally {
                    this.treeMapLock.readLock().unlock();
                }
            } catch (InterruptedException e) {
                log.error("getExpiredMsg exception", e);
            }

            try {

                pushConsumer.sendMessageBack(msg, 3);
                log.info("send expire msg back. topic={}, msgId={}, storeHost={}, queueId={}, queueOffset={}", msg.getTopic(), msg.getMsgId(), msg.getStoreHost(), msg.getQueueId(), msg.getQueueOffset());
                try {
                    this.treeMapLock.writeLock().lockInterruptibly();
                    try {
                        if (!msgTreeMap.isEmpty() && msg.getQueueOffset() == msgTreeMap.firstKey()) {
                            try {
                                removeMessage(Collections.singletonList(msg));
                            } catch (Exception e) {
                                log.error("send expired msg exception", e);
                            }
                        }
                    } finally {
                        this.treeMapLock.writeLock().unlock();
                    }
                } catch (InterruptedException e) {
                    log.error("getExpiredMsg exception", e);
                }
            } catch (Exception e) {
                log.error("send expired msg exception", e);
            }
        }
    }
复制代码

cleanExpiredMsgTwo lines in the code of concern

 pushConsumer.sendMessageBack(msg, 3);
                log.info("send expire msg back. topic={}, msgId={}, storeHost={}, queueId={}, queueOffset={}", msg.getTopic(), msg.getMsgId(), msg.getStoreHost(), msg.getQueueId(), msg.getQueueOffset());     
复制代码

After the message times out, the brokerName parameter of the sent message is null DefaultMQPushConsumer#sendMessageBack

    public void sendMessageBack(MessageExt msg, int delayLevel)
        throws RemotingException, MQBrokerException, InterruptedException, MQClientException {
        msg.setTopic(withNamespace(msg.getTopic()));
        this.defaultMQPushConsumerImpl.sendMessageBack(msg, delayLevel, null);
    }
复制代码

That is to say, the message that causes the timeout is eventually resent to the original broker, and the delayLevel is written to 3 by default, which means 30s. The consumption timeout here means that the message is still in the queue and has not been taken out and delivered to the handler for processing .

Consumption Abnormal Scenarios

消息消费失败的场景 是指可能存在消费异常(抛出exception), 或者业务handler显式返回 ConsumeConcurrentlyStatus.RECONSUME_LATER 或者没有返回结果null, 都会触发最终发送回 retry message的逻辑.

调用路径如下:

ConsumeMessageConcurrentlyService$ConsumeRequest#run -> ConsumeMessageConcurrentlyService#processConsumeResult -> DefaultMQPushConsumerImpl#sendMessageBack
复制代码

重新投递为啥抛出异常,或者,没有返回 null 或者显示返回 ConsumeConcurrentlyStatus.RECONSUME_LATER

具体看代码 ConsumeMessageConcurrentlyService#run 中的这几行。

  if (null == status) {
                if (hasException) {
                    returnType = ConsumeReturnType.EXCEPTION;
                } else {
                    returnType = ConsumeReturnType.RETURNNULL;
                }
                
...

 if (null == status) {
                log.warn("consumeMessage return null, Group: {} Msgs: {} MQ: {}",
                    ConsumeMessageConcurrentlyService.this.consumerGroup,
                    msgs,
                    messageQueue);
                status = ConsumeConcurrentlyStatus.RECONSUME_LATER;
            }
                            
复制代码

processConsumeResult 代码实现如下:


    public void processConsumeResult(
        final ConsumeConcurrentlyStatus status,
        final ConsumeConcurrentlyContext context,
        final ConsumeRequest consumeRequest
    ) {
        int ackIndex = context.getAckIndex();

        if (consumeRequest.getMsgs().isEmpty())
            return;

        switch (status) {
            case CONSUME_SUCCESS:
                if (ackIndex >= consumeRequest.getMsgs().size()) {
                    ackIndex = consumeRequest.getMsgs().size() - 1;
                }
                int ok = ackIndex + 1;
                int failed = consumeRequest.getMsgs().size() - ok;
                this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
                this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
                break;
            case RECONSUME_LATER:
                ackIndex = -1;
                this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
                    consumeRequest.getMsgs().size());
                break;
            default:
                break;
        }

        switch (this.defaultMQPushConsumer.getMessageModel()) {
            case BROADCASTING:
                for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
                    MessageExt msg = consumeRequest.getMsgs().get(i);
                    log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
                }
                break;
            case CLUSTERING:
                List<MessageExt> msgBackFailed = new ArrayList<MessageExt>(consumeRequest.getMsgs().size());
                for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
                    MessageExt msg = consumeRequest.getMsgs().get(i);
                    boolean result = this.sendMessageBack(msg, context);
                    if (!result) {
                        msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
                        msgBackFailed.add(msg);
                    }
                }

                if (!msgBackFailed.isEmpty()) {
                    consumeRequest.getMsgs().removeAll(msgBackFailed);

                    this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
                }
                break;
            default:
                break;
        }

        long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
        if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
            this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
        }
    }
复制代码

为啥能调用 sendMessageBack 方法 ,具体查看下面几行:

 case RECONSUME_LATER:
                ackIndex = -1;
...

  case BROADCASTING:
                for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
                    MessageExt msg = consumeRequest.getMsgs().get(i);
                    log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
                }
                break;
            case CLUSTERING:
                List<MessageExt> msgBackFailed = new ArrayList<MessageExt>(consumeRequest.getMsgs().size());
                for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
                    MessageExt msg = consumeRequest.getMsgs().get(i);
                    boolean result = this.sendMessageBack(msg, context);
                    if (!result) {
                        msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
                        msgBackFailed.add(msg);
                    }
                }

                if (!msgBackFailed.isEmpty()) {
                    consumeRequest.getMsgs().removeAll(msgBackFailed);

                    this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
                }
                break;
            default:
                break;
复制代码

发现没,如果消息模式是广播的,会直接 drop ,如果是集群的,或重新投递。

和消费超时的参数不同,消息消费失败,不近需要重新计算 delayLevel ,并且 brokerNamemessagebrokerName,不再是 null。

public boolean sendMessageBack(final MessageExt msg, final ConsumeConcurrentlyContext context) {
        int delayLevel = context.getDelayLevelWhenNextConsume();

        // Wrap topic with namespace before sending back message.
        msg.setTopic(this.defaultMQPushConsumer.withNamespace(msg.getTopic()));
        try {
            this.defaultMQPushConsumerImpl.sendMessageBack(msg, delayLevel, context.getMessageQueue().getBrokerName());
            return true;
        } catch (Exception e) {
            log.error("sendMessageBack exception, group: " + this.consumerGroup + " msg: " + msg, e);
        }

        return false;
    }
复制代码

retry 投递核心函数

DefaultMQPushConsumerImpl#sendMessageBack

 public void sendMessageBack(MessageExt msg, int delayLevel, final String brokerName)
            throws RemotingException, MQBrokerException, InterruptedException, MQClientException {
        try {
            String brokerAddr = (null != brokerName) ? this.mQClientFactory.findBrokerAddressInPublish(brokerName)
                    : RemotingHelper.parseSocketAddressAddr(msg.getStoreHost());
            this.mQClientFactory.getMQClientAPIImpl().consumerSendMessageBack(brokerAddr, msg,
                    this.defaultMQPushConsumer.getConsumerGroup(), delayLevel, 5000, getMaxReconsumeTimes());
        } catch (Exception e) {
            log.error("sendMessageBack Exception, " + this.defaultMQPushConsumer.getConsumerGroup(), e);

            Message newMsg = new Message(MixAll.getRetryTopic(this.defaultMQPushConsumer.getConsumerGroup()), msg.getBody());

            String originMsgId = MessageAccessor.getOriginMessageId(msg);
            MessageAccessor.setOriginMessageId(newMsg, UtilAll.isBlank(originMsgId) ? msg.getMsgId() : originMsgId);

            newMsg.setFlag(msg.getFlag());
            MessageAccessor.setProperties(newMsg, msg.getProperties());
            MessageAccessor.putProperty(newMsg, MessageConst.PROPERTY_RETRY_TOPIC, msg.getTopic());
            MessageAccessor.setReconsumeTime(newMsg, String.valueOf(msg.getReconsumeTimes() + 1));
            MessageAccessor.setMaxReconsumeTimes(newMsg, String.valueOf(getMaxReconsumeTimes()));
            MessageAccessor.clearProperty(newMsg, MessageConst.PROPERTY_TRANSACTION_PREPARED);
            newMsg.setDelayTimeLevel(3 + msg.getReconsumeTimes());

            this.mQClientFactory.getDefaultMQProducer().send(newMsg);
        } finally {
            msg.setTopic(NamespaceUtil.withoutNamespace(msg.getTopic(), this.defaultMQPushConsumer.getNamespace()));
        }
    }
复制代码

消费者发送&生产者重试消息投递的区别

  1. consumer client 发送
  • 消费者发送,在消费超时的情况下,用 sendMessageBack 的brokerName 参数是 null, 也就意味着 超时的消息会被重新投递会 原来的broker,根据ip指定的。 那么, 如果这条消息是从 slave 拉取的话, 这个retry message 就会发送到 slave 上。

  • 消息超时的场景中, delay level 写死的 3, 对应的延迟是 30s. 也就意味着, 当消息消费超时的时候, 再次消费是 30s之后.

  1. 生产者发送
  • 消费失败的消息, 调用 sendMessageBack 的brokerName 参数是 消息的brokerName, 那么, 消费失败的消息会被投递给 这个消息所有的broker的master. 如果这条消息是从slave消费, 那么消费失败后也会被投递给 master

  • ConsumeConcurrentlyContext 进行显式设置, 如果没有设置, 默认是0. 但是在broker的处理流程中, 会尝试进行修正, 重置 delayLevel = 3 + msgExt.getReconsumeTimes(). 也就是说 重试次数越大, 消费延迟越久, 至少 >=30s.

broker在收到 CONSUMER_SEND_MSG_BACK 的请求后, 因为设置了 delayLevel 的原因, 消息会被发送到 延迟消息队列, 消息到期后, 重新投递回 retry topic 最终被 consumer client 消费. 需要注意的是, broker在处理 CONSUMER_SEND_MSG_BACK 的时候, 如果 retry topic不存在, 也会触发 retry topic的创建. 除此之外, 当消息的重复消费次数大于group配置的最大重试次数 (客户端默认16; 老版本<3.4.9客户端没有配置, 使用broker配置, 是 16),会被投递到 dead letter topic 。

总结一下:

  1. Retry topic 是什么时候创建的?

上一篇文章已经谈到:心跳和发送重试消息都会触发

  1. Retry topic 的queue有多少个?

1个, subscriptionGroupConfig.getRetryQueueNums(), 默认是1 3. rocketmq是有master-slave的,slave上也会有retryTopic吗?

会, 心跳机制保证了slave 会创建, 除此之外, 消费重试场景下, 也会触发。 客户端和所有broker 都是长链接,master ,slave 都会创建。

  1. consumer client 从 broker A上 拉取的消息消费失败后, 还会投递到 broker A上吗? 负载均衡?

When brokerA is not down, it will still be delivered to brokerA. If there is a network failure or broker down, it will rely on the producer's routing mechanism for load balancing. In the consumption timeout scenario, you can see that it is still delivered to the original broker.

Guess you like

Origin juejin.im/post/7084919543499325448