Message Queue (VIII) --- RocketMQ message and retries transmission delay (half original)

本文图片和部分总结来自于参考资料,半原创,侵删

problem

  • Rocketmq retry timeout if there is a problem, how to solve if overtime is resend the message it? Or have been waiting for
  • If a msg into the retry queue (% RETRY_XXX%), then the success of the consumer

Outline

    This paper introduces the mechanism RocketMQ retry mechanism and message retries.

Regular tasks

Timing Task Overview

    rocketmq scheduled task create a single topic, and the timing task is rocketmq given time is hierarchical, and different levels in different queues corresponding to topic, and then performing a plurality of timings within a queue via a "task execution timing of service" the task, we need to change the scheduled task topic and tag the actual execution to be sent.

Send examples

Send examples

Message msg =
    new Message("TopicTest" /* Topic */,
                "TagA" /* Tag */,
                ("Hello RocketMQ " + i).getBytes(RemotingHelper.DEFAULT_CHARSET) /* Message body */);
msg.setDelayTimeLevel(i + 1);

    Ratings

public class MessageStoreConfig {

    private String messageDelayLevel = "1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m 1h 2h";
    
}

Write regular tasks

    When written is written in the time of the write commitLog, which is very important, because this is the basis of failure to achieve consumer retry. Alternatively CommitLog topic will this message queue ID and a dedicated queue ID to the topic and the timing corresponding to the respective levels. Real queue ID and the topic will be placed as an attribute to this message, the processing time of the later message may be transmitted from the own queue id.

    public class CommitLog {

        public PutMessageResult putMessage(final MessageExtBrokerInner msg) {

            // Delay Delivery
            if (msg.getDelayTimeLevel() > 0) {

                topic = ScheduleMessageService.SCHEDULE_TOPIC;
                queueId = ScheduleMessageService.delayLevel2QueueId(msg.getDelayTimeLevel());

                // Backup real topic, queueId
                MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_TOPIC, msg.getTopic());
                MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_QUEUE_ID, String.valueOf(msg.getQueueId()));
                msg.setPropertiesString(MessageDecoder.messageProperties2String(msg.getProperties()));

                Alternatively // Topic and QueueID 
                msg.setTopic (Topic); 
                msg.setQueueId (QueueID); 
            } 
            
        } 
        
    }

Handle regular tasks

    Perform regular tasks of service, ScheduleMessageService start method

    void Start public () { 

        for (of Map.Entry <Integer, Long> entry: this.delayLevelTable.entrySet ()) { 
            Integer entry.getKey Level = (); 
            Long timeDelay entry.getValue = (); 
            Long = the this offset. offsetTable.get (Level); 
            IF (offset == null) { 
                offset = 0L; 
            } 

            IF (! timeDelay = null) { 
                // timer holds scheduled tasks, and then on time to perform this task, 
                // but timer only one internal thread on a mission, we can not guarantee the accuracy of the time (because when one thread of execution, a task time has come to) 
                // Note that the task is to build a time delay for each level task 
                this.timer.schedule (new new DeliverDelayedMessageTimerTask (Level, offset), FIRST_DELAY_TIME);  
            }
        }

        this.timer.scheduleAtFixedRate(new TimerTask() {

            @Override
            public void run() {
                try {
                    ScheduleMessageService.this.persist();
                } catch (Throwable e) {
                    log.error("scheduleAtFixedRate flush exception", e);
                }
            }
        }, 10000, this.defaultMessageStore.getMessageStoreConfig().getFlushDelayOffsetInterval());
    }

DeliverDelayedMessageTimerTask the extends the TimerTask {class 

    public void executeOnTimeup () { 
        // ... 
        for (; I <bufferCQ.getSize (); I = + ConsumeQueue.CQ_STORE_UNIT_SIZE) { 
            // if the time 
            Long COUNTDOWN = deliverTimestamp - now; 

            IF (COUNTDOWN <= 0) { 
                // get the message 
                MessageExt msgExt = 
                    ScheduleMessageService.this.defaultMessageStore.lookMessageByOffset (offsetPy, sizePy); 
                // corrected message, the topic is provided on the right and queue ID 
                MessageExtBrokerInner msgInner = this.messageTimeup (msgExt); 
                // newly stored message 
                putMessageResult putMessageResult = 
                    ScheduleMessageService.this.defaultMessageStore
                    .putMessage (msgInner); 
            } {the else 
                after this message delivery // countdown 
                ScheduleMessageService.this 
                    .timer 
                    .schedule (new new DeliverDelayedMessageTimerTask (this.delayLevel, nextOffset), COUNTDOWN); 
                // update the offset 
            } 
        } // End of for 

        // update the offset 
    } 
    
}

    Meanwhile, the timing of task persistence also, a consumption schedule, the amount of displacement corresponding to a message

1297993-20200106151538670-2043955717.png

1297993-20200106151602350-1881388344.png

Message Retry consumption

    RocketMQ encountered the following message will be retried:

  • Throw an exception
  • Return NULL state
  • Return RECONSUME_LATER state
  • No response timeout 15 minutes

1297993-20200107155108508-1498228666.png

Sign up for the consumer retry queue

    consumer subscription will start when "% RETRY_XXX%" of the topic, as is the topic when a consumer fails processing retry message. As shown below:

1297993-20200106162252037-330088757.png

public class DefaultMQPushConsumerImpl implements MQConsumerInner {

    public synchronized void start() throws MQClientException {
        switch (this.serviceState) {
        case CREATE_JUST:
            // ...
            this.copySubscription();
            // ...
        
            this.serviceState = ServiceState.RUNNING;
            break;
        }
    }

    private void copySubscription() throws MQClientException {
        switch (this.defaultMQPushConsumer.getMessageModel()) {
        case BROADCASTING:
            break;
            
        case CLUSTERING:
            // 重试话题组
            final String retryTopic = MixAll.getRetryTopic(this.defaultMQPushConsumer.getConsumerGroup());
            SubscriptionData subscriptionData = FilterAPI.buildSubscriptionData(this.defaultMQPushConsumer.getConsumerGroup(),
                                                                                retryTopic, SubscriptionData.SUB_ALL);
            this.rebalanceImpl.getSubscriptionInner().put(retryTopic, subscriptionData);
            break;
            
        default:
            break;
        }
    }
    
}

超时消费

    我们思考一个问题,假如消费者掉线了,那么消息直接发不过去了,而要是消费者的消费逻辑执行了太久的业务逻辑,那么应该有一个动作来触发 消费超时,进行重试.

ConsumeMessageConcurrentlyService 的 start 方法。

    public void start() {
        this.cleanExpireMsgExecutors.scheduleAtFixedRate(new Runnable() {

            @Override
            public void run() {
                cleanExpireMsg();
            }

        }, this.defaultMQPushConsumer.getConsumeTimeout(), this.defaultMQPushConsumer.getConsumeTimeout(), TimeUnit.MINUTES);
    }

这个定时周期任务每过 getConsumeTimeout 时间就会扫描消费超时的任务,调用 sendMessageBack 方法,该方法会调用 RPC发送消息给 broker ,消费失败进行重试。

    上一篇我们讲到消息消费的过程,当集群模式下,消息消费成功会本地的消息消费进度,而失败了会调用RPC 发送消息给broker ,而broker 处理的逻辑在 SendMessageProcessor

    @Override
    public RemotingCommand processRequest(ChannelHandlerContext ctx,
        RemotingCommand request) throws RemotingCommandException {
        SendMessageContext mqtraceContext;
        switch (request.getCode()) {

            //消费者消费失败的情况
            case RequestCode.CONSUMER_SEND_MSG_BACK:
                return this.consumerSendMsgBack(ctx, request);
            default:

                SendMessageRequestHeader requestHeader = parseRequestHeader(request);
                if (requestHeader == null) {
                    return null;
                }

                mqtraceContext = buildMsgContext(ctx, requestHeader);
                this.executeSendMessageHookBefore(ctx, request, mqtraceContext);

                RemotingCommand response;
                if (requestHeader.isBatch()) {
                    response = this.sendBatchMessage(ctx, request, mqtraceContext, requestHeader);
                } else {
                    response = this.sendMessage(ctx, request, mqtraceContext, requestHeader);
                }

                this.executeSendMessageHookAfter(response, mqtraceContext);
                return response;
        }
    }

  需要注意的是 consumerTimeOut 的时间是 15 分钟,生产的时候可以配置短点。 

批量处理的问题

    批量处理一批数据要是返回 RECONSUME_LATER ,那么这批数据就会重新发给 broker ,进行消息重试,所以在业务逻辑的时候就要考虑消费者重新消费的幂等性。

    ConsumeRequest的 run 方法

        @Override
        public void run() {
            ....

            try {
                ConsumeMessageConcurrentlyService.this.resetRetryTopic(msgs);
                if (msgs != null && !msgs.isEmpty()) {
                    for (MessageExt msg : msgs) {
                        MessageAccessor.setConsumeStartTimeStamp(msg, String.valueOf(System.currentTimeMillis()));
                    }
                }
                //NO.1 业务实现
                status = listener.consumeMessage(Collections.unmodifiableList(msgs), context);
            } catch (Throwable e) {
                log.warn("consumeMessage exception: {} Group: {} Msgs: {} MQ: {}",
                        RemotingHelper.exceptionSimpleDesc(e),
                        ConsumeMessageConcurrentlyService.this.consumerGroup,
                        msgs,
                        messageQueue);
                hasException = true;
            }

            ...

            if (!processQueue.isDropped()) {
                //NO.2 处理消息消费的结果
                ConsumeMessageConcurrentlyService.this.processConsumeResult(status, context, this);
            } else {
                log.warn("processQueue is dropped without process consume result. messageQueue={}, msgs={}", messageQueue, msgs);
            }
        }

我们可以设置最大批量处理的数量为 1 ,那么就会针对每一条消息进行重试,但是那样的话就会性能相对于批量处理肯定差一些。 

ack 机制

    public void processConsumeResult(
            final ConsumeConcurrentlyStatus status,
            final ConsumeConcurrentlyContext context,
            final ConsumeRequest consumeRequest
    ) {
        int ackIndex = context.getAckIndex();

        if (consumeRequest.getMsgs().isEmpty())
            return;

        switch (status) {
            case CONSUME_SUCCESS:
                if (ackIndex >= consumeRequest.getMsgs().size()) {
                    ackIndex = consumeRequest.getMsgs().size() - 1;
                }
                int ok = ackIndex + 1;
                int failed = consumeRequest.getMsgs().size() - ok;
                this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
                this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
                break;
            case RECONSUME_LATER:
                ackIndex = -1;
                this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
                        consumeRequest.getMsgs().size());
                break;
            default:
                break;
        }

        switch (this.defaultMQPushConsumer.getMessageModel()) {
            case BROADCASTING:
                for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
                    MessageExt msg = consumeRequest.getMsgs().get(i);
                    log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
                }
                break;
            case CLUSTERING:
                //发送给broker , 该批数据进行消息重试
                List<MessageExt> msgBackFailed = new ArrayList<MessageExt>(consumeRequest.getMsgs().size());
                for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
                    MessageExt msg = consumeRequest.getMsgs().get(i);
                    boolean result = this.sendMessageBack(msg, context);
                    if (!result) {
                        msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
                        msgBackFailed.add(msg);
                    }
                }

                if (!msgBackFailed.isEmpty()) {
                    consumeRequest.getMsgs().removeAll(msgBackFailed);

                    this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
                }
                break;
            default:
                break;
        }

        //删除消息树中的已消费的消息节点,并返回消息树中最小的节点,更新最小的节点为当前进度!!
        long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
        if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
            this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
        }
    }

After completion of the consumer can see the message, update progress is processqueue corresponding message corresponding to the lowest node in the tree (i.e. minimum offset nodes), it is possible that there is a problem, from below with reference to

 

这钟方式和传统的一条message单独ack的方式有本质的区别。性能上提升的同时,会带来一个潜在的重复问题——由于消费进度只是记录了一个下标,就可能出现拉取了100条消息如 2101-2200的消息,后面99条都消费结束了,只有2101消费一直没有结束的情况。

在这种情况下,RocketMQ为了保证消息肯定被消费成功,消费进度职能维持在2101,直到2101也消费结束了,本地的消费进度才能标记2200消费结束了(注:consumerOffset=2201)。

在这种设计下,就有消费大量重复的风险。如2101在还没有消费完成的时候消费实例突然退出(机器断电,或者被kill)。这条queue的消费进度还是维持在2101,当queue重新分配给新的实例的时候,新的实例从broker上拿到的消费进度还是维持在2101,这时候就会又从2101开始消费,2102-2200这批消息实际上已经被消费过还是会投递一次。

to sum up

    I learned from the reference data in their ability to learn other people's differences are summarized by concentrating code snippets, summarized the core of logical steps to deepen the understanding of logic.

Reference material

  • https://www.jianshu.com/p/5843cdcd02aa
  • http://jaskey.github.io/blog/2017/01/25/rocketmq-consume-offset-management/

Guess you like

Origin www.cnblogs.com/Benjious/p/12162047.html