Exploring the RocketMQ source code [1]-Transaction news from the perspective of Producer

Preface

Apache RocketMQ, a well-known open source messaging middleware, was born in Alibaba and donated to Apache in 2016. From RocketMQ 4.0 to the latest v4.7.1, whether in Alibaba's internal or external communities, it has won widespread attention and praise.
Out of interest and work needs, I recently studied part of the RocketMQ 4.7.1 code, during which a lot of confusion occurred, and I also gained more inspiration.

This article will stand in the perspective of the sender and analyze how RocketMQ works in transactional message sending by reading the RocketMQ Producer source code. It should be noted that the code posted in this article comes from the RocketMQ source code of version 4.7.1. The sending discussed in this article only refers to the process of sending from Producer to Broker, and does not include the process of Broker delivering messages to Consumer.

Macro overview

RocketMQ transaction message sending process:

Combined with the source code, the sendMessageInTransaction method of RocketMQ's transaction message TransactionMQProducer actually calls the sendMessageInTransaction method of DefaultMQProducerImpl. We enter the sendMessageInTransaction method, and the sending process of the entire transaction message is clearly visible:

First, check before sending and fill in the necessary parameters, including the prepare transaction message.

Source list-1

public TransactionSendResult sendMessageInTransaction(final Message msg,
    final LocalTransactionExecuter localTransactionExecuter, final Object arg)
    throws MQClientException {
    TransactionListener transactionListener = getCheckListener(); 
        if (null == localTransactionExecuter && null == transactionListener) {
        throw new MQClientException("tranExecutor is null", null);
    }

    // ignore DelayTimeLevel parameter
    if (msg.getDelayTimeLevel() != 0) {
        MessageAccessor.clearProperty(msg, MessageConst.PROPERTY_DELAY_TIME_LEVEL);
    }

    Validators.checkMessage(msg, this.defaultMQProducer);

    SendResult sendResult = null;
    MessageAccessor.putProperty(msg, MessageConst.PROPERTY_TRANSACTION_PREPARED, "true");
    MessageAccessor.putProperty(msg, MessageConst.PROPERTY_PRODUCER_GROUP, this.defaultMQProducer.getProducerGroup());

Enter the sending process:

Source list-2

    try {
        sendResult = this.send(msg);
    } catch (Exception e) {
        throw new MQClientException("send message Exception", e);
    }

Determine whether to execute the local transaction according to the processing result returned by the broker, and start the local transaction execution if the half message is successfully sent:

Source list-3

    LocalTransactionState localTransactionState = LocalTransactionState.UNKNOW;
    Throwable localException = null;
    switch (sendResult.getSendStatus()) {
        case SEND_OK: {
            try {
                if (sendResult.getTransactionId() != null) {
                    msg.putUserProperty("__transactionId__", sendResult.getTransactionId());
                }
                String transactionId = msg.getProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
                if (null != transactionId && !"".equals(transactionId)) {
                    msg.setTransactionId(transactionId);
                }
                if (null != localTransactionExecuter) { 
                    localTransactionState = localTransactionExecuter.executeLocalTransactionBranch(msg, arg);
                } else if (transactionListener != null) { 
                    log.debug("Used new transaction API");
                    localTransactionState = transactionListener.executeLocalTransaction(msg, arg); 
                }
                if (null == localTransactionState) {
                    localTransactionState = LocalTransactionState.UNKNOW;
                }

                if (localTransactionState != LocalTransactionState.COMMIT_MESSAGE) {
                    log.info("executeLocalTransactionBranch return {}", localTransactionState);
                    log.info(msg.toString());
                }
            } catch (Throwable e) {
                log.info("executeLocalTransactionBranch exception", e);
                log.info(msg.toString());
                localException = e;
            }
        }
        break;
        case FLUSH_DISK_TIMEOUT:
        case FLUSH_SLAVE_TIMEOUT:
        case SLAVE_NOT_AVAILABLE:  // 当备broker状态不可用时,半消息要回滚,不执行本地事务
            localTransactionState = LocalTransactionState.ROLLBACK_MESSAGE;
            break;
        default:
            break;
    }

The execution of the local transaction ends, and the two-stage processing is performed according to the local transaction status:

Source list-4

    try {
        this.endTransaction(sendResult, localTransactionState, localException);
    } catch (Exception e) {
        log.warn("local transaction execute " + localTransactionState + ", but end broker transaction failed", e);
    }

    // 组装发送结果
    // ...
    return transactionSendResult;
}

Next, we in-depth code analysis at each stage.

Deep inside

One-stage delivery

Focus on the send method. After entering the send method, we found that the first stage of RocketMQ transaction messages uses the SYNC synchronization mode:

Source list-5

public SendResult send(Message msg,
    long timeout) throws MQClientException, RemotingException, MQBrokerException, InterruptedException {
    return this.sendDefaultImpl(msg, CommunicationMode.SYNC, null, timeout);
}

This is easy to understand. After all, the transaction message is based on the results of a stage to determine whether to execute a local transaction, so it must be blocked waiting for the broker's ack.

Let's enter DefaultMQProducerImpl.java to see the implementation of the sendDefaultImpl method. By reading the code of this method, we will try to understand the behavior of the producer during the first stage of the transaction message sending process. It is worth noting that this method is not customized for transaction messages, or even customized for SYNC synchronization mode, so after reading this code, you can basically have a more comprehensive understanding of RocketMQ's message sending mechanism.
The logic of this code is very smooth, I can't bear to slice it. In order to save space, replace the more complicated but less informative parts of the code with comments to preserve the integrity of the process as much as possible. The parts that I personally think are more important or easily overlooked are marked with notes, and some details are explained in detail later.

Source list-6

private SendResult sendDefaultImpl(
    Message msg,
    final CommunicationMode communicationMode,
    final SendCallback sendCallback,
    final long timeout
    ) throws MQClientException, RemotingException, MQBrokerException, InterruptedException {
    this.makeSureStateOK();
    // 一、消息有效性校验。见后文
    Validators.checkMessage(msg, this.defaultMQProducer);
    final long invokeID = random.nextLong();
    long beginTimestampFirst = System.currentTimeMillis();
    long beginTimestampPrev = beginTimestampFirst;
    long endTimestamp = beginTimestampFirst;

    // 获取当前topic的发送路由信息,主要是要broker,如果没找到则从namesrv获取
    TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());
    if (topicPublishInfo != null && topicPublishInfo.ok()) {
        boolean callTimeout = false;
        MessageQueue mq = null;
        Exception exception = null;
        SendResult sendResult = null;
        // 二、发送重试机制。见后文
        int timesTotal = communicationMode == CommunicationMode.SYNC ? 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed() : 1;
        int times = 0;
        String[] brokersSent = new String[timesTotal];
        for (; times < timesTotal; times++) {
            // 第一次发送是mq == null, 之后都是有broker信息的
            String lastBrokerName = null == mq ? null : mq.getBrokerName();
            // 三、rocketmq发送消息时如何选择队列?——broker异常规避机制 
            MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);

            if (mqSelected != null) {
                mq = mqSelected;
                brokersSent[times] = mq.getBrokerName();
                try {
                    beginTimestampPrev = System.currentTimeMillis();
                    if (times > 0) {
                        //Reset topic with namespace during resend.
                        msg.setTopic(this.defaultMQProducer.withNamespace(msg.getTopic()));
                    }
                    long costTime = beginTimestampPrev - beginTimestampFirst;
                    if (timeout < costTime) {
                        callTimeout = true;
                        break;
                    }
                    // 发送核心代码
                    sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);
                    endTimestamp = System.currentTimeMillis();
                    // rocketmq 选择 broker 时的规避机制,开启 sendLatencyFaultEnable == true 才生效
                    this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false);

                    switch (communicationMode) {
                    // 四、RocketMQ的三种CommunicationMode。见后文
                        case ASYNC: // 异步模式
                            return null;
                        case ONEWAY: // 单向模式
                            return null;
                        case SYNC: // 同步模式
                            if (sendResult.getSendStatus() != SendStatus.SEND_OK) {
                                if (this.defaultMQProducer.isRetryAnotherBrokerWhenNotStoreOK()) {
                                    continue;
                                }
                            }
                            return sendResult;
                        default:
                            break;
                    }
                } catch (RemotingException e) {
                    // ...
                    // 自动重试
                } catch (MQClientException e) {
                    // ...
                    // 自动重试
                } catch (MQBrokerException e) {
                   // ...
                    // 仅返回码==NOT_IN_CURRENT_UNIT==205 时自动重试
                    // 其他情况不重试,抛异常
                } catch (InterruptedException e) {
                   // ...
                    // 不重试,抛异常
                }
            } else {
                break;
            }
        }

        if (sendResult != null) {
            return sendResult;
        }

        // 组装返回的info信息,最后以MQClientException抛出
        // ... ...

        // 超时场景抛RemotingTooMuchRequestException
        if (callTimeout) {
            throw new RemotingTooMuchRequestException("sendDefaultImpl call timeout");
        }

        // 填充MQClientException异常信息
        // ...
    }

    validateNameServerSetting();

    throw new MQClientException("No route info of this topic: " + msg.getTopic() + FAQUrl.suggestTodo(FAQUrl.NO_TOPIC_ROUTE_INFO),
        null).setResponseCode(ClientErrorCode.NOT_FOUND_TOPIC_EXCEPTION);
}

1. Message validity check:

Source list-7

 Validators.checkMessage(msg, this.defaultMQProducer);

In this method, the validity of the message is verified, including the verification of topic and message body. The topic naming must conform to the specification, and avoid using the built-in system message TOPIC. Message body length> 0 && Message body length <= 1024*1024*4 = 4M.

Source list-8

public static void checkMessage(Message msg, DefaultMQProducer defaultMQProducer)
    throws MQClientException {
    if (null == msg) {
        throw new MQClientException(ResponseCode.MESSAGE_ILLEGAL, "the message is null");
    }
    // topic
    Validators.checkTopic(msg.getTopic());
    Validators.isNotAllowedSendTopic(msg.getTopic());

    // body
    if (null == msg.getBody()) {
        throw new MQClientException(ResponseCode.MESSAGE_ILLEGAL, "the message body is null");
    }

    if (0 == msg.getBody().length) {
        throw new MQClientException(ResponseCode.MESSAGE_ILLEGAL, "the message body length is zero");
    }

    if (msg.getBody().length > defaultMQProducer.getMaxMessageSize()) {
        throw new MQClientException(ResponseCode.MESSAGE_ILLEGAL,
            "the message body size over max value, MAX: " + defaultMQProducer.getMaxMessageSize());
    }
}

Two, send retry mechanism

Producer will automatically retry when the message is not sent successfully, the maximum number of sending times = retryTimesWhenSendFailed + 1 = 3 times.

It is worth noting that not all abnormal situations will be retried. The information that can be extracted from the above source code tells us that it will automatically retry in the following three situations:
1) RemotingException or MQClientException occurs when one of two exceptions occurs
2) When MQBrokerException occurs and ResponseCode is NOT_IN_CURRENT_UNIT = 205
3) In SYNC mode, no exception occurs and the status of the sending result is not SEND_OK

Before each message is sent, it will first check whether the previous two steps have taken a long time (the timeout period is 3000ms by default). If so, it will not continue to send and return to the timeout without retrying. Two problems are explained here:
1) The automatic retry within the producer is imperceptible to business applications, and the sending time seen by the application includes the time consumed by all retries;
2) Once it times out, it means this This message transmission has ended in failure due to timeout. This message will finally be thrown in the form of RemotingTooMuchRequestException.

What needs to be pointed out here is that the RocketMQ official document points out that the sending timeout time is 10s, that is, 10000ms. Many people on the Internet also think that the timeout time of rocketMQ is 10s. However, 3000ms was clearly written in the code, and finally I confirmed after debugging that the default timeout period was indeed 3000ms. It is also recommended that the RocketMQ team confirm the document. If there is a mistake, it is better to correct it as soon as possible.
Exploring the RocketMQ source code [1]-Transaction news from the perspective of Producer

Third, the broker's unusual avoidance mechanism

Source list-8

MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);  

This line of code is the process of selecting the queue before sending.

This involves a core mechanism of RocketMQ message sending high availability, latencyFaultTolerance. This mechanism is part of the producer's load balancing and is controlled by the value of sendLatencyFaultEnable. The default is false and the broker failure delay mechanism is not activated. When the value is true, the broker failure delay mechanism is enabled, which can be activated by the Producer.

When selecting a queue, turn on the abnormal conventional avoidance mechanism, and according to the working status of the broker, avoid selecting the broker agent in the current state. Unhealthy brokers will be avoided for a period of time. If the abnormal conventional avoidance mechanism is not enabled, the next queue is selected in order , But in the retry scenario, try to choose a queue that is different from the broker sent last time. Every time a message is sent, the status information of the broker is maintained through the updateFaultItem method.

Source list-9

public void updateFaultItem(final String brokerName, final long currentLatency, boolean isolation) {
    if (this.sendLatencyFaultEnable) {
        // 计算延迟多久,isolation表示是否需要隔离该broker,若是,则从30s往前找第一个比30s小的延迟值,再按下标判断规避的周期,若30s,则是10min规避;
        // 否则,按上一次发送耗时来决定规避时长;
        long duration = computeNotAvailableDuration(isolation ? 30000 : currentLatency);
        this.latencyFaultTolerance.updateFaultItem(brokerName, currentLatency, duration);
    }
}  

Go deep into the selectOneMessageQueue method to find out:

Source list-10

public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {
    if (this.sendLatencyFaultEnable) {
        // 开启异常规避
        try {
            int index = tpInfo.getSendWhichQueue().getAndIncrement();
            for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {
                int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();
                if (pos < 0)
                    pos = 0;
                // 按顺序取下一个message queue作为发送的queue
                MessageQueue mq = tpInfo.getMessageQueueList().get(pos);
                // 当前queue所在的broker可用,且与上一个queue的broker相同,
                // 或者第一次发送,则使用这个queue
                if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) {
                    if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName))
                        return mq;
                }
            }

            final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();
            int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);
            if (writeQueueNums > 0) {
                final MessageQueue mq = tpInfo.selectOneMessageQueue();
                if (notBestBroker != null) {
                    mq.setBrokerName(notBestBroker);
                    mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums);
                }
                return mq;
            } else {
                latencyFaultTolerance.remove(notBestBroker);
            }
        } catch (Exception e) {
            log.error("Error occurred when selecting message queue", e);
        }

        return tpInfo.selectOneMessageQueue();
    }
    // 不开启异常规避,则随机自增选择Queue
    return tpInfo.selectOneMessageQueue(lastBrokerName);
}

Four, RocketMQ's three CommunicationMode:

Source list-11

 public enum CommunicationMode {
    SYNC,
    ASYNC,
    ONEWAY,
}

The above three modes refer to the stage when the message reaches the broker from the sender, and does not include the process of the broker delivering the message to the subscriber.
The difference of the three modes of sending:

  • One-way mode: ONEWAY. The message sender just sends it and doesn't care about the result of the broker processing. In this mode, since the processing flow is small, the sending time is very small, and the throughput is large, but the message cannot be guaranteed to be reliable and not lost. It is often used in scenarios with huge traffic but not important messages, such as heartbeat sending.
  • Asynchronous mode: ASYNC. After the message sender sends the message to the broker, there is no need to wait for the broker to process it. Instead, an asynchronous thread does the message processing. After the processing is completed, the sender is notified of the sending result in the form of a callback. If there is an exception during asynchronous processing, it will be retried internally before returning the failure result of the sender (default 3 times, the sender is not aware). In this mode, the sender's waiting time is small, the throughput is large, and the message is reliable. It is used in the scene of heavy but important message.
  • Synchronization mode: SYNC. The message sender needs to wait for the broker to complete the processing and explicitly return success or failure. Before the message sender gets the result of the message sending failure, it will also experience internal retries (default 3 times, the sender does not perceive). In this mode, the sender will block and wait for the message processing result, the waiting time is long, the message is reliable, and it is used for important message scenarios with small traffic. It should be emphasized that the processing of one-phase and half-transaction messages of transaction messages is a synchronous mode.

The specific implementation differences can also be seen in the sendKernelImpl method. ONEWAY mode is the simplest and does not do any processing. Among the parameters of the sendMessage method responsible for sending, compared with the synchronous mode, the asynchronous mode has more callback methods, topicPublishInfo containing topic sending routing meta-information, instance containing sending broker information, producer containing sending queue information, and the number of retries. In addition, in asynchronous mode, the compressed message will be copied first.

Source list-12

    switch (communicationMode) {
                case ASYNC:
                    Message tmpMessage = msg;
                    boolean messageCloned = false;
                    if (msgBodyCompressed) {
                        //If msg body was compressed, msgbody should be reset using prevBody.
                        //Clone new message using commpressed message body and recover origin massage.
                        //Fix bug:https://github.com/apache/rocketmq-externals/issues/66
                        tmpMessage = MessageAccessor.cloneMessage(msg);
                        messageCloned = true;
                        msg.setBody(prevBody);
                    }

                    if (topicWithNamespace) {
                        if (!messageCloned) {
                            tmpMessage = MessageAccessor.cloneMessage(msg);
                            messageCloned = true;
                        }
                        msg.setTopic(NamespaceUtil.withoutNamespace(msg.getTopic(), this.defaultMQProducer.getNamespace()));
                    }

                    long costTimeAsync = System.currentTimeMillis() - beginStartTime;
                    if (timeout < costTimeAsync) {
                        throw new RemotingTooMuchRequestException("sendKernelImpl call timeout");
                    }
                    sendResult = this.mQClientFactory.getMQClientAPIImpl().sendMessage(
                        brokerAddr,
                        mq.getBrokerName(),
                        tmpMessage,
                        requestHeader,
                        timeout - costTimeAsync,
                        communicationMode,
                        sendCallback,
                        topicPublishInfo,
                        this.mQClientFactory,
                        this.defaultMQProducer.getRetryTimesWhenSendAsyncFailed(),
                        context,
                        this);
                    break;
                case ONEWAY:
                case SYNC:
                    long costTimeSync = System.currentTimeMillis() - beginStartTime;
                    if (timeout < costTimeSync) {
                        throw new RemotingTooMuchRequestException("sendKernelImpl call timeout");
                    }
                    sendResult = this.mQClientFactory.getMQClientAPIImpl().sendMessage(
                        brokerAddr,
                        mq.getBrokerName(),
                        msg,
                        requestHeader,
                        timeout - costTimeSync,
                        communicationMode,
                        context,
                        this);
                    break;
                default:
                    assert false;
                    break;
            } 

There is such a picture in the official document, which clearly describes the detailed process of asynchronous communication:
Exploring the RocketMQ source code [1]-Transaction news from the perspective of Producer

Two-stage delivery

The source code listing-3 reflects the execution of the local transaction. The localTransactionState associates the execution result of the local transaction with the second-phase sending of the transaction message.
It is worth noting that if the sending result of the first stage is SLAVE_NOT_AVAILABLE, that is, when the standby broker is not available, the localTransactionState will be set to Rollback, and the local transaction will not be executed at this time. After that, the endTransaction method is responsible for the second-stage submission, see source code listing-4. Specific to the implementation of endTransaction:

Source list-13

public void endTransaction(
    final SendResult sendResult,
    final LocalTransactionState localTransactionState,
    final Throwable localException) throws RemotingException, MQBrokerException, InterruptedException, UnknownHostException {
    final MessageId id;
    if (sendResult.getOffsetMsgId() != null) {
        id = MessageDecoder.decodeMessageId(sendResult.getOffsetMsgId());
    } else {
        id = MessageDecoder.decodeMessageId(sendResult.getMsgId());
    }
    String transactionId = sendResult.getTransactionId();
    final String brokerAddr = this.mQClientFactory.findBrokerAddressInPublish(sendResult.getMessageQueue().getBrokerName());
    EndTransactionRequestHeader requestHeader = new EndTransactionRequestHeader();
    requestHeader.setTransactionId(transactionId);
    requestHeader.setCommitLogOffset(id.getOffset());
    switch (localTransactionState) {
        case COMMIT_MESSAGE:
            requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_COMMIT_TYPE);
            break;
        case ROLLBACK_MESSAGE:
            requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_ROLLBACK_TYPE);
            break;
        case UNKNOW:
            requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_NOT_TYPE);
            break;
        default:
            break;
    }

    requestHeader.setProducerGroup(this.defaultMQProducer.getProducerGroup());
    requestHeader.setTranStateTableOffset(sendResult.getQueueOffset());
    requestHeader.setMsgId(sendResult.getMsgId());
    String remark = localException != null ? ("executeLocalTransactionBranch exception: " + localException.toString()) : null;
    // 采用oneway的方式发送二阶段消息
    this.mQClientFactory.getMQClientAPIImpl().endTransactionOneway(brokerAddr, requestHeader, remark,
        this.defaultMQProducer.getSendMsgTimeout());
}

When sending in the second stage, the reason why it is sent in oneway way, I personally understand this is precisely because the transaction message has a special reliable mechanism-back check.

Message review

When the Broker has passed a certain period of time and found that it still does not get the exact information about whether to commit or roll back the second phase of the transaction message, the Broker does not know what happened to the Producer (maybe the producer is down, or the producer may send a commit but the network If the jitter is lost, it may be...), so I took the initiative to initiate a review.
The back-check mechanism of transaction messages is more reflected on the broker side. RocketMQ's broker isolates transaction messages in different sending stages with three different topics: Half message, Op message, and real message, so that Consumer can only see the message that finally confirms that the commit needs to be delivered. The detailed implementation logic will not be repeated in this article for the time being, and another post can be opened to interpret it from the perspective of Broker.

Returning to the Producer's perspective, when receiving the Broker's review request, the Producer will check the local transaction status according to the message, and decide to submit or roll back based on the result. This requires the Producer to specify the review implementation in case of emergency.
Of course, under normal circumstances, it is not recommended to actively send the UNKNOW status. This status will undoubtedly bring extra check back overhead to the broker. It is reasonable to start the check back mechanism only when there is an unpredictable abnormal situation. select.

In addition, the transaction review of version 4.7.1 is not unlimited review, but a maximum of 15 review:

Source list-14

/**
 * The maximum number of times the message was checked, if exceed this value, this message will be discarded.
 */
@ImportantField
private int transactionCheckMax = 15;

appendix

The official default parameters of the Producer are as follows (the timeout duration parameter has also been mentioned in the previous article, and the debug result is the default 3000ms, not 10000ms):
Exploring the RocketMQ source code [1]-Transaction news from the perspective of Producer

RocketMQ is an excellent open source message middleware, and many developers have done secondary development based on it. For example, the commercial product of Ant Group SOFAStack MQ message queue is a financial-grade message middleware re-developed based on the RocketMQ kernel. A lot of excellent work has been done in information management and control, transparent operation and maintenance.
I hope that RocketMQ will continue to grow and develop with greater vitality under the co-creation and construction of the community of developers.

Guess you like

Origin blog.51cto.com/14898876/2605860