Source code analysis of RocketMQ's Producer to send message queue selection

1. Description

There are two types. One is to send messages directly. There is an algorithm for selecting the queue inside the client, and no outside changes are allowed. There is also a custom queue selection algorithm (three algorithms are built-in, if you don't like it, you can customize the algorithm to achieve).

public class org.apache.rocketmq.client.producer.DefaultMQProducer {
    // 只发送消息,queue的选择由默认的算法来实现
    @Override
    public SendResult send(Collection<Message> msgs) {}
    
    // 自定义选择queue的算法进行发消息
    @Override
    public SendResult send(Collection<Message> msgs, MessageQueue messageQueue) {}
}

Second, the source code

1、send(msg, mq)

1.1, usage scenarios

Sometimes we do not want the default queue selection algorithm, but need to customize it. Generally, the most commonly used scenario is sequential messages . The sending of sequential messages generally specifies that messages with a certain set of characteristics are sent in the same queue, so that the order can be guaranteed. , Because the single queue is ordered.

If you don’t understand the sequence message, please see my previous sequence message article.
Introduce the principle and code of RocketMQ sequential message

1.2, principle analysis

There are three built-in algorithms, all of which implement a common interface:

org.apache.rocketmq.client.producer.MessageQueueSelector

  • SelectMessageQueueByRandom

  • SelectMessageQueueByHash

  • SelectMessageQueueByMachineRoom

  • If you want to customize the logic, you can directly implement the interface and override the select method.

Very typical strategy mode , different algorithms have different implementation classes, and there is a top-level interface.

1.2.1、SelectMessageQueueByRandom

public class SelectMessageQueueByRandom implements MessageQueueSelector {
    private Random random = new Random(System.currentTimeMillis());
    @Override
    public MessageQueue select(List<MessageQueue> mqs, Message msg, Object arg) {
        // mqs.size():队列的个数。假设队列个数是4,那么这个value就是0-3之间随机。
        int value = random.nextInt(mqs.size());
        return mqs.get(value);
    }
}

so easy is pure randomness.

mqs.size(): The number of queues. Assuming that the number of queues is 4, then this value is random between 0-3.

1.2.2、SelectMessageQueueByHash

public class SelectMessageQueueByHash implements MessageQueueSelector {
    @Override
    public MessageQueue select(List<MessageQueue> mqs, Message msg, Object arg) {
        int value = arg.hashCode();
        // 防止出现负数,取个绝对值,这也是我们平时开发中需要注意到的点
        if (value < 0) {
            value = Math.abs(value);
        }
        // 直接取余队列个数。
        value = value % mqs.size();
        return mqs.get(value);
    }
}

so easy is purely taking the remainder.

mqs.size(): The number of queues. Suppose the number of queues is 4, and the hashcode of the value is 3, then 3% 4 = 3, then it is the last queue, that is, the fourth queue, because the subscript starts from 0.

1.2.3、SelectMessageQueueByMachineRoom

public class SelectMessageQueueByMachineRoom implements MessageQueueSelector {
    private Set<String> consumeridcs;
    @Override
    public MessageQueue select(List<MessageQueue> mqs, Message msg, Object arg) {
        return null;
    }
    public Set<String> getConsumeridcs() {
        return consumeridcs;
    }
    public void setConsumeridcs(Set<String> consumeridcs) {
        this.consumeridcs = consumeridcs;
    }
}

I don’t understand what bird use is, just return null; so if there is a custom requirement, just customize it directly, this thing does not see any use for eggs.

1.2.4, custom algorithm

public class MySelectMessageQueue implements MessageQueueSelector {
    @Override
    public MessageQueue select(List<MessageQueue> mqs, Message msg, Object arg) {
        return mqs.get(0);
    }
}

Always choose queue 0, which is the first queue. Just to give an example, it really depends on your business needs.

1.3, call chain

org.apache.rocketmq.client.producer.DefaultMQProducer#send(Message msg, MessageQueueSelector selector, Object arg)
->
org.apache.rocketmq.client.producer.DefaultMQProducer#send(Message msg, MessageQueueSelector selector, Object arg)
->
org.apache.rocketmq.client.producer.DefaultMQProducer#send(Message msg, MessageQueueSelector selector, Object arg, long timeout)
->
org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#sendSelectImpl(xxx)
->
mq = mQClientFactory.getClientConfig().queueWithNamespace(selector.select(messageQueueList, userMessage, arg));
->
selector.select(messageQueueList, userMessage, arg)
->
org.apache.rocketmq.client.producer.MessageQueueSelector#select(final List<MessageQueue> mqs, final Message msg, final Object arg)

2、send(msg)

2.1, usage scenarios

Generally, this is used in scenarios where there is no special requirement. Because his default queue selection algorithm is very good, various optimization scenarios have been thought of for us.

2.2. Principle analysis

// {@link org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#sendDefaultImpl}
// 这是发送消息核心原理,不清楚的看我之前发消息源码分析的文章

// 选择消息要发送的队列
MessageQueue mq = null;
for (int times = 0; times < 3; times++) {
    // 首次肯定是null
    String lastBrokerName = null == mq ? null : mq.getBrokerName();
    // 调用下面的方法进行选择queue
    MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);
    if (mqSelected != null) {
        // 给mq赋值,如果首次失败了,那么下次重试的时候(也就是下次for的时候),mq就有值了。
        mq = mqSelected;
        ......
        // 很关键,能解答下面会提到的两个问题:
        // 1.faultItemTable是什么时候放进去的?
        // 2.isAvailable() 为什么只是判断一个时间就可以知道Broker是否可用?   
        this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false);    
    }
}

Select the main entrance of the queue

public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {
    // 默认为false,代表不启用broker故障延迟
    if (this.sendLatencyFaultEnable) {
        try {
            // 随机数且+1
            int index = tpInfo.getSendWhichQueue().getAndIncrement();
            // 遍历
            for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {
                // 先(随机数 +1) % queue.size()
                int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();
                if (pos < 0) {
                    pos = 0;
                }
                MessageQueue mq = tpInfo.getMessageQueueList().get(pos);
                // 看找到的这个queue所属的broker是不是可用的
                if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) {
                    // 非失败重试,直接返回到的队列
                    // 失败重试的情况,如果和选择的队列是上次重试是一样的,则返回
                    
                    // 也就是说如果你这个queue所在的broker可用,
                    // 且不是重试进来的或失败重试的情况,如果和选择的队列是上次重试是一样的,那你就是天选之子了。
                    if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName)) {
                        return mq;
                    }
                }
            }
            
			// 如果所有队列都不可用,那么选择一个相对好的broker,不考虑可用性的消息队列
            final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();
            int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);
            if (writeQueueNums > 0) {
                final MessageQueue mq = tpInfo.selectOneMessageQueue();
                if (notBestBroker != null) {
                    mq.setBrokerName(notBestBroker);
                    mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums);
                }
                return mq;
            } else {
                latencyFaultTolerance.remove(notBestBroker);
            }
        } catch (Exception e) {
            log.error("Error occurred when selecting message queue", e);
        }
		// 随机选择一个queue
        return tpInfo.selectOneMessageQueue();
    }
	// 当sendLatencyFaultEnable=false的时候选择queue的方法,默认就是false。
    return tpInfo.selectOneMessageQueue(lastBrokerName);
}

2.2.1. Do not enable broker failure delay

Since sendLatencyFaultEnable is false by default, first look at the logic when sendLatencyFaultEnable=false

public MessageQueue selectOneMessageQueue(final String lastBrokerName) {
    // 第一次就是null,第二次(也就是重试的时候)就不是null了。
    if (lastBrokerName == null) {
        // 第一次选择队列的逻辑
        return selectOneMessageQueue();
    } else {
        // 第一次选择队列发送消息失败了,第二次重试的时候选择队列的逻辑
        
        int index = this.sendWhichQueue.getAndIncrement();
        for (int i = 0; i < this.messageQueueList.size(); i++) {
            int pos = Math.abs(index++) % this.messageQueueList.size();
            if (pos < 0)
                pos = 0;
            MessageQueue mq = this.messageQueueList.get(pos);
			// 过滤掉上次发送消息失败的队列
            if (!mq.getBrokerName().equals(lastBrokerName)) {
                return mq;
            }
        }
        return selectOneMessageQueue();
    }
}

Then continue to look at the logic of selecting the queue for the first time:

public MessageQueue selectOneMessageQueue() {
    // 当前线程有个ThreadLocal变量,存放了一个随机数 {@link org.apache.rocketmq.client.common.ThreadLocalIndex#getAndIncrement}
    // 然后取出随机数根据队列长度取模且将随机数+1
    int index = this.sendWhichQueue.getAndIncrement();
    int pos = Math.abs(index) % this.messageQueueList.size();
    if (pos < 0) {
        pos = 0;
    }
    return this.messageQueueList.get(pos);
}

Well, it actually means a bit random. But the highlight is that taking out the random number modulo the queue length and adding the random number +1, this +1 is lit (getAndIncrement cas +1).

When the message fails to be sent for the first time, lastBrokerName will store the broker that has failed the current selection (mq = mqSelected). After retrying, the lastBrokerName has a value, which means that the last selected boker failed to send, and the sendWhichQueue local thread variable + 1. Traverse the selection message queue until it is not the last broker, which is to avoid the logic of the broker that failed to send the last time.

For example: this time your random number is 1, the queue length is 4, 1%4=1, and it fails at this time and enters the retry. Then before the retry, that is, after the previous step 1%4, he puts 1 After the ++ operation is performed, it becomes 2, so when you retry this time, it is 2%4=2, which directly filters out the broker that just failed.

Then continue to see the logic of the second retry selection queue:

// +1
int index = this.sendWhichQueue.getAndIncrement();
for (int i = 0; i < this.messageQueueList.size(); i++) {
    // 取模
    int pos = Math.abs(index++) % this.messageQueueList.size();
    if (pos < 0)
        pos = 0;
    MessageQueue mq = this.messageQueueList.get(pos);
    // 过滤掉上次发送消息失败的队列
    if (!mq.getBrokerName().equals(lastBrokerName)) {
        return mq;
    }
}
// 没找到能用的queue的话继续走默认的那个
return selectOneMessageQueue();

So easy, didn't you fail last time, did you try again after entering me? I am also very simple. I still take out the random number +1 and then modulate the queue length. I see if the broker failed last time. If it is his kid, I will filter it out and continue to traverse the queue to find the next one that can be used.

2.2.2, enable broker failure delay

That is the logic in the following if

if (this.sendLatencyFaultEnable) {
    ....
}

Just look at the above comment. It is very clear. Let me first (随机数 +1) % queue.size(), and then see if the broker to which your queue belongs is available. If it is available, it will not be retryed or failed. If the selected queue is the last retry The test is the same, then just return and you are done. So how do you see if the broker is available?

// {@link org.apache.rocketmq.client.latency.LatencyFaultToleranceImpl#isAvailable(String)}
public boolean isAvailable(final String name) {
    final FaultItem faultItem = this.faultItemTable.get(name);
    if (faultItem != null) {
        return faultItem.isAvailable();
    }
    return true;
}

// {@link org.apache.rocketmq.client.latency.LatencyFaultToleranceImpl.FaultItem#isAvailable()}
public boolean isAvailable() {
    return (System.currentTimeMillis() - startTimestamp) >= 0;
}

doubt:

  • When was faultItemTable put in?
  • isAvailable() Why can we know whether Broker is available only by judging for a period of time?

This requires the method called after the message is sent above:

// {@link org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#updateFaultItem}
// 发送开始时间
beginTimestampPrev = System.currentTimeMillis();
// 进行发送
sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout);
// 发送结束时间
endTimestamp = System.currentTimeMillis();
// 更新broker的延迟情况
this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false);

The detailed logic is as follows:

// {@link org.apache.rocketmq.client.latency.MQFaultStrategy#updateFaultItem}
public void updateFaultItem(final String brokerName, final long currentLatency, boolean isolation) {
    if (this.sendLatencyFaultEnable) {
        // 首次isolation传入的是false,currentLatency是发送消息所耗费的时间,如下
        // this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false);
        long duration = computeNotAvailableDuration(isolation ? 30000 : currentLatency);
        this.latencyFaultTolerance.updateFaultItem(brokerName, currentLatency, duration);
    }
}

private long[] latencyMax = {50L, 100L, 550L, 1000L, 2000L, 3000L, 15000L};
private long[] notAvailableDuration = {0L, 0L, 30000L, 60000L, 120000L, 180000L, 600000L};

// 根据延迟时间对比MQFaultStrategy中的延迟级别数组latencyMax 不可用时长数组notAvailableDuration 来将该broker加进faultItemTable中。
private long computeNotAvailableDuration(final long currentLatency) {
    for (int i = latencyMax.length - 1; i >= 0; i--) {
        // 假设currentLatency花费了10ms,那么latencyMax里的数据显然不符合下面的所有判断,所以直接return 0;
        if (currentLatency >= latencyMax[i])
            return this.notAvailableDuration[i];
    }
    return 0;
}

// {@link org.apache.rocketmq.client.latency.LatencyFaultToleranceImpl#updateFaultItem()}
@Override
// 其实主要就是给startTimestamp赋值为当前时间+computeNotAvailableDuration(isolation ? 30000 : currentLatency);的结果,给isAvailable()所用
// 也就是说只有notAvailableDuration == 0的时候,isAvailable()才会返回true。
public void updateFaultItem(final String name, final long currentLatency, final long notAvailableDuration) {
    FaultItem old = this.faultItemTable.get(name);
    if (null == old) {
        final FaultItem faultItem = new FaultItem(name);
        faultItem.setCurrentLatency(currentLatency);
        // 给startTimestamp赋值为当前时间+computeNotAvailableDuration(isolation ? 30000 : currentLatency);的结果,给isAvailable()所用
        faultItem.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);

        old = this.faultItemTable.putIfAbsent(name, faultItem);
        if (old != null) {
            old.setCurrentLatency(currentLatency);
            // 给startTimestamp赋值为当前时间+computeNotAvailableDuration(isolation ? 30000 : currentLatency);的结果,给isAvailable()所用
            old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
        }
    } else {
        old.setCurrentLatency(currentLatency);
        // 给startTimestamp赋值为当前时间+computeNotAvailableDuration(isolation ? 30000 : currentLatency);的结果,给isAvailable()所用
        old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
    }
}

The following two lines of code are explained in detail:

private long[] latencyMax = {50L, 100L, 550L, 1000L, 2000L, 3000L, 15000L};
private long[] notAvailableDuration = {0L, 0L, 30000L, 60000L, 120000L, 180000L, 600000L};
latencyMax notAvailableDuration
50L 0L
100L 0L
550L 30000L
1000L 60000L
2000L 120000L
3000L 180000L
15000L 600000L

which is

  • currentLatency is greater than or equal to 50 and less than 100, then notAvailableDuration is 0
  • currentLatency is greater than or equal to 100 and less than 550, then notAvailableDuration is 0
  • currentLatency is greater than or equal to 550 and less than 1000, then notAvailableDuration is 300000
  • …and many more

Let's give another example:

Assuming isolation is passed to true,

long duration = computeNotAvailableDuration(isolation ? 30000 : currentLatency);

Then notAvailableDuration will be passed in 600000L. Combined with the isAvailable method, the approximate process is as follows:

RocketMQ predicts an available time (current time + notAvailableDuration) for each Broker. When the current time is greater than this time, it means that the Broker is available, and notAvailableDuration has 6 levels that correspond to the interval of latencyMax one-to-one, and predict according to the incoming currentLatency When will the Broker be available.

So look at this again

public boolean isAvailable() {
    return (System.currentTimeMillis() - startTimestamp) >= 0;
}

According to the execution time to see which interval it falls into, notAvailableDuration is 0 within the time of 0~100, which are all available. After the value is greater than this value, the available time will start to increase, and it is considered not the optimal solution. give up.

2.3, call chain

org.apache.rocketmq.client.producer.DefaultMQProducer#send(org.apache.rocketmq.common.message.Message)
->
org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#send(org.apache.rocketmq.common.message.Message)
->
org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#send(org.apache.rocketmq.common.message.Message, long)
->
org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#sendDefaultImpl(xxx)
->
MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);
->
org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#selectOneMessageQueue(xxx) 
org.apache.rocketmq.client.latency.MQFaultStrategy#selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName)    

2.4. Summary

  • When fault tolerance is not enabled, poll the queue for sending. If it fails, filter the failed Broker when retrying
  • If the fault tolerance strategy is enabled, RocketMQ's prediction mechanism will be used to predict whether a Broker is available
  • If the Broker that failed last time is available, then the Broker's queue will still be selected
  • If the above situation fails, randomly select one to send
  • When sending a message, it will record the time of the call and whether an error is reported, and predict the available time of the broker based on the time

Three, summary

1. Questions

He has two overloaded send() methods, one supports algorithm selector and the other does not support algorithm selection. The queue algorithm selection is a typical strategy mode. Why send(message)is the built-in queue selection algorithm not extracted into a separate class, and then this class implements the org.apache.rocketmq.client.producer.MessageQueueSelectorinterface? For example, it is called:, SelectMessageQueueByBestsuch as the following:

public class org.apache.rocketmq.client.producer.DefaultMQProducer {
    // 只发送消息,queue的选择由默认的算法来实现
    @Override
    public SendResult send(Collection<Message> msgs) {
        this.send(msgs, new SelectMessageQueueByBest().select(xxx));
    }
    
    // 自定义选择queue的算法进行发消息
    @Override
    public SendResult send(Collection<Message> msgs, MessageQueue messageQueue) {}
}

My guess is that this algorithm may be too complicated, and there are too many interactions with other types, and the parameters may be different from the other three built-in, so they didn’t get them together, but they were still standardized together. They did the same thing, but The algorithm is different, it is a typical strategy mode.

2. Interview

Q: What are the algorithms for selecting queue when sending messages?

Answer: There are two types, one is to send messages directly, and you cannot select queue. The queue selection algorithm is as follows:

  • When fault tolerance is not enabled, poll the queue for sending. If it fails, filter the failed Broker when retrying
  • If the fault tolerance strategy is enabled, RocketMQ's prediction mechanism will be used to predict whether a Broker is available
  • If the Broker that failed last time is available, then the Broker's queue will still be selected
  • If the above situation fails, randomly select one to send
  • When sending a message, it will record the time of the call and whether an error is reported, and predict the available time of the broker based on the time

The other is to select an algorithm when sending a message and even implement an interface custom algorithm:

  • SelectMessageQueueByRandom:random
  • SelectMessageQueueByHash:hash
  • SelectMessageQueueByMachineRoom
  • Implement interface customization

Guess you like

Origin blog.csdn.net/qq_33762302/article/details/114783849