Thoroughly understand Ribbon - Load balancing strategy source code exploration

Ribbon load balancing strategy source code analysis

RandomRule

Let’s take a look at the source code of this random strategy first:
Insert image description here
This choose is its core method.

public Server choose(ILoadBalancer lb, Object key) {
    
    
		// 如果传入的LoadBalancer是空,直接返回null
        if (lb == null) {
    
    
            return null;
        } else {
    
    
            Server server = null;
			//如果server为空,那么通过while循环,直到找到一个可用的server
            while(server == null) {
    
    
            	//如果线程被中断,则返回null
                if (Thread.interrupted()) {
    
    
                    return null;
                }
				//可用服务列表
                List<Server> upList = lb.getReachableServers();
                //全部服务列表
                List<Server> allList = lb.getAllServers();
                //所有服务的数量
                int serverCount = allList.size();
				//如果没有节点注册,则返回null
                if (serverCount == 0) {
    
    
                    return null;
                }
				
				//把所有机器的数量传入chooseRandomInt拿到一个下标
                int index = this.chooseRandomInt(serverCount);
                //拿到当前的index作为upList中的下标获取server实例
                server = (Server)upList.get(index);
                //如果server为空,为空的情况是因为serverList在这个时候正在被修正。下面注释中也有解释。
                if (server == null) {
    
    
                /*
                 * The only time this should happen is if the server list were
                 * somehow trimmed. This is a transient condition. Retry after
                 * yielding.
                 */
                 //通知线程调度器,可以让出当前CPU
                 //这里既然有yield,因此才会出现上面的判断线程是否被中断。这是就是一种防御性编程的体现。
                    Thread.yield();
                    continue;
                } else {
    
    
                	//如果server是可用的则直接返回。
                    if (server.isAlive()) {
    
    
                        return server;
                    }

					//如果不可用,则把server置为空,
                    server = null;
                    Thread.yield();
                }
            }

            return server;
        }
    }

In the choose method, we can see that it obtains an index subscript through a chooseRandomInt method.
Let’s take a look at this method:
Insert image description here
It uses a ThreadLocalRandom and then passes one in through nextInt in the current thread. serverCount, then any value from 0 to serverCount will be returned.

Since random is mentioned here, let me expand a little knowledge point for everyone. In fact, whether we use the Random class or ThreadLocalRandom, randomness in Java is not actually Random number, that is to say, when all inputs are certain, no matter what kind of seed data you insert, whether it is the current time or the last few digits of the current time, it will be The results can be predicted.
So what are true random numbers? True random numbers are often mixed with some unpredictable things to use as a seed to generate data. For example, your current CPU temperature is unpredictable, so true random numbers are often generated by some hardware. These hardware have some sensors or temperature sensors, or noise sensors, which will collect certain attributes in your current environment as random number seeds to generate true random numbers.

RoundRobinRule

In fact, the code of RandomRule is relatively simple to read.
Let’s take a look at the next load balancing strategy, RoundRobinRule:

Insert image description here
We can see that the code for this choose and RandomRule look basically the same.

But it uses a counter here to limit the number of loops. If no available service is found after 10 times, a line of log will be printed:
Insert image description here
The code logic is basically the same as Random. , the key is this method:
Insert image description here
Let’s go into this method to see how it obtains the subscript:
Insert image description here
We can see that a spin is used here In the lock method, the current here is the last accessed machine.
Then a modulo operation was performed using the last visited machine + 1 and the incoming modulo to obtain the current subscript.
The modulo here is the external serverCount. For example, there are 100 servers in total, and the 90th server has been trained last time, then this is the 90th server + 1, and then modulo 100, and finally get is 91. Everyone thinks why not just use current+1 to return it directly?
In fact, this is because the size of this server collection will change. It is possible that the last time you visited was the 100th server. This time, the 100th server suddenly died, and it will start from It has been removed from the server's List, so we cannot simply perform +1, but must perform a modulo operation.

Then the following uses CAS to compare whether the currently replaced value is the current value. If so, replace it with next. If not, continue the loop.
This kind of spin and compareAndSet is a very common synchronization scheme that consumes few resources.

BestAvailableRule

Let's take a look at the choose method in BestAvailableRule,
Insert image description here
We see a judgment here. If loadBalancerStats is null, the parent class's choose method will be called and returned.
Let’s take a look at its parent class:
Insert image description here
You can see that it actually has a certain relationship with RoundRobinRule:
Insert image description here
If roundRobinRule is not empty, and the choose method of roundRobinRule is directly called.

Let’s go back to BestAvailableRule:
Insert image description here
We can see that in this loop, a loadBalancerStats is used to obtain the status of the current service. Let’s click in to see:
Insert image description here
Let’s go into getServerStats:
Insert image description here
We can see that it first obtains a serverStats in the cache, and there is a fault-tolerant logic below the catch:
means that if it cannot be obtained, a default ServerStats will be created and placed in the cache, and finally it will be retrieved from the cache and returned.

Finally, we see that there is a judgment here to determine whether the current state is in a circuit breaker state.
Insert image description here
It will pass in the current time. Let’s follow up to see what kind of judgment logic it is:
Insert image description here
First, it will get the fuse Timeout first. , if it is less than or equal to 0, it will return false directly. If it is greater than the current time, it will return true. The so-called circuit breaker means that the service is unavailable. We will know its meaning after contacting Hystrix.

Let’s take a look at how this timeout is calculated:
Insert image description here
We see that this has a blackOutPeriod. Let’s take a look at how it is obtained:
Insert image description here
First, it gets a number of failures from the counter, and then gets a threshold from a cached attribute. Then determine whether the current number of failures is less than the threshold. If it is less than 0, it will directly return 0, indicating that the current status is not a circuit breaker.
Otherwise, calculate a difference between the two, then calculate blackOutSecounds through this diff and some attributes in the cache, and then multiply this number of seconds by 1000 to get the number of milliseconds.
After getting this time:
Insert image description here
Here, the blackOutPeriod is added to the time of the last connection failure. That is to say, after the last failure, this is added. During the buffering period, it is considered a failure, that is, it is buffered for a period of time.

Finally, we return to the choose method:
Insert image description here

If it is not in the circuit breaker state, it will enter the process inside if. concurrentConnections is to obtain the number of connections connected to the current server at the current time.
After obtaining the current number of connections, it will be compared with minimalConcurrentConnections. If the current server concurrentConnections is less than minimalConcurrentConnections, concurrentConnections will be assigned to minimalConcurrentConnections.
Finally, replace the server to be selected with the current server.

We can see that there is no interruption condition in this for loop, which means that all nodes must be traversed until the node with the minimum number of connections is obtained.

Finally:
Insert image description here
If not found, it is passed directly to RoundRobinRule.

RetryRule

Let’s take a look at this RetryRule:
Insert image description here
First, in its choose method, the current time is obtained, and then the current time is added to maxRetryMillis, which means that it exceeds this Time is about to stop and try again.
Look further down:
Insert image description here
Here the choose method of subRule is called:
Insert image description here
You can see that the subRule here is RoundRobinRule. Of course, the subRule here can be specified through the construction method.

Let’s continue looking at:
Insert image description here
You can see that if the answer here is empty, or the current time is less than the maximum retry time, you can continue to retry. This retry depends on While loop, first determine if the thread is interrupted, interrupt and retry. If there is no interruption, continue to call subRule, and then continue to get the answer in the loop. If it gets empty again, or it has been down, and the current time is less than the maximum retry time , the current thread will give in, and it will be judged again the next time it loops.

Finally:
Insert image description here
Let’s take a look at this task first:

It subtracts the current time from the maximum retry time and passes it to InterruptTask. We Click in and take a look:
Insert image description here
We can see that a timer is used here to start a background task. That is to say, after the retry time has passed, the entire retry is interrupted through the background task. process.

So after completing this process, the timer will be turned off.

By reading the source code, it is not difficult to find that the implementation of RetryRule is relatively simple. It relies more on the specific load balancing strategy of the underlying layer. In fact, it is just a layer of retry logic.

Guess you like

Origin blog.csdn.net/qq_45455361/article/details/121389320