Dubbo source code analysis 12: cluster fault tolerance

Insert picture description here

Introduction

In the actual production environment, in order to ensure the stable and reliable operation of the service, we need to deploy multiple instances of the same service. However, remote services do not operate normally at all times. When a service call is abnormal, automatic fault tolerance is required.

The process of cluster fault tolerance is shown in the following figure.
Insert picture description here
Directory interface : The service directory interface contains the Invoker collection of each service, and this collection is dynamic. It is the basic
cluster interface for subsequent cluster fault tolerance, routing, and load balancing : the cluster fault-tolerant interface. When the Consumer fails to call some Providers, the request can be forwarded to those normal Provider nodes.
Router interface : The routing interface is specified by the user The rule matches the Provider
LoadBalance interface that meets the conditions : load balancing interface, according to the specified load balancing strategy, select one from the Provider collection to process the request

Directory (Service Directory)

Insert picture description here
Directory (Service Directory) represents multiple Invokers (for the consumer side, each Invoker represents a service provider)

RegistryDirectory : The Invoker list is changed according to the push of the registry, and the NotifyListener interface is implemented. When the path monitored by the registry changes, the NotifyListener#notify method will be called back, so that the Invoker list can be updated.
StaticDirectory : When used more When registering a center, gather all the register center's invoker lists into one invoker list

The following is the process of subscribing to the zookeeper service and generating Invoker during the service introduction process. Because this process is relatively related to this section,
Insert picture description here
we put it in this section for analysis. Let’s analyze the Invoker generation process, and by the way, analyze why the Invoker in the RegistryDirectory can be dynamically refreshed

Insert picture description here
Insert picture description here
The figure below is the implementation of ZookeeperRegistry#doSubscribe. The ZookeeperRegistry#notify method will be executed the first time a subscription or node changes. This method will call back the RegistryDirectory#notify method and update the cache
Insert picture description here
RegistryDirectory#refreshOverrideAndInvoker(providerURLs) The
method is to generate Invoker based on providerURLs process

The main logic is as follows:

  1. There is only one service provider, and the agreement is empty, the service will be disabled
  2. Generate new Invokers based on providerURLs. If the Invoker of a providerURL already exists, it will not be regenerated, otherwise it will be generated.
  3. Destroy the Invoker generated by the old providerURLs (the Invoker used by the new providerURL will not be destroyed)

The approximate logic is as follows
Insert picture description here

Cluster

Insert picture description here
In the process of service introduction, Cluster will merge multiple Invokers, and only expose one Invoker for the caller to use

// RegistryProtocol#doRefer
Invoker invoker = cluster.join(directory);

Cluster contains a cluster fault tolerance strategy. The final fault tolerance strategy used is determined by Dubbo SPI. The default is FailoverCluster

Implementation class Explanation
AvailableCluster Find an available node and initiate the call directly
FailoverCluster Retry on failure (default)
FailfastCluster Fail fast
FailsafeCluster Security failure
FailbackCluster Automatic recovery from failure
ForkingCluster Parallel call
BroadcastCluster Broadcast call
@SPI(FailoverCluster.NAME)
public interface Cluster {
    
    

    @Adaptive
    <T> Invoker<T> join(Directory<T> directory) throws RpcException;

}
public class FailoverCluster implements Cluster {
    
    

    public final static String NAME = "failover";

    @Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
    
    
        return new FailoverClusterInvoker<T>(directory);
    }

}

The FailoverCluster#join method simply returns a FailoverClusterInvoker, and other cluster fault tolerance strategies are the same as this, and they all return the corresponding Invoker

Insert picture description here
All cluster fault-tolerant Invokers implement the AbstractClusterInvoker interface

The AbstractClusterInvoker interface mainly abstracts the following two parts of logic

  1. Filter out eligible Invokers based on routing configuration
  2. Initialize the load balancing strategy and perform load balancing on List<Invoker> invokers

Analyze the implementation of FailoverClusterInvoker, other Invoker implementations are similar to this one, and those who are interested can take a look.

public class FailoverClusterInvoker<T> extends AbstractClusterInvoker<T> {
    
    

    @Override
    @SuppressWarnings({
    
    "unchecked", "rawtypes"})
    public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    
    
        List<Invoker<T>> copyInvokers = invokers;
        checkInvokers(copyInvokers, invocation);
        String methodName = RpcUtils.getMethodName(invocation);
        // 获取重试次数
        int len = getUrl().getMethodParameter(methodName, Constants.RETRIES_KEY, Constants.DEFAULT_RETRIES) + 1;
        if (len <= 0) {
    
    
            len = 1;
        }
        // retry loop.
        RpcException le = null; // last exception.
        List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyInvokers.size()); // invoked invokers.
        Set<String> providers = new HashSet<String>(len);
        // 循环调用,失败重试
        for (int i = 0; i < len; i++) {
    
    
            //Reselect before retry to avoid a change of candidate `invokers`.
            //NOTE: if `invokers` changed, then `invoked` also lose accuracy.
            if (i > 0) {
    
    
                // 当前实例已经被销毁,则抛出异常
                checkWhetherDestroyed();
                // 重新获取服务提供者
                copyInvokers = list(invocation);
                // check again
                // 重新检查一下
                checkInvokers(copyInvokers, invocation);
            }
            // 通过负载均衡选择 Invoker,已经调用过的不会再选择
            Invoker<T> invoker = select(loadbalance, invocation, copyInvokers, invoked);
            invoked.add(invoker);
            RpcContext.getContext().setInvokers((List) invoked);
            try {
    
    
                // 发起远程调用
                Result result = invoker.invoke(invocation);
                if (le != null && logger.isWarnEnabled()) {
    
    
                    logger.warn("Although retry the method " + methodName
                            + " in the service " + getInterface().getName()
                            + " was successful by the provider " + invoker.getUrl().getAddress()
                            + ", but there have been failed providers " + providers
                            + " (" + providers.size() + "/" + copyInvokers.size()
                            + ") from the registry " + directory.getUrl().getAddress()
                            + " on the consumer " + NetUtils.getLocalHost()
                            + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                            + le.getMessage(), le);
                }
                return result;
            } catch (RpcException e) {
    
    
                // 业务类的异常直接跑出来
                if (e.isBiz()) {
    
     // biz exception.
                    throw e;
                }
                le = e;
            } catch (Throwable e) {
    
    
                le = new RpcException(e.getMessage(), e);
            } finally {
    
    
                providers.add(invoker.getUrl().getAddress());
            }
        }
        throw new RpcException(le.getCode(), "Failed to invoke the method "
                + methodName + " in the service " + getInterface().getName()
                + ". Tried " + len + " times of the providers " + providers
                + " (" + providers.size() + "/" + copyInvokers.size()
                + ") from the registry " + directory.getUrl().getAddress()
                + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
                + Version.getVersion() + ". Last error is: "
                + le.getMessage(), le.getCause() != null ? le.getCause() : le);
    }

}

When the call is executed, the doInvoke method will be executed. Invokers (invokers after routing) and loadbalance (load balancing strategy) are all determined by AbstractClusterInvoker according to the configuration. DoInvoke will return directly if the call succeeds, otherwise the traversal of the list of invokers will continue until it exceeds number of retries. If the call is not successful at this time, RpcException will be thrown

Router

The Invoker found from the Directory based on the calling information cannot be called directly, and the Invoker that needs to be filtered by the routing rules can be directly called

Configure the following routing rules to indicate that the service caller whose ip is 172.22.3.1 will only call the service whose ip is 172.22.3.2.

host = 172.22.3.1 => host = 172.22.3.2

You can see the official document for routing rules:
http://dubbo.apache.org/zh/docs/v2.7/user/examples/routing-rule/

There are three types of routes as follows

  1. Conditional routing: Use the grammatical rules defined by Dubbo to write routing rules
  2. File routing: the framework reads routing rules from the file
  3. Script routing: use jdk's own script analysis engine to parse routing rule scripts

Conditional routing is used most, briefly introduce the realization of conditional routing

public class ConditionRouter extends AbstractRouter {
    
    

    protected Map<String, MatchPair> whenCondition;
    protected Map<String, MatchPair> thenCondition;

}

When routing, it mainly deals with whenCondition and thenCondition. These two maps will be initialized in the constructor.

When the routing conditions are as follows, the 2 maps generated are as follows

host != 4.4.4.4 & host = 2.2.2.2,1.1.1.1,3.3.3.3 & method = sayHello => host = 1.2.3.4 & host != 4.4.4.4

Insert picture description here
Insert picture description here
The following is the part of executing the routing logic. The method of matenWhen and matenThen is to use the whenCondition and thenCondition initialized above to match the process. I will not analyze it in detail. If you are interested in the implementation, you can debug the official Test class (ctrl + shift + t shortcut key to quickly reach the corresponding Test class)

// ConditionRouter
public <T> List<Invoker<T>> route(List<Invoker<T>> invokers, URL url, Invocation invocation)
        throws RpcException {
    
    
    // 不生效
    if (!enabled) {
    
    
        return invokers;
    }

    if (CollectionUtils.isEmpty(invokers)) {
    
    
        return invokers;
    }
    try {
    
    
        // 没有whenRule匹配,返回所有
        if (!matchWhen(url, invocation)) {
    
    
            return invokers;
        }
        List<Invoker<T>> result = new ArrayList<Invoker<T>>();
        // 没有thenRule,则表明服务消费者在黑名单中,返回空列表
        if (thenCondition == null) {
    
    
            logger.warn("The current consumer in the service blacklist. consumer: " + NetUtils.getLocalHost() + ", service: " + url.getServiceKey());
            return result;
        }
        for (Invoker<T> invoker : invokers) {
    
    
            // 匹配成功
            if (matchThen(invoker.getUrl(), url)) {
    
    
                result.add(invoker);
            }
        }
        if (!result.isEmpty()) {
    
    
            // result不为空,直接返回
            return result;
        } else if (force) {
    
    
            // result为空 force=true 强制返回空列表
            logger.warn("The route result is empty and force execute. consumer: " + NetUtils.getLocalHost() + ", service: " + url.getServiceKey() + ", router: " + url.getParameterAndDecoded(Constants.RULE_KEY));
            return result;
        }
    } catch (Throwable t) {
    
    
        logger.error("Failed to execute condition router rule: " + getUrl() + ", invokers: " + invokers + ", cause: " + t.getMessage(), t);
    }
    // result为空,force=false 返回所有Invoker列表
    return invokers;
}

LoadBalance (load balancing)

If there is only one Invoker filtered by the routing rules, you can directly initiate the call. If there are more than one, the load balancing strategy is involved at this time. Dubbo provides the following strategies. If you cannot meet your needs, you can customize the implementation of the LoadBalance interface

Implementation class Explanation
RandomLoadBalance Random strategy (default)
RoundRobinLoadBalance Polling strategy
LeastActiveLoadBalance Minimum number of active calls
ConsistentHashLoadBalance Consistent hash strategy

Insert picture description here
AbstractLoadBalance mainly provides a method getWeight, which assigns a weight to the service according to the startup time of the service.

The main ideas are as follows

Let's analyze RandomLoadBalance

The idea is as follows:
RandomLoadBalance is a concrete realization of the weighted random algorithm, and its algorithm idea is very simple. Suppose we have a set of servers = [A, B, C], their corresponding weights are weights = [5, 3, 2], and the total weight is 10. Now tile these weight values ​​on one-dimensional coordinate values, [0, 5) interval belongs to server A, [5, 8) interval belongs to server B, and [8, 10) interval belongs to server C. Next, a random number in the range [0, 10) is generated by the random number generator, and then the interval in which this random number will fall is calculated. For example, the number 3 will fall into the interval corresponding to server A, and then return to server A. The larger the weight of the machine, the larger the range of the corresponding interval on the coordinate axis, so the number generated by the random number generator will have a greater probability of falling within this interval. As long as the random number generated by the random number generator is well distributed, after multiple selections, the proportion of times each server is selected is close to its weight ratio. For example, after 10,000 selections, server A has been selected approximately 5000 times, server B has been selected approximately 3000 times, and server C has been selected approximately 2000 times.

public class RandomLoadBalance extends AbstractLoadBalance {
    
    

    public static final String NAME = "random";

    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
    
    
        // Number of invokers
        int length = invokers.size();
        // Every invoker has the same weight?
        boolean sameWeight = true;
        // the weight of every invokers
        int[] weights = new int[length];
        // the first invoker's weight
        // 获取第一个服务的权重
        int firstWeight = getWeight(invokers.get(0), invocation);
        weights[0] = firstWeight;
        // The sum of weights
        int totalWeight = firstWeight;
        // 下面这个循环有两个作用
        // 1. 计算总权重
        // 2. 检测所有服务的权重是否相同
        for (int i = 1; i < length; i++) {
    
    
            int weight = getWeight(invokers.get(i), invocation);
            // save for later use
            weights[i] = weight;
            // Sum
            // 类加权重
            totalWeight += weight;
            if (sameWeight && weight != firstWeight) {
    
    
                sameWeight = false;
            }
        }
        // 下面的 if 分支主要用于获取随机数,并计算随机数落在哪个区间上
        if (totalWeight > 0 && !sameWeight) {
    
    
            // If (not every invoker has the same weight & at least one invoker's weight>0), select randomly based on totalWeight.
            // 随机获取一个 [0, totalWeight) 区间内的数字
            int offset = ThreadLocalRandom.current().nextInt(totalWeight);
            // Return a invoker based on the random value.
            // 循环让 offset 数减去服务提供者权重值,当 offset 小于0时,返回相应的 Invoker。
            // 举例说明一下,我们有 servers = [A, B, C],weights = [5, 3, 2],offset = 7。
            // 第一次循环,offset - 5 = 2 > 0,即 offset > 5,
            // 表明其不会落在服务器 A 对应的区间上。
            // 第二次循环,offset - 3 = -1 < 0,即 5 < offset < 8,
            // 表明其会落在服务器 B 对应的区间上
            for (int i = 0; i < length; i++) {
    
    
                offset -= weights[i];
                if (offset < 0) {
    
    
                    return invokers.get(i);
                }
            }
        }
        // 如果所有服务提供者权重值相同,此时直接随机返回一个即可
        // If all invokers have the same weight value or totalWeight=0, return evenly.
        return invokers.get(ThreadLocalRandom.current().nextInt(length));
    }

}

Others, you can look at the analysis on the official website, it is very clear
http://dubbo.apache.org/zh/docs/v2.7/dev/source/loadbalance/

Reference blog

Service Directory
[0]http://dubbo.apache.org/zh/docs/v2.7/dev/source/directory/Load
Balancing
[1]http://dubbo.apache.org/zh/docs/v2. 7/dev/source/loadbalance/

Guess you like

Origin blog.csdn.net/zzti_erlie/article/details/109696511