Bienvenidos a todos a prestar atención a github.com/hsfxuebao , espero que les sea útil. Si creen que es posible, hagan clic en Estrella.

Cuando diseñamos el sistema, no solo debemos considerar cómo debe funcionar la lógica del código en circunstancias normales, sino también cómo debe funcionar la lógica del código en circunstancias anormales. Cuando ocurre un error cuando el consumidor del servicio invoca el servicio del proveedor del servicio, Dubbo proporciona una variedad de esquemas de tolerancia a fallas.El modo predeterminado es Failover Cluster, lo que significa volver a intentar en caso de falla. Echemos un vistazo a los modos de tolerancia a fallas del clúster proporcionados por Dubbo.

1. Gráfico de creación y llamada de instancias tolerantes a fallas

2. Análisis de la estrategia de tolerancia a fallas

2.1 conmutación por error

故障转移策略. Cuando un consumidor no puede llamar a un servidor en el clúster de proveedores, automáticamente intenta llamar a otros servidores. El número de reintentos se especifica a través del atributo de reintentos.

org.apache.dubbo.rpc.cluster.support.FailoverClusterInvoker#doInvoke:

public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    List<Invoker<T>> copyInvokers = invokers;
    // 检测invokers列表是否为空
    checkInvokers(copyInvokers, invocation);
    // 获取RPC调用的方法名
    String methodName = RpcUtils.getMethodName(invocation);
    // 获取retries属性值
    int len = calculateInvokeTimes(methodName);
    // retry loop.
    RpcException le = null; // last exception.
    // 存放所有已经尝试调用过的invoker，这些invoker中，除了最后一个外，其它的都是不可用的
    List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyInvokers.size()); // invoked invokers.
    Set<String> providers = new HashSet<String>(len);

    for (int i = 0; i < len; i++) {
        //Reselect before retry to avoid a change of candidate `invokers`.
        //NOTE: if `invokers` changed, then `invoked` also lose accuracy.
        if (i > 0) {
            // 检测委托对象invoker是否被销毁
            checkWhetherDestroyed();
            // 更新本地invoker列表
            copyInvokers = list(invocation);
            // check again 重新检测invokers列表是否为空
            checkInvokers(copyInvokers, invocation);
        }
        // 负载均衡
        Invoker<T> invoker = select(loadbalance, invocation, copyInvokers, invoked);
        // 将选择出的invoker写入到invoked集合
        invoked.add(invoker);
        RpcContext.getServiceContext().setInvokers((List) invoked);
        try {
            // 远程调用
            Result result = invokeWithContext(invoker, invocation);
            //重试过程中，将最后一次调用的异常信息以 warn 级别日志输出
            if (le != null && logger.isWarnEnabled()) {
                logger.warn("Although retry the method " + methodName
                        + " in the service " + getInterface().getName()
                        + " was successful by the provider " + invoker.getUrl().getAddress()
                        + ", but there have been failed providers " + providers
                        + " (" + providers.size() + "/" + copyInvokers.size()
                        + ") from the registry " + directory.getUrl().getAddress()
                        + " on the consumer " + NetUtils.getLocalHost()
                        + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                        + le.getMessage(), le);
            }
            return result;
        } catch (RpcException e) {
            // 如果是业务性质的异常，不再重试，直接抛出
            if (e.isBiz()) { // biz exception.
                throw e;
            }
            // 其他性质的异常统一封装成RpcException
            le = e;
        } catch (Throwable e) {
            le = new RpcException(e.getMessage(), e);
        } finally {
            // 将提供者的地址添加到providers
            providers.add(invoker.getUrl().getAddress());
        }
    }  // end-for
    // 最后抛出异常
    throw new RpcException(le.getCode(), "Failed to invoke the method "
            + methodName + " in the service " + getInterface().getName()
            + ". Tried " + len + " times of the providers " + providers
            + " (" + providers.size() + "/" + copyInvokers.size()
            + ") from the registry " + directory.getUrl().getAddress()
            + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
            + Version.getVersion() + ". Last error is: "
            + le.getMessage(), le.getCause() != null ? le.getCause() : le);
}

2.2 a prueba de fallos

快速失败策略. El lado del consumidor solo inicia una llamada y, si falla, se informa un error de inmediato. Por lo general, se usa para operaciones de escritura no idempotentes, como agregar registros.org.apache.dubbo.rpc.cluster.support.FailfastClusterInvoker#doInvoke:

public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    checkInvokers(invokers, invocation);
    Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
    try {
        return invokeWithContext(invoker, invocation);
    } catch (Throwable e) {
        if (e instanceof RpcException && ((RpcException) e).isBiz()) { // biz exception.
            throw (RpcException) e;
        }
        throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0,
                "Failfast invoke providers " + invoker.getUrl() + " " + loadbalance.getClass().getSimpleName()
                        + " for service " + getInterface().getName()
                        + " method " + invocation.getMethodName() + " on consumer " + NetUtils.getLocalHost()
                        + " use dubbo version " + Version.getVersion()
                        + ", but no luck to perform the invocation. Last error is: " + e.getMessage(),
                e.getCause() != null ? e.getCause() : e);
    }
}

2.3 a prueba de fallas

失败安全策略. Cuando ocurre una excepción cuando el consumidor llama al proveedor, la operación de consumo se ignora directamente. Esta estrategia se usa típicamente para realizar servicios relativamente poco importantes. org.apache.dubbo.rpc.cluster.support.FailsafeClusterInvoker#doInvoke:

public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    try {
        checkInvokers(invokers, invocation);
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        return invokeWithContext(invoker, invocation);
    } catch (Throwable e) {
        logger.error("Failsafe ignore exception: " + e.getMessage(), e);
        return AsyncRpcResult.newDefaultAsyncResult(null, null, invocation); // ignore
    }
}

2.4 conmutación por recuperación

失败自动恢复策略. Después de que el consumidor no pueda llamar al proveedor, Dubbo registrará la solicitud fallida y luego iniciará periódicamente una solicitud de reintento, y la cantidad de veces que se ejecuta la tarea programada aún se especifica mediante reintentos en el archivo de configuración. Esta estrategia se suele utilizar para servicios con menos requisitos de tiempo real.

org.apache.dubbo.rpc.cluster.support.FailbackClusterInvoker#doInvoke:

protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    Invoker<T> invoker = null;
    try {
        checkInvokers(invokers, invocation);
        invoker = select(loadbalance, invocation, invokers, null);
        return invokeWithContext(invoker, invocation);
    } catch (Throwable e) {
        logger.error("Failback to invoke method " + invocation.getMethodName() + ", wait for retry in background. Ignored exception: "
                + e.getMessage() + ", ", e);
        addFailed(loadbalance, invocation, invokers, invoker);
        return AsyncRpcResult.newDefaultAsyncResult(null, null, invocation); // ignore
    }
}

private void addFailed(LoadBalance loadbalance, Invocation invocation, List<Invoker<T>> invokers, Invoker<T> lastInvoker) {
    if (failTimer == null) {
        synchronized (this) {
            if (failTimer == null) {
                failTimer = new HashedWheelTimer(
                        new NamedThreadFactory("failback-cluster-timer", true),
                        1,
                        TimeUnit.SECONDS, 32, failbackTasks);
            }
        }
    }
    RetryTimerTask retryTimerTask = new RetryTimerTask(loadbalance, invocation, invokers, lastInvoker, retries, RETRY_FAILED_PERIOD);
    try {
        failTimer.newTimeout(retryTimerTask, RETRY_FAILED_PERIOD, TimeUnit.SECONDS);
    } catch (Throwable e) {
        logger.error("Failback background works error,invocation->" + invocation + ", exception: " + e.getMessage());
    }
}

2.5 forking

并行策略。消费者对于同一服务并行调用多个提供者服务器，只要一个成功即调用结束并返回结果。通常用于实时性要求较高的读操作，但其会浪费较多服务器资源: org.apache.dubbo.rpc.cluster.support.ForkingClusterInvoker#doInvoke:

public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    try {
        checkInvokers(invokers, invocation);
        // 存放的是挑选出的用于进行并行运行的invoker
        final List<Invoker<T>> selected;
        // 获取forks属性值
        final int forks = getUrl().getParameter(FORKS_KEY, DEFAULT_FORKS);
        // 获取timeout属性值，远程调用超时时限
        final int timeout = getUrl().getParameter(TIMEOUT_KEY, DEFAULT_TIMEOUT);
        if (forks <= 0 || forks >= invokers.size()) {
            selected = invokers;
        } else {  // 处理forks取值在(0, invokers.size())范围的情况
            selected = new ArrayList<>(forks);
            while (selected.size() < forks) {
                // 负载均衡选择一个invoker
                Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                if (!selected.contains(invoker)) {
                    //Avoid add the same invoker several times.
                    selected.add(invoker);
                }
            }
        }
        RpcContext.getServiceContext().setInvokers((List) selected);

        // 计数器，记录并行运行异常的invoker数量
        final AtomicInteger count = new AtomicInteger();

        // 队列：存放并行运行结果
        final BlockingQueue<Object> ref = new LinkedBlockingQueue<>();

        // 并行运行
        for (final Invoker<T> invoker : selected) {
            // 使用线程池中的线程执行，这是并行执行的过程
            executor.execute(() -> {
                try {
                    // 远程调用
                    Result result = invokeWithContext(invoker, invocation);
                    // 将当前invoker执行结果写入到队列
                    ref.offer(result);
                } catch (Throwable e) {
                    // 若invoker执行过程中出现异常，则计数器加一
                    int value = count.incrementAndGet();
                    if (value >= selected.size()) {
                        // 代码走到这里说明，没有任何一个并行远程调用是成功的。
                        // 为了能够唤醒后面的poll()，这里就将异常信息写入到ref队列
                        ref.offer(e);
                    }
                }
            });
        }  // end-for
        try {
            // poll()是一个阻塞方法，等待ref中具有一个元素。
            // 只要ref中被写入了一个元素，阻塞马上被唤醒。或一直等待到timeout超时
            // 注意，该poll()方法的执行与前面的并行远程调用的执行也是并行的
            Object ret = ref.poll(timeout, TimeUnit.MILLISECONDS);
            if (ret instanceof Throwable) {
                Throwable e = (Throwable) ret;
                throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
            }
            return (Result) ret;
        } catch (InterruptedException e) {
            throw new RpcException("Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e);
        }
    } finally {
        // clear attachments which is binding to current thread.
        RpcContext.getClientAttachment().clearAttachments();
    }
}

2.6 broadcast

广播策略。广播调用所有提供者，逐个调用，任意一台报错则报错。通常用于通知所有提供者更新缓存或日志等本地资源信息。 org.apache.dubbo.rpc.cluster.support.BroadcastClusterInvoker#doInvoke

2.7 available

首个可用策略。从所有 invoker 中查找，选择第一个可用的 invoker。 org.apache.dubbo.rpc.cluster.support.AvailableClusterInvoker#doInvoke

2.8 mergeable

合并策略。将多个 group 的 invoker 的执行结果进行合并

org.apache.dubbo.rpc.cluster.support.MergeableClusterInvoker#doInvoke

2.9 zone-aware

当有多个注册中心可供订阅时，该容错机制提供了一种策略，用于决定如何在它们之间分配流量：

标记为“preferred=true”的注册表具有最高优先级。
检查当前请求所属的区域，首先选择具有相同区域的注册表。
根据每个注册表的权重均衡所有注册表之间的流量。
挑选任何有空的人。

3. 基于扩展接口自定义集群容错策略

Dubbo 本身提供了丰富的集群容错策略，但是如果你有定制化需求，可以根据 Dubbo 提供的扩展接口 Cluster 进行定制。为了自定义扩展实现，首先需要实现 Cluster 接口：

public class MyCluster implements Cluster{
    @Override
    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new MyClusterinvoker(directory) ;
    }
}

En el código anterior, MyCluster implementa la interfaz de unión de Cluster. Luego, debe integrar la clase AbstractClusterinvoker para crear su propia clase Clusterlnvoker:

public class MyCluster l nvoker<T> extends AbstractClusterln飞roker<T> {
    public MyClusterinvoker(Di rectory<T> directory) {
        super(directory) ;
    }
    @Override
    protected Result doinvoke （工nvocation invocation, List<Invoker<T> invokers ,
        Loac!Balance loadbalance )
        throws RpcException {
        checklnvokers (invoker s , invocation) ;
        RpcContext . getContext () . setinvokers ( (List) invokers ) ;
        RpcExcept工on exception = null ;
        Result result = null ;
        ／／做 自己的集群容错策略
        return result ;
    }
}

Como se puede ver en el código anterior, el método dolnvoke debe reescribirse y los usuarios pueden implementar su propia estrategia de tolerancia a fallas de clúster en este método. Luego, cree un archivo en el directorio org.apache .dubbo.rpc.cluster.Cluster y agréguelo al archivo myCluster=org.apache.dubbo.demo.cluster.MyCluster.

Finalmente, cambie el modo de tolerancia a fallas del clúster a myCluster usando el siguiente método:

<dubbo :reference id= "greetingService"
interface＝ "com.books.dubbo.demo.api.GreetingService " group＝ "dubbo"
cluster＝ "myCluster"／>

Artículo de referencia

Dubbo3.0 código fuente comentario github dirección
serie de código
fuente dubbo columna de análisis de código fuente dubbo

Código fuente de Dubbo3 Capítulo 6 - Estrategia de tolerancia a fallas del clúster