Explanation of Spring Cloud Nacos source code (7) - the core process of Nacos client service subscription mechanism

The core process of Nacos client service subscription mechanism

​ Speaking of the service subscription mechanism of Nacos, everyone will find it difficult to understand, so let’s analyze it in detail, then let’s start with an overview of Nacos subscription

Nacos subscription overview

​ The subscription mechanism of Nacos, if described in one sentence, is: the Nacos client obtains the instance list from the registration center every 6 seconds through a scheduled task. instance, changing the local cache).

​ The following is the mainline process of the subscription method, which involves a lot of content and complicated details, so here we mainly study the core part.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-0Coadaqu-1677029556577)(image-20211025182722346.png)]

Scheduled tasks open

​ In fact, subscription is essentially a way of service discovery, that is, the subscription method is executed during service discovery, and a scheduled task is triggered to pull data from the server.

​ Many overloaded subscribes exposed in NacosNamingService, the purpose of overloading is to let everyone write less parameters, and these parameters are handled by Nacos by default. In the end, these overloaded methods will call the following method:

@Override
public void subscribe(String serviceName, String groupName, List<String> clusters, EventListener listener)
    throws NacosException {
    
    
    if (null == listener) {
    
    
        return;
    }
    String clusterString = StringUtils.join(clusters, ",");
    changeNotifier.registerListener(groupName, serviceName, clusterString, listener);
    clientProxy.subscribe(serviceName, groupName, clusterString);
}

​ Here we first look at the subscribe method. You may be familiar with it. It is a method called by the clientProxy type, which is actually NamingClientProxyDelegate.subscribe(), so in fact, the method called here is the same method as in the previous service discovery. Here, it is actually doing The query of the service list, so it is concluded that both the query and the subscription call the same method

@Override
public ServiceInfo subscribe(String serviceName, String groupName, String clusters) throws NacosException {
    
    
    String serviceNameWithGroup = NamingUtils.getGroupedName(serviceName, groupName);
    String serviceKey = ServiceInfo.getKey(serviceNameWithGroup, clusters);
    // 定时调度UpdateTask
    serviceInfoUpdateService.scheduleUpdateIfAbsent(serviceName, groupName, clusters);
    // 获取缓存中的ServiceInfo
    ServiceInfo result = serviceInfoHolder.getServiceInfoMap().get(serviceKey);
    if (null == result) {
    
    
        // 如果为null,则进行订阅逻辑处理,基于gRPC协议
        result = grpcClientProxy.subscribe(serviceName, groupName, clusters);
    }
    // ServiceInfo本地缓存处理
    serviceInfoHolder.processServiceInfo(result);
    return result;
}

​ But here we need to pay attention to the task scheduling here. This method includes constructing serviceKey, judging repetition through serviceKey, and finally adding UpdateTask, and the implementation of addTask is to initiate a scheduled task:

public void scheduleUpdateIfAbsent(String serviceName, String groupName, String clusters) {
    
    
    String serviceKey = ServiceInfo.getKey(NamingUtils.getGroupedName(serviceName, groupName), clusters);
    if (futureMap.get(serviceKey) != null) {
    
    
        return;
    }
    synchronized (futureMap) {
    
    
        if (futureMap.get(serviceKey) != null) {
    
    
            return;
        }
		//构建UpdateTask
        ScheduledFuture<?> future = addTask(new UpdateTask(serviceName, groupName, clusters));
        futureMap.put(serviceKey, future);
    }
}

​ Timed task execution delayed by one second:

private synchronized ScheduledFuture<?> addTask(UpdateTask task) {
    
    
    return executor.schedule(task, DEFAULT_DELAY, TimeUnit.MILLISECONDS);
}

​ So here we come to the conclusion that the core is: calling the subscription method and initiating a scheduled task.

Timed task execution content

​ UpdateTask encapsulates the core business logic of the subscription mechanism. Let's take a look at the flow chart:

​ When we know the overall process, let's look at the corresponding source code:

@Override
public void run() {
    
    
    long delayTime = DEFAULT_DELAY;

    try {
    
    
        // 判断是服务是否订阅和未开启过定时任务,如果订阅过直接不在执行
        if (!changeNotifier.isSubscribed(groupName, serviceName, clusters) && !futureMap.containsKey(serviceKey)) {
    
    
            NAMING_LOGGER
                .info("update task is stopped, service:{}, clusters:{}", groupedServiceName, clusters);
            return;
        }
		
        // 获取缓存的service信息
        ServiceInfo serviceObj = serviceInfoHolder.getServiceInfoMap().get(serviceKey);
        // 如果为空
        if (serviceObj == null) {
    
    
            // 根据serviceName从注册中心服务端获取Service信息
            serviceObj = namingClientProxy.queryInstancesOfService(serviceName, groupName, clusters, 0, false);
            // 处理本地缓存
            serviceInfoHolder.processServiceInfo(serviceObj);
            lastRefTime = serviceObj.getLastRefTime();
            return;
        }
		
        // 过期服务,服务的最新更新时间小于等于缓存刷新(最后一次拉取数据的时间)时间,从注册中心重新查询
        if (serviceObj.getLastRefTime() <= lastRefTime) {
    
    
            serviceObj = namingClientProxy.queryInstancesOfService(serviceName, groupName, clusters, 0, false);
            // 处理本地缓存
            serviceInfoHolder.processServiceInfo(serviceObj);
        }
        //刷新更新时间
        lastRefTime = serviceObj.getLastRefTime();
        if (CollectionUtils.isEmpty(serviceObj.getHosts())) {
    
    
            incFailCount();
            return;
        }
        // 下次更新缓存时间设置,默认6秒
        // TODO multiple time can be configured.
        delayTime = serviceObj.getCacheMillis() * DEFAULT_UPDATE_CACHE_TIME_MULTIPLE;
        // 重置失败数量为0(可能会出现失败情况,没有ServiceInfo,连接失败)
        resetFailCount();
    } catch (Throwable e) {
    
    
        incFailCount();
        NAMING_LOGGER.warn("[NA] failed to update serviceName: {}", groupedServiceName, e);
    } finally {
    
    
        // 下次调度刷新时间,下次执行的时间与failCount有关,failCount=0,则下次调度时间为6秒,最长为1分钟
        // 即当无异常情况下缓存实例的刷新时间是6秒
        executor.schedule(this, Math.min(delayTime << failCount, DEFAULT_DELAY * 60), TimeUnit.MILLISECONDS);
    }
}

​ The business logic will finally calculate the execution time of the next scheduled task, and delay the execution through delayTime. The delayTime defaults to 1000L * 6, which is 6 seconds. And in finally it really initiates the next timed task. When an exception occurs, the next execution time is related to the number of failures, but the maximum is no more than 1 minute.

Summarize:

  1. Call the subscription method, and register the EventListener, and the UpdateTask will be used to judge later;

  2. The subscription logic is processed through the proxy class, and the same method is used here as the method of obtaining the instance list;

  3. Execute the UpdateTask method through a scheduled task. The default execution interval is 6 seconds. When an exception occurs, it will be extended, but not more than 1 minute;

  4. The UpdateTask method will compare whether there is a local cache and whether the cache has expired. When it does not exist or expires, query the registration center, obtain the latest instance, update the last obtained time, and process ServiceInfo.

  5. Recalculate the scheduled task time and execute the process in a loop.

Guess you like

Origin blog.csdn.net/qq_27566167/article/details/129155640