nacos source code analysis - heartbeat detection (server)

foreword

Earlier we talked about " nacos source code analysis - service registration (client )" and " nacos source code analysis - service registration (server) ", mainly about the service registration process. In this chapter, we will talk about the service heartbeat detection mechanism.

Heartbeat renewal client

In fact, when we talked about nacos service registration client, we mentioned heartbeat by the way. The service registration process is:

insert image description here

The heartbeat of the nacos client service is triggered during the service registration process. Here I will post the source code again, the source code of NacosNamingService#registerInstance:

public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
    
    
        if (instance.isEphemeral()) {
    
    
            BeatInfo beatInfo = new BeatInfo();
            beatInfo.setServiceName(NamingUtils.getGroupedName(serviceName, groupName));
            beatInfo.setIp(instance.getIp());
            beatInfo.setPort(instance.getPort());
            beatInfo.setCluster(instance.getClusterName());
            beatInfo.setWeight(instance.getWeight());
            beatInfo.setMetadata(instance.getMetadata());
            beatInfo.setScheduled(false);
            beatInfo.setPeriod(instance.getInstanceHeartBeatInterval());
            
            //添加心跳
            this.beatReactor.addBeatInfo(NamingUtils.getGroupedName(serviceName, groupName), beatInfo);
        }

        this.serverProxy.registerService(NamingUtils.getGroupedName(serviceName, groupName), groupName, instance);
    }

Here it is clearer. Here, the service ip, port, service name and other information will be encapsulated into the BeatInfo object. beatReactor.addBeatInfo is to add the current service instance to the heartbeat mechanism (heartbeat renewal), and then register through serverProxy.registerService

The code adds the heartbeat renewal in BeatReactor#addBeatInfo, encapsulates the service information as a BeatInfo in the NacosNamingService#registerInstance method, and then adds the heartbeat mechanism of this.beatReactor.addBeatInfo. Let's take a look at how the heartbeat is done. The following is the source code of beatReactor.addBeatInfo

 public void addBeatInfo(String serviceName, BeatInfo beatInfo) {
    
    
        LogUtils.NAMING_LOGGER.info("[BEAT] adding beat: {} to beat map.", beatInfo);
        String key = this.buildKey(serviceName, beatInfo.getIp(), beatInfo.getPort());
        BeatInfo existBeat = null;
        if ((existBeat = (BeatInfo)this.dom2Beat.remove(key)) != null) {
    
    
            existBeat.setStopped(true);
        }

        this.dom2Beat.put(key, beatInfo);
        //线程池,定时任务,5000毫秒发送一次心跳。beatInfo.getPeriod()是定时任务执行的频率
        this.executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS);
        MetricsMonitor.getDom2BeatSizeMonitor().set((double)this.dom2Beat.size());
    }

   //心跳任务
   class BeatTask implements Runnable {
    
    
        BeatInfo beatInfo;

        public BeatTask(BeatInfo beatInfo) {
    
    
            this.beatInfo = beatInfo;
        }

       public void run() {
    
    
            if (!this.beatInfo.isStopped()) {
    
    
            
                long nextTime = this.beatInfo.getPeriod();

                try {
    
    
                //发送心跳请求,拿到结果
                    JSONObject result = BeatReactor.this.serverProxy.sendBeat(this.beatInfo, BeatReactor.this.lightBeatEnabled);
                    long interval = (long)result.getIntValue("clientBeatInterval");
                    boolean lightBeatEnabled = false;
                    if (result.containsKey("lightBeatEnabled")) {
    
    
                        lightBeatEnabled = result.getBooleanValue("lightBeatEnabled");
                    }

                    BeatReactor.this.lightBeatEnabled = lightBeatEnabled;
                    if (interval > 0L) {
    
    
                        nextTime = interval;
                    }

                    int code = 10200;
                    if (result.containsKey("code")) {
    
    
                        code = result.getIntValue("code");
                    }

                    if (code == 20404) {
    
    
                    //实例不存在就创建
                        Instance instance = new Instance();
                        instance.setPort(this.beatInfo.getPort());
                        instance.setIp(this.beatInfo.getIp());
                        instance.setWeight(this.beatInfo.getWeight());
                        instance.setMetadata(this.beatInfo.getMetadata());
                        instance.setClusterName(this.beatInfo.getCluster());
                        instance.setServiceName(this.beatInfo.getServiceName());
                        instance.setInstanceId(instance.getInstanceId());
                        instance.setEphemeral(true);

                        try {
    
    
                            //注册服务
                            BeatReactor.this.serverProxy.registerService(this.beatInfo.getServiceName(), NamingUtils.getGroupName(this.beatInfo.getServiceName()), instance);
                        } catch (Exception var10) {
    
    
                        }
                    }
                } catch (NacosException var11) {
    
    
                    LogUtils.NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: {}, code: {}, msg: {}", new Object[]{
    
    JSON.toJSONString(this.beatInfo), var11.getErrCode(), var11.getErrMsg()});
                }

                //定时任务:5s一次执行心跳任务
                BeatReactor.this.executorService.schedule(BeatReactor.this.new BeatTask(this.beatInfo), nextTime, TimeUnit.MILLISECONDS);
            }
        }
   }

Like Eureka, the heartbeat is also implemented through the thread pool ScheduledExecutorService, and the default time frequency is once every 5 seconds.

  • BeatInfo: The object of heartbeat renewal, including service IP, port, service name, weight, etc.
  • executorService.schedule : timed task, beatInfo.getPeriod() is the execution frequency of the timed task, the default is 5000 milliseconds to send a heartbeat renewal request to NacosServer
  • BeatTask : It is a Runnable thread, and the run method will call BeatReactor.this.serverProxy.sendBeat to send a heartbeat request.

BeatTask is the thread object for heartbeat renewal. In its run method, it sends heartbeat through BeatReactor.this.serverProxy.sendBeat. If it finds that the service is not registered, it will register the service through BeatReactor.this.serverProxy.registerService.

The following is com.alibaba.nacos.client.naming.net.NamingProxy#sendBeat method to send heartbeat

 public JSONObject sendBeat(BeatInfo beatInfo, boolean lightBeatEnabled) throws NacosException {
    
    
        if (LogUtils.NAMING_LOGGER.isDebugEnabled()) {
    
    
            LogUtils.NAMING_LOGGER.debug("[BEAT] {} sending beat to server: {}", this.namespaceId, beatInfo.toString());
        }

        Map<String, String> params = new HashMap(8);
        String body = "";
        if (!lightBeatEnabled) {
    
    
            try {
    
    
                body = "beat=" + URLEncoder.encode(JSON.toJSONString(beatInfo), "UTF-8");
            } catch (UnsupportedEncodingException var6) {
    
    
                throw new NacosException(500, "encode beatInfo error", var6);
            }
        }

        params.put("namespaceId", this.namespaceId);
        params.put("serviceName", beatInfo.getServiceName());
        params.put("clusterName", beatInfo.getCluster());
        params.put("ip", beatInfo.getIp());
        params.put("port", String.valueOf(beatInfo.getPort()));
        String result = this.reqAPI(UtilAndComs.NACOS_URL_BASE + "/instance/beat", params, body, "PUT");
        return JSON.parseObject(result);
    }

Here is also the address that will splice the heartbeat: 127.0.0.1:8848/nacos/v1/ns/instance/beat, the parameters include namespaceId namespace ID; serviceName service name; clusterName cluster name; ip service IP; port port. Then send a PUT request. The bottom layer still randomly selects one from multiple NacosServers to initiate a heartbeat request. The bottom layer is handed over to httpClient to execute

Heartbeat renewal server

The server is still in the InstanceController, which provides a beat method. In addition to considering how he handles the heartbeat request, we also need to consider how he does the heartbeat expiration check. The source code is as follows

 /**
     * Create a beat for instance.
     * 心跳检测
     * @param request http request
     * @return detail information of instance
     * @throws Exception any error during handle
     */
    @CanDistro
    @PutMapping("/beat")
    @Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
    public ObjectNode beat(HttpServletRequest request) throws Exception {
    
    
        //客户端心跳频率 5s/次
         ObjectNode result = JacksonUtils.createEmptyJsonNode();
        result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, switchDomain.getClientBeatInterval());
        //拿到请求中的beat数据,转成clientBeat对象
        String beat = WebUtils.optional(request, "beat", StringUtils.EMPTY);
        RsInfo clientBeat = null;
        if (StringUtils.isNotBlank(beat)) {
    
    
            clientBeat = JacksonUtils.toObj(beat, RsInfo.class);
        }//集群名
        String clusterName = WebUtils
                .optional(request, CommonParams.CLUSTER_NAME, UtilsAndCommons.DEFAULT_CLUSTER_NAME);
        //拿到客户端IP,端口
        String ip = WebUtils.optional(request, "ip", StringUtils.EMPTY);
        int port = Integer.parseInt(WebUtils.optional(request, "port", "0"));
        if (clientBeat != null) {
    
    
            if (StringUtils.isNotBlank(clientBeat.getCluster())) {
    
    
                clusterName = clientBeat.getCluster();
            } else {
    
    
                // fix #2533
                clientBeat.setCluster(clusterName);
            }
            ip = clientBeat.getIp();
            port = clientBeat.getPort();
        }
        //拿到命名空间ID和服务名
        String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
        String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
        //检查服务名
        NamingUtils.checkServiceNameFormat(serviceName);
        Loggers.SRV_LOG.debug("[CLIENT-BEAT] full arguments: beat: {}, serviceName: {}", clientBeat, serviceName);
        //拿到服务表中的服务实例
        Instance instance = serviceManager.getInstance(namespaceId, serviceName, clusterName, ip, port);
        // 如果获取失败,说明心跳失败,实例尚未注册
        if (instance == null) {
    
    
            if (clientBeat == null) {
    
    //如果客户端心跳出现为空(请求参数中没beat),返回资源没找到
                result.put(CommonParams.CODE, NamingResponseCode.RESOURCE_NOT_FOUND);
                return result;
            }
            
            Loggers.SRV_LOG.warn("[CLIENT-BEAT] The instance has been removed for health mechanism, "
                    + "perform data compensation operations, beat: {}, serviceName: {}", clientBeat, serviceName);
            //创建一个实例
            instance = new Instance();
            instance.setPort(clientBeat.getPort());
            instance.setIp(clientBeat.getIp());
            instance.setWeight(clientBeat.getWeight());
            instance.setMetadata(clientBeat.getMetadata());
            instance.setClusterName(clusterName);
            instance.setServiceName(serviceName);
            instance.setInstanceId(instance.getInstanceId());
            instance.setEphemeral(clientBeat.isEphemeral());
            //注册实例
            serviceManager.registerInstance(namespaceId, serviceName, instance);
        }
        //获取服务
        Service service = serviceManager.getService(namespaceId, serviceName);
        
        if (service == null) {
    
    
            //服务为空
            throw new NacosException(NacosException.SERVER_ERROR,
                    "service not found: " + serviceName + "@" + namespaceId);
        }
        if (clientBeat == null) {
    
    
            clientBeat = new RsInfo();
            clientBeat.setIp(ip);
            clientBeat.setPort(port);
            clientBeat.setCluster(clusterName);
        }
        //处理心跳请求
        service.processClientBeat(clientBeat);
        
        result.put(CommonParams.CODE, NamingResponseCode.OK);
        if (instance.containsMetadata(PreservedMetadataKeys.HEART_BEAT_INTERVAL)) {
    
    
            result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, instance.getInstanceHeartBeatInterval());
        }
        result.put(SwitchEntry.LIGHT_BEAT_ENABLED, switchDomain.isLightBeatEnabled());
        return result;
    }

The general logic of the method is as follows

  • Get the heartbeat request parameters, beat, including the IP, port, service name, namespace, etc. of the client service
  • Get the service instance of the current heartbeat request from the server service registry through serviceManager
  • If the instance is empty, a new instance will be created, and the instance will be registered through the serviceManager
  • Then get the service object of the current service and call the service.processClientBeat method to process the heartbeat
  • Finally return OK
    insert image description here

The following is the source code of the service#processClientBeat method

public void processClientBeat(final RsInfo rsInfo) {
    
    
        //心跳处理器,runnable对象
        ClientBeatProcessor clientBeatProcessor = new ClientBeatProcessor();
        clientBeatProcessor.setService(this);
        clientBeatProcessor.setRsInfo(rsInfo);
        //这里HealthCheckReactor.scheduleNow(clientBeatProcessor);
        // 开启一个没有延迟的任务,可以理解为这里就是开启了一个异步线程处理心跳续约逻辑
        HealthCheckReactor.scheduleNow(clientBeatProcessor);
    }
    
 /**	没有延迟的任务
     * Schedule client beat check task without a delay.
     *
     * @param task health check task
     * @return scheduled future
     */
    public static ScheduledFuture<?> scheduleNow(Runnable task) {
    
    
        return GlobalExecutor.scheduleNamingHealth(task, 0, TimeUnit.MILLISECONDS);
    }

It can be seen that the heartbeat is processed through ClientBeatProcessor. Execute through scheduled tasks. ClientBeatProcessor is a thread object

public class ClientBeatProcessor implements Runnable {
    
    
    
    public static final long CLIENT_BEAT_TIMEOUT = TimeUnit.SECONDS.toMillis(15);
    
    private RsInfo rsInfo;
    
    private Service service;
    
    @JsonIgnore
    public PushService getPushService() {
    
    
        return ApplicationUtils.getBean(PushService.class);
    }
    
    public RsInfo getRsInfo() {
    
    
        return rsInfo;
    }
    
    public void setRsInfo(RsInfo rsInfo) {
    
    
        this.rsInfo = rsInfo;
    }
    
    public Service getService() {
    
    
        return service;
    }
    
    public void setService(Service service) {
    
    
        this.service = service;
    }
    
    @Override
    public void run() {
    
    
        //拿到续约的服务
        Service service = this.service;
        if (Loggers.EVT_LOG.isDebugEnabled()) {
    
    
            Loggers.EVT_LOG.debug("[CLIENT-BEAT] processing beat: {}", rsInfo.toString());
        }
        //拿到ip,端口,集群名等
        String ip = rsInfo.getIp();
        String clusterName = rsInfo.getCluster();
        int port = rsInfo.getPort();
        //拿到服务中的cLuster对象
        Cluster cluster = service.getClusterMap().get(clusterName);
        //拿到所有实例
        List<Instance> instances = cluster.allIPs(true);
        
        for (Instance instance : instances) {
    
    
            //找到当前发送心跳的instance,通过IP和端口对比
            if (instance.getIp().equals(ip) && instance.getPort() == port) {
    
    
                if (Loggers.EVT_LOG.isDebugEnabled()) {
    
    
                    Loggers.EVT_LOG.debug("[CLIENT-BEAT] refresh beat: {}", rsInfo.toString());
                }
                //设置心跳最后发送时间【重要】
                instance.setLastBeat(System.currentTimeMillis());
                if (!instance.isMarked() && !instance.isHealthy()) {
    
    
                    //设置健康状态为true
                    instance.setHealthy(true);
                    Loggers.EVT_LOG
                            .info("service: {} {POS} {IP-ENABLED} valid: {}:{}@{}, region: {}, msg: client beat ok",
                                    cluster.getService().getName(), ip, port, cluster.getName(),
                                    UtilsAndCommons.LOCALHOST_SITE);
                    //发布一个改变事件:ServiceChangeEvent
                    //PushService发布ServiceChangeEvent事件,使用udpPush推送给所有的客户端
                    getPushService().serviceChanged(service);
                }
            }
        }
    }
}

In the method, the service corresponding to the heartbeat renewal will be retrieved from the service registry, and then the last heartbeat time and health status will be set.

  • instance.setLastBeat(System.currentTimeMillis()); : Change the last renewal time to the current system time
  • instance.setHealthy(true);: set the health status to true

Heartbeat timeout detection

Instances in Nacos are divided into temporary instances and permanent instances. Temporary instances will be removed by the registration center after the heartbeat renewal timeout, but not. For non-temporary instances (ephemeral=false), Nacos will use active health detection, send requests to the instance regularly, and judge the health status of the instance based on the response.

The above is just the processing flow of heartbeat renewal. The heartbeat expiration detection entry is in the service manager#registerInstance registration service method, which will call the serviceManager#putServiceAndInit(service) method to initialize the service, and call the Service#init method in this method to start the heartbeat check. 该方法是在服务注册成功之后就会被调用.

// servieManager#putServiceAndInit 服务初始化
private void putServiceAndInit(Service service) throws NacosException {
    
    
        putService(service);
        service = getService(service.getNamespaceId(), service.getName());
        //服务初始化,心跳检查入口
        service.init();
        consistencyService
                .listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service);
        consistencyService
                .listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service);
        Loggers.SRV_LOG.info("[NEW-SERVICE] {}", service.toJson());
    }

The following is the service#init() method

@JsonInclude(Include.NON_NULL)
public class Service extends com.alibaba.nacos.api.naming.pojo.Service implements Record, RecordListener<Instances> {
    
    

	public void init() {
    
    
        //心跳检查。对临时服务的初始化	
        HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
        //遍历注册表,初始化集群
        for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) {
    
    
            entry.getValue().setService(this);
           //对永久实例初始化,调用Cluster.init()
            entry.getValue().init();
        }
    }

//定时心跳超时检查 5s一次
public static void scheduleCheck(ClientBeatCheckTask task) {
    
    
        futureMap.computeIfAbsent(task.taskKey(),
                k -> GlobalExecutor.scheduleNamingHealth(task, 5000, 5000, TimeUnit.MILLISECONDS));
    }

Temporary service, the heartbeat check is completed through the scheduled task every 5s, and is completed through the ClientBeatCheckTask thread object.

//客户端心跳检查
public class ClientBeatCheckTask implements Runnable {
    
    
   
    @Override
    public void run() {
    
    
        try {
    
    
            if (!getDistroMapper().responsible(service.getName())) {
    
    
                return;
            }
            
            if (!getSwitchDomain().isHealthCheckEnabled()) {
    
    
                return;
            }
            //拿到注册表中的所有实例
            List<Instance> instances = service.allIPs(true);
            
            // first set health status of instances:
            for (Instance instance : instances) {
    
    
                //判断心跳是否超时:系统时间 - 最后心跳时间 > 超时时间
                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
    
    
                    if (!instance.isMarked()) {
    
    
                        //如果是健康的,设置为不健康
                        if (instance.isHealthy()) {
    
    
                            instance.setHealthy(false);
                            Loggers.EVT_LOG
                                    .info("{POS} {IP-DISABLED} valid: {}:{}@{}@{}, region: {}, msg: client timeout after {}, last beat: {}",
                                            instance.getIp(), instance.getPort(), instance.getClusterName(),
                                            service.getName(), UtilsAndCommons.LOCALHOST_SITE,
                                            instance.getInstanceHeartBeatTimeOut(), instance.getLastBeat());
                            //抛出服务改变时间
                            getPushService().serviceChanged(service);
                            //抛出服务超时事件
                            ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
                        }
                    }
                }
            }
            
            if (!getGlobalConfig().isExpireInstance()) {
    
    
                return;
            }
            
            //移除过时的实例
            // then remove obsolete instances:
            for (Instance instance : instances) {
    
    
                //是否超时的标记
                if (instance.isMarked()) {
    
    
                    continue;
                }
                //超时时间大于30s就要把服务剔除
                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
    
    
                    // delete instance
                    Loggers.SRV_LOG.info("[AUTO-DELETE-IP] service: {}, ip: {}", service.getName(),
                            JacksonUtils.toJson(instance));
                    //剔除服务
                    deleteIp(instance);
                }
            }
            
        } catch (Exception e) {
    
    
            Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);
        }
        
    }
    

The method does the following things

  • Get all service instances in the registry
  • Algorithm used: system time - last heartbeat time > timeout time. To determine whether the heartbeat timeout, the heartbeat timeout default is 15s
  • The timeout instance will set the health status to false, and then throw the service change event ServiceChangeEvent and throw the heartbeat timeout event InstanceHeartbeatTimeoutEvent. That is to say, you can see the health status of the service through the nacos console is false
  • Finally, it will be judged that if the timeout exceeds 30s, the current service will be deleted

For the serviceChanged service change event, it is released through: PushService#serviceChanged, which will use the udpPush protocol to push to all clients, the current service status.

Permanent instance checks

The following is the source code of com.alibaba.nacos.naming.core.Cluster#init method

   public synchronized void init() {
    
    
        if (inited) {
    
    
            return;
        }
        checkTask = new HealthCheckTask(this);
        //开启对 永久实例的 定时健康检测
        HealthCheckReactor.scheduleCheck(checkTask);
        inited = true;
    }
    
	public static ScheduledFuture<?> scheduleCheck(HealthCheckTask task) {
    
    
        task.setStartTime(System.currentTimeMillis());
        //开启定时任务心跳检查
        return GlobalExecutor.scheduleNamingHealth(task, task.getCheckRtNormalized(), TimeUnit.MILLISECONDS);
    }

Here, the health check of the permanent instance is processed through HealthCheckTask, and the regular check is performed through the scheduled task. The following is the source code of HealthCheckTask

//这里在计算定时任务的时间频率
private void initCheckRT() {
    
    
        // first check time delay 计算主动检测的时间频率
        //周期为2000 + 5000毫秒内的随机数
        checkRtNormalized =
                2000 + RandomUtils.nextInt(0, RandomUtils.nextInt(0, switchDomain.getTcpHealthParams().getMax()));
        checkRtBest = Long.MAX_VALUE;
        checkRtWorst = 0L;
    }
    
    @Override
    public void run() {
    
    
        
        try {
    
    
            if (distroMapper.responsible(cluster.getService().getName()) && switchDomain
                    .isHealthCheckEnabled(cluster.getService().getName())) {
    
    
                    //执行检查逻辑,使用的是 TcpSuperSenseProcessor 处理,基于TCP模式
                healthCheckProcessor.process(this);
                if (Loggers.EVT_LOG.isDebugEnabled()) {
    
    
                    Loggers.EVT_LOG
                            .debug("[HEALTH-CHECK] schedule health check task: {}", cluster.getService().getName());
                }
            }
        } catch (Throwable e) {
    
    
            Loggers.SRV_LOG
                    .error("[HEALTH-CHECK] error while process health check for {}:{}", cluster.getService().getName(),
                            cluster.getName(), e);
        } finally {
    
    
            ...
        }
    }

The code healthCheckProcessor.process(this); is to process the heartbeat check, using the implementation class TcpSuperSenseProcessor, which is a Runnable, the source code is as follows

@Override
   public void process(HealthCheckTask task) {
    
    
   	//拿到集群中的所有实例,非临时ephemeral=false的实例
       List<Instance> ips = task.getCluster().allIPs(false);
       
       if (CollectionUtils.isEmpty(ips)) {
    
    
           return;
       }
       
       for (Instance ip : ips) {
    
    
           
           ...
           Beat beat = new Beat(ip, task);
           //添加到队列LinkedBlockingQueue,可以看到,所有的健康检测任务都被放入一个阻塞队列
           taskQueue.add(beat);
           MetricsMonitor.getTcpHealthCheckMonitor().incrementAndGet();
       }
   }

//处理任务
	private void processTask() throws Exception {
    
    
        Collection<Callable<Void>> tasks = new LinkedList<>();
        do {
    
    
            Beat beat = taskQueue.poll(CONNECT_TIMEOUT_MS / 2, TimeUnit.MILLISECONDS);
            if (beat == null) {
    
    
                return;
            }
            //把任务封装到TaskProcessor
            tasks.add(new TaskProcessor(beat));
        } while (taskQueue.size() > 0 && tasks.size() < NIO_THREAD_COUNT * 64);
        //执行所有任务,批量执行
        for (Future<?> f : GlobalExecutor.invokeAllTcpSuperSenseTask(tasks)) {
    
    
            f.get();
        }
    }
    
    @Override
    public void run() {
    
    
    //循环,不停的从队列中拿到beat心跳任务去执行
        while (true) {
    
    
            try {
    
    
            //执行任务
                processTask();
                
                int readyCount = selector.selectNow();
                if (readyCount <= 0) {
    
    
                    continue;
                }
                
                Iterator<SelectionKey> iter = selector.selectedKeys().iterator();
                while (iter.hasNext()) {
    
    
                    SelectionKey key = iter.next();
                    iter.remove();
                    
                    GlobalExecutor.executeTcpSuperSense(new PostProcessor(key));
                }
            } catch (Throwable e) {
    
    
                SRV_LOG.error("[HEALTH-CHECK] error while processing NIO task", e);
            }
        }
    }
    

Seeing this, we probably understand that healthCheckProcessor sends a heartbeat check to the client through TCP, and the bottom layer stores the heartbeat task Beat through the queue LinkedBlockingQueue. Then TcpSuperSenseProcessor itself is a Runnable, which periodically takes out Beat tasks from the queue and encapsulates them into TaskProcessor for batch execution. The following is the source code of TaskProcessor

 private class TaskProcessor implements Callable<Void> {
    
    
        
        private static final int MAX_WAIT_TIME_MILLISECONDS = 500;
        
        Beat beat;
        
        public TaskProcessor(Beat beat) {
    
    
            this.beat = beat;
        }
        
        @Override
        public Void call() {
    
    
            long waited = System.currentTimeMillis() - beat.getStartTime();
            if (waited > MAX_WAIT_TIME_MILLISECONDS) {
    
    
                Loggers.SRV_LOG.warn("beat task waited too long: " + waited + "ms");
            }
            
            SocketChannel channel = null;
            try {
    
    
                Instance instance = beat.getIp();
                
                BeatKey beatKey = keyMap.get(beat.toString());
                if (beatKey != null && beatKey.key.isValid()) {
    
    
                    if (System.currentTimeMillis() - beatKey.birthTime < TCP_KEEP_ALIVE_MILLIS) {
    
    
                        instance.setBeingChecked(false);
                        return null;
                    }
                    
                    beatKey.key.cancel();
                    beatKey.key.channel().close();
                }
                
                channel = SocketChannel.open();
                channel.configureBlocking(false);
                // only by setting this can we make the socket close event asynchronous
                channel.socket().setSoLinger(false, -1);
                channel.socket().setReuseAddress(true);
                channel.socket().setKeepAlive(true);
                channel.socket().setTcpNoDelay(true);
                
                Cluster cluster = beat.getTask().getCluster();
                int port = cluster.isUseIPPort4Check() ? instance.getPort() : cluster.getDefCkport();
                channel.connect(new InetSocketAddress(instance.getIp(), port));
                
                SelectionKey key = channel.register(selector, SelectionKey.OP_CONNECT | SelectionKey.OP_READ);
                key.attach(beat);
                keyMap.put(beat.toString(), new BeatKey(key));
                
                beat.setStartTime(System.currentTimeMillis());
                
                GlobalExecutor
                        .scheduleTcpSuperSenseTask(new TimeOutTask(key), CONNECT_TIMEOUT_MS, TimeUnit.MILLISECONDS);
            } catch (Exception e) {
    
    
                beat.finishCheck(false, false, switchDomain.getTcpHealthParams().getMax(),
                        "tcp:error:" + e.getMessage());
                
                if (channel != null) {
    
    
                    try {
    
    
                        channel.close();
                    } catch (Exception ignore) {
    
    
                    }
                }
            }
            
            return null;
        }
    }

It can be seen that he is a Callable, sending TCP requests through NIO. Here is a small summary

The health check of Nacos is divided into two types: temporary instance and permanent instance:

  • For temporary instances: the client sends a heartbeat every 5 seconds, if it exceeds 15 seconds, it will be marked as unhealthy, and if it exceeds 30 seconds, it will be deleted from the service list
  • For permanent instances: active health checks on the server side, the period is a random number within 2000 + 5000 milliseconds, when the check timeout expires, it will only be marked as unhealthy and will not be deleted

Well, this is the end of the article, use a picture to summarize service registration and heartbeat

insert image description here
If the article is helpful to you, please give a good review, your affirmation is my biggest motivation

Guess you like

Origin blog.csdn.net/u014494148/article/details/128680864