Nacos source code analysis. Dark horse follow-up notes

1. Download the Nacos source code and run it

To study the Nacos source code, of course, you cannot use the packaged Nacos server jar package to run. You need to download the source code and compile it yourself.

1.1. Download Nacos source code

Nacos GitHub address: https://github.com/alibaba/nacos

The downloaded Nacos source code of version 1.4.2 has been provided in the pre-class materials:
insert image description here

If you need to study other versions of students, you can also download it yourself:

You can find its release page: https://github.com/alibaba/nacos/tags, and find the 1.4.2 version:
insert image description here

After clicking to enter, download Source code (zip):
insert image description here

1.2. Import Demo project

Our pre-class materials provide a microservice demo, including services such as service registration and discovery.
insert image description here
After importing the project, view its project structure:
insert image description here

Structure description:

  • cloud-source-demo: project parent directory
    • cloud-demo: the parent project of microservices, managing microservice dependencies
      • order-service: Order microservice, which needs to access user-service in business, is a service consumer
      • user-service: user microservice, which exposes the interface of querying users based on id, and is a service provider

1.3. Import Nacos source code

Unzip the previously downloaded Nacos source code into the cloud-source-demo project directory:
insert image description here

Then, use IDEA to import it as a module:

1) Select the Project Structure option:
insert image description here

Then click import module:
insert image description here

In the pop-up window, select the nacos source code directory:
insert image description here

Then select the maven module, finish:

insert image description here

Finally, click OK:

insert image description here

Imported project structure:

insert image description here

1.4.proto compilation

The underlying data communication of Nacos will serialize and deserialize the data based on protobuf. And define the corresponding proto file in the consistency submodule:
insert image description here

We need to compile the proto file into the corresponding Java code first.

1.4.1. What is protobuf

The full name of protobuf is Protocol Buffer, which is a data serialization protocol provided by Google. This is Google's official definition:

Protocol Buffers is a lightweight and efficient structured data storage format that can be used for serialization of structured data and is very suitable for data storage or RPC data exchange format. It can be used in language-independent, platform-independent, and extensible serialized structured data formats in communication protocols, data storage, and other fields.

It can be simply understood as a cross-language, cross-platform data transmission format. The function is similar to json, but both performance and data size are much better than json.

The reason why protobuf can be cross-language is because the format of the data definition is .protoformat, which needs to be compiled into the corresponding language based on protoc.

1.4.2. Install protoc

GitHub address of Protobuf: https://github.com/protocolbuffers/protobuf/releases

We can download the windows version to use:
insert image description here

In addition, the pre-class materials also provide the downloaded installation package:

insert image description here

Unzip it to any non-Chinese directory, and the protoc.exe in the bin directory can help us compile:
insert image description here

Then configure this bin directory to your environment variable path, you can refer to the JDK configuration method:
insert image description here

1.4.3. Compile proto

Enter the src/main directory under the consistency module of nacos-1.4.2:
insert image description here

Then open a cmd window and run the following two commands:

protoc --java_out=./java ./proto/consistency.proto
protoc --java_out=./java ./proto/Data.proto

As shown in the picture:

insert image description here

These java codes will be compiled in the consistency module of nacos:

insert image description here

1.5. Run

The entry of the nacos server is the Nacos class in the console module:
insert image description here

We need to make it stand-alone start:
insert image description here

Then create a new SpringBootApplication:

insert image description here

Then fill in the application information:
insert image description here

Then run the main function of Nacos:
insert image description here

After starting the order-service and user-service services, you can view the nacos console:
insert image description here

2. Service Registration

After the service is registered to Nacos, it will be saved in a local registry, and its structure is as follows:
insert image description here

First of all, the outermost layer is a Map, the structure is: Map<String, Map<String, Service>>:

  • key: It is namespace_id, which plays the role of environmental isolation. There can be multiple groups under the namespace
  • value: Another one Map<String, Service>, representing the group and the services within the group. There can be multiple services in a group
    • key: represents group grouping, but when used as a key, the format is group_name:service_name
    • value: A certain service under the group, such as userservice, user service. The type is Service, which also contains one internally Map<String,Cluster>, and there can be multiple clusters under one service
      • key: cluster name
      • value: Clustertype, containing the specific information of the cluster. A cluster may contain multiple instances, that is, specific node information, including one Set<Instance>, which is the collection of instances under the cluster
        • Instance: instance information, including instance IP, Port, health status, weight, etc.

When each service registers with Nacos, the information will be organized and stored in this Map.

2.1. Service registration interface

Nacos provides an API interface for service registration. The client only needs to send a request to this interface to realize service registration.

**Interface description:** Register an instance to Nacos service.

Request type :POST

Request path :/nacos/v1/ns/instance

Request parameters :

name type Is it required? describe
ip string yes Service instance IP
port int yes Service instance port
namespaceId string no Namespace ID
weight double no Weights
enabled boolean no Is it online
healthy boolean no whether healthy
metadata string no Extended Information
clusterName string no cluster name
serviceName string yes Service Name
groupName string no group name
ephemeral boolean no Is it a temporary instance

Error code :

error code describe Semantics
400 Bad Request Syntax error in client request
403 Forbidden permission denied
404 Not Found resource not found
500 Internal Server Error internal server error
200 OK normal

2.2. Client

First, we need to find the entry for service registration.

2.2.1.NacosServiceRegistryAutoConfiguration

Because the Nacos client is implemented based on SpringBoot's automatic assembly, we can rely on nacos-discovery:

spring-cloud-starter-alibaba-nacos-discovery-2.2.6.RELEASE.jar

Nacos autoassembly information is found in this package:
insert image description here

It can be seen that many automatic configuration classes have been loaded, and the class related to service registration is NacosServiceRegistryAutoConfiguration, which we follow.

As you can see, in the NacosServiceRegistryAutoConfiguration class, there is a Bean related to automatic registration:
insert image description here

2.2.2.NacosAutoServiceRegistration

NacosAutoServiceRegistrationThe source code is shown in the figure:
insert image description here

It can be seen that during initialization, its parent class AbstractAutoServiceRegistrationis also initialized.

AbstractAutoServiceRegistrationAs shown in the picture:
insert image description here

You can see that it implements ApplicationListenerthe interface and listens to events during the startup of the Spring container.

After listening to WebServerInitializedEventthe event (web service initialization completed), the method is executed bind.
insert image description here

The bind method is as follows:

public void bind(WebServerInitializedEvent event) {
    
    
    // 获取 ApplicationContext
    ApplicationContext context = event.getApplicationContext();
    // 判断服务的 namespace,一般都是null
    if (context instanceof ConfigurableWebServerApplicationContext) {
    
    
        if ("management".equals(((ConfigurableWebServerApplicationContext) context)
                                .getServerNamespace())) {
    
    
            return;
        }
    }
    // 记录当前 web 服务的端口
    this.port.compareAndSet(0, event.getWebServer().getPort());
    // 启动当前服务注册流程
    this.start();
}

The start method process in it:

public void start() {
    
    
		if (!isEnabled()) {
    
    
			if (logger.isDebugEnabled()) {
    
    
				logger.debug("Discovery Lifecycle disabled. Not starting");
			}
			return;
		}

		// 当前服务处于未运行状态时,才进行初始化
		if (!this.running.get()) {
    
    
            // 发布服务开始注册的事件
			this.context.publishEvent(
					new InstancePreRegisteredEvent(this, getRegistration()));
            // ☆☆☆☆开始注册☆☆☆☆
			register();
			if (shouldRegisterManagement()) {
    
    
				registerManagement();
			}
            // 发布注册完成事件
			this.context.publishEvent(
					new InstanceRegisteredEvent<>(this, getConfiguration()));
            // 服务状态设置为运行状态,基于AtomicBoolean
			this.running.compareAndSet(false, true);
		}

	}

The most critical register() method is the key to complete service registration, the code is as follows:

protected void register() {
    
    
    this.serviceRegistry.register(getRegistration());
}

This.serviceRegistry here is NacosServiceRegistry:
insert image description here

2.2.3.NacosServiceRegistry

NacosServiceRegistryIt is the implementation class of Spring's ServiceRegistryinterface, and the ServiceRegistry interface is a protocol interface for service registration and discovery, which defines the declaration of register, deregister and other methods.

NacosServiceRegistryThe correct implementation is as follows register:

@Override
public void register(Registration registration) {
    
    
	// 判断serviceId是否为空,也就是spring.application.name不能为空
    if (StringUtils.isEmpty(registration.getServiceId())) {
    
    
        log.warn("No service to register for nacos client...");
        return;
    }
    // 获取Nacos的命名服务,其实就是注册中心服务
    NamingService namingService = namingService();
    // 获取 serviceId 和 Group
    String serviceId = registration.getServiceId();
    String group = nacosDiscoveryProperties.getGroup();
	// 封装服务实例的基本信息,如 cluster-name、是否为临时实例、权重、IP、端口等
    Instance instance = getNacosInstanceFromRegistration(registration);

    try {
    
    
        // 开始注册服务
        namingService.registerInstance(serviceId, group, instance);
        log.info("nacos registry, {} {} {}:{} register finished", group, serviceId,
                 instance.getIp(), instance.getPort());
    }
    catch (Exception e) {
    
    
        if (nacosDiscoveryProperties.isFailFast()) {
    
    
            log.error("nacos registry, {} register failed...{},", serviceId,
                      registration.toString(), e);
            rethrowRuntimeException(e);
        }
        else {
    
    
            log.warn("Failfast is false. {} register failed...{},", serviceId,
                     registration.toString(), e);
        }
    }
}

It can be seen that in the method, the registerInstance method of NamingService is finally called to realize the registration.

The default implementation of the NamingService interface is NacosNamingService.

2.2.4.NacosNamingService

NacosNamingService provides functions such as service registration and subscription.

Among them, registerInstance is the registration service instance, and the source code is as follows:

@Override
public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
    
    
    // 检查超时参数是否异常。心跳超时时间(默认15秒)必须大于心跳周期(默认5秒)
    NamingUtils.checkInstanceIsLegal(instance);
    // 拼接得到新的服务名,格式为:groupName@@serviceId
    String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
    // 判断是否为临时实例,默认为 true。
    if (instance.isEphemeral()) {
    
    
        // 如果是临时实例,需要定时向 Nacos 服务发送心跳
        BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance);
        beatReactor.addBeatInfo(groupedServiceName, beatInfo);
    }
    // 发送注册服务实例的请求
    serverProxy.registerService(groupedServiceName, groupName, instance);
}

Finally, the service registration is completed by the registerService method of NacosProxy.

code show as below:

public void registerService(String serviceName, String groupName, Instance instance) throws NacosException {
    
    

    NAMING_LOGGER.info("[REGISTER-SERVICE] {} registering service {} with instance: {}", namespaceId, serviceName,
                       instance);
	// 组织请求参数
    final Map<String, String> params = new HashMap<String, String>(16);
    params.put(CommonParams.NAMESPACE_ID, namespaceId);
    params.put(CommonParams.SERVICE_NAME, serviceName);
    params.put(CommonParams.GROUP_NAME, groupName);
    params.put(CommonParams.CLUSTER_NAME, instance.getClusterName());
    params.put("ip", instance.getIp());
    params.put("port", String.valueOf(instance.getPort()));
    params.put("weight", String.valueOf(instance.getWeight()));
    params.put("enable", String.valueOf(instance.isEnabled()));
    params.put("healthy", String.valueOf(instance.isHealthy()));
    params.put("ephemeral", String.valueOf(instance.isEphemeral()));
    params.put("metadata", JacksonUtils.toJson(instance.getMetadata()));
	// 通过POST请求将上述参数,发送到 /nacos/v1/ns/instance
    reqApi(UtilAndComs.nacosUrlInstance, params, HttpMethod.POST);

}

The information submitted here is the complete parameters required by the Nacos service registration interface. The core parameters are:

  • namespace_id: environment
  • service_name: service name
  • group_name: group name
  • cluster_name: cluster name
  • ip: the ip address of the current instance
  • port: the port of the current instance

In the registerInstance method of NacosNamingService, there is a section of code related to the service heartbeat, which we will continue to learn later.
insert image description here

2.2.5. Flow chart of client registration

As shown in the picture:
insert image description here

2.3. Server

In the nacos-console module, the nacos-naming module will be introduced:
insert image description here
the module structure is as follows:
insert image description here

Among them, the com.alibaba.nacos.naming.controllers package has various interfaces related to service registration and discovery, among which the service registration is in InstanceControllerthe class:
insert image description here

2.3.1.InstanceController

Entering the InstanceController class, you can see a register method, which is the method of service registration:

@CanDistro
@PostMapping
@Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
public String register(HttpServletRequest request) throws Exception {
    
    
	// 尝试获取namespaceId
    final String namespaceId = WebUtils
        .optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
    // 尝试获取serviceName,其格式为 group_name@@service_name
    final String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
    NamingUtils.checkServiceNameFormat(serviceName);
	// 解析出实例信息,封装为Instance对象
    final Instance instance = parseInstance(request);
	// 注册实例
    serviceManager.registerInstance(namespaceId, serviceName, instance);
    return "ok";
}

Here, we entered the serviceManager.registerInstance() method.

2.3.2.ServiceManager

ServiceManager is the core API for managing services and instance information in Nacos, which includes the service registry of Nacos:
insert image description here

And the registerInstance method is the method of registering service instances:

/**
     * Register an instance to a service in AP mode.
     *
     * <p>This method creates service or cluster silently if they don't exist.
     *
     * @param namespaceId id of namespace
     * @param serviceName service name
     * @param instance    instance to register
     * @throws Exception any error occurred in the process
     */
public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
    
    
	// 创建一个空的service(如果是第一次来注册实例,要先创建一个空service出来,放入注册表)
    // 此时不包含实例信息
    createEmptyService(namespaceId, serviceName, instance.isEphemeral());
    // 拿到创建好的service
    Service service = getService(namespaceId, serviceName);
    // 拿不到则抛异常
    if (service == null) {
    
    
        throw new NacosException(NacosException.INVALID_PARAM,
                                 "service not found, namespace: " + namespaceId + ", service: " + serviceName);
    }
    // 添加要注册的实例到service中
    addInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
}

After the service is created, the next step is to add an instance to the service:

/**
     * Add instance to service.
     *
     * @param namespaceId namespace
     * @param serviceName service name
     * @param ephemeral   whether instance is ephemeral
     * @param ips         instances
     * @throws NacosException nacos exception
     */
public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips)
    throws NacosException {
    
    
	// 监听服务列表用到的key,服务唯一标识,例如:com.alibaba.nacos.naming.iplist.ephemeral.public##DEFAULT_GROUP@@order-service
    String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
    // 获取服务
    Service service = getService(namespaceId, serviceName);
    // 同步锁,避免并发修改的安全问题
    synchronized (service) {
    
    
        // 1)获取要更新的实例列表
        List<Instance> instanceList = addIpAddresses(service, ephemeral, ips);
		// 2)封装实例列表到Instances对象
        Instances instances = new Instances();
        instances.setInstanceList(instanceList);
		// 3)完成 注册表更新 以及 Nacos集群的数据同步
        consistencyService.put(key, instances);
    }
}

In this method, the action of modifying the service list is locked to ensure thread safety. In the synchronous code block, the following steps are included:

  • 1) First obtain the list of instances to be updated,addIpAddresses(service, ephemeral, ips);
  • 2) Then encapsulate the updated data into Instancesan object, which will be used later when updating the registry
  • 3) Finally, call consistencyService.put()the method to complete the data synchronization of the Nacos cluster to ensure the consistency of the cluster.

Note: In addIPAddress in step 1, the old instance list will be copied and a new instance will be added to the list. In step 3, after updating the instance state, the old instance list will be directly overwritten with the new list. During the update process, the old instance list is not affected, and users can still read it.

In this way, in the process of updating the list state, there is no need to block the user's read operation, and it will not cause the user to read dirty data, and the performance is better. This scheme is called CopyOnWrite scheme.

1) Update service list

Let's take a look at the update of the instance list, the corresponding method is addIpAddresses(service, ephemeral, ips);:

private List<Instance> addIpAddresses(Service service, boolean ephemeral, Instance... ips) throws NacosException {
    
    
    return updateIpAddresses(service, UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD, ephemeral, ips);
}

Continue to enter updateIpAddressesthe method:

public List<Instance> updateIpAddresses(Service service, String action, boolean ephemeral, Instance... ips)
    throws NacosException {
    
    
	// 根据namespaceId、serviceName获取当前服务的实例列表,返回值是Datum
    // 第一次来,肯定是null
    Datum datum = consistencyService
        .get(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), ephemeral));
	// 得到服务中现有的实例列表
    List<Instance> currentIPs = service.allIPs(ephemeral);
    // 创建map,保存实例列表,key为ip地址,value是Instance对象
    Map<String, Instance> currentInstances = new HashMap<>(currentIPs.size());
    // 创建Set集合,保存实例的instanceId
    Set<String> currentInstanceIds = Sets.newHashSet();
	// 遍历要现有的实例列表
    for (Instance instance : currentIPs) {
    
    
        // 添加到map中
        currentInstances.put(instance.toIpAddr(), instance);
        // 添加instanceId到set中
        currentInstanceIds.add(instance.getInstanceId());
    }
	
    // 创建map,用来保存更新后的实例列表
    Map<String, Instance> instanceMap;
    if (datum != null && null != datum.value) {
    
    
        // 如果服务中已经有旧的数据,则先保存旧的实例列表
        instanceMap = setValid(((Instances) datum.value).getInstanceList(), currentInstances);
    } else {
    
    
        // 如果没有旧数据,则直接创建新的map
        instanceMap = new HashMap<>(ips.length);
    }
	// 遍历实例列表
    for (Instance instance : ips) {
    
    
        // 判断服务中是否包含要注册的实例的cluster信息
        if (!service.getClusterMap().containsKey(instance.getClusterName())) {
    
    
            // 如果不包含,创建新的cluster
            Cluster cluster = new Cluster(instance.getClusterName(), service);
            cluster.init();
            // 将集群放入service的注册表
            service.getClusterMap().put(instance.getClusterName(), cluster);
            Loggers.SRV_LOG
                .warn("cluster: {} not found, ip: {}, will create new cluster with default configuration.",
                      instance.getClusterName(), instance.toJson());
        }
		// 删除实例 or 新增实例 ?
        if (UtilsAndCommons.UPDATE_INSTANCE_ACTION_REMOVE.equals(action)) {
    
    
            instanceMap.remove(instance.getDatumKey());
        } else {
    
    
            // 新增实例,instance生成全新的instanceId
            Instance oldInstance = instanceMap.get(instance.getDatumKey());
            if (oldInstance != null) {
    
    
                instance.setInstanceId(oldInstance.getInstanceId());
            } else {
    
    
                instance.setInstanceId(instance.generateInstanceId(currentInstanceIds));
            }
            // 放入instance列表
            instanceMap.put(instance.getDatumKey(), instance);
        }

    }

    if (instanceMap.size() <= 0 && UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD.equals(action)) {
    
    
        throw new IllegalArgumentException(
            "ip list can not be empty, service: " + service.getName() + ", ip list: " + JacksonUtils
            .toJson(instanceMap.values()));
    }
	// 将instanceMap中的所有实例转为List返回
    return new ArrayList<>(instanceMap.values());
}

To put it simply, it is to obtain the old instance list first, then compare the new instance information with the old one, add the new instance, and synchronize the ID of the old instance. Then return the most recent list of instances.

2) Nacos cluster consistency

After completing the update of the local service list, Nacos implements the cluster consistency update again, calling:

consistencyService.put(key, instances);

The ConsistencyService interface here, which represents the interface of cluster consistency, has many different implementations:
insert image description here

Let's go to DelegateConsistencyServiceImpl to see:

@Override
public void put(String key, Record value) throws NacosException {
    
    
    // 根据实例是否是临时实例,判断委托对象
    mapConsistencyService(key).put(key, value);
}

One of mapConsistencyService(key)the methods is to choose the delegate method:

private ConsistencyService mapConsistencyService(String key) {
    
    
    // 判断是否是临时实例:
    // 是,选择 ephemeralConsistencyService,也就是 DistroConsistencyServiceImpl类
    // 否,选择 persistentConsistencyService,也就是PersistentConsistencyServiceDelegateImpl
    return KeyBuilder.matchEphemeralKey(key) ? ephemeralConsistencyService : persistentConsistencyService;
}

By default, all instances are temporary instances, we just focus on DistroConsistencyServiceImpl.

2.3.4.DistroConsistencyServiceImpl

Let's look at the consistency implementation of the temporary instance: the put method of the DistroConsistencyServiceImpl class:

public void put(String key, Record value) throws NacosException {
    
    
    // 先将要更新的实例信息写入本地实例列表
    onPut(key, value);
    // 开始集群同步
    distroProtocol.sync(new DistroKey(key, KeyBuilder.INSTANCE_LIST_KEY_PREFIX), DataOperation.CHANGE,
                        globalConfig.getTaskDispatchPeriod() / 2);
}

This method has only two lines:

  • onPut(key, value): where value is Instances, the service information to be updated. This line is mainly based on the thread pool method, asynchronously writing Service information into the registry (that is, the multiple Map)
  • distroProtocol.sync(): It is to synchronize data to other Nacos nodes in the cluster through the Distro protocol

Let's look at the onPut method first

2.3.4.1. Update the local instance list

1) Put into the blocking queue

The onPut method is as follows:

public void onPut(String key, Record value) {
    
    
	// 判断是否是临时实例
    if (KeyBuilder.matchEphemeralInstanceListKey(key)) {
    
    
        // 封装 Instances 信息到 数据集:Datum
        Datum<Instances> datum = new Datum<>();
        datum.value = (Instances) value;
        datum.key = key;
        datum.timestamp.incrementAndGet();
        // 放入DataStore
        dataStore.put(key, datum);
    }

    if (!listeners.containsKey(key)) {
    
    
        return;
    }
	// 放入阻塞队列,这里的 notifier维护了一个阻塞队列,并且基于线程池异步执行队列中的任务
    notifier.addTask(key, DataOperation.CHANGE);
}

The type of notifier is that DistroConsistencyServiceImpl.Notifiera blocking queue is maintained internally to store changes in the service list:
insert image description here

When addTask, add the task to the blocking queue:

// DistroConsistencyServiceImpl.Notifier类的 addTask 方法:
public void addTask(String datumKey, DataOperation action) {
    
    

    if (services.containsKey(datumKey) && action == DataOperation.CHANGE) {
    
    
        return;
    }
    if (action == DataOperation.CHANGE) {
    
    
        services.put(datumKey, StringUtils.EMPTY);
    }
    // 任务放入阻塞队列
    tasks.offer(Pair.with(datumKey, action));
}
2) Notifier is updated asynchronously

At the same time, the notifier is still a Runnable, which continuously obtains tasks from the blocking queue through a single-threaded thread pool, and executes the update of the service list. Let's take a look at the run method:

// DistroConsistencyServiceImpl.Notifier类的run方法:
@Override
public void run() {
    
    
    Loggers.DISTRO.info("distro notifier started");
	// 死循环,不断执行任务。因为是阻塞队列,不会导致CPU负载过高
    for (; ; ) {
    
    
        try {
    
    
            // 从阻塞队列中获取任务
            Pair<String, DataOperation> pair = tasks.take();
            // 处理任务,更新服务列表
            handle(pair);
        } catch (Throwable e) {
    
    
            Loggers.DISTRO.error("[NACOS-DISTRO] Error while handling notifying task", e);
        }
    }
}

Let's take a look at the handle method:

// DistroConsistencyServiceImpl.Notifier类的 handle 方法:
private void handle(Pair<String, DataOperation> pair) {
    
    
    try {
    
    
        String datumKey = pair.getValue0();
        DataOperation action = pair.getValue1();

        services.remove(datumKey);

        int count = 0;

        if (!listeners.containsKey(datumKey)) {
    
    
            return;
        }
		// 遍历,找到变化的service,这里的 RecordListener就是 Service
        for (RecordListener listener : listeners.get(datumKey)) {
    
    

            count++;

            try {
    
    
                // 服务的实例列表CHANGE事件
                if (action == DataOperation.CHANGE) {
    
    
                    // 更新服务列表
                    listener.onChange(datumKey, dataStore.get(datumKey).value);
                    continue;
                }
				// 服务的实例列表 DELETE 事件
                if (action == DataOperation.DELETE) {
    
    
                    listener.onDelete(datumKey);
                    continue;
                }
            } catch (Throwable e) {
    
    
                Loggers.DISTRO.error("[NACOS-DISTRO] error while notifying listener of key: {}", datumKey, e);
            }
        }

        if (Loggers.DISTRO.isDebugEnabled()) {
    
    
            Loggers.DISTRO
                .debug("[NACOS-DISTRO] datum change notified, key: {}, listener count: {}, action: {}",
                       datumKey, count, action.name());
        }
    } catch (Throwable e) {
    
    
        Loggers.DISTRO.error("[NACOS-DISTRO] Error while handling notifying task", e);
    }
}
3) Override instance list

In the onChange method of Service, you can see the logic of updating the instance list:

@Override
public void onChange(String key, Instances value) throws Exception {
    
    

    Loggers.SRV_LOG.info("[NACOS-RAFT] datum is changed, key: {}, value: {}", key, value);

	// 更新实例列表
    updateIPs(value.getInstanceList(), KeyBuilder.matchEphemeralInstanceListKey(key));

    recalculateChecksum();
}

updateIPs method:

public void updateIPs(Collection<Instance> instances, boolean ephemeral) {
    
    
    // 准备一个Map,key是cluster,值是集群下的Instance集合
    Map<String, List<Instance>> ipMap = new HashMap<>(clusterMap.size());
    // 获取服务的所有cluster名称
    for (String clusterName : clusterMap.keySet()) {
    
    
        ipMap.put(clusterName, new ArrayList<>());
    }
    // 遍历要更新的实例
    for (Instance instance : instances) {
    
    
        try {
    
    
            if (instance == null) {
    
    
                Loggers.SRV_LOG.error("[NACOS-DOM] received malformed ip: null");
                continue;
            }
			// 判断实例是否包含clusterName,没有的话用默认cluster
            if (StringUtils.isEmpty(instance.getClusterName())) {
    
    
                instance.setClusterName(UtilsAndCommons.DEFAULT_CLUSTER_NAME);
            }
			// 判断cluster是否存在,不存在则创建新的cluster
            if (!clusterMap.containsKey(instance.getClusterName())) {
    
    
                Loggers.SRV_LOG
                    .warn("cluster: {} not found, ip: {}, will create new cluster with default configuration.",
                          instance.getClusterName(), instance.toJson());
                Cluster cluster = new Cluster(instance.getClusterName(), this);
                cluster.init();
                getClusterMap().put(instance.getClusterName(), cluster);
            }
			// 获取当前cluster实例的集合,不存在则创建新的
            List<Instance> clusterIPs = ipMap.get(instance.getClusterName());
            if (clusterIPs == null) {
    
    
                clusterIPs = new LinkedList<>();
                ipMap.put(instance.getClusterName(), clusterIPs);
            }
			// 添加新的实例到 Instance 集合
            clusterIPs.add(instance);
        } catch (Exception e) {
    
    
            Loggers.SRV_LOG.error("[NACOS-DOM] failed to process ip: " + instance, e);
        }
    }

    for (Map.Entry<String, List<Instance>> entry : ipMap.entrySet()) {
    
    
        //make every ip mine
        List<Instance> entryIPs = entry.getValue();
        // 将实例集合更新到 clusterMap(注册表)
        clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral);
    }

    setLastModifiedMillis(System.currentTimeMillis());
    // 发布服务变更的通知消息
    getPushService().serviceChanged(this);
    StringBuilder stringBuilder = new StringBuilder();

    for (Instance instance : allIPs()) {
    
    
        stringBuilder.append(instance.toIpAddr()).append("_").append(instance.isHealthy()).append(",");
    }

    Loggers.EVT_LOG.info("[IP-UPDATED] namespace: {}, service: {}, ips: {}", getNamespaceId(), getName(),
                         stringBuilder.toString());

}

In the code on line 45:clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral);

Just updating the registry:

public void updateIps(List<Instance> ips, boolean ephemeral) {
    
    
    // 获取旧实例列表
    Set<Instance> toUpdateInstances = ephemeral ? ephemeralInstances : persistentInstances;

    HashMap<String, Instance> oldIpMap = new HashMap<>(toUpdateInstances.size());

    for (Instance ip : toUpdateInstances) {
    
    
        oldIpMap.put(ip.getDatumKey(), ip);
    }

	// 检查新加入实例的状态
    List<Instance> newIPs = subtract(ips, oldIpMap.values());
    if (newIPs.size() > 0) {
    
    
        Loggers.EVT_LOG
            .info("{} {SYNC} {IP-NEW} cluster: {}, new ips size: {}, content: {}", getService().getName(),
                  getName(), newIPs.size(), newIPs.toString());

        for (Instance ip : newIPs) {
    
    
            HealthCheckStatus.reset(ip);
        }
    }
	// 移除要删除的实例
    List<Instance> deadIPs = subtract(oldIpMap.values(), ips);

    if (deadIPs.size() > 0) {
    
    
        Loggers.EVT_LOG
            .info("{} {SYNC} {IP-DEAD} cluster: {}, dead ips size: {}, content: {}", getService().getName(),
                  getName(), deadIPs.size(), deadIPs.toString());

        for (Instance ip : deadIPs) {
    
    
            HealthCheckStatus.remv(ip);
        }
    }

    toUpdateInstances = new HashSet<>(ips);
	// 直接覆盖旧实例列表
    if (ephemeral) {
    
    
        ephemeralInstances = toUpdateInstances;
    } else {
    
    
        persistentInstances = toUpdateInstances;
    }
}

2.3.4.2. Cluster data synchronization

There are two steps in the put method of DistroConsistencyServiceImpl:
insert image description here

The onPut method has already been analyzed.

The following distroProtocol.sync() is the logic of cluster synchronization.

The sync method of the DistroProtocol class is as follows:

public void sync(DistroKey distroKey, DataOperation action, long delay) {
    
    
    // 遍历 Nacos 集群中除自己以外的其它节点
    for (Member each : memberManager.allMembersWithoutSelf()) {
    
    
        DistroKey distroKeyWithTarget = new DistroKey(distroKey.getResourceKey(), distroKey.getResourceType(),
                                                      each.getAddress());
        // 定义一个Distro的同步任务
        DistroDelayTask distroDelayTask = new DistroDelayTask(distroKeyWithTarget, action, delay);
        // 交给线程池去执行
        distroTaskEngineHolder.getDelayTaskExecuteEngine().addTask(distroKeyWithTarget, distroDelayTask);
        if (Loggers.DISTRO.isDebugEnabled()) {
    
    
            Loggers.DISTRO.debug("[DISTRO-SCHEDULE] {} to {}", distroKey, each.getAddress());
        }
    }
}

The synchronized task is encapsulated as an DistroDelayTaskobject.

Handed over to distroTaskEngineHolder.getDelayTaskExecuteEngine()execution, the return value of this line of code is:

NacosDelayTaskExecuteEngine, this class maintains a thread pool, and receives and executes tasks.

The method of executing tasks is the processTasks() method:

protected void processTasks() {
    
    
    Collection<Object> keys = getAllTaskKeys();
    for (Object taskKey : keys) {
    
    
        AbstractDelayTask task = removeTask(taskKey);
        if (null == task) {
    
    
            continue;
        }
        NacosTaskProcessor processor = getProcessor(taskKey);
        if (null == processor) {
    
    
            getEngineLog().error("processor not found for task, so discarded. " + task);
            continue;
        }
        try {
    
    
            // 尝试执行同步任务,如果失败会重试
            if (!processor.process(task)) {
    
    
                retryFailedTask(taskKey, task);
            }
        } catch (Throwable e) {
    
    
            getEngineLog().error("Nacos task execute error : " + e.toString(), e);
            retryFailedTask(taskKey, task);
        }
    }
}

It can be seen that the synchronization based on the Distro mode is performed asynchronously, and when it fails, the task will be re-queued and enriched. Therefore, the strong consistency of the synchronization results is not guaranteed, which belongs to the consistency strategy of the AP mode.

2.3.5. Server flow chart

insert image description here

2.4. Summary

  • What is the registry structure of Nacos?

    • Answer: Nacos is a multi-level storage model. The outermost layer implements environment isolation through namespace, and then grouping, under which are services. A service can be divided into different clusters, and a cluster contains multiple instances. Therefore, its registry structure is a Map, and its type is:

      Map<String, Map<String, Service>>

      The outer key is namespace_idand the inner key is group+serviceName.

      Service maintains a Map inside, the structure is: Map<String,Cluster>, the key is clusterName, and the value is cluster information

      The Cluster maintains a Set collection internally, and the elements are of the Instance type, representing multiple instances in the cluster.

  • How does Nacos guarantee the security of concurrent writing?

    • A: First of all, when registering an instance, the service will be locked. There is no concurrent write problem between different services, and they do not affect each other. Mutual exclusion through locks for the same service. Moreover, when updating the instance list, it is done based on the asynchronous thread pool, and the number of threads in the thread pool is 1.
  • How does Nacos avoid concurrent read and write conflicts?

    • Answer: When Nacos updates the instance list, it will use the CopyOnWrite technology. First, copy the Old instance list, then update the copied instance list, and then use the updated instance list to overwrite the old instance list.
  • How does Nacos deal with the concurrent write requests of hundreds of thousands of services within Ali?

    • Answer: Nacos will put the service registration tasks into the blocking queue internally, and use the thread pool to complete the instance update asynchronously, thereby improving the concurrent writing capability.

3. Service Heartbeat

Nacos instances are divided into temporary instances and permanent instances, which can be configured in the yaml file:

spring:
  application:
    name: order-service
  cloud:
    nacos:
      discovery:
        ephemeral: false # 设置实例为永久实例。true:临时; false:永久
      server-addr: 192.168.150.1:8845

Temporary instances do health checks based on heartbeat, while permanent instances are actively detected by Nacos.

The heartbeat API interface provided by Nacos is:

Interface description : Send the heartbeat of an instance

Request type : PUT

Request path :

/nacos/v1/ns/instance/beat

Request parameters :

name type Is it required? describe
serviceName string yes Service Name
groupName string no group name
ephemeral boolean no Is it a temporary instance
beat JSON format string yes Instance heartbeat content

Error code :

error code describe Semantics
400 Bad Request Syntax error in client request
403 Forbidden permission denied
404 Not Found resource not found
500 Internal Server Error internal server error
200 OK normal

3.1. Client

In the section 2.2.4. Service Registration, we said that the NacosNamingService class implements service registration and also implements service heartbeat:

@Override
public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
    
    
    NamingUtils.checkInstanceIsLegal(instance);
    String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
    // 判断是否是临时实例。
    if (instance.isEphemeral()) {
    
    
        // 如果是临时实例,则构建心跳信息BeatInfo
        BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance);
        // 添加心跳任务
        beatReactor.addBeatInfo(groupedServiceName, beatInfo);
    }
    serverProxy.registerService(groupedServiceName, groupName, instance);
}

3.1.1.BeatInfo

The BeanInfo here contains all kinds of information needed for heartbeat:
insert image description here

3.1.2.BeatReactor

And BeatReactorthis class maintains a thread pool:
insert image description here

A heartbeat is performed when the method is BeatReactorcalled .addBeatInfo(groupedServiceName, beatInfo):

public void addBeatInfo(String serviceName, BeatInfo beatInfo) {
    
    
    NAMING_LOGGER.info("[BEAT] adding beat: {} to beat map.", beatInfo);
    String key = buildKey(serviceName, beatInfo.getIp(), beatInfo.getPort());
    BeatInfo existBeat = null;
    //fix #1733
    if ((existBeat = dom2Beat.remove(key)) != null) {
    
    
        existBeat.setStopped(true);
    }
    dom2Beat.put(key, beatInfo);
    // 利用线程池,定期执行心跳任务,周期为 beatInfo.getPeriod()
    executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS);
    MetricsMonitor.getDom2BeatSizeMonitor().set(dom2Beat.size());
}

The default value of the heartbeat period is in com.alibaba.nacos.api.common.Constantsthe class:
insert image description here
you can see that it is 5 seconds, and the default is 5 seconds for a heartbeat.

3.1.3.BeatTask

The heartbeat task is encapsulated in BeatTaskthis class, which is a Runnable whose run method is as follows:

@Override
public void run() {
    
    
    if (beatInfo.isStopped()) {
    
    
        return;
    }
    // 获取心跳周期
    long nextTime = beatInfo.getPeriod();
    try {
    
    
        // 发送心跳
        JsonNode result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled);
        long interval = result.get("clientBeatInterval").asLong();
        boolean lightBeatEnabled = false;
        if (result.has(CommonParams.LIGHT_BEAT_ENABLED)) {
    
    
            lightBeatEnabled = result.get(CommonParams.LIGHT_BEAT_ENABLED).asBoolean();
        }
        BeatReactor.this.lightBeatEnabled = lightBeatEnabled;
        if (interval > 0) {
    
    
            nextTime = interval;
        }
        // 判断心跳结果
        int code = NamingResponseCode.OK;
        if (result.has(CommonParams.CODE)) {
    
    
            code = result.get(CommonParams.CODE).asInt();
        }
        if (code == NamingResponseCode.RESOURCE_NOT_FOUND) {
    
    
            // 如果失败,则需要 重新注册实例
            Instance instance = new Instance();
            instance.setPort(beatInfo.getPort());
            instance.setIp(beatInfo.getIp());
            instance.setWeight(beatInfo.getWeight());
            instance.setMetadata(beatInfo.getMetadata());
            instance.setClusterName(beatInfo.getCluster());
            instance.setServiceName(beatInfo.getServiceName());
            instance.setInstanceId(instance.getInstanceId());
            instance.setEphemeral(true);
            try {
    
    
                serverProxy.registerService(beatInfo.getServiceName(),
                                            NamingUtils.getGroupName(beatInfo.getServiceName()), instance);
            } catch (Exception ignore) {
    
    
            }
        }
    } catch (NacosException ex) {
    
    
        NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: {}, code: {}, msg: {}",
                            JacksonUtils.toJson(beatInfo), ex.getErrCode(), ex.getErrMsg());

    } catch (Exception unknownEx) {
    
    
        NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: {}, unknown exception msg: {}",
                            JacksonUtils.toJson(beatInfo), unknownEx.getMessage(), unknownEx);
    } finally {
    
    
        executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS);
    }
}

3.1.5. Send heartbeat

The sending of the final heartbeat is still achieved through the method NamingProxy:sendBeat

public JsonNode sendBeat(BeatInfo beatInfo, boolean lightBeatEnabled) throws NacosException {
    
    

    if (NAMING_LOGGER.isDebugEnabled()) {
    
    
        NAMING_LOGGER.debug("[BEAT] {} sending beat to server: {}", namespaceId, beatInfo.toString());
    }
    // 组织请求参数
    Map<String, String> params = new HashMap<String, String>(8);
    Map<String, String> bodyMap = new HashMap<String, String>(2);
    if (!lightBeatEnabled) {
    
    
        bodyMap.put("beat", JacksonUtils.toJson(beatInfo));
    }
    params.put(CommonParams.NAMESPACE_ID, namespaceId);
    params.put(CommonParams.SERVICE_NAME, beatInfo.getServiceName());
    params.put(CommonParams.CLUSTER_NAME, beatInfo.getCluster());
    params.put("ip", beatInfo.getIp());
    params.put("port", String.valueOf(beatInfo.getPort()));
    // 发送请求,这个地址就是:/v1/ns/instance/beat
    String result = reqApi(UtilAndComs.nacosUrlBase + "/instance/beat", params, bodyMap, HttpMethod.PUT);
    return JacksonUtils.toObj(result);
}

3.2. Server

For temporary instances, the server code is divided into two parts:

  • 1) InstanceController provides an interface to handle the client's heartbeat request
  • 2) Regularly check whether the instance heartbeat is executed on schedule

3.2.1.InstanceController

As with service registration, in the InstanceController class in the nacos-naming module, a method is defined to handle heartbeat requests:

@CanDistro
@PutMapping("/beat")
@Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
public ObjectNode beat(HttpServletRequest request) throws Exception {
    
    
	// 解析心跳的请求参数
    ObjectNode result = JacksonUtils.createEmptyJsonNode();
    result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, switchDomain.getClientBeatInterval());

    String beat = WebUtils.optional(request, "beat", StringUtils.EMPTY);
    RsInfo clientBeat = null;
    if (StringUtils.isNotBlank(beat)) {
    
    
        clientBeat = JacksonUtils.toObj(beat, RsInfo.class);
    }
    String clusterName = WebUtils
        .optional(request, CommonParams.CLUSTER_NAME, UtilsAndCommons.DEFAULT_CLUSTER_NAME);
    String ip = WebUtils.optional(request, "ip", StringUtils.EMPTY);
    int port = Integer.parseInt(WebUtils.optional(request, "port", "0"));
    if (clientBeat != null) {
    
    
        if (StringUtils.isNotBlank(clientBeat.getCluster())) {
    
    
            clusterName = clientBeat.getCluster();
        } else {
    
    
            // fix #2533
            clientBeat.setCluster(clusterName);
        }
        ip = clientBeat.getIp();
        port = clientBeat.getPort();
    }
    String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
    String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
    NamingUtils.checkServiceNameFormat(serviceName);
    Loggers.SRV_LOG.debug("[CLIENT-BEAT] full arguments: beat: {}, serviceName: {}", clientBeat, serviceName);
    // 尝试根据参数中的namespaceId、serviceName、clusterName、ip、port等信息
    // 从Nacos的注册表中 获取实例
    Instance instance = serviceManager.getInstance(namespaceId, serviceName, clusterName, ip, port);
	// 如果获取失败,说明心跳失败,实例尚未注册
    if (instance == null) {
    
    
        if (clientBeat == null) {
    
    
            result.put(CommonParams.CODE, NamingResponseCode.RESOURCE_NOT_FOUND);
            return result;
        }

        Loggers.SRV_LOG.warn("[CLIENT-BEAT] The instance has been removed for health mechanism, "
                             + "perform data compensation operations, beat: {}, serviceName: {}", clientBeat, serviceName);
		// 这里重新注册一个实例
        instance = new Instance();
        instance.setPort(clientBeat.getPort());
        instance.setIp(clientBeat.getIp());
        instance.setWeight(clientBeat.getWeight());
        instance.setMetadata(clientBeat.getMetadata());
        instance.setClusterName(clusterName);
        instance.setServiceName(serviceName);
        instance.setInstanceId(instance.getInstanceId());
        instance.setEphemeral(clientBeat.isEphemeral());

        serviceManager.registerInstance(namespaceId, serviceName, instance);
    }
	// 尝试基于namespaceId和serviceName从 注册表中获取Service服务
    Service service = serviceManager.getService(namespaceId, serviceName);
	// 如果不存在,说明服务不存在,返回404
    if (service == null) {
    
    
        throw new NacosException(NacosException.SERVER_ERROR,
                                 "service not found: " + serviceName + "@" + namespaceId);
    }
    if (clientBeat == null) {
    
    
        clientBeat = new RsInfo();
        clientBeat.setIp(ip);
        clientBeat.setPort(port);
        clientBeat.setCluster(clusterName);
    }
    // 如果心跳没问题,开始处理心跳结果
    service.processClientBeat(clientBeat);

    result.put(CommonParams.CODE, NamingResponseCode.OK);
    if (instance.containsMetadata(PreservedMetadataKeys.HEART_BEAT_INTERVAL)) {
    
    
        result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, instance.getInstanceHeartBeatInterval());
    }
    result.put(SwitchEntry.LIGHT_BEAT_ENABLED, switchDomain.isLightBeatEnabled());
    return result;
}

Finally, after confirming that the service and instance corresponding to the heartbeat request exist, start handing over the service class to process the heartbeat request. The processClientBeat method of Service is called

3.2.2. Handling heartbeat requests

ServiceHow to check service.processClientBeat(clientBeat);:

public void processClientBeat(final RsInfo rsInfo) {
    
    
    ClientBeatProcessor clientBeatProcessor = new ClientBeatProcessor();
    clientBeatProcessor.setService(this);
    clientBeatProcessor.setRsInfo(rsInfo);
    HealthCheckReactor.scheduleNow(clientBeatProcessor);
}

It can be seen that the heartbeat information is encapsulated into the ClientBeatProcessor class and handed over to the HealthCheckReactor for processing. The HealthCheckReactor is the encapsulation of the thread pool, so there is no need to check too much.

The key business logic is in the class ClientBeatProcessor, which is a Runnable, and the run method is as follows:

@Override
public void run() {
    
    
    Service service = this.service;
    if (Loggers.EVT_LOG.isDebugEnabled()) {
    
    
        Loggers.EVT_LOG.debug("[CLIENT-BEAT] processing beat: {}", rsInfo.toString());
    }

    String ip = rsInfo.getIp();
    String clusterName = rsInfo.getCluster();
    int port = rsInfo.getPort();
    // 获取集群信息
    Cluster cluster = service.getClusterMap().get(clusterName);
    // 获取集群中的所有实例信息
    List<Instance> instances = cluster.allIPs(true);

    for (Instance instance : instances) {
    
    
        // 找到心跳的这个实例
        if (instance.getIp().equals(ip) && instance.getPort() == port) {
    
    
            if (Loggers.EVT_LOG.isDebugEnabled()) {
    
    
                Loggers.EVT_LOG.debug("[CLIENT-BEAT] refresh beat: {}", rsInfo.toString());
            }
            // 更新实例的最后一次心跳时间 lastBeat
            instance.setLastBeat(System.currentTimeMillis());
            if (!instance.isMarked()) {
    
    
                if (!instance.isHealthy()) {
    
    
                    instance.setHealthy(true);
                    Loggers.EVT_LOG
                        .info("service: {} {POS} {IP-ENABLED} valid: {}:{}@{}, region: {}, msg: client beat ok",
                              cluster.getService().getName(), ip, port, cluster.getName(),
                              UtilsAndCommons.LOCALHOST_SITE);
                    getPushService().serviceChanged(service);
                }
            }
        }
    }
}

The core of processing heartbeat requests is to update the last heartbeat time of the heartbeat instance, lastBeat, which will become a key indicator for judging whether the instance heartbeat has expired!

3.3.3. Abnormal heartbeat detection

When a service is registered, an Serviceobject must be created, and Servicethere is a initmethod in it that will be called during registration:

public void init() {
    
    
    // 开启心跳检测的任务
    HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
    for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) {
    
    
        entry.getValue().setService(this);
        entry.getValue().init();
    }
}

Among them, HealthCheckReactor.scheduleCheck is a scheduled task for performing heartbeat detection:
insert image description here

It can be seen that this task is executed every 5000ms, that is, the heartbeat status of the instance is detected once every 5 seconds.

The ClientBeatCheckTask here is also a Runnable, where the run method is:

@Override
public void run() {
    
    
    try {
    
    
        // 找到所有临时实例的列表
        List<Instance> instances = service.allIPs(true);

        // first set health status of instances:
        for (Instance instance : instances) {
    
    
            // 判断 心跳间隔(当前时间 - 最后一次心跳时间) 是否大于 心跳超时时间,默认15秒
            if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
    
    
                if (!instance.isMarked()) {
    
    
                    if (instance.isHealthy()) {
    
    
                        // 如果超时,标记实例为不健康 healthy = false
                        instance.setHealthy(false);
 
                        // 发布实例状态变更的事件
                        getPushService().serviceChanged(service);
                        ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
                    }
                }
            }
        }

        if (!getGlobalConfig().isExpireInstance()) {
    
    
            return;
        }

        // then remove obsolete instances:
        for (Instance instance : instances) {
    
    

            if (instance.isMarked()) {
    
    
                continue;
            }
           // 判断心跳间隔(当前时间 - 最后一次心跳时间)是否大于 实例被删除的最长超时时间,默认30秒
            if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
    
    
                // 如果是超过了30秒,则删除实例
                Loggers.SRV_LOG.info("[AUTO-DELETE-IP] service: {}, ip: {}", service.getName(),
                                     JacksonUtils.toJson(instance));
                deleteIp(instance);
            }
        }

    } catch (Exception e) {
    
    
        Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);
    }

}

The timeout period is also in com.alibaba.nacos.api.common.Constantsthis class:
insert image description here

3.3.4. Active health detection

For non-temporary instances (ephemeral=false), Nacos will use active health detection, send requests to the instance regularly, and judge the health status of the instance based on the response.

The registerInstance method in the class whose entry is in Section 2.3.2 ServiceManager:
insert image description here

When creating an empty service:

public void createEmptyService(String namespaceId, String serviceName, boolean local) throws NacosException {
    
    
    // 如果服务不存在,创建新的服务
    createServiceIfAbsent(namespaceId, serviceName, local, null);
}

Create a service process:

public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster)
    throws NacosException {
    
    
    // 尝试获取服务
    Service service = getService(namespaceId, serviceName);
    if (service == null) {
    
    
		// 发现服务不存在,开始创建新服务
        Loggers.SRV_LOG.info("creating empty service {}:{}", namespaceId, serviceName);
        service = new Service();
        service.setName(serviceName);
        service.setNamespaceId(namespaceId);
        service.setGroupName(NamingUtils.getGroupName(serviceName));
        // now validate the service. if failed, exception will be thrown
        service.setLastModifiedMillis(System.currentTimeMillis());
        service.recalculateChecksum();
        if (cluster != null) {
    
    
            cluster.setService(service);
            service.getClusterMap().put(cluster.getName(), cluster);
        }
        service.validate();
		// ** 写入注册表并初始化 **
        putServiceAndInit(service);
        if (!local) {
    
    
            addOrReplaceService(service);
        }
    }
}

The key is putServiceAndInit(service)in the method:

private void putServiceAndInit(Service service) throws NacosException {
    
    
    // 将服务写入注册表
    putService(service);
    service = getService(service.getNamespaceId(), service.getName());
    // 完成服务的初始化
    service.init();
    consistencyService
        .listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service);
    consistencyService
        .listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service);
    Loggers.SRV_LOG.info("[NEW-SERVICE] {}", service.toJson());
}

Enter the initialization logic: service.init(), this will enter the Service class:

/**
     * Init service.
     */
public void init() {
    
    
    // 开启临时实例的心跳监测任务
    HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
    // 遍历注册表中的集群
    for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) {
    
    
        entry.getValue().setService(this);
        // 完成集群初识化
        entry.getValue().init();
    }
}

Here the initialization of the cluster entry.getValue().init();goes into a method Clusterof type init():

/**
     * Init cluster.
     */
public void init() {
    
    
    if (inited) {
    
    
        return;
    }
    // 创建健康检测的任务
    checkTask = new HealthCheckTask(this);
	// 这里会开启对 非临时实例的 定时健康检测
    HealthCheckReactor.scheduleCheck(checkTask);
    inited = true;
}

Here HealthCheckReactor.scheduleCheck(checkTask);, scheduled tasks will be started to perform health checks on non-temporary instances. The detection logic is defined in HealthCheckTaskthis class, which is a Runnable, in which the run method:

public void run() {
    
    

    try {
    
    
        if (distroMapper.responsible(cluster.getService().getName()) && switchDomain
            .isHealthCheckEnabled(cluster.getService().getName())) {
    
    
            // 开始健康检测
            healthCheckProcessor.process(this);
			// 记录日志 。。。
        }
    } catch (Throwable e) {
    
    
       // 记录日志 。。。
    } finally {
    
    
        if (!cancelled) {
    
    
            // 结束后,再次进行任务调度,一定延迟后执行
            HealthCheckReactor.scheduleCheck(this);
            
            // 。。。
        }
    }
}

The health detection logic is defined in healthCheckProcessor.process(this);the method. In the HealthCheckProcessor interface, there are many implementations of this interface. The default is TcpSuperSenseProcessor:
insert image description here

Entered TcpSuperSenseProcessorprocess method:

@Override
public void process(HealthCheckTask task) {
    
    
    // 获取所有 非临时实例的 集合
    List<Instance> ips = task.getCluster().allIPs(false);

    if (CollectionUtils.isEmpty(ips)) {
    
    
        return;
    }

    for (Instance ip : ips) {
    
    
		// 封装健康检测信息到 Beat
        Beat beat = new Beat(ip, task);
        // 放入一个阻塞队列中
        taskQueue.add(beat);
        MetricsMonitor.getTcpHealthCheckMonitor().incrementAndGet();
    }
}

It can be seen that all health detection tasks are put into a blocking queue instead of being executed immediately. Here again, the strategy of asynchronous execution is adopted, and you can see a large number of such designs in Nacos.

And TcpSuperSenseProcessoritself is a Runnable, in its constructor will put itself into the thread pool to execute, its run method is as follows:

public void run() {
    
    
    while (true) {
    
    
        try {
    
    
            // 处理任务
            processTask();
            // ...
        } catch (Throwable e) {
    
    
            SRV_LOG.error("[HEALTH-CHECK] error while processing NIO task", e);
        }
    }
}

Process the task of health detection through processTask:

private void processTask() throws Exception {
    
    
    // 将任务封装为一个 TaskProcessor,并放入集合
    Collection<Callable<Void>> tasks = new LinkedList<>();
    do {
    
    
        Beat beat = taskQueue.poll(CONNECT_TIMEOUT_MS / 2, TimeUnit.MILLISECONDS);
        if (beat == null) {
    
    
            return;
        }

        tasks.add(new TaskProcessor(beat));
    } while (taskQueue.size() > 0 && tasks.size() < NIO_THREAD_COUNT * 64);
	// 批量处理集合中的任务
    for (Future<?> f : GlobalExecutor.invokeAllTcpSuperSenseTask(tasks)) {
    
    
        f.get();
    }
}

Tasks are encapsulated into TaskProcessor for execution, TaskProcessor is a Callable, the call method in it:

@Override
public Void call() {
    
    
    // 获取检测任务已经等待的时长
    long waited = System.currentTimeMillis() - beat.getStartTime();
    if (waited > MAX_WAIT_TIME_MILLISECONDS) {
    
    
        Loggers.SRV_LOG.warn("beat task waited too long: " + waited + "ms");
    }
	
    SocketChannel channel = null;
    try {
    
    
        // 获取实例信息
        Instance instance = beat.getIp();
		// 通过NIO建立TCP连接
        channel = SocketChannel.open();
        channel.configureBlocking(false);
        // only by setting this can we make the socket close event asynchronous
        channel.socket().setSoLinger(false, -1);
        channel.socket().setReuseAddress(true);
        channel.socket().setKeepAlive(true);
        channel.socket().setTcpNoDelay(true);

        Cluster cluster = beat.getTask().getCluster();
        int port = cluster.isUseIPPort4Check() ? instance.getPort() : cluster.getDefCkport();
        channel.connect(new InetSocketAddress(instance.getIp(), port));
		// 注册连接、读取事件
        SelectionKey key = channel.register(selector, SelectionKey.OP_CONNECT | SelectionKey.OP_READ);
        key.attach(beat);
        keyMap.put(beat.toString(), new BeatKey(key));

        beat.setStartTime(System.currentTimeMillis());

        GlobalExecutor
            .scheduleTcpSuperSenseTask(new TimeOutTask(key), CONNECT_TIMEOUT_MS, TimeUnit.MILLISECONDS);
    } catch (Exception e) {
    
    
        beat.finishCheck(false, false, switchDomain.getTcpHealthParams().getMax(),
                         "tcp:error:" + e.getMessage());

        if (channel != null) {
    
    
            try {
    
    
                channel.close();
            } catch (Exception ignore) {
    
    
            }
        }
    }

    return null;
}

3.3. Summary

There are two modes of health detection in Nacos:

  • Temporary instance:
    • Adopt the client heartbeat detection mode, the heartbeat period is 5 seconds
    • Heartbeat intervals longer than 15 seconds are marked as unhealthy
    • If the heartbeat interval exceeds 30 seconds, it will be deleted from the service list
  • Permanent instance:
    • Adopt server-side active health detection method
    • A random number with a period of 2000 + 5000 milliseconds
    • Detecting anomalies will only be marked as unhealthy and will not be deleted

So why does Nacos have both temporary and permanent instances?

Take Taobao as an example. During the Double Eleven promotion period, traffic will be much higher than usual. At this time, the service must add more instances to cope with high concurrency. These instances will no longer need to be used after Double Eleven. Temporary Examples are more appropriate. For some standing instances of services, it is more appropriate to use permanent instances .

Compared with eureka, both Nacos and Eureka are implemented based on the heartbeat mode on temporary instances, and the difference is not big, mainly because the heartbeat cycle is different, eureka is 30 seconds, and Nacos is 5 seconds.

In addition, Nacos supports permanent instances, but Eureka does not. Eureka only provides health monitoring in heartbeat mode, without active detection.

4. Service Discovery

Nacos provides an interface to query the instance list based on serviceId:

Interface description : query the list of instances under the service

Request type : GET

Request path :

/nacos/v1/ns/instance/list

Request parameters :

name type Is it required? describe
serviceName string yes Service Name
groupName string no group name
namespaceId string no Namespace ID
clusters String, multiple clusters separated by commas no cluster name
healthyOnly boolean 否,默认为false 是否只返回健康实例

错误编码

错误代码 描述 语义
400 Bad Request 客户端请求中的语法错误
403 Forbidden 没有权限
404 Not Found 无法找到资源
500 Internal Server Error 服务器内部错误
200 OK 正常

4.1.客户端

4.1.1.定时更新服务列表

4.1.1.1.NacosNamingService

在2.2.4小节中,我们讲到一个类NacosNamingService,这个类不仅仅提供了服务注册功能,同样提供了服务发现的功能。
insert image description here

多个重载的方法最终都会进入一个方法:

@Override
public List<Instance> getAllInstances(String serviceName, String groupName, List<String> clusters,
                                      boolean subscribe) throws NacosException {
    
    

    ServiceInfo serviceInfo;
    // 1.判断是否需要订阅服务信息(默认为 true)
    if (subscribe) {
    
    
        // 1.1.订阅服务信息
        serviceInfo = hostReactor.getServiceInfo(NamingUtils.getGroupedName(serviceName, groupName),
                                                 StringUtils.join(clusters, ","));
    } else {
    
    
        // 1.2.直接去nacos拉取服务信息
        serviceInfo = hostReactor
            .getServiceInfoDirectlyFromServer(NamingUtils.getGroupedName(serviceName, groupName),
                                              StringUtils.join(clusters, ","));
    }
    // 2.从服务信息中获取实例列表并返回
    List<Instance> list;
    if (serviceInfo == null || CollectionUtils.isEmpty(list = serviceInfo.getHosts())) {
    
    
        return new ArrayList<Instance>();
    }
    return list;
}

4.1.1.2.HostReactor

进入1.1.订阅服务消息,这里是由HostReactor类的getServiceInfo()方法来实现的:

public ServiceInfo getServiceInfo(final String serviceName, final String clusters) {
    
    

    NAMING_LOGGER.debug("failover-mode: " + failoverReactor.isFailoverSwitch());
    // 由 服务名@@集群名拼接 key
    String key = ServiceInfo.getKey(serviceName, clusters);
    if (failoverReactor.isFailoverSwitch()) {
    
    
        return failoverReactor.getService(key);
    }
    // 读取本地服务列表的缓存,缓存是一个Map,格式:Map<String, ServiceInfo>
    ServiceInfo serviceObj = getServiceInfo0(serviceName, clusters);
    // 判断缓存是否存在
    if (null == serviceObj) {
    
    
        // 不存在,创建空ServiceInfo
        serviceObj = new ServiceInfo(serviceName, clusters);
        // 放入缓存
        serviceInfoMap.put(serviceObj.getKey(), serviceObj);
        // 放入待更新的服务列表(updatingMap)中
        updatingMap.put(serviceName, new Object());
        // 立即更新服务列表
        updateServiceNow(serviceName, clusters);
        // 从待更新列表中移除
        updatingMap.remove(serviceName);

    } else if (updatingMap.containsKey(serviceName)) {
    
    
        // 缓存中有,但是需要更新
        if (UPDATE_HOLD_INTERVAL > 0) {
    
    
            // hold a moment waiting for update finish 等待5秒中,待更新完成
            synchronized (serviceObj) {
    
    
                try {
    
    
                    serviceObj.wait(UPDATE_HOLD_INTERVAL);
                } catch (InterruptedException e) {
    
    
                    NAMING_LOGGER
                        .error("[getServiceInfo] serviceName:" + serviceName + ", clusters:" + clusters, e);
                }
            }
        }
    }
    // 开启定时更新服务列表的功能
    scheduleUpdateIfAbsent(serviceName, clusters);
    // 返回缓存中的服务信息
    return serviceInfoMap.get(serviceObj.getKey());
}

基本逻辑就是先从本地缓存读,根据结果来选择:

  • 如果本地缓存没有,立即去nacos读取,updateServiceNow(serviceName, clusters)
    insert image description here

  • 如果本地缓存有,则开启定时更新功能,并返回缓存结果:

    • scheduleUpdateIfAbsent(serviceName, clusters)

insert image description here

在UpdateTask中,最终还是调用updateService方法:
insert image description here

不管是立即更新服务列表,还是定时更新服务列表,最终都会执行HostReactor中的updateService()方法:

public void updateService(String serviceName, String clusters) throws NacosException {
    
    
    ServiceInfo oldService = getServiceInfo0(serviceName, clusters);
    try {
    
    
		// 基于ServerProxy发起远程调用,查询服务列表
        String result = serverProxy.queryList(serviceName, clusters, pushReceiver.getUdpPort(), false);

        if (StringUtils.isNotEmpty(result)) {
    
    
            // 处理查询结果
            processServiceJson(result);
        }
    } finally {
    
    
        if (oldService != null) {
    
    
            synchronized (oldService) {
    
    
                oldService.notifyAll();
            }
        }
    }
}

4.1.1.3.ServerProxy

而ServerProxy的queryList方法如下:

public String queryList(String serviceName, String clusters, int udpPort, boolean healthyOnly)
    throws NacosException {
    
    
	// 准备请求参数
    final Map<String, String> params = new HashMap<String, String>(8);
    params.put(CommonParams.NAMESPACE_ID, namespaceId);
    params.put(CommonParams.SERVICE_NAME, serviceName);
    params.put("clusters", clusters);
    params.put("udpPort", String.valueOf(udpPort));
    params.put("clientIP", NetUtils.localIP());
    params.put("healthyOnly", String.valueOf(healthyOnly));
	// 发起请求,地址与API接口一致
    return reqApi(UtilAndComs.nacosUrlBase + "/instance/list", params, HttpMethod.GET);
}

4.1.2.处理服务变更通知

除了定时更新服务列表的功能外,Nacos还支持服务列表变更时的主动推送功能。

在HostReactor类的构造函数中,有非常重要的几个步骤:
insert image description here

基本思路是:

  • 通过PushReceiver监听服务端推送的变更数据
  • 解析数据后,通过NotifyCenter发布服务变更的事件
  • InstanceChangeNotifier监听变更事件,完成对服务列表的更新

4.1.2.1.PushReceiver

我们先看PushReceiver,这个类会以UDP方式接收Nacos服务端推送的服务变更数据。

先看构造函数:

public PushReceiver(HostReactor hostReactor) {
    
    
    try {
    
    
        this.hostReactor = hostReactor;
        // 创建 UDP客户端
        String udpPort = getPushReceiverUdpPort();
        if (StringUtils.isEmpty(udpPort)) {
    
    
            this.udpSocket = new DatagramSocket();
        } else {
    
    
            this.udpSocket = new DatagramSocket(new InetSocketAddress(Integer.parseInt(udpPort)));
        }
        // 准备线程池
        this.executorService = new ScheduledThreadPoolExecutor(1, new ThreadFactory() {
    
    
            @Override
            public Thread newThread(Runnable r) {
    
    
                Thread thread = new Thread(r);
                thread.setDaemon(true);
                thread.setName("com.alibaba.nacos.naming.push.receiver");
                return thread;
            }
        });
		// 开启线程任务,准备接收变更数据
        this.executorService.execute(this);
    } catch (Exception e) {
    
    
        NAMING_LOGGER.error("[NA] init udp socket failed", e);
    }
}

The PushReceiver constructor runs tasks based on the thread pool. This is because PushReceiver itself is also a Runnable, and the business logic of the run method is as follows:

@Override
public void run() {
    
    
    while (!closed) {
    
    
        try {
    
    
            // byte[] is initialized with 0 full filled by default
            byte[] buffer = new byte[UDP_MSS];
            DatagramPacket packet = new DatagramPacket(buffer, buffer.length);
			// 接收推送数据
            udpSocket.receive(packet);
			// 解析为json字符串
            String json = new String(IoUtils.tryDecompress(packet.getData()), UTF_8).trim();
            NAMING_LOGGER.info("received push data: " + json + " from " + packet.getAddress().toString());
			// 反序列化为对象
            PushPacket pushPacket = JacksonUtils.toObj(json, PushPacket.class);
            String ack;
            if ("dom".equals(pushPacket.type) || "service".equals(pushPacket.type)) {
    
    
                // 交给 HostReactor去处理
                hostReactor.processServiceJson(pushPacket.data);

                // send ack to server 发送ACK回执,略。。
        } catch (Exception e) {
    
    
            if (closed) {
    
    
                return;
            }
            NAMING_LOGGER.error("[NA] error while receiving push data", e);
        }
    }
}

4.1.2.2.HostReactor

The processing of the notification data is handed over to HostReactorthe processServiceJsonmethod:

public ServiceInfo processServiceJson(String json) {
    
    
    // 解析出ServiceInfo信息
    ServiceInfo serviceInfo = JacksonUtils.toObj(json, ServiceInfo.class);
    String serviceKey = serviceInfo.getKey();
    if (serviceKey == null) {
    
    
        return null;
    }
    // 查询缓存中的 ServiceInfo
    ServiceInfo oldService = serviceInfoMap.get(serviceKey);

    // 如果缓存存在,则需要校验哪些数据要更新
    boolean changed = false;
    if (oldService != null) {
    
    
		// 拉取的数据是否已经过期
        if (oldService.getLastRefTime() > serviceInfo.getLastRefTime()) {
    
    
            NAMING_LOGGER.warn("out of date data received, old-t: " + oldService.getLastRefTime() + ", new-t: "
                               + serviceInfo.getLastRefTime());
        }
        // 放入缓存
        serviceInfoMap.put(serviceInfo.getKey(), serviceInfo);
		
        // 中间是缓存与新数据的对比,得到newHosts:新增的实例;remvHosts:待移除的实例;
        // modHosts:需要修改的实例
        if (newHosts.size() > 0 || remvHosts.size() > 0 || modHosts.size() > 0) {
    
    
            // 发布实例变更的事件
            NotifyCenter.publishEvent(new InstancesChangeEvent(
                serviceInfo.getName(), serviceInfo.getGroupName(),
                serviceInfo.getClusters(), serviceInfo.getHosts()));
            DiskCache.write(serviceInfo, cacheDir);
        }

    } else {
    
    
        // 本地缓存不存在
        changed = true;
        // 放入缓存
        serviceInfoMap.put(serviceInfo.getKey(), serviceInfo);
        // 直接发布实例变更的事件
        NotifyCenter.publishEvent(new InstancesChangeEvent(
            serviceInfo.getName(), serviceInfo.getGroupName(),
            serviceInfo.getClusters(), serviceInfo.getHosts()));
        serviceInfo.setJsonFromServer(json);
        DiskCache.write(serviceInfo, cacheDir);
    }
	// 。。。
    return serviceInfo;
}

4.2. Server

4.2.1. Pull service list interface

In the InstanceController introduced in Section 2.3.1, an interface for pulling the service list is provided:

/**
     * Get all instance of input service.
     *
     * @param request http request
     * @return list of instance
     * @throws Exception any error during list
     */
@GetMapping("/list")
@Secured(parser = NamingResourceParser.class, action = ActionTypes.READ)
public ObjectNode list(HttpServletRequest request) throws Exception {
    
    
    // 从request中获取namespaceId和serviceName
    String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
    String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
    NamingUtils.checkServiceNameFormat(serviceName);

    String agent = WebUtils.getUserAgent(request);
    String clusters = WebUtils.optional(request, "clusters", StringUtils.EMPTY);
    String clientIP = WebUtils.optional(request, "clientIP", StringUtils.EMPTY);
    // 获取客户端的 UDP端口
    int udpPort = Integer.parseInt(WebUtils.optional(request, "udpPort", "0"));
    String env = WebUtils.optional(request, "env", StringUtils.EMPTY);
    boolean isCheck = Boolean.parseBoolean(WebUtils.optional(request, "isCheck", "false"));

    String app = WebUtils.optional(request, "app", StringUtils.EMPTY);

    String tenant = WebUtils.optional(request, "tid", StringUtils.EMPTY);

    boolean healthyOnly = Boolean.parseBoolean(WebUtils.optional(request, "healthyOnly", "false"));

    // 获取服务列表
    return doSrvIpxt(namespaceId, serviceName, agent, clusters, clientIP, udpPort, env, isCheck, app, tenant,
                     healthyOnly);
}

Enter doSrvIpxt()the method to get the list of services:

public ObjectNode doSrvIpxt(String namespaceId, String serviceName, String agent,
                            String clusters, String clientIP,
                            int udpPort, String env, boolean isCheck,
                            String app, String tid, boolean healthyOnly) throws Exception {
    
    
    ClientInfo clientInfo = new ClientInfo(agent);
    ObjectNode result = JacksonUtils.createEmptyJsonNode();
    // 获取服务列表信息
    Service service = serviceManager.getService(namespaceId, serviceName);
    long cacheMillis = switchDomain.getDefaultCacheMillis();

    // now try to enable the push
    try {
    
    
        if (udpPort > 0 && pushService.canEnablePush(agent)) {
    
    
			// 添加当前客户端 IP、UDP端口到 PushService 中
            pushService
                .addClient(namespaceId, serviceName, clusters, agent, new InetSocketAddress(clientIP, udpPort),
                           pushDataSource, tid, app);
            cacheMillis = switchDomain.getPushCacheMillis(serviceName);
        }
    } catch (Exception e) {
    
    
        Loggers.SRV_LOG
            .error("[NACOS-API] failed to added push client {}, {}:{}", clientInfo, clientIP, udpPort, e);
        cacheMillis = switchDomain.getDefaultCacheMillis();
    }

    if (service == null) {
    
    
        // 如果没找到,返回空
        if (Loggers.SRV_LOG.isDebugEnabled()) {
    
    
            Loggers.SRV_LOG.debug("no instance to serve for service: {}", serviceName);
        }
        result.put("name", serviceName);
        result.put("clusters", clusters);
        result.put("cacheMillis", cacheMillis);
        result.replace("hosts", JacksonUtils.createEmptyArrayNode());
        return result;
    }
	// 结果的检测,异常实例的剔除等逻辑省略
    // 最终封装结果并返回 。。。

    result.replace("hosts", hosts);
    if (clientInfo.type == ClientInfo.ClientType.JAVA
        && clientInfo.version.compareTo(VersionUtil.parseVersion("1.0.0")) >= 0) {
    
    
        result.put("dom", serviceName);
    } else {
    
    
        result.put("dom", NamingUtils.getServiceName(serviceName));
    }
    result.put("name", serviceName);
    result.put("cacheMillis", cacheMillis);
    result.put("lastRefTime", System.currentTimeMillis());
    result.put("checksum", service.getChecksum());
    result.put("useSpecifiedURL", false);
    result.put("clusters", clusters);
    result.put("env", env);
    result.replace("metadata", JacksonUtils.transferToJsonNode(service.getMetadata()));
    return result;
}

4.2.2. Publish UDP notification of service change

InstanceControllerIn the method in the previous section doSrvIpxt(), there is such a line of code:

pushService.addClient(namespaceId, serviceName, clusters, agent,
                      new InetSocketAddress(clientIP, udpPort),
                           pushDataSource, tid, app);

In fact, the consumer's UDP port, IP and other information are encapsulated into a PushClient object and stored in the PushService. It is convenient to push messages after service changes in the future.

The PushService class itself implements ApplicationListenerthe interface:
insert image description here

This is the event listener interface, which listens to ServiceChangeEvent (service change event).

We will be notified when the list of services changes:
insert image description here

4.3. Summary

Nacos service discovery is divided into two modes:

  • Mode 1: Active pull mode, consumers actively pull the service list from Nacos regularly and cache it, and then read the service list in the local cache first when calling the service.
  • Mode 2: Subscription mode. Consumers subscribe to the service list in Nacos and receive service change notifications based on the UDP protocol. When the service list in Nacos is updated, a UDP broadcast will be sent to all subscribers.

Compared with Eureka, Nacos's subscription mode service status update is more timely, and it is easier for consumers to discover changes in the service list in a timely manner and eliminate faulty services.

Guess you like

Origin blog.csdn.net/sinat_38316216/article/details/129862342