nacos source code analysis - service registration (server)

Install Nacos source code

In the previous article, we learned about the source code of the " Nacos Service Registration " client. In this article, we will look at the source code execution of the service registration Nacos server. First of all, you need to download the Nacos source code, https://github.com/alibaba/nacos/releases/tag/1.4.3 , insert image description here
after decompression, use the IDEA tool to import it.

insert image description here
But after compiling, it is found that the code will report an error, mainly due to the lack of entity classes, such as:
insert image description here

Install protobuf

This is mainly because the bottom layer of nacos data communication should use protobuf for serialization (similar to JSON), which is a data serialization protocol provided by Google

Protocol Buffers is a lightweight and efficient structured data storage format that can be used for serialization of structured data and is very suitable for data storage or RPC data exchange format. It can be used in language-independent, platform-independent, and extensible serialized structured data formats in communication protocols, data storage, and other fields.

So here we need to install protobuf, first download https://github.com/protocolbuffers/protobuf/releases , download the window version as follows:
insert image description here

  • Unzip after downloading

insert image description here

  • Then you need to configure environment variables

insert image description here

  • Find the consistency module and enter src/main

insert image description here

  • Enter the main directory and execute the cmd command
protoc --java_out=./java ./proto/consistency.proto
protoc --java_out=./java ./proto/Data.proto

The effect is as follows:
insert image description here

Start Nacos

Find the console console and start Nacos. An error will be reported when starting Nacos for the first time, because the default is to start in cluster mode, and an error that jdbc.properties cannot be found will appear.
insert image description here

  • Then specify as stand-alone startup, specify VM parameters

insert image description here

  • Start successfully

insert image description here

  • Visit http://localhost:8848/nacos/index.html to enter the console

insert image description here

  • At this point, the source code of the nacos server is successfully started, then we try to start the nacos-client program and let it register with the nacos-server

insert image description here

  • Check the console, nacos-client is successfully registered to the server

insert image description here

service registration

In the previous chapter " Nacos Source Code Analysis - Service Registration (Client) ", we have analyzed that the address for nacos-client to submit registration is post /nacos/v1/ns/instance, then we can find this interface in the nacos-server source code, which is located in / in the naming module In the interface under the controllers package InstanceController. The source code is as follows

@RestController
@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/instance")
public class InstanceController {
    
    
    
    @Autowired
    private SwitchDomain switchDomain;
    
    @Autowired
    private PushService pushService;
    
    @Autowired
    private ServiceManager serviceManager;
    
  	...省略...
    
    /**
      注册一个新的实例
     * Register new instance.
     *
     * @param request http request
     * @return 'ok' if success
     * @throws Exception any error during register
     */
	@CanDistro
    @PostMapping
    @Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
    //request请求对象中包括了注册的服务的port,namespaceId,groupName,serviceName,ip,集群名等等
    public String register(HttpServletRequest request) throws Exception {
    
    
        //拿到注册的服务的:namespaceId,默认是public
        final String namespaceId = WebUtils
                .optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
        //拿到注册的服务的:serviceName服务名会把组名加在前面,比如:DEFAULT_GROUP@@nacos-client
        final String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
        //检查服务名的格式:groupName@@serviceName
        NamingUtils.checkServiceNameFormat(serviceName);
        //解析请求参数,封装服务实例对戏,把注册的服务封装为Instance,其中包括IP,端口,服务名等
        final Instance instance = parseInstance(request);
        //使用ServiceManger注册服务实例
        serviceManager.registerInstance(namespaceId, serviceName, instance);
        return "ok";
    }

	//解析要注册的服务实例
	private Instance parseInstance(HttpServletRequest request) throws Exception {
    
    
        //拿到服务名 DEFAULT_GROUP@@nacos-client
        String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
        //拿到app,没配置就是:unknown
        String app = WebUtils.optional(request, "app", "DEFAULT");
        //拿到注册服务的:IP,是否开启服务,权重,健康状况,等封装为Instance 对象
        Instance instance = getIpAddress(request);
        instance.setApp(app);
        instance.setServiceName(serviceName);
        // Generate simple instance id first. This value would be updated according to
        // 生成实例的ID:192.168.174.1#8080#DEFAULT#DEFAULT_GROUP@@nacos-client
        instance.setInstanceId(instance.generateInstanceId());
        //设置最后的心跳时间为当前时间
        instance.setLastBeat(System.currentTimeMillis());
        String metadata = WebUtils.optional(request, "metadata", StringUtils.EMPTY);
        if (StringUtils.isNotEmpty(metadata)) {
    
    
            instance.setMetadata(UtilsAndCommons.parseMetadata(metadata));
        }
        //验证实例
        instance.validate();
        
        return instance;
    }

In the register method, the registered parameters such as IP, whether to enable the service, weight, health status, etc. will be obtained from the request object, and then encapsulated into an instance object, and handed over to serviceManager.registerInstance to register. The following is the source code of serviceManager.registerInstance

Cache and initialize serivce

@Component
public class ServiceManager implements RecordListener<Service> {
    
    
    
    /**
     * Map(namespace, Map(group::serviceName, Service)).
     */
    private final Map<String, Map<String, Service>> serviceMap = new ConcurrentHashMap<>();
  
   ...省略部分代码...
	//注册服务实例
	public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
    
    
	        //1.会尝试从serviceMap(服务注册表)中获取到服务实例,如果没有就会创建一个Service,
	        // 并设置好属性:GroupName,namespaceId,serviceName。然后存储到ServiceManager的一个ConcurrentHashMap中
	        // 服务注册表的结构是Map<String,Map<String,Service>>
	        createEmptyService(namespaceId, serviceName, instance.isEphemeral());
	        //从注册表中获取服务,注册表是一个Map<String,Map<String,Service>>结构,
	        // 先根据namespaceId取得到Map<String,Service>,然后再根据serviceName取Service
	        Service service = getService(namespaceId, serviceName);
	        //参数无效,没有找到服务
	        if (service == null) {
    
    
	            throw new NacosException(NacosException.INVALID_PARAM,
	                    "service not found, namespace: " + namespaceId + ", service: " + serviceName);
	        }
	        //添加 instance 服务实例到注册表
	        addInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
	    }
	    
	    ...省略部分代码...
	    
		//二.创建service,并初始化
		public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster)
		            throws NacosException {
    
    
	        Service service = getService(namespaceId, serviceName);
	        //如果服务不存在就创建一个service
	        if (service == null) {
    
    
	            
	            Loggers.SRV_LOG.info("creating empty service {}:{}", namespaceId, serviceName);
	            service = new Service();
	            service.setName(serviceName);
	            service.setNamespaceId(namespaceId);
	            service.setGroupName(NamingUtils.getGroupName(serviceName));
	            // now validate the service. if failed, exception will be thrown
	            service.setLastModifiedMillis(System.currentTimeMillis());
	            service.recalculateChecksum();
	            if (cluster != null) {
    
    
	                cluster.setService(service);
	                service.getClusterMap().put(cluster.getName(), cluster);
	            }
	            service.validate();
	            //保存service和初始化service
	            putServiceAndInit(service);
	            if (!local) {
    
    
	                addOrReplaceService(service);
	            }
	        }
	    }
		//保存service和初始化service
		private void putServiceAndInit(Service service) throws NacosException {
    
    
				//保存service
		        putService(service);
		        service = getService(service.getNamespaceId(), service.getName());
		        //初始化service
		        service.init();
		        //consistencyService.listen实现数据一致性监听
		        consistencyService
		                .listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service);
		        consistencyService
		                .listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service);
		        Loggers.SRV_LOG.info("[NEW-SERVICE] {}", service.toJson());
		    }
		
	  	//保存service到注册表中
	  	public void putService(Service service) {
    
    
	        if (!serviceMap.containsKey(service.getNamespaceId())) {
    
    
	            synchronized (putServiceLock) {
    
    
	                if (!serviceMap.containsKey(service.getNamespaceId())) {
    
    
	                    serviceMap.put(service.getNamespaceId(), new ConcurrentSkipListMap<>());
	                }
	            }
	        }
	        //把注册的服务存储到Map中
	        serviceMap.get(service.getNamespaceId()).putIfAbsent(service.getName(), service);
	    }

registerInstance does three things

  • Cache the service to memory through the putService() method

  • service.init() establishes a heartbeat mechanism

  • consistencyService.listen implements data consistency monitoring

The registerInstance method will try to get the service instance from ServiceManager#serviceMap (service registry), if not, it will create a Service and set the attributes: GroupName, namespaceId, serviceName. Then store it in ServiceManager#serviceMap.

The Map is a ConcurrentHashMap with a structure of Map<String,Map<String,Service>>. The first key is NamespaceId, such as: public, and the second key is the service name, such as: DEFAULT_GROUP@@nacos-client

insert image description here

This is the service registry in nacos, which is used to store the Map of registered service instances.

insert image description here
Note: The relationship between service and instance is that a service contains a Map<String, Cluster>, and a Cluster contains a Set.

  • service represents a service: such as user service
  • Cluster represents a service cluster, for example, two user services form a cluster
  • And there are multiple service instances in a cluster, so there is a Set in the Cluster to save the service instances

In addition, the com.alibaba.nacos.naming.core.Service#init method will be called to initialize the service. The following is the source code of the init method

public void init() {
    
    
		//clientBeatCheckTask 是一个Runnable,它持有service,它的作用是
       //检查并更新临时实例的状态,如果它们已过期,则将其删除
        HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
        for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) {
    
    
            entry.getValue().setService(this);
            entry.getValue().init();
        }
    }
//定时任务:定时检查服务的健康状况,5S一次
 public static void scheduleCheck(ClientBeatCheckTask task) {
    
    
        futureMap.computeIfAbsent(task.taskKey(),
                k -> GlobalExecutor.scheduleNamingHealth(task, 5000, 5000, TimeUnit.MILLISECONDS));
    }

The initialization method of service.init mainly encapsulates the service into the ClientBeatCheckTask object. ClientBeatCheckTask is a Runnable thread object, and then uses the scheduled task 5s to perform a health check. What ClientBeatCheckTask does is: Check and update the state of temporary instances, and delete them if they have expired

The following is the source code of com.alibaba.nacos.naming.healthcheck.ClientBeatCheckTask#run thread object

public void run() {
    
    
        try {
    
    
            if (!getDistroMapper().responsible(service.getName())) {
    
    
                return;
            }
            
            if (!getSwitchDomain().isHealthCheckEnabled()) {
    
    
                return;
            }
            //拿到服务中的所有实例
            List<Instance> instances = service.allIPs(true);
            
            // first set health status of instances:
            for (Instance instance : instances) {
    
    
            	//当前系统时间 - 实例最后心跳时间 > 默认15s,就意味着超时
                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
    
    
                    if (!instance.isMarked()) {
    
    
                        if (instance.isHealthy()) {
    
    
                        	//健康状态设置为false
                            instance.setHealthy(false);
                            Loggers.EVT_LOG
                                    .info("{POS} {IP-DISABLED} valid: {}:{}@{}@{}, region: {}, msg: client timeout after {}, last beat: {}",
                                            instance.getIp(), instance.getPort(), instance.getClusterName(),
                                            service.getName(), UtilsAndCommons.LOCALHOST_SITE,
                                            instance.getInstanceHeartBeatTimeOut(), instance.getLastBeat());
                             //发布时间:服务状态改变
                            getPushService().serviceChanged(service);
                            //发布时间:服务实例心跳超时事件
                            ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
                        }
                    }
                }
            }
            
            if (!getGlobalConfig().isExpireInstance()) {
    
    
                return;
            }
            
            // then remove obsolete instances:
            for (Instance instance : instances) {
    
    
                
                if (instance.isMarked()) {
    
    
                    continue;
                }
                
                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
    
    
                    // delete instance
                    Loggers.SRV_LOG.info("[AUTO-DELETE-IP] service: {}, ip: {}", service.getName(),
                            JacksonUtils.toJson(instance));
                    deleteIp(instance);
                }
            }
            
        } catch (Exception e) {
    
    
            Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);
        }
        
    }

In the run method, all instances of the current service will be obtained, and then looped. If: the current system time - the last heartbeat time of the instance > the default 15s, it means timeout, and then the Healthy status of the instance will be changed to false; and the service instance will be thrown heartbeat timeout event

getPushService().serviceChanged(service): The method is very interesting. Its function is to notify nacos-client that the service has gone offline (UDP protocol push), so that nacos-client will remove the offline service locally. This is where it is different from eureka. Eureka uses pull. Nacos uses pull + push mode. See the specific source code: PushService#onApplicationEvent

 public void onApplicationEvent(ServiceChangeEvent event) {
    
    
        Service service = event.getService();
        String serviceName = service.getName();
        String namespaceId = service.getNamespaceId();
        //使用定时任务 1s 一次
        Future future = GlobalExecutor.scheduleUdpSender(() -> {
    
    
            try {
    
    
            	//服务改变,添加到 push队列
                Loggers.PUSH.info(serviceName + " is changed, add it to push queue.");
                ConcurrentMap<String, PushClient> clients = clientMap
                        .get(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName));
                if (MapUtils.isEmpty(clients)) {
    
    
                    return;
                }
                
                Map<String, Object> cache = new HashMap<>(16);
                long lastRefTime = System.nanoTime();
                for (PushClient client : clients.values()) {
    
    
                    if (client.zombie()) {
    
    
                        Loggers.PUSH.debug("client is zombie: " + client.toString());
                        clients.remove(client.toString());
                        Loggers.PUSH.debug("client is zombie: " + client.toString());
                        continue;
                    }
                    
                    Receiver.AckEntry ackEntry;
                    Loggers.PUSH.debug("push serviceName: {} to client: {}", serviceName, client.toString());
                    String key = getPushCacheKey(serviceName, client.getIp(), client.getAgent());
                    byte[] compressData = null;
                    Map<String, Object> data = null;
                    if (switchDomain.getDefaultPushCacheMillis() >= 20000 && cache.containsKey(key)) {
    
    
                        org.javatuples.Pair pair = (org.javatuples.Pair) cache.get(key);
                        compressData = (byte[]) (pair.getValue0());
                        data = (Map<String, Object>) pair.getValue1();
                        
                        Loggers.PUSH.debug("[PUSH-CACHE] cache hit: {}:{}", serviceName, client.getAddrStr());
                    }
                    
                    if (compressData != null) {
    
    
                        ackEntry = prepareAckEntry(client, compressData, data, lastRefTime);
                    } else {
    
    
                        ackEntry = prepareAckEntry(client, prepareHostsData(client), lastRefTime);
                        if (ackEntry != null) {
    
    
                            cache.put(key, new org.javatuples.Pair<>(ackEntry.origin.getData(), ackEntry.data));
                        }
                    }
                    
                    Loggers.PUSH.info("serviceName: {} changed, schedule push for: {}, agent: {}, key: {}",
                            client.getServiceName(), client.getAddrStr(), client.getAgent(),
                            (ackEntry == null ? null : ackEntry.key));
                    //使用udp协议push
                    udpPush(ackEntry);
                }
            } catch (Exception e) {
    
    
                Loggers.PUSH.error("[NACOS-PUSH] failed to push serviceName: {} to client, error: {}", serviceName, e);
                
            } finally {
    
    
                futureMap.remove(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName));
            }
            
        }, 1000, TimeUnit.MILLISECONDS);
        
        futureMap.put(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName), future);
        
    }

add instance

At this point, the cache and initialization of the service are finished, and the code returns to com.alibaba.nacos.naming.core.ServiceManager#registerInstance. The next step is to analyze the addInstance method

//添加一个instance到Add instance to service.
public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips)
            throws NacosException {
    
    
        //拿到key: com.alibaba.nacos.naming.iplist.ephemeral.public##DEFAULT_GROUP@@nacos-client
        String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
        //拿到service
        Service service = getService(namespaceId, serviceName);
        //对service加同步锁,避免并发修改
        synchronized (service) {
    
    
        	//拿到该service中的所有instance
            List<Instance> instanceList = addIpAddresses(service, ephemeral, ips);
            //把实例列表封装到Instances 对象中
            Instances instances = new Instances();
            instances.setInstanceList(instanceList);
            //调用consistencyService.put()方法完成Nacos集群的数据同步,保证集群一致性
            consistencyService.put(key, instances);
        }
    }

List<Instance>In the addInstance method , the list of instances in the service will be obtained , and then set to Instances, and the consistencyService will be called to synchronize to the nacos cluster.

The CopyOnWrite scheme is used here. For the addIPAddress method, the old instance list will be copied and added to the new instance list. After the nacos cluster is synchronized and the instance status is updated, the old instance list will be directly overwritten with the new list. During the update process, the old instance list is not affected, and users can still read it.

In this way, in the process of updating the list state, there is no need to block the user's read operation, and it will not cause the user to read dirty data, and the performance is better. This scheme is called CopyOnWrite scheme

ConsistencyService is used for service synchronization. An interface representing cluster consistency.

insert image description here

Let's take a look at the consistencyService.put method. The underlying layer will call the DistroConsistencyServiceImpl#put method. The source code is as follows

@Override
public void put(String key, Record value) throws NacosException {
    
    
    //根据key确定是用ephemeralConsistencyService或者persistentConsistencyService
    mapConsistencyService(key).put(key, value);
}

private ConsistencyService mapConsistencyService(String key) {
    
    
		//key以 ephemeral 开头就是临时实例
		// 临时实例选择 ephemeralConsistencyService,也就是 DistroConsistencyServiceImpl类
    	//  持久实例选择 persistentConsistencyService,也就是PersistentConsistencyServiceDelegateImpl
        return KeyBuilder.matchEphemeralKey(key) ? ephemeralConsistencyService : persistentConsistencyService;
    }

//初始化方法,
@PostConstruct
public void init() {
    
    
		//把notifier提交给线程池
    GlobalExecutor.submitDistroNotifyTask(notifier);
}
    
@Override
public void put(String key, Record value) throws NacosException {
    
    
		//把实例保存到本地实例表
        onPut(key, value);
        //使用distro协议同步到集群
        distroProtocol.sync(new DistroKey(key, KeyBuilder.INSTANCE_LIST_KEY_PREFIX), DataOperation.CHANGE,
                globalConfig.getTaskDispatchPeriod() / 2);
}




In the put method, the temporary synchronization service ephemeralConsistencyService or the persistent synchronization service persistentConsistencyService will be judged first based on the service key. Then it will do 2 things

  • Call onPut: Save the instance to the local instance list.
  • Call distroProtocol.sync to synchronize the instance to the cluster

Update service list

Two things are done in the onPut method.

  • One is to encapsulate the instance into a Datum object, and then hand it over to the dataStore for storage.
  • The other is to put the key into the blocking queue through notifier.addTask, and then execute the blocking queue asynchronously through the thread pool
 public void onPut(String key, Record value) {
    
    
        //判断是否是临时实例
        if (KeyBuilder.matchEphemeralInstanceListKey(key)) {
    
    
            Datum<Instances> datum = new Datum<>();
            datum.value = (Instances) value;
            datum.key = key;
            datum.timestamp.incrementAndGet();
            //把数据存储到dataStore,内部维护了一个Map
            dataStore.put(key, datum);
        }
        
        if (!listeners.containsKey(key)) {
    
    
            return;
        }
        //这里是把key放入一个阻塞队列,然后会用线程池异步去执行队列
        notifier.addTask(key, DataOperation.CHANGE);
    }

 public class Notifier implements Runnable {
    
    
        
        private ConcurrentHashMap<String, String> services = new ConcurrentHashMap<>(10 * 1024);
       
        //一个阻塞队列
        private BlockingQueue<Pair<String, DataOperation>> tasks = new ArrayBlockingQueue<>(1024 * 1024);
		public void addTask(String datumKey, DataOperation action) {
    
    
            
            if (services.containsKey(datumKey) && action == DataOperation.CHANGE) {
    
    
                return;
            }
            if (action == DataOperation.CHANGE) {
    
    
            	//如果是change,就把key放入一个map中
                services.put(datumKey, StringUtils.EMPTY);
            }
            //加入阻塞队列
            tasks.offer(Pair.with(datumKey, action));
        }

		 @Override
         public void run() {
    
    
            Loggers.DISTRO.info("distro notifier started");
            
            for (; ; ) {
    
    
                try {
    
    
                	//从阻塞队列中取出任务
                    Pair<String, DataOperation> pair = tasks.take();
                    //处理任务更新服务列表
                    handle(pair);
                } catch (Throwable e) {
    
    
                    Loggers.DISTRO.error("[NACOS-DISTRO] Error while handling notifying task", e);
                }
            }
        }
        

Notifier is a Runnable, which maintains a task (ArrayBlockingQueue) to store the change event of the service list. His run method is an infinite loop, constantly taking out tasks from the blocking queue and handing them over to the handle method for processing. Below is the DistroConsistencyServiceImpl.Notifier#handle method

private void handle(Pair<String, DataOperation> pair) {
    
    
            try {
    
    
                String datumKey = pair.getValue0();
                DataOperation action = pair.getValue1();
                
                services.remove(datumKey);
                
                int count = 0;

                ConcurrentLinkedQueue<RecordListener> recordListeners = listeners.get(datumKey);
                if (recordListeners == null) {
    
    
                    Loggers.DISTRO.info("[DISTRO-WARN] RecordListener not found, key: {}", datumKey);
                    return;
                }
                //拿到有change的service,RecordListener 就是 service的接口
                for (RecordListener listener : recordListeners) {
    
    
                    
                    count++;
                    
                    try {
    
    
                    	//如果是change事件
                        if (action == DataOperation.CHANGE) {
    
    
                        	//取出服务
                            Datum datum = dataStore.get(datumKey);
                            if (datum != null) {
    
    
                            	//执行linster的change事件。更新服务列表
                                listener.onChange(datumKey, datum.value);
                            } else {
    
    
                                Loggers.DISTRO.info("[DISTRO-WARN] data not found, key: {}", datumKey);
                            }
                            continue;
                        }
                        //处理服务的delete事件
                        if (action == DataOperation.DELETE) {
    
    
                            listener.onDelete(datumKey);
                            continue;
                        }
                    } catch (Throwable e) {
    
    
                        Loggers.DISTRO.error("[NACOS-DISTRO] error while notifying listener of key: {}", datumKey, e);
                    }
                }
                
                if (Loggers.DISTRO.isDebugEnabled()) {
    
    
                    Loggers.DISTRO
                            .debug("[NACOS-DISTRO] datum change notified, key: {}, listener count: {}, action: {}",
                                    datumKey, count, action.name());
                }
            } catch (Throwable e) {
    
    
                Loggers.DISTRO.error("[NACOS-DISTRO] Error while handling notifying task", e);
            }
        }

In the handle method, you will find a changed RecordListener, which is actually a service (change or delete event). Then, triggering the onChange method is actually calling the com.alibaba.nacos.naming.core.Service#onChange method.

 public void onChange(String key, Instances value) throws Exception {
    
    
        
        Loggers.SRV_LOG.info("[NACOS-RAFT] datum is changed, key: {}, value: {}", key, value);
        //遍历service中的所有实例instance
        for (Instance instance : value.getInstanceList()) {
    
    
            
            if (instance == null) {
    
    
                // Reject this abnormal instance list:
                throw new RuntimeException("got null instance " + key);
            }
            
            if (instance.getWeight() > 10000.0D) {
    
    
            	//设置权重
                instance.setWeight(10000.0D);
            }
            
            if (instance.getWeight() < 0.01D && instance.getWeight() > 0.0D) {
    
    
                instance.setWeight(0.01D);
            }
        }
        //修改IP
        updateIPs(value.getInstanceList(), KeyBuilder.matchEphemeralInstanceListKey(key));
        
        recalculateChecksum();
    }

In this method, updateIPS will be called to update the service instance. The source code is as follows

public void updateIPs(Collection<Instance> instances, boolean ephemeral) {
    
    
    // 准备一个HashMap,key是cluster,值是集群下的Instance集合
    Map<String, List<Instance>> ipMap = new HashMap<>(clusterMap.size());
    // 获取集群名称存储到map中,key是集群名
    for (String clusterName : clusterMap.keySet()) {
    
    
        ipMap.put(clusterName, new ArrayList<>());
    }
    // 遍历要更新的实例
    for (Instance instance : instances) {
    
    
        try {
    
    
            if (instance == null) {
    
    
                Loggers.SRV_LOG.error("[NACOS-DOM] received malformed ip: null");
                continue;
            }
			// 判断实例是否包含clusterName,没有的话用默认cluster
            if (StringUtils.isEmpty(instance.getClusterName())) {
    
    
                instance.setClusterName(UtilsAndCommons.DEFAULT_CLUSTER_NAME);
            }
			// 判断cluster是否存在,不存在则创建新的cluster
            if (!clusterMap.containsKey(instance.getClusterName())) {
    
    
                Loggers.SRV_LOG
                    .warn("cluster: {} not found, ip: {}, will create new cluster with default configuration.",
                          instance.getClusterName(), instance.toJson());
                Cluster cluster = new Cluster(instance.getClusterName(), this);
                cluster.init();
                getClusterMap().put(instance.getClusterName(), cluster);
            }
			// 获取当前cluster实例的集合,不存在则创建新的
            List<Instance> clusterIPs = ipMap.get(instance.getClusterName());
            if (clusterIPs == null) {
    
    
                clusterIPs = new LinkedList<>();
                ipMap.put(instance.getClusterName(), clusterIPs);
            }
			// 添加新的实例到 Instance 集合
            clusterIPs.add(instance);
        } catch (Exception e) {
    
    
            Loggers.SRV_LOG.error("[NACOS-DOM] failed to process ip: " + instance, e);
        }
    }

    for (Map.Entry<String, List<Instance>> entry : ipMap.entrySet()) {
    
    
        //make every ip mine
        List<Instance> entryIPs = entry.getValue();
        // 这里就是在更新注册表
        clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral);
    }
	//设置最后修改时间
    setLastModifiedMillis(System.currentTimeMillis());
    // 发布服务变更的通知消息
    getPushService().serviceChanged(this);
    StringBuilder stringBuilder = new StringBuilder();

    for (Instance instance : allIPs()) {
    
    
        stringBuilder.append(instance.toIpAddr()).append("_").append(instance.isHealthy()).append(",");
    }

    Loggers.EVT_LOG.info("[IP-UPDATED] namespace: {}, service: {}, ips: {}", getNamespaceId(), getName(),
                         stringBuilder.toString());

}

In the above code, clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral); is updating the service registry, because service#clusterMap is a Map<String, Cluster> structure, and the cluster is the service instance. Then call .Cluster#updateIps to update the instance. The source code is as follows

public void updateIps(List<Instance> ips, boolean ephemeral) {
    
    

    Set<Instance> toUpdateInstances = ephemeral ? ephemeralInstances : persistentInstances;
	//拿到旧的服务列表,
    HashMap<String, Instance> oldIpMap = new HashMap<>(toUpdateInstances.size());
	
    for (Instance ip : toUpdateInstances) {
    
    
        oldIpMap.put(ip.getDatumKey(), ip);
    }
    ...省略部分代码...

	// 检查新加入实例的状态
    List<Instance> newIPs = subtract(ips, oldIpMap.values());
    
    if (newIPs.size() > 0) {
    
    
        Loggers.EVT_LOG
            .info("{} {SYNC} {IP-NEW} cluster: {}, new ips size: {}, content: {}", getService().getName(),
                  getName(), newIPs.size(), newIPs.toString());

        for (Instance ip : newIPs) {
    
    
       		 //重置服务的健康状态
            HealthCheckStatus.reset(ip);
        }
    }
	// 移除要删除的实例
    List<Instance> deadIPs = subtract(oldIpMap.values(), ips);

    if (deadIPs.size() > 0) {
    
    
        Loggers.EVT_LOG
            .info("{} {SYNC} {IP-DEAD} cluster: {}, dead ips size: {}, content: {}", getService().getName(),
                  getName(), deadIPs.size(), deadIPs.toString());

        for (Instance ip : deadIPs) {
    
    
        	//移除
            HealthCheckStatus.remv(ip);
        }
    }

    toUpdateInstances = new HashSet<>(ips);
	
    if (ephemeral) {
    
    
    // 直接覆盖旧实例列表
        ephemeralInstances = toUpdateInstances;
    } else {
    
    
        persistentInstances = toUpdateInstances;
    }
}

Synchronize services to the cluster

Next, go back to the DistroConsistencyServiceImpl#put method. Just said that this method does 2 things

  • onPut(key, value) : update service list
  • distroProtocol.sync : Synchronize services to the cluster

Let's now take a look at how the sync method works. The following is the source code of the method

/**
     * 开始同步数据到所有的远程服务
     * Start to sync data to all remote server.
     *
     * @param distroKey distro key of sync data
     * @param action    the action of data operation
     */
    public void sync(DistroKey distroKey, DataOperation action, long delay) {
    
    
        //拿到除开自己以外的所有nacos集群中的成员
        for (Member each : memberManager.allMembersWithoutSelf()) {
    
    
            //构建一个key
            DistroKey distroKeyWithTarget = new DistroKey(distroKey.getResourceKey(), distroKey.getResourceType(),
                    each.getAddress());
            //构建一个延迟任务对象
            DistroDelayTask distroDelayTask = new DistroDelayTask(distroKeyWithTarget, action, delay);
            //交给线程池去执行,维护了一个DistroDelayTaskExecuteEngine
            //任务交给 NacosDelayTaskExecuteEngine 引擎 其中维护了一个ScheduledExecutorService线程池
            distroTaskEngineHolder.getDelayTaskExecuteEngine().addTask(distroKeyWithTarget, distroDelayTask);
            if (Loggers.DISTRO.isDebugEnabled()) {
    
    
                Loggers.DISTRO.debug("[DISTRO-SCHEDULE] {} to {}", distroKey, each.getAddress());
            }
        }
    }

In this method, all members of the nacos cluster (except yourself) will be found, and then the service key (DistroKey) will be obtained to build a DistroDelayTask task object and handed over to the thread pool for synchronization.

A DelayTaskExecuteEngine delay task execution engine NacosDelayTaskExecuteEngine is maintained here. The task execution is completed through the engine's processTasks method com.alibaba.nacos.common.task.engine.NacosDelayTaskExecuteEngine#processTasks

protected void processTasks() {
    
    
		//拿到所有任务
        Collection<Object> keys = getAllTaskKeys();
        for (Object taskKey : keys) {
    
    
            AbstractDelayTask task = removeTask(taskKey);
            if (null == task) {
    
    
                continue;
            }
            //任务执行器
            NacosTaskProcessor processor = getProcessor(taskKey);
            if (null == processor) {
    
    
                getEngineLog().error("processor not found for task, so discarded. " + task);
                continue;
            }
            try {
    
    
                // ReAdd task if process failed
                //执行任务,任务失败会重试
                if (!processor.process(task)) {
    
    
                    retryFailedTask(taskKey, task);
                }
            } catch (Throwable e) {
    
    
                getEngineLog().error("Nacos task execute error : " + e.toString(), e);
                //重试失败的任务
                retryFailedTask(taskKey, task);
            }
        }
    }

Summarize

The article is a bit long, so let’s make a summary below. From the perspective of the large process, it is divided into the following steps

  1. instanceController interface: After receiving the registration request, the nacos service point will parse the request into Instance, and then execute the serviceManager#registerInstance method to register the instance
  2. The serviceManager#registerInstance method will first try to create a Service object, and cache it in a service registry with a Map<String, Map<String, Service>> structure, and then initialize each service, mainly using the thread pool to check once every 10s Whether the service is in a healthy state, and expired services will be deleted.
  3. The second thing of serviceManager#registerInstance is to execute the addInstances method to add instances, which will trigger the update of the service list and synchronize the service to other nacos clusters.

insert image description here
This is the end of the article, if the article is helpful to you, please give a good review, your encouragement is my biggest motivation

Guess you like

Origin blog.csdn.net/u014494148/article/details/127991145