Spring Cloud Nacos source code explanation (9) - Nacos client local cache and failover

Nacos client local cache and failover

​ Sometimes some faults will inevitably occur when Nacos local cache, these faults need to be dealt with, the core classes involved are ServiceInfoHolder and FailoverReactor.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-0tebMgFe-1677029918442)(image-20211027191504884.png)]

​ There are two aspects to local caching. The first aspect is that the instance information obtained from the registration center will be cached in memory, that is, carried in the form of Map, so that query operations are convenient. The second aspect is to cache them regularly in the form of disk files in case of emergency.

​ Failover is also divided into two aspects. The first aspect is that the failover switch is marked by the file; the second aspect is that after the failover is enabled, when a failure occurs, the service instance can be obtained from the failover backup file information.

ServiceInfoHolder function overview

​ The ServiceInfoHolder class, as the name suggests, is the holder of service information. This class is called every time the client obtains new service information from the registration center, and the processServiceInfo method is used for localization processing, including updating cache services, publishing events, updating local files, etc.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-cuBsv384-1677029918446)(image-20211027152442627.png)]

​ In addition to these core functions, when this class is instantiated, it also performs operations such as local cache directory initialization and failover initialization. Let's analyze it below.

Local memory cache of ServiceInfo

​ ServiceInfo, registration service information, which includes service name, group name, cluster information, instance list information, last update time, etc., so we conclude that the information obtained by the client from the server registration center is local Use ServiceInfo as the bearer.

​ And the ServiceInfoHolder class holds ServiceInfo, which is stored through a ConcurrentMap

// ServiceInfoHolder
private final ConcurrentMap<String, ServiceInfo> serviceInfoMap;

​ This is the first layer of the Nacos client’s cache of the registration information obtained by the server, and when we analyzed the processServiceInfo method in the previous course, we have seen that when the service information changes, the information in the ServiceInfoMap will be updated as soon as possible

public ServiceInfo processServiceInfo(ServiceInfo serviceInfo) {
    
    
 	....
    //缓存服务信息
    serviceInfoMap.put(serviceInfo.getKey(), serviceInfo);
    // 判断注册的实例信息是否更改
    boolean changed = isChangedServiceInfo(oldService, serviceInfo);
    if (StringUtils.isBlank(serviceInfo.getJsonFromServer())) {
    
    
        serviceInfo.setJsonFromServer(JacksonUtils.toJson(serviceInfo));
    }
    ....
    return serviceInfo;
}

​ The use of serviceInfoMap is like this, when the change instance puts the latest data into it. When using an instance, just perform a get operation based on the key.

​ serviceInfoMap is initialized in the constructor of ServiceInfoHolder, and an empty ConcurrentMap is created by default. But when it is configured to read information from the cache file at startup, it will be loaded from the local cache.

public ServiceInfoHolder(String namespace, Properties properties) {
    
    
    initCacheDir(namespace, properties);
    // 启动时是否从缓存目录读取信息,默认false。
    if (isLoadCacheAtStart(properties)) {
    
    
        this.serviceInfoMap = new ConcurrentHashMap<String, ServiceInfo>(DiskCache.read(this.cacheDir));
    } else {
    
    
        this.serviceInfoMap = new ConcurrentHashMap<String, ServiceInfo>(16);
    }
    this.failoverReactor = new FailoverReactor(this, cacheDir);
    this.pushEmptyProtection = isPushEmptyProtect(properties);
}

​ Here we should pay attention to the local cache directory . In our study in the last lesson, we know that in the processServiceInfo method, when the service instance changes, we will see that the ServiceInfo information is written to the directory through the DiskCache#write method.

public ServiceInfo processServiceInfo(ServiceInfo serviceInfo) {
    
    
   	.....
    // 服务实例已变更
    if (changed) {
    
    
        NAMING_LOGGER.info("current ips:({}) service: {} -> {}", serviceInfo.ipCount(), serviceInfo.getKey(),
                           JacksonUtils.toJson(serviceInfo.getHosts()));
        // 添加实例变更事件InstancesChangeEvent,订阅者
        NotifyCenter.publishEvent(new InstancesChangeEvent(serviceInfo.getName(), serviceInfo.getGroupName(),
                                                           serviceInfo.getClusters(), serviceInfo.getHosts()));
        // 记录Service本地文件
        DiskCache.write(serviceInfo, cacheDir);
    }
    return serviceInfo;
}

local cache directory

​ The local cache directory cacheDir is a property of ServiceInfoHolder, which is used to specify the root directory of the local cache and the root directory of failover.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-iNCn2sip-1677029918448)(image-20211027163722859.png)]

​ In the constructor of ServiceInfoHolder, initialize and generate the cache directory

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-RJQq3COb-1677029918449)(image-20211027163940247.png)]

​ This initCacheDir does not need to be read carefully, it is the operation of generating the cache directory, the default path: ${user.home}/nacos/naming/public, can also be customized, through System.setProperty("JM.SNAPSHOT.PATH" )customize

​ After the directory is initialized here, the failover information is also stored in this directory.

private void initCacheDir(String namespace, Properties properties) {
    
    
    String jmSnapshotPath = System.getProperty(JM_SNAPSHOT_PATH_PROPERTY);

    String namingCacheRegistryDir = "";
    if (properties.getProperty(PropertyKeyConst.NAMING_CACHE_REGISTRY_DIR) != null) {
    
    
        namingCacheRegistryDir = File.separator + properties.getProperty(PropertyKeyConst.NAMING_CACHE_REGISTRY_DIR);
    }

    if (!StringUtils.isBlank(jmSnapshotPath)) {
    
    
        cacheDir = jmSnapshotPath + File.separator + FILE_PATH_NACOS + namingCacheRegistryDir
            + File.separator + FILE_PATH_NAMING + File.separator + namespace;
    } else {
    
    
        cacheDir = System.getProperty(USER_HOME_PROPERTY) + File.separator + FILE_PATH_NACOS + namingCacheRegistryDir
            + File.separator + FILE_PATH_NAMING + File.separator + namespace;
    }
}

failover

​ In the constructor of ServiceInfoHolder, a FailoverReactor class is also initialized, which is also a member variable of ServiceInfoHolder. The role of FailoverReactor is to handle failover.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-mmQqhDeE-1677029918451)(image-20211027170511242.png)]

public ServiceInfoHolder(String namespace, Properties properties) {
    ....
    // this为ServiceHolder当前对象,这里可以立即为两者相互持有对方的引用
    this.failoverReactor = new FailoverReactor(this, cacheDir);
    .....
}

​ Let's take a look at the construction method of FailoverReactor. The construction method of FailoverReactor basically shows all its functions:

1. 持有ServiceInfoHolder的引用
2. 拼接故障目录:${user.home}/nacos/naming/public/failover,其中public也有可能是其他的自定义命名空间
3. 初始化executorService(执行者服务)
4. init方法:通过executorService开启多个定时任务执行
public FailoverReactor(ServiceInfoHolder serviceInfoHolder, String cacheDir) {
    
    
    // 持有ServiceInfoHolder的引用
    this.serviceInfoHolder = serviceInfoHolder;
    // 拼接故障目录:${user.home}/nacos/naming/public/failover
    this.failoverDir = cacheDir + FAILOVER_DIR;
    // 初始化executorService
    this.executorService = new ScheduledThreadPoolExecutor(1, new ThreadFactory() {
    
    
        @Override
        public Thread newThread(Runnable r) {
    
    
            Thread thread = new Thread(r);
            // 守护线程模式运行
            thread.setDaemon(true);
            thread.setName("com.alibaba.nacos.naming.failover");
            return thread;
        }
    });
    // 其他初始化操作,通过executorService开启多个定时任务执行
    this.init();
}

init method execution

​ In this method, three timing tasks are started, and these three tasks are actually the internal classes of FailoverReactor:

1. 初始化立即执行,执行间隔5秒,执行任务SwitchRefresher
2. 初始化延迟30分钟执行,执行间隔24小时,执行任务DiskFileWriter
3. 初始化立即执行,执行间隔10秒,执行核心操作为DiskFileWriter
public void init() {
    
    
	// 初始化立即执行,执行间隔5秒,执行任务SwitchRefresher
    executorService.scheduleWithFixedDelay(new SwitchRefresher(), 0L, 5000L, TimeUnit.MILLISECONDS);
	// 初始化延迟30分钟执行,执行间隔24小时,执行任务DiskFileWriter
    executorService.scheduleWithFixedDelay(new DiskFileWriter(), 30, DAY_PERIOD_MINUTES, TimeUnit.MINUTES);

    // backup file on startup if failover directory is empty.
    // 如果故障目录为空,启动时立即执行,立即备份文件
    // 初始化立即执行,执行间隔10秒,执行核心操作为DiskFileWriter
    executorService.schedule(new Runnable() {
    
    
        @Override
        public void run() {
    
    
            try {
    
    
                File cacheDir = new File(failoverDir);

                if (!cacheDir.exists() && !cacheDir.mkdirs()) {
    
    
                    throw new IllegalStateException("failed to create cache dir: " + failoverDir);
                }

                File[] files = cacheDir.listFiles();
                if (files == null || files.length <= 0) {
    
    
                    new DiskFileWriter().run();
                }
            } catch (Throwable e) {
    
    
                NAMING_LOGGER.error("[NA] failed to backup file on startup.", e);
            }

        }
    }, 10000L, TimeUnit.MILLISECONDS);
}

​ Here we look at DiskFileWriter first. The logic here is not difficult. It is to obtain the ServiceInfo cached in ServiceInfo and judge whether it is satisfied to be written to disk. Both are DiskFileWriter, but the third scheduled task has a pre-judgment, as long as the file does not exist, it will be executed immediately to write the file to the local disk.

class DiskFileWriter extends TimerTask {
    
    

    @Override
    public void run() {
    
    
        Map<String, ServiceInfo> map = serviceInfoHolder.getServiceInfoMap();
        for (Map.Entry<String, ServiceInfo> entry : map.entrySet()) {
    
    
            ServiceInfo serviceInfo = entry.getValue();
            if (StringUtils.equals(serviceInfo.getKey(), UtilAndComs.ALL_IPS) || StringUtils
                .equals(serviceInfo.getName(), UtilAndComs.ENV_LIST_KEY) || StringUtils
                .equals(serviceInfo.getName(), UtilAndComs.ENV_CONFIGS) || StringUtils
                .equals(serviceInfo.getName(), UtilAndComs.VIP_CLIENT_FILE) || StringUtils
                .equals(serviceInfo.getName(), UtilAndComs.ALL_HOSTS)) {
    
    
                continue;
            }
			// 将缓存写入磁盘
            DiskCache.write(serviceInfo, failoverDir);
        }
    }
}

​ Next, let's look at the core implementation of the first scheduled task SwitchRefresher. The specific logic is as follows:

1. 如果故障转移文件不存在,则直接返回(文件开关)
2. 比较文件修改时间,如果已经修改,则获取故障转移文件中的内容。
3. 故障转移文件中存储了0和1标识。0表示关闭,1表示开启。
4. 当为开启状态时,执行线程FailoverFileReader。
class SwitchRefresher implements Runnable {
    
    

    long lastModifiedMillis = 0L;

    @Override
    public void run() {
    
    
        try {
    
    
            File switchFile = new File(failoverDir + UtilAndComs.FAILOVER_SWITCH);
            // 文件不存在则退出
            if (!switchFile.exists()) {
    
    
                switchParams.put(FAILOVER_MODE_PARAM, Boolean.FALSE.toString());
                NAMING_LOGGER.debug("failover switch is not found, {}", switchFile.getName());
                return;
            }

            long modified = switchFile.lastModified();
			
            if (lastModifiedMillis < modified) {
    
    
                lastModifiedMillis = modified;
                // 获取故障转移文件内容
                String failover = ConcurrentDiskUtil.getFileContent(failoverDir + UtilAndComs.FAILOVER_SWITCH,
                                                                    Charset.defaultCharset().toString());
                if (!StringUtils.isEmpty(failover)) {
    
    
                    String[] lines = failover.split(DiskCache.getLineSeparator());

                    for (String line : lines) {
    
    
                        String line1 = line.trim();
                        // 1 表示开启故障转移模式
                        if (IS_FAILOVER_MODE.equals(line1)) {
    
    
                            switchParams.put(FAILOVER_MODE_PARAM, Boolean.TRUE.toString());
                            NAMING_LOGGER.info("failover-mode is on");
                            new FailoverFileReader().run();
                        // 0 表示关闭故障转移模式
                        } else if (NO_FAILOVER_MODE.equals(line1)) {
    
    
                            switchParams.put(FAILOVER_MODE_PARAM, Boolean.FALSE.toString());
                            NAMING_LOGGER.info("failover-mode is off");
                        }
                    }
                } else {
    
    
                    switchParams.put(FAILOVER_MODE_PARAM, Boolean.FALSE.toString());
                }
            }

        } catch (Throwable e) {
    
    
            NAMING_LOGGER.error("[NA] failed to read failover switch.", e);
        }
    }
}

FailoverFileReader

​ As the name implies, the basic operation of reading the failover file is to read the content of the backup service information file stored in the failover directory , then convert it into ServiceInfo, and store all the ServiceInfo in the ServiceMap attribute of FailoverReactor.

​ The process is as follows:

1. 读取failover目录下的所有文件,进行遍历处理
2. 如果文件不存在跳过
3. 如果文件是故障转移开关标志文件跳过
4. 读取文件中的备份内容,转换为ServiceInfo对象
5. 将ServiceInfo对象放入到domMap中
6. 最后判断domMap不为空,赋值给serviceMap
class FailoverFileReader implements Runnable {
    
    

    @Override
    public void run() {
    
    
        Map<String, ServiceInfo> domMap = new HashMap<String, ServiceInfo>(16);

        BufferedReader reader = null;
        try {
    
    

            File cacheDir = new File(failoverDir);
            if (!cacheDir.exists() && !cacheDir.mkdirs()) {
    
    
                throw new IllegalStateException("failed to create cache dir: " + failoverDir);
            }

            File[] files = cacheDir.listFiles();
            if (files == null) {
    
    
                return;
            }

            for (File file : files) {
    
    
                if (!file.isFile()) {
    
    
                    continue;
                }
				// 如果是故障转移标志文件,则跳过
                if (file.getName().equals(UtilAndComs.FAILOVER_SWITCH)) {
    
    
                    continue;
                }

                ServiceInfo dom = new ServiceInfo(file.getName());

                try {
    
    
                    String dataString = ConcurrentDiskUtil
                        .getFileContent(file, Charset.defaultCharset().toString());
                    reader = new BufferedReader(new StringReader(dataString));

                    String json;
                    if ((json = reader.readLine()) != null) {
    
    
                        try {
    
    
                            dom = JacksonUtils.toObj(json, ServiceInfo.class);
                        } catch (Exception e) {
    
    
                            NAMING_LOGGER.error("[NA] error while parsing cached dom : {}", json, e);
                        }
                    }

                } catch (Exception e) {
    
    
                    NAMING_LOGGER.error("[NA] failed to read cache for dom: {}", file.getName(), e);
                } finally {
    
    
                    try {
    
    
                        if (reader != null) {
    
    
                            reader.close();
                        }
                    } catch (Exception e) {
    
    
                        //ignore
                    }
                }
                if (!CollectionUtils.isEmpty(dom.getHosts())) {
    
    
                    domMap.put(dom.getKey(), dom);
                }
            }
        } catch (Exception e) {
    
    
            NAMING_LOGGER.error("[NA] failed to read cache file", e);
        }
		
        // 读入缓存
        if (domMap.size() > 0) {
    
    
            serviceMap = domMap;
        }
    }
}

​ But there is still a problem here, where is the serviceMap used, this is actually the getServiceInfo method we used when reading the instance before

​ In fact, once the failover is enabled, the failoverReactor.getService method will be called first. This method is to obtain the ServiceInfo from the serviceMap

public ServiceInfo getService(String key) {
    
    
    ServiceInfo serviceInfo = serviceMap.get(key);

    if (serviceInfo == null) {
    
    
        serviceInfo = new ServiceInfo();
        serviceInfo.setName(key);
    }

    return serviceInfo;
}

​ Call the serviceMap method getServiceInfo method in ServiceInfoHolder

// ServiceInfoHolder
public ServiceInfo getServiceInfo(final String serviceName, final String groupName, final String clusters) {
    
    
    NAMING_LOGGER.debug("failover-mode: {}", failoverReactor.isFailoverSwitch());
    String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
    String key = ServiceInfo.getKey(groupedServiceName, clusters);
    if (failoverReactor.isFailoverSwitch()) {
    
    
        return failoverReactor.getService(key);
    }
    return serviceInfoMap.get(key);
}

Guess you like

Origin blog.csdn.net/qq_27566167/article/details/129155807