ES5.6.4源码分析----多实例的原理

起源

本文的起因源于现场问题的排查。
首先要说明的是ES的数据默认存储在nodes目录下,假设ES配置文件中指定的数据目录为/tmp/elasticsearch,那么ES会把数据存在如下目录:

/tmp/elasticsearch/nodes

正常情况下nodes目录下只有一个目录0,即

/tmp/elasticsearch/nodes/0

ES的所有数据就存在0目录下。

然而ES的现场环境经常会在nodes目录下创建目录1

/tmp/elasticsearch/nodes/0
/tmp/elasticsearch/nodes/1

其实这里的一个目录就表示一个ES实例。目录/tmp/elasticsearch/nodes/0是ES实例0存放数据的路径,目录/tmp/elasticsearch/nodes/1是ES实例1存放数据的路径。
奇怪的是现场ES在每个节点只会启动一个实例,这里为什么会创建出两个呢?原来现场的ES是由看门狗管理的,而看门狗会有重启启动ES的情况,从而导致创建多个实例。这种情况容易导致ES数据的丢失。

解决

那么如何避免多个实例的创建呢?

ES的最大实例个数由elasticsearch.yml中的参数max_local_storage_nodes指定。如果将这个参数的值设置为1,在已经有一个ES实例的情况下,看门狗如果试图再拉起一个ES实例是不会成功的,从而保证的节点最多只能有一个ES实例。在ES2.x中这个配置的默认为50,现场的ES版本就是2.x,又没有配置这个参数才会创建多个实例。值得一提的是ES5.x已经意识到这个默认值得危险性,因此将默认值修改为1,请参考https://github.com/elastic/elasticsearch/pull/19964

源码分析

接下来让我们深入一下ES多实例的关键源码
这段源码位于NodeEnvironment类的构造方法中

ES2.3

// 读取node.max_local_storage_nodes 配置,如果没有值取默认值50
int maxLocalStorageNodes = settings.getAsInt("node.max_local_storage_nodes", 50);
		
        for (int possibleLockId = 0; possibleLockId < maxLocalStorageNodes; possibleLockId++) {
            for (int dirIndex = 0; dirIndex < environment.dataWithClusterFiles().length; dirIndex++) {
            	// 创建实例存储数据的目录举个例子:/tmp/elasticsearch/nodes/0
                Path dir = environment.dataWithClusterFiles()[dirIndex].resolve(NODES_FOLDER).resolve(Integer.toString(possibleLockId));
                Files.createDirectories(dir);

                try (Directory luceneDir = FSDirectory.open(dir, NativeFSLockFactory.INSTANCE)) {
                    logger.trace("obtaining node lock on {} ...", dir.toAbsolutePath());
                    try {
                    	//当前实例试图去获取这个实例目录的文件锁,如果这个目录已经被其他实例使用则,获取失败,继续循环
                        locks[dirIndex] = luceneDir.obtainLock(NODE_LOCK_FILENAME);
                        nodePaths[dirIndex] = new NodePath(dir, environment);
                        localNodeId = possibleLockId;
                    } catch (LockObtainFailedException ex) {
                        logger.trace("failed to obtain node lock on {}", dir.toAbsolutePath());
                        // release all the ones that were obtained up until now
                        releaseAndNullLocks(locks);
                        break;
                    }

                } catch (IOException e) {
                    logger.trace("failed to obtain node lock on {}", e, dir.toAbsolutePath());
                    lastException = new IOException("failed to obtain lock on " + dir.toAbsolutePath(), e);
                    // release all the ones that were obtained up until now
                    releaseAndNullLocks(locks);
                    break;
                }
            }
            // 如果获取到文件锁,就跳出循环
            if (locks[0] != null) {
                // we found a lock, break
                break;
            }
        }

ES5.6.4

ES5.6.4 这部分的代码相较于ES2.3除了MAX_LOCAL_STORAGE_NODES_SETTING的默认值改为1之外,并没有做什么重要的改动

int maxLocalStorageNodes = MAX_LOCAL_STORAGE_NODES_SETTING.get(settings);
            for (int possibleLockId = 0; possibleLockId < maxLocalStorageNodes; possibleLockId++) {
                for (int dirIndex = 0; dirIndex < environment.dataFiles().length; dirIndex++) {
                    Path dataDirWithClusterName = environment.dataWithClusterFiles()[dirIndex];
                    Path dataDir = environment.dataFiles()[dirIndex];
                    // TODO: Remove this in 6.0, we are no longer going to read from the cluster name directory
                    if (readFromDataPathWithClusterName(dataDirWithClusterName)) {
                        DeprecationLogger deprecationLogger = new DeprecationLogger(startupTraceLogger);
                        deprecationLogger.deprecated("ES has detected the [path.data] folder using the cluster name as a folder [{}], " +
                                        "Elasticsearch 6.0 will not allow the cluster name as a folder within the data path", dataDir);
                        dataDir = dataDirWithClusterName;
                    }
                    Path dir = resolveNodePath(dataDir, possibleLockId);
                    Files.createDirectories(dir);

                    try (Directory luceneDir = FSDirectory.open(dir, NativeFSLockFactory.INSTANCE)) {
                        startupTraceLogger.trace("obtaining node lock on {} ...", dir.toAbsolutePath());
                        try {
                            locks[dirIndex] = luceneDir.obtainLock(NODE_LOCK_FILENAME);
                            nodePaths[dirIndex] = new NodePath(dir);
                            nodeLockId = possibleLockId;
                        } catch (LockObtainFailedException ex) {
                            startupTraceLogger.trace(
                                    new ParameterizedMessage("failed to obtain node lock on {}", dir.toAbsolutePath()), ex);
                            // release all the ones that were obtained up until now
                            releaseAndNullLocks(locks);
                            break;
                        }

                    } catch (IOException e) {
                        startupTraceLogger.trace(
                            (Supplier<?>) () -> new ParameterizedMessage("failed to obtain node lock on {}", dir.toAbsolutePath()), e);
                        lastException = new IOException("failed to obtain lock on " + dir.toAbsolutePath(), e);
                        // release all the ones that were obtained up until now
                        releaseAndNullLocks(locks);
                        break;
                    }
                }
                if (locks[0] != null) {
                    // we found a lock, break
                    break;
                }
            }

流程

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/qqqq0199181/article/details/82988445