How data storage path Elasticsearch index is determined

Elasticsearch, the node configuration can be specified path.datais used as a storage node catalog data, and we can specify multiple values as the path data storage, then Elasticsearch is how to determine which path should be stored next to it? Today, I recorded this issue.

Elasticsearch index creation process

  1. After the cluster master receives a request to create an index, through a number of steps to create the index, the index is created will eventually submit a request to ClusterState
  2. master will be distributed to all nodes according to ClusterState points
  3. It involves creating a shard node reads locally available path.data, then get the path according to certain rules.
  4. Creating a basic shard path, save the shard basic information.

How to determine which directory under

Source

The main call is ShardPath method of selectNewPathForShard

   for (NodeEnvironment.NodePath nodePath : env.nodePaths()) {
                totFreeSpace = totFreeSpace.add(BigInteger.valueOf(nodePath.fileStore.getUsableSpace()));
            }

            // TODO: this is a hack!!  We should instead keep track of incoming (relocated) shards since we know
            // how large they will be once they're done copying, instead of a silly guess for such cases:

            // Very rough heuristic of how much disk space we expect the shard will use over its lifetime, the max of current average
            // shard size across the cluster and 5% of the total available free space on this node:
            BigInteger estShardSizeInBytes = BigInteger.valueOf(avgShardSizeInBytes).max(totFreeSpace.divide(BigInteger.valueOf(20)));

            // TODO - do we need something more extensible? Yet, this does the job for now...
            final NodeEnvironment.NodePath[] paths = env.nodePaths();

            // If no better path is chosen, use the one with the most space by default
            NodeEnvironment.NodePath bestPath = getPathWithMostFreeSpace(env);

            if (paths.length != 1) {
                Map<NodeEnvironment.NodePath, Long> pathToShardCount = env.shardCountPerPath(shardId.getIndex());

                // Compute how much space there is on each path
                final Map<NodeEnvironment.NodePath, BigInteger> pathsToSpace = new HashMap<>(paths.length);
                for (NodeEnvironment.NodePath nodePath : paths) {
                    FileStore fileStore = nodePath.fileStore;
                    BigInteger usableBytes = BigInteger.valueOf(fileStore.getUsableSpace());
                    pathsToSpace.put(nodePath, usableBytes);
                }

                bestPath = Arrays.stream(paths)
                        // Filter out paths that have enough space
                        .filter((path) -> pathsToSpace.get(path).subtract(estShardSizeInBytes).compareTo(BigInteger.ZERO) > 0)
                        // Sort by the number of shards for this index
                        .sorted((p1, p2) -> {
                                int cmp = Long.compare(pathToShardCount.getOrDefault(p1, 0L),
                                    pathToShardCount.getOrDefault(p2, 0L));
                                if (cmp == 0) {
                                    // if the number of shards is equal, tie-break with the number of total shards
                                    cmp = Integer.compare(dataPathToShardCount.getOrDefault(p1.path, 0),
                                            dataPathToShardCount.getOrDefault(p2.path, 0));
                                    if (cmp == 0) {
                                        // if the number of shards is equal, tie-break with the usable bytes
                                        cmp = pathsToSpace.get(p2).compareTo(pathsToSpace.get(p1));
                                    }
                                }
                                return cmp;
                            })
                        // Return the first result
                        .findFirst()
                        // Or the existing best path if there aren't any that fit the criteria
                        .orElse(bestPath);
            }

            statePath = bestPath.resolve(shardId);
            dataPath = statePath;
        }

Process Analysis

  1. First, determine whether a custom path.data, no custom created in the default path
  2. Since the case of defined nodes ensuring at least 5% of the space can be used
  3. Get all the paths,
  4. Then set the default path is the best path currently has the most space
  5. Through all the paths, first of all there is no path to filter out the space, if it is not in line, they return path 4 steps, otherwise continue to step 6
  6. According to the sort rule paths, first determines whether the number of the path of each shard index lower priority return path containing a minimum number of shard index is present;
    when the conditions are the same as a result, each path contains some comparative shard Total (all indexes) return path contains the minimum number of the shard;
    when the second condition the same results, compare the available space, the most available space return path
  7. Generates the appropriate path, create directories and other information.

 

 

Published 298 original articles · won praise 107 · Views 140,000 +

Guess you like

Origin blog.csdn.net/ywl470812087/article/details/104874698