The basic concepts and relationships of Cluster, Node, Shard, Indices, replicas in ElasticSearch

[Cluster]
cluster, an ES cluster consists of one or more nodes (Node), each cluster has a cluster name as an identification
--------------------- ---------------------------
[node]
node, an ES instance is a node, a machine can have multiple instances, so it cannot It is said that a machine is a node. In most cases, each node runs in an independent environment or virtual machine.
------------------------------------------------
【index ]
Index, that is, a collection of documents
----------------------------------------- -------
[shard]

1. Sharding, ES is a distributed search engine, each index has one or more shards, and the indexed data is distributed to each shard, which is equivalent to using N cups for a bucket of water

2. Sharding is helpful for horizontal expansion. N shards will be distributed on different nodes as evenly as possible (rebalance) (for example, if you have 2 nodes and 4 primary shards (not considering backup), then each Each node will be divided into 2 shards. Later, if you add 2 nodes, then you will have 1 shard on each of your 4 nodes. This process is called relocation, which is automatically completed after ES perception)

3. Fragments are independent. For the behavior of a Search Request, each fragment will execute this Request. In addition

4. Each shard is a Lucene Index, so a shard can only store Integer.MAX_VALUE - 128 = 2,147,483,519 docs.

------------------------------------------------
【replica】

1. Replication, which can be understood as a backup shard, correspondingly has a primary shard (primary shard)

2. The primary shard and backup shard will not appear on the same node (to prevent single point of failure). By default, an index creates 5 shards and one backup (ie 5primary+5replica=10 shards)

3. If you only have one node, none of the five replicas can be assigned (unassigned), and the cluster status will change to Yellow. The role of replica mainly includes:

Three states of ES cluster:

1) Green: All primary shards and backup shards are ready and allocated successfully. Even if a machine hangs up (assuming a machine instance), the data will not be lost, but it will become yellow.

2) Yellow: All primary shards are ready, but at least one primary shard (assumed to be A) corresponds to a backup shard that is not ready. At this time, the cluster is in an alarm state, which means that the high availability and disaster recovery capabilities are reduced. If it happens to be A If the machine where it is hangs up, and you only set up one backup (already in the uncontinued state), then the data of A will be lost (the query is incomplete), and the cluster is in the Red state at this time.

3) Red: At least one primary shard is not ready (the direct reason is that the corresponding backup shard cannot be found to become the new primary shard), and the result of the query will be data loss (incomplete).

1. Disaster recovery: if the primary shard is lost, the replica shard will be pushed up to become the new primary shard, and a new replica will be created based on the new primary shard, and the cluster data will be safe and sound


2. Improve query performance: The data of the replica and primary shards are the same, so for a query, both the primary shard and the backup shard can be checked, and the performance of multiple replicas within a suitable range will be better (but consider The resource usage will also increase [cpu/disk/heap]), and the index request can only occur on the primary shard, and the replica cannot execute the index request.

3. For an index, the number of shards (number of primary shards, number_of_shards) cannot be adjusted unless the index is rebuilt, but the number of replicas (number_of_replicas) can be adjusted at any time.


[Cluster]
cluster, an ES cluster consists of one or more nodes (Node), each cluster has a cluster name as an identification
--------------------- ---------------------------
[node]
node, an ES instance is a node, a machine can have multiple instances, so it cannot It is said that a machine is a node. In most cases, each node runs in an independent environment or virtual machine.
------------------------------------------------
【index ]
Index, that is, a collection of documents
----------------------------------------- -------
[shard]

1. Sharding, ES is a distributed search engine, each index has one or more shards, and the indexed data is distributed to each shard, which is equivalent to using N cups for a bucket of water

2. Sharding is helpful for horizontal expansion. N shards will be distributed on different nodes as evenly as possible (rebalance) (for example, if you have 2 nodes and 4 primary shards (not considering backup), then each Each node will be divided into 2 shards. Later, if you add 2 nodes, then you will have 1 shard on each of your 4 nodes. This process is called relocation, which is automatically completed after ES perception)

3. Fragments are independent. For the behavior of a Search Request, each fragment will execute this Request. In addition

4. Each shard is a Lucene Index, so a shard can only store Integer.MAX_VALUE - 128 = 2,147,483,519 docs.

------------------------------------------------
【replica】

1. Replication, which can be understood as a backup shard, correspondingly has a primary shard (primary shard)

2. The primary shard and backup shard will not appear on the same node (to prevent single point of failure). By default, an index creates 5 shards and one backup (ie 5primary+5replica=10 shards)

3. If you only have one node, none of the five replicas can be assigned (unassigned), and the cluster status will change to Yellow. The role of replica mainly includes:

Three states of ES cluster:

1) Green: All primary shards and backup shards are ready and allocated successfully. Even if a machine hangs up (assuming a machine instance), the data will not be lost, but it will become yellow.

2) Yellow: All primary shards are ready, but at least one primary shard (assumed to be A) corresponds to a backup shard that is not ready. At this time, the cluster is in an alarm state, which means that the high availability and disaster recovery capabilities are reduced. If it happens to be A If the machine where it is hangs up, and you only set up one backup (already in the uncontinued state), then the data of A will be lost (the query is incomplete), and the cluster is in the Red state at this time.

3) Red: At least one primary shard is not ready (the direct reason is that the corresponding backup shard cannot be found to become the new primary shard), and the result of the query will be data loss (incomplete).

1. Disaster recovery: if the primary shard is lost, the replica shard will be pushed up to become the new primary shard, and a new replica will be created based on the new primary shard, and the cluster data will be safe and sound


2. Improve query performance: The data of the replica and primary shards are the same, so for a query, both the primary shard and the backup shard can be checked, and the performance of multiple replicas within a suitable range will be better (but consider The resource usage will also increase [cpu/disk/heap]), and the index request can only occur on the primary shard, and the replica cannot execute the index request.

3. For an index, the number of shards (number of primary shards, number_of_shards) cannot be adjusted unless the index is rebuilt, but the number of replicas (number_of_replicas) can be adjusted at any time.


Guess you like

Origin blog.csdn.net/selectgoodboy/article/details/86611810