"Elasticsearch"-Cluster/Index Red/Yellow Status@20210222

Problem Description

After we modify the JVM Heap parameters, when the cluster is restarted, the cluster appears in the Red state, and some indexes also appear in the Red state.

Use the GET /_cluster/allocation/explain?pretty interface to check the status and get the following information:

reached the limit of incoming shard recoveries [2], cluster setting [cluster.rou
ting.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [clu
ster.routing.allocation.node_concurrent_recoveries])

As time progresses, the index transitions from RED => YELLOW => GREEN. In addition, we use Elasticsearch version 7.6.

Additional notes:
1) This note will introduce the general method of troubleshooting Red Cluster, rather than specific causes (because there are many causes, but the troubleshooting ideas are similar).

background knowledge

When we run a single instance...

When we run a single instance, all Primary Shards are on the same node, no Replica Shards exist, and the node is in the Yellow state.

At this time, the cluster function is normal, but data may be lost.

When we join the node...

When we added a new node to the cluster, the cluster immediately began to allocate Replica Shareds.

When the allocation is over, the cluster enters the Green state. In the end, all Primary Shards are scattered on different nodes.

When the cluster is in the Red state...

Implicit information

It means that at least one Primary Shard and its Replica Shard are not allocated to the node.

Simply put, there are problems with the allocation of Primary Shard and Replica Shard.

Possible Causes

1) The cluster node fails, and the load is too high to cause the process to exit.
Explanation: All primary shards are scattered on different nodes. When a node fails, the primary shard assigned to it is lost, then there is a problem with the primary shard of the cluster at this time, the cluster function is abnormal, it cannot be used, and it appears red.

Troubleshooting ideas

Red Index leads to Red Shard status, and Red Shard leads to Red Cluster status.

Use GET /_cluster/allocation/explain to get the first shard that cannot be allocated and explain the reason:

{
    "index": "test4",
    "shard": 0,
    "primary": true,
    "current_state": "unassigned",
    "can_allocate": "no",
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes"
}

Use GET /_cat/indices?v to view the index status:

health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test1            30h1EiMvS5uAFr2t5CEVoQ   5   0        820            0       14mb           14mb
green  open   test2            sdIxs_WDT56afFGu5KPbFQ   1   0          0            0       233b           233b
green  open   test3            GGRZp_TBRZuSaZpAGk2pmw   1   1          2            0     14.7kb          7.3kb
red    open   test4            BJxfAErbTtu5HBjIXJV_7A   1   0
green  open   test5            _8C6MIXOSxCqVYicH3jsEA   1   0          7            0     24.3kb         24.3kb

Solution

In the previous steps, the reason is usually obtained, just follow the prompts.

In Amazon Elasticsearch Service Troubleshooting/Red Cluster Status , there is no specific solution:
1) The fastest way is to delete the index (if possible). Depending on the situation, the cluster can be expanded later.
2) If you cannot delete the index, you can restore the snapshot, delete the document from the index, modify the configuration, reduce the number of copies, delete other indexes and release the disk, etc.

When the cluster is in the Yellow state...

Implicit information

All Primary Shards of Index have been allocated to nodes, but at least one Replica Shard of Index has not been allocated to nodes.

Simply put, Primary Shard allocation has been completed, but there are problems with Replica Shard allocation.

Possible Causes

1) The cluster has only a single node.
Explanation: When a single-node cluster was first identified, it was always in the Yellow state. All Primary Shards are on this node, and there is no Repica Shard, so they cannot be assigned to other nodes. At this time, the function of the cluster is normal, but there is a risk of data loss.

Solution

If the problem is caused by a single node, you can add nodes to form a cluster.

In Amazon Elasticsearch Service Troubleshooting/Yellow Cluster Status , no detailed solutions are given:
1) You can add nodes (if it is a single-node cluster)

references

Elasticsearch Reference [7.7] » Set up Elasticsearch » Adding nodes to your cluster

Guess you like

Origin blog.csdn.net/u013670453/article/details/113956806