Problem Description
After we modify the JVM Heap parameters, when the cluster is restarted, the cluster appears in the Red state, and some indexes also appear in the Red state.
Use the GET /_cluster/allocation/explain?pretty interface to check the status and get the following information:
reached the limit of incoming shard recoveries [2], cluster setting [cluster.rou ting.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [clu ster.routing.allocation.node_concurrent_recoveries])
As time progresses, the index transitions from RED => YELLOW => GREEN. In addition, we use Elasticsearch version 7.6.
Additional notes:
1) This note will introduce the general method of troubleshooting Red Cluster, rather than specific causes (because there are many causes, but the troubleshooting ideas are similar).
background knowledge
When we run a single instance...
When we run a single instance, all Primary Shards are on the same node, no Replica Shards exist, and the node is in the Yellow state.
At this time, the cluster function is normal, but data may be lost.
When we join the node...
When we added a new node to the cluster, the cluster immediately began to allocate Replica Shareds.
When the allocation is over, the cluster enters the Green state. In the end, all Primary Shards are scattered on different nodes.
When the cluster is in the Red state...
Implicit information
It means that at least one Primary Shard and its Replica Shard are not allocated to the node.
Simply put, there are problems with the allocation of Primary Shard and Replica Shard.
Possible Causes
1) The cluster node fails, and the load is too high to cause the process to exit.
Explanation: All primary shards are scattered on different nodes. When a node fails, the primary shard assigned to it is lost, then there is a problem with the primary shard of the cluster at this time, the cluster function is abnormal, it cannot be used, and it appears red.
Troubleshooting ideas
Red Index leads to Red Shard status, and Red Shard leads to Red Cluster status.
Use GET /_cluster/allocation/explain to get the first shard that cannot be allocated and explain the reason:
{ "index": "test4", "shard": 0, "primary": true, "current_state": "unassigned", "can_allocate": "no", "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes" }
Use GET /_cat/indices?v to view the index status:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open test1 30h1EiMvS5uAFr2t5CEVoQ 5 0 820 0 14mb 14mb green open test2 sdIxs_WDT56afFGu5KPbFQ 1 0 0 0 233b 233b green open test3 GGRZp_TBRZuSaZpAGk2pmw 1 1 2 0 14.7kb 7.3kb red open test4 BJxfAErbTtu5HBjIXJV_7A 1 0 green open test5 _8C6MIXOSxCqVYicH3jsEA 1 0 7 0 24.3kb 24.3kb
Solution
In the previous steps, the reason is usually obtained, just follow the prompts.
In Amazon Elasticsearch Service Troubleshooting/Red Cluster Status , there is no specific solution:
1) The fastest way is to delete the index (if possible). Depending on the situation, the cluster can be expanded later.
2) If you cannot delete the index, you can restore the snapshot, delete the document from the index, modify the configuration, reduce the number of copies, delete other indexes and release the disk, etc.
When the cluster is in the Yellow state...
Implicit information
All Primary Shards of Index have been allocated to nodes, but at least one Replica Shard of Index has not been allocated to nodes.
Simply put, Primary Shard allocation has been completed, but there are problems with Replica Shard allocation.
Possible Causes
1) The cluster has only a single node.
Explanation: When a single-node cluster was first identified, it was always in the Yellow state. All Primary Shards are on this node, and there is no Repica Shard, so they cannot be assigned to other nodes. At this time, the function of the cluster is normal, but there is a risk of data loss.
Solution
If the problem is caused by a single node, you can add nodes to form a cluster.
In Amazon Elasticsearch Service Troubleshooting/Yellow Cluster Status , no detailed solutions are given:
1) You can add nodes (if it is a single-node cluster)
references
Elasticsearch Reference [7.7] » Set up Elasticsearch » Adding nodes to your cluster