elasticsearch node restart problem
ElasticSearch high availability cluster and the node will automatically copy fragments in other nodes after node hang (restart) self-balancing scheme, which would lead to a large amount of network overhead and IO.
If a node leaves the cluster to rejoin, elasticsearch to the data slice (Shard) rebalance assigns data piece (Shard) rejoining nodes again, When an excessive pressure because es hang up other es of service that backs up data sets stored should have es, resulting in greater pressure, so the entire cluster will be an avalanche.
Advised to turn off automatic balancing production environment.
Data packets from the balance sheet
A: Automatic slice, even if the new index can not allocate data pieces
curl -XPUT http://192.168.1.213:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.enable" : "none"
}
}'
II: balancing the data slice does not automatically turn off automatically when the balance is only increased or decreased ES node
curl -XPUT http://192.168.1.213:9200/_cluster/settings?pretty -d '{
"transient" : {
"cluster.routing.rebalance.enable" : "none"
}
}'
After setting up the view settings are added successfully:
curl http://192.168.1.213:9200/_cluster/settings?pretty
Re-enable automatic slicing
curl -XPUT http://192.168.1.213:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.enable" : "all"
}
}
Redistribution of delayed copies of
PUT /_all/_settings
{
"settings": {
"index.unassigned.node_left.delayed_timeout": "5m"
}
}
Re unallocated node to delay After 5 minutes
The following files are modified elasticsearch.yml
gateway.recover_after_nodes: 8
This will prevent Elasticsearch data immediately begin to recover until there is a cluster of at least eight (data node or primary node) node.
gateway.expected_nodes: 10
gateway.recover_after_time: 5m
Cluster data recovery starts after five minutes, or until the node 10 is added, whichever comes first.
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/modules-gateway.html
Split brain problem
After the restart of a certain instance, is likely to lead to the master instance can not be found and will be elected himself as master of the situation arise, to prevent this, it is necessary to adjust the content of elasticsearch.yml
discovery.zen.minimum_master_nodes: 2
This configuration is to tell Elasticsearch unless there is enough available master candidate nodes, or not elections master, only there is enough available master candidate node before the election.
This set should always be configured as the master node with a qualified points / 2 + 1, for example:
10 matching rule nodes, configure to 6
with a 2 to 3 is arranged.
Set on the effectiveness of
After the restart setting also persistent presence
transient will disappear after restarting the entire cluster settings
PUT /_cluster/settings
{
"persistent" : {
"discovery.zen.minimum_master_nodes" : 2
}
}
The following two general settings on it
# 通过配置大多数节点(节点总数/ 2 + 1)来防止脑裂
#
discovery.zen.minimum_master_nodes: 2
# 在一个完整的集群重新启动到N个节点开始之前,阻止初始恢复
#
gateway.recover_after_nodes: 3
Reprinted: https://www.jianshu.com/p/9752709bfea4