es cluster minor problem record

The first time I get this, there may be some problems with the following description, just for reference, try to see the official documents and the links listed below.

1. Split brain issues
ref: https://qbox.io/blog/split-brain-problem-elasticsearch
When a node crashes or the communication between nodes is interrupted for some reason, it will appear problem. If a slave node
cannot communicate with the master node, it will elect a new master node from the master node that is still connected to it. Then, the new
master node will take over the responsibilities of the previous master node. If the old master node rejoins the cluster or resumes communication, the new master
node will demote it to a slave node, so there is no conflict. In most cases, this process is seamless and "effective".

However, if you consider the case of only two nodes: a master node and a slave node. If the communication between the two is interrupted, the
slave will be promoted to master, but once the communication is restored, it will have two master nodes. The original master master node
thinks that the slave node is offline and should be re-functioned as a slave node, while the new master node believes that the original master node is offline and should be a slave node.
At this time, there will be a cluster split brain problem. (Personal tests did not reproduce the split-brain problem of the two nodes. There may not be a
reason for in-depth testing. After the slave node of the two nodes on my side hangs up, the master node can normally perform modification/query operations, but the master node hangs
Later, because the slave node is also a data node, it can still be queried, but it cannot be modified. Because the
python es plug- in is used from the code level, the plug-in may protect the split brain problem caused by two nodes. Mechanism,
when the code performs modification operations, it will always try to connect to the failed master node, resulting in a timeout exception, and data synchronization during the operation is normal)

Look at another blog for a description: Suppose you have a cluster of 10 nodes, and 3 nodes are disconnected from the cluster.
Due to the discovery mechanism, these 3 nodes may form a new cluster, which results in two The cluster with the same name, this is
split-brain, you need to set discovery.zen.minium_master_nodes:6 at this time (there are too many nodes,
there is no real test, but it is understandable by looking at the description, here are 4 nodes to try 3 quorums, ie 3 quorums)


There is a post on the official forum to discuss the two-node split brain issues, you can refer to To avoid split brain issues.
ref: https://discuss.elastic.co/t/how-can-a-configure-two-node- In-one-cluster/171088/11
, a member of the es team said: That's the recommandation to avoid split brain issues.
All 3 nodes must be master elligible nodes. Basically. And listed the configuration information of the 3 nodes,
so It is recommended to configure at least 3 nodes. Of course, the more nodes, the better HA may be, depending on the actual business situation and available
server resources.


There is also a description of two-node clusters in the official es documentation, but it is important to point out: Because it's not resilient to failures, 
we do not recommend deploying a two-node cluster in production. Therefore, it is not recommended to deploy a two-node cluster in production

When I was testing the failover of two nodes, after doing the following simple configuration,
discovery.seed_hosts: ["ip1", "ip2"]
cluster.initial_master_nodes: ["ip1", "ip2"]
found out:
If the slave node is down : There is one master node left, the es log can see that the slave node is left (left) and no abnormal information is thrown, and
                the modification operations and query operations on the code side can still be completed normally.
If the master node hangs: remaining For a slave node, the es log will report a connect error and always try to trigger the election, but always try to access the master node during the election (refer to the log below).
                The query operation on the code side can be completed normally, but the modification operation cannot be performed. Performing the modification operation from the code side will report the connection to the master node timeout exception,
               (I want to achieve the effect of master-slave replication with redis-that is, the slave node will automatically upgrade the master node after the master node hangs, but in reality, it cannot be achieved. Because the slave node has been
                trying to ping the master node, resulting in a connect error exception)
After the master node hangs down, the slave node log: master not discovered or elected yet, an election requires a node with id [master node ID],... have discovered [slave node] which is not a quorum; discovery will continue using [master Node]
ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/high-availability-cluster-small-clusters.html#high-availability-cluster-design-two-nodes


2.es built-in automatic discovery mechanism: zen
ref: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-discovery-zen.html
only need to configure the same cluster.name to connect the node Join the same cluster


4-node cluster
 

cluster.name: my-application
node.name: node-1
path.data: /opt/es/elasticsearch-7.6.1/data
path.logs: /opt/es/elasticsearch-7.6.1/log
network.host: ip1
http.port: 9200
discovery.seed_hosts: ["ip1", "ip2", "ip3", "ip4"]
cluster.initial_master_nodes: ["ip1", "ip2", "ip3", "ip4"]
bootstrap.system_call_filter: false
bootstrap.memory_lock: false
http.cors.enabled: true
http.cors.allow-origin: "*"
discovery.zen.minimum_master_nodes: 3

cluster.name: my-application
node.name: node-2
path.data: /opt/es/elasticsearch-7.6.1/data
path.logs: /opt/es/elasticsearch-7.6.1/log
network.host: ip2
http.port: 9200
discovery.seed_hosts: ["ip1", "ip2", "ip3", "ip4"]
cluster.initial_master_nodes: ["ip1", "ip2", "ip3", "ip4"]
bootstrap.system_call_filter: false
bootstrap.memory_lock: false
http.cors.enabled: true
http.cors.allow-origin: "*"
discovery.zen.minimum_master_nodes: 3

cluster.name: my-application
node.name: node-3
path.data: /opt/es/elasticsearch-7.6.1/data
path.logs: /opt/es/elasticsearch-7.6.1/log
network.host: ip3
http.port: 9200
discovery.seed_hosts: ["ip1", "ip2", "ip3", "ip4"]
cluster.initial_master_nodes: ["ip1", "ip2", "ip3", "ip4"]
bootstrap.system_call_filter: false
bootstrap.memory_lock: false
http.cors.enabled: true
http.cors.allow-origin: "*"
discovery.zen.minimum_master_nodes: 3

cluster.name: my-application
node.name: node-4
path.data: /opt/es/elasticsearch-7.6.1/data
path.logs: /opt/es/elasticsearch-7.6.1/log
network.host: ip4
http.port: 9200
discovery.seed_hosts: ["ip1", "ip2", "ip3", "ip4"]
cluster.initial_master_nodes: ["ip1", "ip2", "ip3", "ip4"]
bootstrap.system_call_filter: false
bootstrap.memory_lock: false
http.cors.enabled: true
http.cors.allow-origin: "*"
discovery.zen.minimum_master_nodes: 3

 

Check the health status of a node
Visit: http://ip:9200/_cluster/health
returns:

{
    "cluster_name": "my-application",
    "status": "green",
    "timed_out": false,
    "number_of_nodes": 4,
    "number_of_data_nodes": 4,
    "active_primary_shards": 6,
    "active_shards": 12,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 0,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 100
}


 

Guess you like

Origin blog.csdn.net/baidu_30809315/article/details/114071822