Table of contents
1. Cluster split brain problem
1. Division of cluster responsibilities
2. Cluster distributed storage
1. Cluster split brain problem
1. Division of cluster responsibilities
Cluster nodes in elasticsearch have different responsibilities:
By default, any node in the cluster has the above four roles at the same time.
But a real cluster must separate cluster responsibilities:
master node: high CPU requirements, but memory requirements
data node: high requirements for CPU and memory
Coordinating node: high requirements for network bandwidth and CPU
Separation of duties allows us to allocate different hardware for deployment according to the needs of different nodes. And avoid mutual interference between services.
A typical es cluster responsibility division is shown in the figure:
2. Split brain problem
A split-brain is caused by the disconnection of nodes in the cluster.
For example, in a cluster, the master node loses connection with other nodes:
At this time, node2 and node3 think that node1 is down, and they will re-elect the master:
After node3 is elected, the cluster continues to provide external services. Node2 and node3 form a cluster by itself, and node1 forms a cluster by itself. The data of the two clusters is not synchronized, and data differences occur.
When the network is restored, because there are two master nodes in the cluster, the status of the cluster is inconsistent, and a split-brain situation occurs:
The solution to split-brain is to require votes to exceed (number of eligible nodes + 1)/2 to be elected as the master, so the number of eligible nodes should preferably be an odd number. The corresponding configuration item is discovery.zen.minimum_master_nodes, which has become the default configuration after es7.0, so the split-brain problem generally does not occur
For example: for a cluster formed by 3 nodes, the votes must exceed (3 + 1) / 2, which is 2 votes. node3 gets the votes of node2 and node3, and is elected as the master. node1 has only 1 vote for itself and was not elected. There is still only one master node in the cluster, and there is no split brain.
3. Summary
What is the role of the master eligible node?
Participate in group election
The master node can manage the cluster state, manage sharding information, and process requests to create and delete index libraries
What is the role of the data node?
CRUD of data
What is the role of the coordinator node?
Route requests to other nodes
Combine the query results and return them to the user
2. Cluster distributed storage
When a new document is added, it should be saved in different shards to ensure data balance, so how does the coordinating node determine which shard the data should be stored in?
1. Shard storage test
2. Shard storage principle
Elasticsearch will use the hash algorithm to calculate which shard the document should be stored in:
illustrate:
-
_routing defaults to the id of the document
-
The algorithm is related to the number of shards, so once the index library is created, the number of shards cannot be modified!
The process of adding new documents is as follows:
Interpretation:
1) Add a document with id=1
2) Do a hash operation on the id, if the result is 2, it should be stored in shard-2
3) The primary shard of shard-2 is on node3, and the data is routed to node3
4) Save the document
5) Synchronize to replica-2 of shard-2, on the node2 node
6) Return the result to the coordinating-node node
3. Cluster distributed query
The elasticsearch query is divided into two stages:
-
scatter phase: In the scatter phase, the coordinating node will distribute the request to each shard
-
gather phase: the gathering phase, the coordinating node summarizes the search results of the data node, and processes it as the final result set and returns it to the user
4. Cluster failover
The master node of the cluster will monitor the status of the nodes in the cluster. If a node is found to be down, it will immediately migrate the fragmented data of the down node to other nodes to ensure data security. This is called failover.
1) For example, a cluster structure as shown in the figure:
Now, node1 is the master node and the other two nodes are slave nodes.
2) Suddenly, node1 fails:
The first thing after the downtime is to re-elect the master, for example, node2 is selected:
After node2 becomes the master node, it will check the cluster monitoring status and find that: shard-1 and shard-0 have no replica nodes. Therefore, the data on node1 needs to be migrated to node2 and node3: