[ElasticSearch] (9) - cluster problems

 

Table of contents

1. Cluster split brain problem

1. Division of cluster responsibilities

2. Split brain problem

3. Summary

2. Cluster distributed storage

1. Shard storage test

2. Shard storage principle

3. Cluster distributed query

4. Cluster failover

 


1. Cluster split brain problem

1. Division of cluster responsibilities

Cluster nodes in elasticsearch have different responsibilities:

By default, any node in the cluster has the above four roles at the same time.

But a real cluster must separate cluster responsibilities:

  • master node: high CPU requirements, but memory requirements

  • data node: high requirements for CPU and memory

  • Coordinating node: high requirements for network bandwidth and CPU

Separation of duties allows us to allocate different hardware for deployment according to the needs of different nodes. And avoid mutual interference between services.

A typical es cluster responsibility division is shown in the figure:

 

 

2. Split brain problem

A split-brain is caused by the disconnection of nodes in the cluster.

For example, in a cluster, the master node loses connection with other nodes:

At this time, node2 and node3 think that node1 is down, and they will re-elect the master:

 

After node3 is elected, the cluster continues to provide external services. Node2 and node3 form a cluster by itself, and node1 forms a cluster by itself. The data of the two clusters is not synchronized, and data differences occur.

When the network is restored, because there are two master nodes in the cluster, the status of the cluster is inconsistent, and a split-brain situation occurs:

 

The solution to split-brain is to require votes to exceed (number of eligible nodes + 1)/2 to be elected as the master, so the number of eligible nodes should preferably be an odd number. The corresponding configuration item is discovery.zen.minimum_master_nodes, which has become the default configuration after es7.0, so the split-brain problem generally does not occur

For example: for a cluster formed by 3 nodes, the votes must exceed (3 + 1) / 2, which is 2 votes. node3 gets the votes of node2 and node3, and is elected as the master. node1 has only 1 vote for itself and was not elected. There is still only one master node in the cluster, and there is no split brain.

3. Summary

What is the role of the master eligible node?

  • Participate in group election

  • The master node can manage the cluster state, manage sharding information, and process requests to create and delete index libraries

What is the role of the data node?

  • CRUD of data

What is the role of the coordinator node?

  • Route requests to other nodes

  • Combine the query results and return them to the user

2. Cluster distributed storage

When a new document is added, it should be saved in different shards to ensure data balance, so how does the coordinating node determine which shard the data should be stored in?

1. Shard storage test

2. Shard storage principle

Elasticsearch will use the hash algorithm to calculate which shard the document should be stored in:

illustrate:

  • _routing defaults to the id of the document

  • The algorithm is related to the number of shards, so once the index library is created, the number of shards cannot be modified!

The process of adding new documents is as follows:

Interpretation:

  • 1) Add a document with id=1

  • 2) Do a hash operation on the id, if the result is 2, it should be stored in shard-2

  • 3) The primary shard of shard-2 is on node3, and the data is routed to node3

  • 4) Save the document

  • 5) Synchronize to replica-2 of shard-2, on the node2 node

  • 6) Return the result to the coordinating-node node

3. Cluster distributed query

The elasticsearch query is divided into two stages:

  • scatter phase: In the scatter phase, the coordinating node will distribute the request to each shard

  • gather phase: the gathering phase, the coordinating node summarizes the search results of the data node, and processes it as the final result set and returns it to the user

 

4. Cluster failover

The master node of the cluster will monitor the status of the nodes in the cluster. If a node is found to be down, it will immediately migrate the fragmented data of the down node to other nodes to ensure data security. This is called failover.

1) For example, a cluster structure as shown in the figure:

 

Now, node1 is the master node and the other two nodes are slave nodes.

2) Suddenly, node1 fails:

The first thing after the downtime is to re-elect the master, for example, node2 is selected: 

 

After node2 becomes the master node, it will check the cluster monitoring status and find that: shard-1 and shard-0 have no replica nodes. Therefore, the data on node1 needs to be migrated to node2 and node3:

 

 

 

 

 

Guess you like

Origin blog.csdn.net/a6470831/article/details/125667996