[Liangshan heroes say IT] to understand how the brain crack problem

This series is to illustrate some of the concepts of IT by way of example Liangshan heroes.

1. Problem Description

Split-brain, that is in a clustered environment, because the abnormal conditions, to produce two sub-clusters. Each sub-cluster will elect a leader (master) node. Finally, competing resources lead to failure.

2. How to solve?
To sum up There are several common ways (including software, hardware), such as the higher intervention, internal liaison, to reflect on, any delay.
We assume that 108 people lost the Liangshan heroes, all 54 people, led by Song Jiang Lu Junyi each half, all thought the other half hung up, it was set up Liangshan Zhong Yi Tong.
So after meeting the two sides, how to decide who the boss is?

a higher intervention:
the use of additional detection node, when the two sides directly connected off an appointment to use a common node to detect whether the fault is directly connected.
Liangshan back here, is two to nine heaven help the hero all the temple, let Xuannv empress specify who is the boss.

b. Line Direct Connect
if there solvability Jane Song Jiang gang, Lu Junyi here solvability treasure. There are special heartbeat connection between two brothers. So two groups of people did not know if you actually hang, so it will not wait but were reunited when the internal election re-election boss

c. more than half of the mechanism
is only a sub-cluster of more than half the original number of clusters allowed only elected leader
Liangshan here, as are the 54 people, more than half do not, so can not elections, can only wait for the reunion when the election boss

D. election extended time
such as has parameters es, accordingly determines if the parameter does not respond within n seconds, the default master node linked to the master node from the access node. This parameter can turn up.
Liangshan here, is that if an appointment can not contact a month later lost to the electoral boss. Now, after the election in order to adjust for 1 year. It is very likely within a year actually reunited.

3. The specific case
a. Zookeeper a "half mechanism" to solve
the leader election process. If an zkserver won more than half of the votes, this zkServer can become a leader.
The key code:
return (set.size ()> n-/ 2);
n-represents the old cluster number of participants zkServer
set subscript after the failure, the number of participants book cluster.
The key point is: be more than half the original cluster to become a leader. This ensures that no matter how faulty split, the sub-cluster or no leader, or only a leader (more than half of the sub-cluster). Example, the original six, n / 2 = 3. If split into two sub-clusters of three, they have no leader, if a 4, a 2, you will become the leader 4 station.

. b ElasticSearch an adjustment parameters
 discovery.zen.ping_timeout: 3 This parameter specifies the access node from the master node does not respond within three seconds if the default master node hung, we can change it to an appropriate large change 5-6 s this reduces the probability of split brain occurs.
discovery.zen.minimum_master_nodes: 1
means that the parameter is, when the master node comprising become satisfied from this number and the number of nodes that are linked to the master node will be the election of a new master node. For example: es cluster with three nodes are eligible to become the master node, then three nodes are considered the master node will be linked to the elections, at this time if the value of the parameter is not 4 election.
We can change this value appropriate to large, reducing the probability of split brain occurs, the official recommendation is given (n / 2) + 1, n is the number of nodes are eligible to become the master node node.master = true.

Reference:
https://my.oschina.net/u/236698/blog/525413
https://mp.weixin.qq.com/s/VlkK_Lb-ZlGcYi-QfHC22Q
https://blog.csdn.net/ty4315/ article / details / 52491799

Guess you like

Origin www.cnblogs.com/rossiXYZ/p/11755955.html