What brain bifida? Zookeeper is how to solve?

What is the split brain

Brain split (split-brain) is the "split brain", which is originally a "brain" is split two or more "brains", we all know that if a person has more brains, and independent of each other, then, it causes the body to "dancing", "lose control."

Split brain typically occurs in a clustered environment, such as ElasticSearch, Zookeeper clusters, and these clusters have a unified environment feature is that they have a brain, such as the Master node in the cluster is ElasticSearch, Zookeeper cluster nodes have Leader.

This article focuses on to tell you about the split brain problem Zookeeper, as well as if the split brain solve the problem.

Zookeeper cluster split brain scenarios

For a cluster, we want to improve the availability of the cluster, usually multi-room deployment, such as now there is a cluster of six zkServer a composed, deployed in two rooms:

image.png

Under normal circumstances, this cluster will be only one Leader, so if the network between the engine room after the break, zkServer in two rooms or can, if communicate with each other without considering more than half of the mechanism , then there will be the interior of each room are We will elect a Leader.

image.png

This is equivalent to the original one cluster, is divided into two clusters, there were two "brain", which is split brain.

In this case, we can also see, was supposed to be a cluster of unified external service provider, now turned into two clusters at the same time provide services, and if after a while, suddenly broken Unicom network, then At this point there will be a problem, both just two clusters provide services, and how to merge data, and so how to solve the problem of data collision.

Just in the description of split-brain scenario, there is not considered a prerequisite for more than half of the mechanism, so in fact Zookeeper is a cluster split brain problem does not occur, and the reason will not be just more than half of the mechanisms.

More than half Mechanism

In the process of the election of the leader. If an zkServer won more than half of the votes, this zkServer can become a Leader.

Source mechanisms to achieve more than half is actually very simple:

public class QuorumMaj implements QuorumVerifier {
    private static final Logger LOG = LoggerFactory.getLogger(QuorumMaj.class);
    
    int half;
    
    // n表示集群中zkServer的个数(准确的说是参与者的个数,参与者不包括观察者节点)
    public QuorumMaj(int n){
        this.half = n/2;
    }

    // 验证是否符合过半机制
    public boolean containsQuorum(Set<Long> set){
        // half是在构造方法里赋值的
        // set.size()表示某台zkServer获得的票数
        return (set.size() > half);
    }
    
}
复制代码

We carefully look at the comments in the above method, the core code is the following two lines:

this.half = n/2;
return (set.size() > half);
复制代码

Here is a simple example: If you have five zkServer now cluster, half = 5/2 = 2, that is to say, the process of the election of the leader must have at least three zkServer voted in the same zkServer, more than half will be in line with the mechanism to elected a Leader.

Then there is a problem we think about the election. Why is it necessary to have more than half a verification mechanism? Because it does not need to wait for all zkServer have voted in the same zkServer a Leader will be elected, so faster, so called rapid chant leader election algorithm.

So to think of a question, more than half of the mechanism Why is greater than, greater than, not equal to it?

This is more a relationship split brain problem, such as split brain problem back to the scene appears above:

image.png

When the network cut off the middle of the room, three servers in the room 1 will be a leader in the elections, but this time more than half of the mechanism condition is set.size ()> 3, that is to say at least four zkServer elected to a Leader, so it is not a room for selecting a Leader, the same can not select a room 2 Leader, the machine room when the entire cluster network is broken in this case, the entire cluster will not Leader.

And if more than half of the mechanism condition is set.size ()> = 3, then the engine room 1 and room 2 will elect a Leader, so there has been a split brain. So we know why more than half of the mechanism is greater than , but not greater than or equal . It is to prevent split brain.

If we assume that there are only five machines, also deployed in two rooms:

image.png

At this condition more than half mechanism is set.size ()> 2, that is, to at least 3 servers to select a Leader, this time off the network room pieces for the room 1 is not affected, Leader still or Leader, room for 2 is not elected Leader, in which case the entire cluster is only one Leader.

So, we can conclude draw, with more than half of the mechanism, Zookeeper for a cluster or not Leader, not only to be a Leader, thus avoiding a split-brain problem.

There are pain points have innovation, it is definitely a technology to solve a pain point phenomena. Please help forward look, if you want the first time to learn more exciting content, please pay attention to micro-channel public number: 1:25

reny125.jpeg

Guess you like

Origin juejin.im/post/5d36c2f25188257f6a209d37