Electoral process MongoDB replica set of the master node

MongoDB replication set has the function of automatic tolerance part node goes down, the problems arise from time to time in the replication set, will trigger election-related process is completed automatically switches from the primary node. Each replica set member will be in the heartbeat thread running in the background and copy set of all nodes, in both cases will trigger state detection process:

  1. Replica set member heartbeat detection result changes, such a node is down or the new node;
  2. 4s over state detection process is not performed. In the state detection process generally comprises the steps of:

(1) detects whether it is in the electoral process, if it is, out of this process.
(2) maintain a list of backup master node, all the nodes in the list are likely to be elected as the master node, each node itself will detect and global conditions are met:
. A replication set to see whether there Majority online.
B. own priority is greater than 0.
c. itself is not arbiter.
d. opTime not fall behind in their 10s more than the latest node.
e. stored by the cluster program according to the latest information.
If all conditions are met, it will add itself to the list of primary backup node, otherwise, would remove itself from the list.

检测以下条件,若都满足,将主节点降为从节点(如果要降级的主节点是自身,直接调用降级方法,如果不为自身,调用replSetStepDown命令将复制集主节点降级为从节点:
    a. 集群中主节点存在。
    b. "主节点的备用列表”中存在比当前的主节点priority更高的节点。
    c. "主节点的备用列表”中priority最高的节点,其opTime要比其他所有节点最新的opTime落后10s以内。
    d. 检测自身是否为主,若为主,且自身无法看见复制集的Majority在线,将自身降级为从。
    e. 如果看不见集群中有主节点存在,检测自身是否在”主节点的备用列表”,若不在,打印log并退出此流程。
    f. 若自身在”主节点的备用列表”中,开始判断自身可否向复制集中发送选举自身为主节点的通知,判断过程包含:
         1> 自身是否可以看见复制集中的Majority在线。
         2>自身是否在”主节点的备用列表”。

If the condition is satisfied, set "itself has been in the electoral process" flag is true, and enter the "election itself master node" approach.
The method will verify that it satisfies the following conditions:
. A thread This thread got a lock.
B. This node is not configured or configurable options slaveDelay slaveDelay to 0.
c. This node is not configured to arbiter.
If met, the calling environment detection, if the following condition is triggered, not sent "by electing me as the master node" Votes:
1> The current time is less than the end of the freeze period steppedDown (for the time when the execution steppedDown + freeze set time, internal It calls for the 60s).
2> own opTime not all nodes to date.
3> if the node opTime than their new, exit this process.
If, like most other recent new node with its own, for every one of these nodes, random sleep for some time, after the judge continued.
a. himself on the line within 5 min and not all nodes in the replication set online.
b. If there is no other problem, try to get votes when their vote in this process will be conducted to determine whether they voted in the 30s, as carried out, exit the whole process.
All these complex after detection, we can finally send vote "elected me as the master node" to the replication set.
After transmission, the receive votes from all the nodes, if the number of votes equal to less than half their master node does not change, if more than half, provided their master.
After the vote, set "itself has been in the electoral process" flag to false.
We can see, some of the above decision logic is repeated judgment, but does not affect the final result may be related to more complex logic to determine a relationship, before every decision must verify that all the conditions are met, prevented from being conditional missing.
When copying a set of nodes receive an "elected me as the master node" sent by other nodes voting information, there will be the following judgment:
. A replication set itself if stored configuration version is too low, do not vote.
b. If replication initiation request sets node storing configuration version is too low, vote.
c. If the set does not replicate itself where the originating node to vote, vote against it.
D. centralized replication master node exists, vote.
When a node can participate in the elections in the presence of node priority than request-based, vote against it.
If all conditions by acquiring its own number of votes (it will also determine whether their 30s participated in the vote, if attended, no vote), the number of votes cast.
Needs to be said about that, it would oppose a final cut 10,000 votes, that in most cases, as long as there is opposition node, the node requests can not be a master.
The electoral process is very complicated, actually used are summarized in two points:
the need in general about 5s were the main election.
If the new elected master node immediately hang up, you need at least a 30s time to re-election primary.

Guess you like

Origin blog.51cto.com/8413723/2436423