redis cluster cluster architecture Comments (13) - Communication failure

5.6. Communication failure

5.6.1. Fault detection

集群中的每个节点都会定期地向集群中的其他节点发送PING消息,以此交换各个节点状态信息,检测各个节点状态:在线状态、疑似下线状态PFAIL、已下线状态FAIL。

Here Insert Picture Description

When the master node A via a message that the master node B that has entered the master node D suspected offline (PFAIL) state,

A master node of the master node will find clusterNode structure corresponding to D in their clusterState.nodes dictionary,

Offline report and the master node B (failure report) is added to the list clusterNode structure fail_reports

struct clusterNode {

//...

//记录所有其他节点对该节点的下线报告

list*fail_reports;

//...

};

Each report has a clusterNodeFailReport downline structure:

struct clusterNodeFailReport{

//报告目标节点已经下线的节点
structclusterNode *node;

//最后一次从node节点收到下线报告的时间
mstime_ttime;

}typedef clusterNodeFailReport;

If the cluster which, more than half of all primary master node to node D report suspected offline, then the master node D will be marked as offline (FAIL) state, the master node D labeled nodes to the cluster will be off the assembly line FAIL message broadcast master node D,

All the nodes receive the message FAIL nodes which are immediately updated master node D status is marked as offline.

The node is marked as FAIL need to meet the following two conditions:

1. More than half of the master node node labeled PFAIL state.

2. The node of the current node is also labeled PFAIL state.

5.6.2. Select master node

Is achieved from a plurality of selected nodes from the master node, the new master election process based Raft election protocol.

1, when they found themselves to be the master node from the node state is offline, a node broadcasts from

CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST message asking all receive this message, and have the right to vote in this poll from the master node to node.

2, if a primary node has the right to vote, and vote for the master node has not been from the other node, the master node will be asked to vote in a return, CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK message from the node, indicating that the master node support from node becomes the new primary node.

3, each participant receives CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK election message from the node will, and according to how many you have received this message statistics that they have gained the support of much of the master node.

4, if there are N cluster master node has the right to vote, then when collected from a cluster node greater than or equal to N / 2 + when one votes, the slave node becomes the new master node.

5, if a, then cluster into a new era of configuration node does not collect enough votes from the support, and the primary election again until the election of a new master node.

5.6.3. Failover

When the node finds his master node becomes a logoff (FAIL) state, then tried to enter Failover, in order to become the new master.

The following are the steps failover:

1, off the assembly line from the master node to all the nodes selected from a slave node;

2, the selected command is executed SLAVEOF NO NOE from the node, becomes the new master node;

3, the new master node will revoke all slots assigned to the master node has been offline, and all assigned to the grooves themselves;

4, the new primary node of the cluster PONG broadcast messages to inform other node has become the new master node;

5, the master node starts receiving the new request and the associated treatment tank.

Published 155 original articles · won praise 23 · views 110 000 +

Guess you like

Origin blog.csdn.net/makyan/article/details/104798780
Recommended