Redis Cluster cluster failover

1. Troubleshooting

    1.1 Subjective offline, Redis clusters communicate with each other through Gossip ping and pong messages. For example, node A sends ping to node B. If it fails within the cluster-node-timeout time, node A will think that B is subjectively offline. At the same time broadcast this state information in the cluster

    1.2 Objective offline, when more than half of the master nodes holding slots mark B as subjective offline, the objective offline process is triggered.

          1.2.1 Notify all nodes in the cluster, mark B as objective offline and take effect immediately

          1.2.2 Notify the slave node of the failed node to trigger a failover

2. Failure recovery

    After objectively offline, if the faulty node is the master node holding the slot, it needs to be replaced by one of its slave nodes to ensure high availability of the cluster. When the slave node finds that the master node is objectively offline through internal scheduled tasks, the fault recovery process will be triggered.

    2.1 Qualification check, the slave node checks the last disconnection time from the master node, if it exceeds a certain time (configuration parameters), it is not eligible.

    2.2 When preparing for the election, the slave node with the smallest delay will initiate the election first.

    2.3 Initiating an election

          2.3.1 Update the configuration epoch (clusterNode.configEpoch), each master node maintains a configuration epoch, the configuration epoch of the master node is different, the slave node copies the configuration epoch of the master node, and the entire cluster is maintained with the global configuration epoch for recording The largest version of all. Every time the slave node votes, the global configuration epoch will be automatically incremented and saved separately, which is used to indicate the version that it initiated the election.

          2.3.2 Send a message to the cluster to ensure that the slave node only sends a message once in a configuration epoch

    2.4 Voting in elections

          Only the master node that holds the slot will handle the failure election. The voting process is the leader election process. In each configuration epoch, each node has only one vote, so only one slave node can obtain N/2+1 votes. When the node is not elected after a certain period of time, the election will be invalidated and the next round of election will begin.

    2.5 Replacing the primary node

           When the slave node obtains enough tickets, it triggers the replacement of the master node, turns the current node into the master node, delegates the slot responsible for the master node to itself, and then broadcasts its own pong message to notify all nodes that it has become the master node.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326097705&siteId=291194637