Detailed zookeeper election leader

I. Introduction

  Learning the details Zookeeper front end of the service, for which the cluster starts, the important part is the Leader election, and then they began in-depth study Leader election.

Two, Leader election

  2.1 Leader Election Overview

  Leader election is to ensure the consistency of distributed data key. Zookeeper cluster when a server goes one of two things, you need to enter Leader election.

  (1) server initialization starts.

  Leader and can not remain connected during (2) the server is running.

  The following analysis of explanation on the two cases.

  1. Start the server during the election Leader

  If the election were Leader, you need at least two machines, select this server cluster consisting of three machines, for example. In a cluster initialization phase, when there is a server Server1 starts, alone and unable to complete the Leader elections, when the second server Server2 started, when the two machines can communicate with each other, each machine trying to find Leader, then enter Leader of the electoral process. Election process is as follows

  (1) Each Server issues a vote . Since the initial case, Server1 and Server2 will itself as the Leader server to vote, each ballot will contain the server recommended the myid and ZXID, use (myid, ZXID) to indicate at this time to vote Server1 is (1, 0), Server2 vote for (2, 0), then the vote will be sent to each cluster other machines.

  (2) to accept votes from each server . After each vote received server cluster, first determine the validity of the vote, such as checking whether the current round of voting, whether from the state LOOKING server.

  (3) the voting process . For each vote, the server will need others to vote and their vote PK, PK rules are as follows

    • Priority check ZXID . ZXID larger server priority as Leader.

    If ZXID same, then compare myid . myid larger servers as Leader server.

  For Server1, the vote which is (1, 0), receiving votes of Server2 is (2, 0), both first compares ZXID, are 0, then myid comparison, the case Server2 myid maximum, then update their vote (2, 0), and then re-vote for Server2, the need to update its own vote, voting information can be sent just once to the cluster again on all machines.

  (4) Statistical vote . After each vote, the server will vote count information to determine whether there has been more than half the voting machines received the same information for Server1, Server2, the statistics are two machines in the cluster have been accepted (2, 0) votes information, this time will be deemed to have elected the Leader.

  (5) changes the state of the server . Once the Leader, each server will update their status, if a Follower, then changed FOLLOWING, if a Leader, it was changed to LEADING.

  

  Leader election run time of 2 server

  During the run Zookeeper, with non-Leader Leader server perform their duties, even when there is a non-Leader server downtime or new entrants, this time it will not affect Leader, but once Leader server hung up, then the whole cluster of external services will be suspended, Leader enter a new round of elections, Leader of the electoral process and the process is basically the same start time. Suppose you are running there Server1, Server2, Server3 three servers, the current Leader is Server2, at a time when the Leader hung up, then began Leader election. Election process is as follows

  (1)  changes state . Leader after the hanging, the remaining non-Observer server will tell their server status changes LOOKING, then began to enter the Leader of the electoral process.

  (2)  Each Server issues a vote . During operation, ZXID may be different on each server, assuming the ZXID of Server1 is 123, the ZXID Server3 122; the first ballot, will cast their Server1 and Server3, vote generating (1, 123), (3, 122), and then sent to each cluster to all voting machines.

  (3)  to receive votes from each server . When start the same process.

  (4)  processing to vote . When you start the same process, this time, Server1 will become Leader.

  (5)  Statistical vote . When start the same process.

  (6)  to change the status of the server . When start the same process.

  2.2 Leader Election Algorithm Analysis

  Zookeeper version 3.4.0 after retaining only the TCP version of FastLeaderElection election algorithm. When a machine into the Leader elections, the current cluster may be in two states

    · Cluster already exists Leader.

    · Cluster there is no Leader.

  For a cluster already exists Leader concerned, such a situation is generally too late to start a machine, before they start, the cluster has been in normal operation, this situation, the machine attempts to elect Leader, will be told the current Leader of the information server for the machine, the machine only needs to establish a connection and Leader, and state synchronization can be. In the absence of a cluster will be relatively complex the Leader, the procedure is as follows

  (1)  on the first ballot . Whatever the cause of all machines were Leader elections, clusters are in trying to elect a Leader of the state, that state LOOKING, LOOKING machine sends a message to all other machines, the message is called to vote. Voting contains SID (unique identification server) and ZXID (transaction ID), (SID, ZXID) in the form of a vote to identify the information. 5 is assumed by the machines Zookeeper composition, respectively SID 1,2,3,4,5, ZXID 9,9,9,8,8 respectively, and at this time the machine 2 is a SID Leader machine, a time, 1,2 where machine failure, so the cluster starts Leader election. When the first ballot, each machine will be voted as himself, then the machine SID voting 3,4,5 respectively (3, 9), (4, 8), (5, 8).

  (2)  change vote . After each machine issue to vote, voting will also receive other machines, each machine will poll other machines to process received according to certain rules, and in order to decide whether to change your vote, this rule is the entire Leader the core algorithm where the election, the terms of which are described below

    Vote_sid · : SID Leader of the vote received by the server as recommended.

    Vote_zxid · : ZXID Leader of the vote received by the server as recommended.

    Self_sid · : The current server's own SID.

    Self_zxid · : The current server's own ZXID.

  Each vote received treatment, it is to (vote_sid, vote_zxid) and (self_sid, self_zxid) comparison process.

    Rule number one: If vote_zxid greater than self_zxid, vote on approval of the current received and sent out the vote again.

    Rule 2: If vote_zxid less than self_zxid, then stick to their vote, without any changes.

    Rule number three: If vote_zxid equal self_zxid, then compare the two SID, if vote_sid greater than self_sid, then the currently received approval vote, and the vote again sent out.

    Rule Four: If vote_zxid equal self_zxid, and vote_sid less than self_sid, then stick to their vote, without any changes.

  Combined with the above rules, given the following change process clusters.

  (3)  determining Leader . After a second round of voting, each machine in the cluster will vote again received other machines, then began to count the vote, if a voting machine has received more than half of the same, then the corresponding voting machine SID is the Leader. At this point Server3 will become Leader.

  From the above rule, the newer the usual data on that server (ZXID will be greater), it becomes the greater the likelihood of Leader, the more able to guarantee data recovery. If ZXID the same, the greater the chance SID.

  2.3 Leader Election implementation details

  1. Server Status

  The server has four states, namely, LOOKING, FOLLOWING, LEADING, OBSERVING.

  LOOKING : looking Leader status. When the server is in this state, it does not think the current cluster Leader, Leader is required to enter the state election.

  FOLLOWING : follower state. It indicates that the current server role is Follower.

  LEADING : leader state. It indicates that the current server role is a Leader.

  The observing : observer status. It indicates that the current server role is Observer.

  2. The voting data structure

  Each ballot contains two of the most basic information, the server recommended the SID and ZXID, vote (Vote) contained in the fields below Zookeeper

  the above mentioned id : SID was elected the Leader of.

  zxid : was elected Leader of the transaction ID.

  electionEpoch : logical clock, used to determine whether a plurality of vote in the same election cycle one, the value in the server is an auto-incremented sequence, each time into a new round of voting, this value will be incremented.

  peerEpoch : Epoch was elected the Leader of.

  State : The current state of the server.

  3. QuorumCnxManager: Network I / O

  Each server in the process of starting, will start a QuorumPeerManager, responsible for network communication underlying Leader election process among the servers in.

  (1)  message queue . Maintaining a series of internal QuorumCnxManager queue to hold received, the message to be transmitted and a transmitter of the message, in addition to a receive queue, the other queues are set in accordance with the SID packet queue is formed, in addition to itself as well as a cluster 3 machine, it will send a queue that three machines were created without disturbing each other.

    RecvQueue · : receiving a message queue for storing those messages received from other servers.

    QueueSendMap · : sending a message queue for storing messages to be transmitted that, grouped by SID.

    SenderWorkerMap · : transmitter sets, each message sender SenderWorker, Zookeeper corresponds to a remote server, responsible for sending the message, but also grouped by SID.

    LastMessageSent · : recently sent messages, leave a message recently sent for each SID.

  (2)  to establish a connection . In order to be able to vote each other, all the machines in the cluster Zookeeper twenty-two need to establish a network connection. QuorumCnxManager when it starts to create a ServerSocket to listen for communication ports Leader election (the default is 3888). After opening monitor, Zookeeper can continue to create a connection request is received from other servers, upon receiving the other servers of TCP connection requests will be processed. In order to avoid repeatedly create a TCP connection, Zookeeper SID allow only large servers and other machines take the initiative to establish a connection, or disconnect between the two machines. After receiving the create connection request, the server by comparing the SID value yourself and remote server to determine whether to accept the connection request, if the current server found more own SID, then terminates the current connection, and then on their own initiative to establish a connection and a remote server . Once the connection is established, to create the appropriate message sender and a message receiver RecvWorker SendWorker The SID remote server, and starts.

  (3)  message reception and transmission . Receiving a message : the message receiver is responsible RecvWorker, since each remote server is assigned a separate RecvWorker Zookeeper is, therefore, just need to keep each RecvWorker read this message from the TCP connection, and save it into the queue recvQueue . Message : Because each remote server is assigned a separate SendWorker Zookeeper is, therefore, only we need to constantly SendWorker each queue to obtain a message transmitted from the corresponding transmitted message, while the message into the lastMessageSent. In SendWorker, once Zookeeper discovery message sent for the current server queue is empty, then the time necessary to remove a message from lastMessageSent recently sent to the retransmission, which is to solve the recipient before receipt of the message or received message after the server being down, causing the message has not been processed correctly. Meanwhile, the recipient can ensure the Zookeeper processing the message, the message will repeat the proper treatment.

  4. FastLeaderElection: Election Algorithm Core

  External Voting : Vote especially sent to other servers.

  Internal vote : The current voting server itself.

  · Election rounds : Round Zookeeper server Leader elections, namely logicalclock.

  PK · : internal and external poll vote compared to determine whether you need to change the internal vote.

  (1) vote management

  Sendqueue · : vote send queue, save for the vote to be sent.

  Recvqueue · : ballot receive queue for storing external vote received.

  WorkerReceiver · : ballot receiver. Which will continue to access other servers election message sent from QuorumCnxManager and converts it into a ballot, then save the recvqueue, the ballot reception, if it finds that the external round ballot elections less than the current server, so ignore the external vote, while sending its own internal vote immediately.

  WorkerSender · : ballot transmitter, acquiring the vote to be constantly transmitted from the sendqueue, and passes it to the underlying QuorumCnxManager.

  (2) the core algorithm

  The figure shows how the FastLeaderElection module interacts with the underlying network I / O's. The basic process is as follows Leader election

  1.  increment election rounds . Zookeeper provisions of all valid votes must be in the same round in times at the beginning of a new round of voting will be first to logicalclock increment operator.

  2.  initialization ballot . Before starting a new round of voting, each server will initialize itself of the vote, and in the initialization phase, each server will be elected himself as Leader.

  3.  Send initialization votes . After initialization votes, the server will launch the first ballot. Zookeeper will be good just to initialize the ballot paper into sendqueue, by the transmitter WorkerSender responsible for sending out.

  4.  receive an external vote . Each server will continue to get votes from the outside recvqueue queue. If the server can not get to find any external voting, it will immediately confirm whether he and the other servers in the cluster to maintain a valid connection, if there is no connection, the connection is established immediately, if you have established a connection, then sends its current internal again vote.

  The  judge election rounds . After the vote after sending initialization, then began to deal with external voting. In dealing with external voting, it will be handled differently according to election rounds.

    · External voting rounds greater than the internal electoral vote . If the server's own election round vote behind the external server corresponds election rounds, it will update its electoral round (logicalclock) immediately and clear all voting has been received, and then use the vote to be initialized PK to determine whether to change the internal vote. Then inside the final vote sent.

    · External voting rounds of elections less than the internal cast votes . If the election rounds outside votes received by the server behind their electoral rounds, then Zookeeper will simply ignore the external vote, without any treatment, and return to Step 4.

    · External voting election rounds equal to the internal vote . At this point you can begin the votes PK.

  6.  ballot PK . During the vote PK, in line with any of the conditions you need to change the vote.

    • If external voting in the election rounds elected Leader server is greater than the internal vote, we need to change the vote.

    • If the same election rounds, then both ZXID contrast, if the external vote ZXID big, you need to change the vote.

    · If the agreement between the two ZXID, then both the SID contrast, if the external vote SID big, then you need to change the vote.

  7.  change vote . After a PK, to determine if a vote is better than external internal vote, then vote on the changes that the use of external voting ballot information to cover internal vote, after the change is complete, the vote will be sent out again after this internal change.

  8.  ballot archive . Whether or not the change in the voting, will be the share of external voting ballots collection just received into the archive recvset in. recvset for external voting record of all current server received during this round of Leader election (in accordance with the service team SID difference, such as {(1, vote1), ( 2, vote2) ...}).

  9.  Statistical vote . After completing the ballot archive, you can start voting statistics, statistics in order to vote whether statistical cluster has more than half of the current internal server recognized the vote, if it is determined there are already more than half the voting server endorsed, the vote is terminated. Otherwise, return to step 4.

  10.  Update server status . If it has been determined may terminate the vote, then start updating the state server, server to determine the current choice is approved by more than half of polling stations corresponding server if the server is the Leader himself, if his own, will own server updates the status LEADING, if not, be determined according to their specific circumstances or is FOLLOWING OBSERVING.

  More than 10 steps is the core FastLeaderElection, wherein the step 4-9 will go through several rounds of circulation until the Leader elected.

 

Guess you like

Origin www.cnblogs.com/myseries/p/11285832.html