(五)zookeeper的Leader选举之QuorumPeer

  • QuorumPeer
    这个类就是zookeeper的Leader选举的启动类,负责创建选举算法,zk数据恢复,启动leader选举等
  • zookeeper的服务器状态:
    public enum ServerState {
        LOOKING, FOLLOWING, LEADING, OBSERVING;
    }

1.LOOKING状态:这个状态表示当前服务器还未选举出Leader,只有在位于该状态时才会重新进行Leader选举
2.FOLLOWING状态:这个状态表示当前服务器的角色是Follower
3.LEADING状态:这个状态表示当前服务器的角色是Leader
4.OBSERVING状态:这个状态表示当前服务器的角色是Observer

关于Leader选举的run():
    @Override
    public void run() {
        updateThreadName();

        LOG.debug("Starting quorum peer");
        try {
            jmxQuorumBean = new QuorumBean(this);
            MBeanRegistry.getInstance().register(jmxQuorumBean, null);
            for(QuorumServer s: getView().values()){
                ZKMBeanInfo p;
                if (getId() == s.id) {
                    p = jmxLocalPeerBean = new LocalPeerBean(this);
                    try {
                        MBeanRegistry.getInstance().register(p, jmxQuorumBean);
                    } catch (Exception e) {
                        LOG.warn("Failed to register with JMX", e);
                        jmxLocalPeerBean = null;
                    }
                } else {
                    RemotePeerBean rBean = new RemotePeerBean(this, s);
                    try {
                        MBeanRegistry.getInstance().register(rBean, jmxQuorumBean);
                        jmxRemotePeerBean.put(s.id, rBean);
                    } catch (Exception e) {
                        LOG.warn("Failed to register with JMX", e);
                    }
                }
            }
        } catch (Exception e) {
            LOG.warn("Failed to register with JMX", e);
            jmxQuorumBean = null;
        }

        try {
            /*
             * Main loop
             */
            while (running) {
                switch (getPeerState()) {
                case LOOKING:
                    LOG.info("LOOKING");
                    ServerMetrics.getMetrics().LOOKING_COUNT.add(1);

                    if (Boolean.getBoolean("readonlymode.enabled")) {
                        LOG.info("Attempting to start ReadOnlyZooKeeperServer");

                        // Create read-only server but don't start it immediately
                        final ReadOnlyZooKeeperServer roZk =
                            new ReadOnlyZooKeeperServer(logFactory, this, this.zkDb);
    
                        // Instead of starting roZk immediately, wait some grace
                        // period before we decide we're partitioned.
                        //
                        // Thread is used here because otherwise it would require
                        // changes in each of election strategy classes which is
                        // unnecessary code coupling.
                        Thread roZkMgr = new Thread() {
                            public void run() {
                                try {
                                    // lower-bound grace period to 2 secs
                                    sleep(Math.max(2000, tickTime));
                                    if (ServerState.LOOKING.equals(getPeerState())) {
                                        roZk.startup();
                                    }
                                } catch (InterruptedException e) {
                                    LOG.info("Interrupted while attempting to start ReadOnlyZooKeeperServer, not started");
                                } catch (Exception e) {
                                    LOG.error("FAILED to start ReadOnlyZooKeeperServer", e);
                                }
                            }
                        };
                        try {
                            roZkMgr.start();
                            reconfigFlagClear();
                            if (shuttingDownLE) {
                                shuttingDownLE = false;
                                startLeaderElection();
                            }
                            setCurrentVote(makeLEStrategy().lookForLeader());
                        } catch (Exception e) {
                            LOG.warn("Unexpected exception", e);
                            setPeerState(ServerState.LOOKING);
                        } finally {
                            // If the thread is in the the grace period, interrupt
                            // to come out of waiting.
                            roZkMgr.interrupt();
                            roZk.shutdown();
                        }
                    } else {
                        try {
                           reconfigFlagClear();
                            if (shuttingDownLE) {
                               shuttingDownLE = false;
                               startLeaderElection();
                               }
                            setCurrentVote(makeLEStrategy().lookForLeader());
                        } catch (Exception e) {
                            LOG.warn("Unexpected exception", e);
                            setPeerState(ServerState.LOOKING);
                        }                        
                    }
                    break;
                case OBSERVING:
                    try {
                        LOG.info("OBSERVING");
                        setObserver(makeObserver(logFactory));
                        observer.observeLeader();
                    } catch (Exception e) {
                        LOG.warn("Unexpected exception",e );
                    } finally {
                        observer.shutdown();
                        setObserver(null);
                        updateServerState();

                        // Add delay jitter before we switch to LOOKING
                        // state to reduce the load of ObserverMaster
                        if (isRunning()) {
                            Observer.waitForObserverElectionDelay();
                        }
                    }
                    break;
                case FOLLOWING:
                    try {
                       LOG.info("FOLLOWING");
                        setFollower(makeFollower(logFactory));
                        follower.followLeader();
                    } catch (Exception e) {
                       LOG.warn("Unexpected exception",e);
                    } finally {
                       follower.shutdown();
                       setFollower(null);
                       updateServerState();
                    }
                    break;
                case LEADING:
                    LOG.info("LEADING");
                    try {
                        setLeader(makeLeader(logFactory));
                        leader.lead();
                        setLeader(null);
                    } catch (Exception e) {
                        LOG.warn("Unexpected exception",e);
                    } finally {
                        if (leader != null) {
                            leader.shutdown("Forcing shutdown");
                            setLeader(null);
                        }
                        updateServerState();
                    }
                    break;
                }
                start_fle = Time.currentElapsedTime();
            }
        } finally {
            LOG.warn("QuorumPeer main thread exited");
            MBeanRegistry instance = MBeanRegistry.getInstance();
            instance.unregister(jmxQuorumBean);
            instance.unregister(jmxLocalPeerBean);

            for (RemotePeerBean remotePeerBean : jmxRemotePeerBean.values()) {
                instance.unregister(remotePeerBean);
            }

            jmxQuorumBean = null;
            jmxLocalPeerBean = null;
            jmxRemotePeerBean = null;
        }
    }

现在详细描述一下这个方法的步骤:
1.注册Leader选举相关的JMX服务
2.循环判断服务器状态

  • LOOKING状态:
    1.进行数据统计
    2.判断是否开启服务器只读模式
    3.如果开启了只读模式,那么初始化一个只读的zookeeper服务
    4.启动一个线程,进行选举等待,等待时间在2000和tickTime之间选一个大的,如果等待结束还没选举出leader,那么启动此时只读模式的zookeeper服务以便对外提供只读服务
    5.如果Leader选举成功,保存选举出来的投票,中断上述线程并结束已启动的只读服务
    6.如果没有开启只读模式,那么直接进行Leader选举,并保存选举出来的投票
  • OBSERVING状态:
    1.说明Leader已经选举出来了,当前角色是Observer
    2.创建观察者Observer实例,并缓存这个实例
    3.通过Observer实例调用observeLeader()方法
  • FOLLOWING状态:
    1.说明Leader已经选举出来了,当前角色是Follower
    2.创建跟随者Follower实例,并缓存这个实例
    3.通过Follower实例调用followLeader()方法
  • LEADING状态:
    1.说明Leader已经选举出来了,当前角色就是Leader
    2.创建领导者Leader实例,并缓存这个实例
    3.通过Leader实例调用lead()方法展开leader工作

3.如果与Leader之间连接断开,会停止当前的服务并再次调整当前服务器状态为LOOKING,有可能会进行新一轮的Leader选举,或者只是网络闪断,重新接收到消息后继续作为Follower对外提供服务

投票验证器的设置

投票验证器有以下两种情况会进行设置:
1.zookeeper服务启动的时候
2.调用reconfig命令重新加载配置文件并启动服务的时候

  • setQuorumVerifier(QuorumVerifier qv, boolean writeToDisk)
    public QuorumVerifier setQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
        synchronized (QV_LOCK) {
            if ((quorumVerifier != null) && (quorumVerifier.getVersion() >= qv.getVersion())) {
                // this is normal. For example - server found out about new config through FastLeaderElection gossiping
                // and then got the same config in UPTODATE message so its already known
                LOG.debug(getId() + " setQuorumVerifier called with known or old config " + qv.getVersion() +
                        ". Current version: " + quorumVerifier.getVersion());
                return quorumVerifier;
            }
            QuorumVerifier prevQV = quorumVerifier;
            quorumVerifier = qv;
            if (lastSeenQuorumVerifier == null || (qv.getVersion() > lastSeenQuorumVerifier.getVersion()))
                lastSeenQuorumVerifier = qv;

            if (writeToDisk) {
                // some tests initialize QuorumPeer without a static config file
                if (configFilename != null) {
                    try {
                        String dynamicConfigFilename = makeDynamicConfigFilename(
                                qv.getVersion());
                        QuorumPeerConfig.writeDynamicConfig(
                                dynamicConfigFilename, qv, false);
                        QuorumPeerConfig.editStaticConfig(configFilename,
                                dynamicConfigFilename,
                                needEraseClientInfoFromStaticConfig());
                    } catch (IOException e) {
                        LOG.error("Error closing file: ", e.getMessage());
                    }
                } else {
                    LOG.info("writeToDisk == true but configFilename == null");
                }
            }

            if (qv.getVersion() == lastSeenQuorumVerifier.getVersion()) {
                QuorumPeerConfig.deleteFile(getNextDynamicConfigFilename());
            }
            QuorumServer qs = qv.getAllMembers().get(getId());
            if (qs != null) {
                setAddrs(qs.addr, qs.electionAddr, qs.clientAddr);
            }
            updateObserverMasterList();
            return prevQV;
        }
    }

    private void updateObserverMasterList() {
        if (observerMasterPort <= 0) {
            return; // observer masters not enabled
        }
        observerMasters.clear();
        StringBuilder sb = new StringBuilder();
        for (QuorumServer server : quorumVerifier.getVotingMembers().values()) {
            InetSocketAddress addr = new InetSocketAddress(server.addr.getAddress(), observerMasterPort);
            observerMasters.add(new QuorumServer(server.id, addr));
            sb.append(addr).append(",");
        }
        LOG.info("Updated learner master list to be {}", sb.toString());
        Collections.shuffle(observerMasters);
        // Reset the internal index of the observerMaster when
        // the observerMaster List is refreshed
        nextObserverMaster = 0;
    }

方法解析如下:
1.阻塞获取QV_LOCK这个对象锁
2.判断当前缓存的quorumVerifier对象是否存在以及对比版本,符合条件则重新设置quorumVerifier对象
3.判断当前缓存的lastSeenQuorumVerifier对象是否存在或对比版本,符合条件则重新设置lastSeenQuorumVerifier对象
4.根据参数writeToDisk来决定是否写入磁盘
5.重新设置对外服务地址、选举地址、Leader与Follower之间交互地址
6.刷新observerMaster列表并重置observerMaster的内部索引

  • setQuorumVerifier(QuorumVerifier qv, boolean writeToDisk)
    public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
        // If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
        // and then take QV_LOCK.  Take the locks in the same order to ensure that we don't
        // deadlock against other callers of connectOne().  If qcmRef gets set in another
        // thread while we're inside the synchronized block, that does no harm; if we didn't
        // take a lock on qcm (because it was null when we sampled it), we won't call
        // connectOne() on it.  (Use of an AtomicReference is enough to guarantee visibility
        // of updates that provably happen in another thread before entering this method.)
        QuorumCnxManager qcm = qcmRef.get();
        Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
        synchronized (outerLockObject) {
            synchronized (QV_LOCK) {
                if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
                    LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
                            ". Current version: " + quorumVerifier.getVersion());
                }
                // assuming that a version uniquely identifies a configuration, so if
                // version is the same, nothing to do here.
                if (lastSeenQuorumVerifier != null &&
                        lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
                    return;
                }
                lastSeenQuorumVerifier = qv;
                if (qcm != null) {
                    connectNewPeers(qcm);
                }

                if (writeToDisk) {
                    try {
                        String fileName = getNextDynamicConfigFilename();
                        if (fileName != null) {
                            QuorumPeerConfig.writeDynamicConfig(fileName, qv, true);
                        }
                    } catch (IOException e) {
                        LOG.error("Error writing next dynamic config file to disk: ", e.getMessage());
                    }
                }
            }
        }
    }

    private void connectNewPeers(QuorumCnxManager qcm){
        if (quorumVerifier != null && lastSeenQuorumVerifier != null) {
            Map<Long, QuorumServer> committedView = quorumVerifier.getAllMembers();
            for (Entry<Long, QuorumServer> e : lastSeenQuorumVerifier.getAllMembers().entrySet()) {
                if (e.getKey() != getId() && !committedView.containsKey(e.getKey()))
                    qcm.connectOne(e.getKey());
            }
        }
    }

分析一下上述步骤:
1.如果缓存了QuorumCnxManager对象qcm,它将获取qcm上的锁,然后获取QV_LOCK,这个设计是因为接下来的调用QuorumCnxManager的connectOne方法是会锁定这个qcm对象的,保持同样的锁访问顺序可以避免不同线程之间死锁
2.判断lastSeenQuorumVerifier对象是否存在以及版本
3.如果缓存了QuorumCnxManager对象qcm,跟新的服务器建立连接
4.根据writeToDisk变量决定是否写入磁盘

以上基本上就是QuorumPeer这个类的核心方法了,zookeeper的reconfig会在以后专门用一篇文章讲解

猜你喜欢

转载自blog.csdn.net/long9870/article/details/93737245
今日推荐