ZooKeeper源码解析:Leader选举

Leader选举又称为master选举是zookeeper中最为经典的应用场景了。那为什么需要leader 选举呢。
ZooKeeper需要在所有的服务(可理解为服务器)中选举出一个Leader,然后让这个Leader来负责管理集群。此时,集群中的其他服务器则成了此Leader的follower。并且,当Leader出现故障的时候,ZooKeeper要能够快速地在Follower中选举出下一个Leader。这就是ZooKeeper的Leader机制,下面我们将简单介绍如何使用ZooKeeper实现Leader选举(Leader Election)。

此操作实现的核心思想是:首先创建一个EPHEMERAL的节点,例如"/election"。然后每一个ZooKeeper服务器在此目录下创建一个SEQUENCE|EPHEMERAL类型的节点,例如“/election/n_”。在SEQUENCE标志下,ZooKeeper将自动地为每一个ZooKeeper服务分配一个比前面所分配的序号要大的序号。此时创建节点ZooKeeper服务器中拥有最小编号的服务器将成为Leader。

在实际的操作中,还需要保证:当Leader服务器发生故障的时候,系统能够快速地选出下一个ZooKeeper服务器作为Leader。一个简单的方案是,让所有的Follower监视leader所对应的节点。当Leader发生故障时,Leader所对应的临时节点会被自动删除,此操作将会触发所有监视Leader的服务器的watch。这样这些服务器就会收到Leader故障的消息,进而进行下一次的Leader选举操作。但是,这种操作将会导致“从众效应”的发生,尤其是当集群中服务器众多并且宽带延迟比较大的时候更为明显。在ZooKeeper中,为了避免从众效应的发生,它是这样来实现的:每一个Follower为Follower集群中对应着比自己节点序号小的节点中x序号最大的节点设置一个watch。只有当Followers所设置的watch被触发时,它才惊醒Leader选举操作,一般情况下它将成为集群中的下一个Leader。很明显,此Leader选举操作的速度是很快的。因为每一次Leader选举几乎只涉及单个Follower的操作。
下面我们看下源码是怎么实现的 org.apache.zookeeper.recipes.leader.LeaderElectionSupport
具体的实现逻辑在这个类中 。首先有一个start 方法 我们来看下 在这个方法中
可以看到 首先调用了 makeOffer();然后是 determineElectionStatus();

 /**
     *选举的开始方法
     */
    public synchronized void start() {
        state = State.START;
        // 广播选举开始
        dispatchEvent(EventType.START);

        LOG.info("Starting leader election support");

        if (zooKeeper == null) {
            throw new IllegalStateException(
                "No instance of zookeeper provided. Hint: use setZooKeeper()");
        }

        if (hostName == null) {
            throw new IllegalStateException(
                "No hostname provided. Hint: use setHostName()");
        }

        try {
            makeOffer();
            determineElectionStatus();
        } catch (KeeperException | InterruptedException e) {
            becomeFailed(e);
        }
    }

我们一起来看下 makeOffer()方法,这个方法主要就是创建临时节点


    /**
     * 真正开始选举的方法 在root 目录下创建节点
     * @throws KeeperException
     * @throws InterruptedException
     */
    private void makeOffer() throws KeeperException, InterruptedException {
        state = State.OFFER;
        dispatchEvent(EventType.OFFER_START);

        LeaderOffer newLeaderOffer = new LeaderOffer();
        byte[] hostnameBytes;
        synchronized (this) {
            newLeaderOffer.setHostName(hostName);
            hostnameBytes = hostName.getBytes();
            newLeaderOffer.setNodePath(zooKeeper.create(rootNodeName + "/" + "n_",
                                                        hostnameBytes, ZooDefs.Ids.OPEN_ACL_UNSAFE,
                                                        // 零时节点
                                                        CreateMode.EPHEMERAL_SEQUENTIAL));
            leaderOffer = newLeaderOffer;
        }
        LOG.debug("Created leader offer {}", leaderOffer);

        dispatchEvent(EventType.OFFER_COMPLETE);
    }

然后就是 determineElectionStatus() 这个方法获取文件列表下面所有的文件最小的那个设置为leader 其他的节点添加对上一个的监听


    /**
     * 
     * 选出最小序号的文件 对应的机器就是leader
     * @throws KeeperException
     * @throws InterruptedException
     */
    private void determineElectionStatus() throws KeeperException, InterruptedException {

        state = State.DETERMINE;
        dispatchEvent(EventType.DETERMINE_START);

        LeaderOffer currentLeaderOffer = getLeaderOffer();

        String[] components = currentLeaderOffer.getNodePath().split("/");

        currentLeaderOffer.setId(Integer.valueOf(components[components.length - 1].substring("n_".length())));

        List<LeaderOffer> leaderOffers = toLeaderOffers(zooKeeper.getChildren(rootNodeName, false));

        /*
         * For each leader offer, find out where we fit in. If we're first, we
         * become the leader. If we're not elected the leader, attempt to stat the
         * offer just less than us. If they exist, watch for their failure, but if
         * they don't, become the leader.
         */
        for (int i = 0; i < leaderOffers.size(); i++) {
            LeaderOffer leaderOffer = leaderOffers.get(i);

            if (leaderOffer.getId().equals(currentLeaderOffer.getId())) {
                LOG.debug("There are {} leader offers. I am {} in line.", leaderOffers.size(), i);

                dispatchEvent(EventType.DETERMINE_COMPLETE);

                if (i == 0) {
                    // 最小的那个变成leader
                    becomeLeader();
                } else {
                    // 其他的是非leader
                    becomeReady(leaderOffers.get(i - 1));
                }

                /* Once we've figured out where we are, we're done. */
                break;
            }
        }
    }

如果没有成为leader 的节点监听上一个节点 如果上一个节点故障了 则重新执行上面的方法


    private void becomeReady(LeaderOffer neighborLeaderOffer)
        throws KeeperException, InterruptedException {

        LOG.info(
            "{} not elected leader. Watching node: {}",
            getLeaderOffer().getNodePath(),
            neighborLeaderOffer.getNodePath());

        /*
         * Make sure to pass an explicit Watcher because we could be sharing this
         * zooKeeper instance with someone else.
         */
        /**
         *
         * 进行watch,监视上一个节点 如果上一个节点删除了 就重新掉用determineElectionStatus
         */
        Stat stat = zooKeeper.exists(neighborLeaderOffer.getNodePath(), this);

        if (stat != null) {
            dispatchEvent(EventType.READY_START);
            LOG.debug(
                "We're behind {} in line and they're alive. Keeping an eye on them.",
                neighborLeaderOffer.getNodePath());
            state = State.READY;
            dispatchEvent(EventType.READY_COMPLETE);
        } else {
            /*
             * If the stat fails, the node has gone missing between the call to
             * getChildren() and exists(). We need to try and become the leader.
             */
            LOG.info(
                "We were behind {} but it looks like they died. Back to determination.",
                neighborLeaderOffer.getNodePath());
            determineElectionStatus();
        }

    }

更多的注释可以看这里
https://github.com/haha174/zookeeper/commit/1174717483578074654bbc6a8a1e4744b9c255a9

发布了213 篇原创文章 · 获赞 35 · 访问量 85万+

猜你喜欢

转载自blog.csdn.net/u012957549/article/details/105009297
今日推荐