应用zk实现集群选主

在实际使用ZooKeeper开发中，我们最常用的是Apache Curator。在使用ZK API开发时会遇到让人头疼的几个问题，ZK连接管理、SESSION失效等一些异常问题的处理，Curator替我们解决了这些问题，通过对ZK连接状态的监控来做出相应的重连等操作，并触发事件。
更好的地方是Curator对ZK的一些应用场景提供了非常好的实现，而且有很多扩充，这些都符合ZK使用规范。
它的主要组件为：

Recipes， ZooKeeper的系列recipe实现, 基于 Curator Framework.
Framework，封装了大量ZooKeeper常用API操作，降低了使用难度, 基于Zookeeper增加了一些新特性，对ZooKeeper链接的管理，对链接丢失自动重新链接。
Utilities，一些ZooKeeper操作的工具类包括ZK的集群测试工具路径生成等非常有用，在Curator-Client包下org.apache.curator.utils。
Client，ZooKeeper的客户端API封装，替代官方 ZooKeeper class，解决了一些繁琐低级的处理，提供一些工具类。
Errors，异常处理, 连接异常等
Extensions，对curator-recipes的扩展实现，拆分为curator-:stuck_out_tongue_closed_eyes:iscovery和curator-:stuck_out_tongue_closed_eyes:iscovery-server提供基于RESTful的Recipes WEB服务.

Recipe 词典的意思是食谱,配方,美食菜谱,烹饪法，延伸用法：某项计划或步骤来取得预先给定的结果。在计算机领域没有合适的汉语对应，如果把ZooKeeper看成菜的话，recipe就相当于菜谱，比如麻婆豆腐，宫保鸡丁。

除了ZK 的"Two-phased Commit"的recipe外， Curator提供了全部的ZK的recipe，而且分类更详细。这篇文章将会以实例的方式介绍这些Recipe。一旦你领会了这些Recipe,就可以在项目中很好的使用ZooKeeper的强大威力。

在分布式计算中， leader election是很重要的一个功能，这个选举过程是这样子的：指派一个进程作为组织者，将任务分发给各节点。在任务开始前，哪个节点都不知道谁是leader或者coordinator. 当选举算法开始执行后，每个节点最终会得到一个唯一的节点作为任务leader.
除此之外，选举还经常会发生在leader意外宕机的情况下，新的leader要被选举出来。

Curator 有两种选举recipe，你可以根据你的需求选择合适的。

Leader latch

首先我们看一个使用LeaderLatch类来选举的例子。
它的构造函数如下：

public LeaderLatch(CuratorFramework client, String latchPath)

public LeaderLatch(CuratorFramework client, String latchPath, String id)

必须启动LeaderLatch: leaderLatch.start();
一旦启动， LeaderLatch会和其它使用相同latch path的其它LeaderLatch交涉，然后随机的选择其中一个作为leader。你可以随时查看一个给定的实例是否是leader:

	
public boolean hasLeadership()

类似JDK的CountDownLatch， LeaderLatch在请求成为leadership时有block方法：

public void await() throws InterruptedException,EOFException

Causes the current thread to wait until this instance acquires leadership

unless the thread is interrupted or closed.

public boolean await(long timeout,TimeUnit unit)

throws InterruptedException

一旦不使用LeaderLatch了，必须调用close方法。如果它是leader,会释放leadership，其它的参与者将会选举一个leader。

异常处理
LeaderLatch实例可以增加ConnectionStateListener来监听网络连接问题。当 SUSPENDED 或 LOST 时, leader不再认为自己还是leader.当LOST 连接重连后 RECONNECTED,LeaderLatch会删除先前的ZNode然后重新创建一个.
LeaderLatch用户必须考虑导致leadershi丢失的连接问题。强烈推荐你使用ConnectionStateListener。

例子：

package com.colobu.zkrecipe.leaderelection;

import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStreamReader;

import java.util.List;

import java.util.concurrent.TimeUnit;

import org.apache.curator.framework.CuratorFramework;

import org.apache.curator.framework.CuratorFrameworkFactory;

import org.apache.curator.framework.recipes.leader.LeaderLatch;

import org.apache.curator.retry.ExponentialBackoffRetry;

import org.apache.curator.test.TestingServer;

import org.apache.curator.utils.CloseableUtils;

import com.google.common.collect.Lists;

public class LeaderLatchExample {

private static final int CLIENT_QTY = 10;

private static final String PATH = "/examples/leader";

public static void main(String[] args) throws Exception {

List<CuratorFramework> clients = Lists.newArrayList();

List<LeaderLatch> examples = Lists.newArrayList();

TestingServer server = new TestingServer();

try {

for (int i = 0; i < CLIENT_QTY; ++i) {

CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), new ExponentialBackoffRetry(1000, 3));

clients.add(client);

LeaderLatch example = new LeaderLatch(client, PATH, "Client #" + i);

examples.add(example);

client.start();

example.start();

}

Thread.sleep(20000);

LeaderLatch currentLeader = null;

for (int i = 0; i < CLIENT_QTY; ++i) {

LeaderLatch example = examples.get(i);

if (example.hasLeadership())

currentLeader = example;

}

System.out.println("current leader is " + currentLeader.getId());

System.out.println("release the leader " + currentLeader.getId());

currentLeader.close();

examples.get(0).await(2, TimeUnit.SECONDS);

System.out.println("Client #0 maybe is elected as the leader or not although it want to be");

System.out.println("the new leader is " + examples.get(0).getLeader().getId());

System.out.println("Press enter/return to quit\n");

new BufferedReader(new InputStreamReader(System.in)).readLine();

} catch (Exception e) {

e.printStackTrace();

} finally {

System.out.println("Shutting down...");

for (LeaderLatch exampleClient : examples) {

CloseableUtils.closeQuietly(exampleClient);

}

for (CuratorFramework client : clients) {

CloseableUtils.closeQuietly(client);

}

CloseableUtils.closeQuietly(server);

}

}

}

首先我们创建了10个LeaderLatch，启动后它们中的一个会被选举为leader。因为选举会花费一些时间，start后并不能马上就得到leader。
通过hasLeadership查看自己是否是leader，如果是的话返回true。
可以通过.getLeader().getId()可以得到当前的leader的ID。
只能通过close释放当前的领导权。
await是一个阻塞方法，尝试获取leader地位，但是未必能上位。

車輪の唄

发布了544 篇原创文章 · 获赞 633 · 访问量 116万+

他的留言板关注

应用zk实现集群选主

Leader latch

猜你喜欢