ZK in action笔记一

第2章

简介

ZK提供了一些简单的操作原语，对于具体的案例实现，需要自己实现，比如分布式锁。这里的案例称为recipe

ZK的数据以tree来展示，每个节点成为znode

ZK提供的操作原语：

create /path data
Creates a znode named with /path and containing data

delete /path
Deletes the znode /path

exists /path
Checks whether /path exists

setData /path data
Sets the data of znode /path to data

getData /path
Returns the data in /path

getChildren /path
Returns the list of children under /path

后续会关注对应的watcher

znode的类型

1. 持久

只能通过delete api去删除

2. 临时

session过期或者关闭，临时节点会删除。

临时节点只能是叶子节点。

[zk: localhost:2181(CONNECTED) 0] create -e /master ""
Created /master
[zk: localhost:2181(CONNECTED) 1] create -e /master/hahah ""
Ephemerals cannot have children: /master/hahah

3. 持久+顺序

4. 临时+顺序

Watch 和 Notification

轮询的坏处：

1. 延迟性不好控制

2. 服务端的压力

客户端设置watch，在watch对象满足条件时，客户端可以接收到notification，而一次watch只能接受到一次notification，如果要继续监控，需要重新设置wath。（one shot operation）

因为one shot的原因，客户端接到一次notification，到再次watch之间，可能zk上再次发生了节点变化，这时候，客户端是不会感知到的。

除非客户端重新发起操作，比如getData，才能获取到新的数据。模式变为：

notification code：

case OK：

String data = getData("/master", true); // sets the watch

-- 此处被notified，但是在这个过程中，可能/master节点又发生了变化，会丢失变化通知，但是数据不会丢失，因为我们会调用getData获取数据

画了示意图：

这里提到一点： ZK会保证一点， notification如果触发，会先通知给客户端，然后才会进行同一个节点后续的更改，即使2次更改是连续的

另外，在重连的情况下，watch是会被重新携带到新的zk service的，此时如果 zk servic发生过相应的改变，则立刻触发notification；如果未变化，则自动注册上watch进行监听。

通过代码可以看到，在重连时，会携带客户端的watch 到zk service：

if (!disableAutoWatchReset) {
    List<String> dataWatches = zooKeeper.getDataWatches();
    List<String> existWatches = zooKeeper.getExistWatches();
    List<String> childWatches = zooKeeper.getChildWatches();
    if (!dataWatches.isEmpty()
                || !existWatches.isEmpty() || !childWatches.isEmpty()) {
        SetWatches sw = new SetWatches(lastZxid,
                prependChroot(dataWatches),
                prependChroot(existWatches),
                prependChroot(childWatches));
        RequestHeader h = new RequestHeader();
        h.setType(ZooDefs.OpCode.setWatches);
        h.setXid(-8);
        Packet packet = new Packet(h, new ReplyHeader(), sw, null, null);
        outgoingQueue.addFirst(packet);
    }
}

Version

每个节点都有version的属性，更新都会提高版本。

ZK的API的操作，都有基于version的条件操作，这可以作为一种并发数据的修改方法。

ZK Quorums

先说ZK集群

假设ZK集群中有5台机器

1. 客户端发送一个数据写请求（包括create、delete和setData）等，什么时候应该返回给客户端，ZK集群已经存储成功？

如果等ZK集群的5台所有机器都存储成功，会导致客户端等待的时间过长。这里只要ZK Quorums的机器存储成功，即可返回。

2. ZK Quorums是什么

姑且就认为是为了解决脑裂问题引入的吧，防止数据不丢失，为集群能正常工作的最小ZK数量，一般为 N/2+1，其中N为集群数

Session演示

其实与一般的会话建立是一样的过程：

1. 建立TCP连接

2. 建立会话

如果会话成功，客户端会收到 SyncConnected的通知

zkCli.sh:

2016-01-06 18:16:08,263 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@67ad77a7
Welcome to ZooKeeper!
JLine support is enabled
2016-01-06 18:16:08,348 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
2016-01-06 18:16:08,379 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@852] - Socket connection established to localhost/127.0.0.1:2181, initiating session
2016-01-06 18:16:08,401 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x152172efbb90002, negotiated timeout = 20000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

zkServer.sh:

2016-01-06 18:16:08,369 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:42870
2016-01-06 18:16:08,388 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /127.0.0.1:42870
2016-01-06 18:16:08,395 [myid:] - INFO [SyncThread:0:ZooKeeperServer@617] - Established session 0x152172efbb90002 with negotiated timeout 20000 for client /127.0.0.1:428

Session的生命周期

先说明造成生命周期转换的几个事件：

这里截取zk client lib的代码注释：

/**
* Enumeration of states the ZooKeeper may be at the event
*/
public enum KeeperState

/** The client is in the connected state - it is connected
* to a server in the ensemble (one of the servers specified
* in the host connection parameter during ZooKeeper client
* creation). */
SyncConnected (3),

/** The client is in the disconnected state - it is not connected
* to any server in the ensemble. */
Disconnected (0),

/** The serving cluster has expired this session. The ZooKeeper
* client connection (the session) is no longer valid. You must
* create a new client connection (instantiate a new ZooKeeper
* instance) if you with to access the ensemble. */
Expired (-112);

这几个事件会在ZooKeeper的watch中处理。

结合上图：

SyncConnected ：触发CONNECTED状态，对应3

Disconnected ：触发CONNECTING状态，对应2

Expired：触发CLOSED状态，对应5

这里要强调Expired事件：这个事件只能由服务端触发，客户端无法触发。

也就是说：如果Session建立，服务端宕机，即使宕机半小时后启动成功，客户端的Session仍然是有效的，因为服务端并不认为Session过期。

We have this behavior because the ZooKeeper ensemble is the one responsible for declaring a session expired, not the client. Until the client hears that a ZooKeeper session has expired, the client cannot declare the session expired. The client may
choose to close the session, however.

再演示Session过期的事件，是通过在eclipse中挂起所有线程的方式（如果运行时，可以通过HSDB等工具），来使得客户端不发送心跳，最终服务端会认为Session过期。

同样，如果把服务端挂起，客户段也会发现会话问题，会触发 Disconnected 事件

会话重连

在集群模式下，如何选择重连的zookeeper service？规则随意，但是有一点很重要。

因为集群之间的数据复制存在延迟，客户端在重连时，一定会选择一个zk zxid > client zxid的zk服务器

通过代码发现，客户端的zk库里，有一个： private volatile long lastZxid;

在发起会话时，会携带给服务端：

ConnectRequest conReq = new ConnectRequest(0, lastZxid,
sessionTimeout, sessId, sessionPasswd);

其中lastZxid的声明： private long lastZxidSeen; 看见Seen了吧，意思很明显。

如示意图，如果client C1 重连S2时，发现S2的zxid < 1，因此不会重连S2，而是再去连接S3

猜你喜欢