ZooKeeper series (a) - ZooKeeper Introduction and core concepts

A, Zookeeper Profile

Zookeeper is an open source distributed coordination services, currently maintained by Apache. Zookeeper distributed systems can be used to implement the common publishing / subscription, load balancing, order services, distributed coordination / notification, cluster management, Master election, distributed locking and distributed queue functions. It has the following features:

  • Sequential consistency : initiated a request from a client transaction, will eventually be in strict accordance with its launch in order to be applied to Zookeeper;
  • Atomicity : all transaction request processing result in the entire cluster are the same on all machines; applied part of the machine does not exist the transaction, while another portion of the case is not applied;
  • Single view : all clients see the server data model are consistent;
  • Reliability : change once a successful application server transaction, it causes will remain, until changed to another firm;
  • Real-time : once a transaction is successfully applied, Zookeeper can ensure that client data immediately after the date of this transaction changes can be read.

Two, Zookeeper design goals

Zookeeper is committed to providing a high-performance, high-availability for those high throughput of large distributed systems, and has a strict sequential access to distributed control capabilities to coordinate services. It has the following four objectives:

2.1 Goal One: simple data model

Zookeeper to store data by a tree structure, which consists of a series of nodes are called data ZNode composition, similar to a common file system. But common file systems and different, Zookeeper will be in memory, in order to realize the full amount of data stored in a high throughput, reduce access latency.

2.2 Objective Two: Build a cluster

May constitute Zookeeper Zookeeper cluster consists of a set of services, each machine in the cluster will maintain its own separate state in memory, and have maintained communication between each machine, as long as half of the machines in the cluster to work properly, the entire cluster will We can provide services normally.

2.3 Goal Three: Sequential Access

For each update request from the client, Zookeeper is assigned a globally unique incrementing ID, which reflects the order of all transaction requests.

2.4 Goal Four: high-performance high availability

ZooKeeper to store the data stored in the total amount of memory in order to maintain high performance and high availability is achieved through the service cluster, since all Zookeeper updates and deletes are transaction-based, so that reading and writing in a scenario few have high performance.

Third, the core concept

3.1 Role Cluster

Zookeeper machines in the cluster is divided into the following three roles:

  • Leader : provide literacy services to clients, and maintain state of the cluster, which is generated by the cluster elections;
  • Follower : provide literacy services to clients, and regularly report their status to the node Leader. Also involved in the write operation, "wrote more than half of the successful" strategy and Leader of the election;
  • Observer : provide literacy services to clients, and regularly report their status to the Leader of the node, but does not participate in the write operation, "wrote more than half successful" election strategy and Leader, so the Observer can improve cluster without affecting the write performance read performance.

Session 3.2

Zookeeper client via TCP long connection to the service cluster, session (Session) from the beginning of the first connection has been established, after the session state to maintain effective heartbeat detection mechanism. Through this connection, the client may send a request and receive a response, but also may receive an event notification Watch.

In another conversation about the concept of a core is sessionTimeOut (session timeout), when due to a network failure or client interrupts and other reasons, resulting in disconnection and re-establish the connection within this time as long as the session timeout, the previously created the session is still valid.

3.3 Data Node

Zookeeper data model is a series of elementary data unit Znode(data nodes) of the node tree, which is the root node /. Will save your data and node information on each node. Zookeeper nodes can be divided into two categories:

  • Persistent node : The node once created, are automatically deleted unless otherwise always existed;
  • Temporary node : Once you create a client session for that node fails, all the client creates a temporary node will be deleted.

Temporary and permanent node node can add a special property: SEQUENTIAL, whether on behalf of the node with increasing property. If you specify this attribute, when creating this node, Zookeeper will automatically append incremental node name behind their numbers maintained by the parent.

3.4 node information

Each node ZNode while storing data, are known to maintain a Statdata structure, which stores information about all the states of the node. as follows:

State property Explanation
czxid Transaction ID when creating a data node
ctime Time of creating a data node
mzxid When the transaction ID data nodes last update
mtime The time when the last update of the data nodes
pzxid Child node of the node data when the last transaction ID is modified
cversion Changing the number of child nodes
version Changing the number of node data
aversion Changing the number of nodes ACL
ephemeralOwner If the node is a temporary node, the created session SessionID the node; if the node is a persistent node, the attribute value is 0
dataLength The length of the data content
numChildren The number of child nodes of the current node data

3.5 Watcher

Zookeeper in a commonly used function is Watcher (event listener), which allows users to register for events of interest to listen on the specified node, when an event occurs, the listener will be triggered and event information is pushed to the client. This mechanism is an important characteristic to achieve Zookeeper distributed coordination services.

3.6 ACL

Zookeeper uses ACL (Access Control Lists) strategies to access control, access control is similar to the UNIX file system. It defines the following five permissions:

  • The CREATE : allows you to create a child node;
  • READ:允许从节点获取数据并列出其子节点;
  • WRITE:允许为节点设置数据;
  • DELETE:允许删除子节点;
  • ADMIN:允许为节点设置权限。

四、ZAB协议

4.1 ZAB协议与数据一致性

ZAB 协议是 Zookeeper 专门设计的一种支持崩溃恢复的原子广播协议。通过该协议,Zookeepe 基于主从模式的系统架构来保持集群中各个副本之间数据的一致性。具体如下:

Zookeeper 使用一个单一的主进程来接收并处理客户端的所有事务请求,并采用原子广播协议将数据状态的变更以事务 Proposal 的形式广播到所有的副本进程上去。如下图:

具体流程如下:

所有的事务请求必须由唯一的 Leader 服务来处理,Leader 服务将事务请求转换为事务 Proposal,并将该 Proposal 分发给集群中所有的 Follower 服务。如果有半数的 Follower 服务进行了正确的反馈,那么 Leader 就会再次向所有的 Follower 发出 Commit 消息,要求将前一个 Proposal 进行提交。

4.2 ZAB协议的内容

ZAB 协议包括两种基本的模式,分别是崩溃恢复和消息广播:

1. 崩溃恢复

当整个服务框架在启动过程中,或者当 Leader 服务器出现异常时,ZAB 协议就会进入恢复模式,通过过半选举机制产生新的 Leader,之后其他机器将从新的 Leader 上同步状态,当有过半机器完成状态同步后,就退出恢复模式,进入消息广播模式。

2. 消息广播

ZAB 协议的消息广播过程使用的是原子广播协议。在整个消息的广播过程中,Leader 服务器会每个事物请求生成对应的 Proposal,并为其分配一个全局唯一的递增的事务 ID(ZXID),之后再对其进行广播。具体过程如下:

Leader 服务会为每一个 Follower 服务器分配一个单独的队列,然后将事务 Proposal 依次放入队列中,并根据 FIFO(先进先出) 的策略进行消息发送。Follower 服务在接收到 Proposal 后,会将其以事务日志的形式写入本地磁盘中,并在写入成功后反馈给 Leader 一个 Ack 响应。当 Leader 接收到超过半数 Follower 的 Ack 响应后,就会广播一个 Commit 消息给所有的 Follower 以通知其进行事务提交,之后 Leader 自身也会完成对事务的提交。而每一个 Follower 则在接收到 Commit 消息后,完成事务的提交。

五、Zookeeper的典型应用场景

5.1数据的发布/订阅

数据的发布/订阅系统,通常也用作配置中心。在分布式系统中,你可能有成千上万个服务节点,如果想要对所有服务的某项配置进行更改,由于数据节点过多,你不可逐台进行修改,而应该在设计时采用统一的配置中心。之后发布者只需要将新的配置发送到配置中心,所有服务节点即可自动下载并进行更新,从而实现配置的集中管理和动态更新。

Zookeeper 通过 Watcher 机制可以实现数据的发布和订阅。分布式系统的所有的服务节点可以对某个 ZNode 注册监听,之后只需要将新的配置写入该 ZNode,所有服务节点都会收到该事件。

5.2 命名服务

在分布式系统中,通常需要一个全局唯一的名字,如生成全局唯一的订单号等,Zookeeper 可以通过顺序节点的特性来生成全局唯一 ID,从而可以对分布式系统提供命名服务。

5.3 Master选举

分布式系统一个重要的模式就是主从模式 (Master/Salves),Zookeeper 可以用于该模式下的 Matser 选举。可以让所有服务节点去竞争性地创建同一个 ZNode,由于 Zookeeper 不能有路径相同的 ZNode,必然只有一个服务节点能够创建成功,这样该服务节点就可以成为 Master 节点。

5.4 分布式锁

可以通过 Zookeeper 的临时节点和 Watcher 机制来实现分布式锁,这里以排它锁为例进行说明:

分布式系统的所有服务节点可以竞争性地去创建同一个临时 ZNode,由于 Zookeeper 不能有路径相同的 ZNode,必然只有一个服务节点能够创建成功,此时可以认为该节点获得了锁。其他没有获得锁的服务节点通过在该 ZNode 上注册监听,从而当锁释放时再去竞争获得锁。锁的释放情况有以下两种:

  • 当正常执行完业务逻辑后,客户端主动将临时 ZNode 删除,此时锁被释放;
  • 当获得锁的客户端发生宕机时,临时 ZNode 会被自动删除,此时认为锁已经释放。

当锁被释放后,其他服务节点则再次去竞争性地进行创建,但每次都只有一个服务节点能够获取到锁,这就是排他锁。

5.5 集群管理

Zookeeper 还能解决大多数分布式系统中的问题:

  • 如可以通过创建临时节点来建立心跳检测机制。如果分布式系统的某个服务节点宕机了,则其持有的会话会超时,此时该临时节点会被删除,相应的监听事件就会被触发。
  • 分布式系统的每个服务节点还可以将自己的节点状态写入临时节点,从而完成状态报告或节点工作进度汇报。
  • 通过数据的订阅和发布功能,Zookeeper 还能对分布式系统进行模块的解耦和任务的调度。
  • 通过监听机制,还能对分布式系统的服务节点进行动态上下线,从而实现服务的动态扩容。


参考资料

  1. 倪超 . 从 Paxos 到 Zookeeper——分布式一致性原理与实践 . 电子工业出版社 . 2015-02-01

更多大数据系列文章可以参见 GitHub 开源项目大数据入门指南

Guess you like

Origin www.cnblogs.com/heibaiying/p/11361836.html