ZooKeeper essence of knowledge

What 1.ZooKeeper that?

ZooKeeper is a distributed , open-source distributed application coordination service , Google's Chubby is an open source implementation, it is a cluster manager , monitors the status of each node in the cluster reasonable next step according to the feedback node submitted operation . Finally, the easy to use interface and efficient performance, function and stability of the system to the user.
Client read request can be clustered in processing any machine , if the read request to register the listener on the node, the listener is processed by a zookeeper connected machine. For a write request , in conjunction with these requests after distributed to other machines when the zookeeper and reach an agreement, the request will return success . Thus, with the increase zookeeper cluster machine, read request throughput increases but write request throughput will decrease .
Orderliness is a very important characteristic of a zookeeper, all updates are global and orderly , each update has a unique time stamp , the time stamp called zxid (Zookeeper Transaction Id) . The read request only with respect to the update order , which is the read request will return results with this newest ZooKeeper zxid .

2.ZooKeeper provides what?

1, the file system
2, a notification mechanism

3.Zookeeper File System

Zookeeper provide a multi-level namespace node (node called znode). And the file system is different is that these nodes can set the data associated , and the file system can only store data file node directory node can not. Zookeeper In order to ensure low latency and high throughput, maintained in memory of this tree directory structure, this feature makes Zookeeper can not be used to store large amounts of data , each node storing data limit is 1M .

4. The four types of znode

. 1, PERSISTENT- directory node persistence 
after zookeeper client is disconnected, the node is still present 
2, PERSISTENT_SEQUENTIAL- persistent directory node sequence number
after zookeeper client disconnects, the node is still present, but to the node Zookeeper name numbered sequentially 
. 3, EPHEMERAL- temporary directory node
after zookeeper client is disconnected, the node is deleted 
. 4, EPHEMERAL_SEQUENTIAL- temporary directory node sequence number
the client is disconnected zookeeper, the node is deleted, except for Zookeeper the node name numbered sequentially

5.Zookeeper notification mechanism

client end will establish a znode a watcher event , when the znode changes, the client will be notified of these zk, and client can make changes on the services according to znode change.

6.Zookeeper did what?

1, the naming service
2, the configuration manager
3, cluster management
4, distributed lock
5, the queue manager

7.zk naming service (file system)

Naming Service refers to by specifying the name of access to resources or address of the service , use zk create a global path, that is the only path, the path can be used as a name, pointing to a cluster of clusters, address of the service provided, or a remote object, and so on.

8.zk configuration management (file system notification mechanism)

Program distributed deployed on different machines, the configuration information program on zk's znode down, when there are configuration changes, that is, when znode changes, you can change the contents of zk in a directory node, use watcher notified to each client, thereby changing the configuration.

9.Zookeeper Cluster Management (file system notification mechanism)

The so-called cluster management not care about two things: Is there a machine quit and join the electoral Master
On the first point, all the machines agreed under the parent directory to create a temporary directory node , and then change the message the child node listens parent directory node. Once the machine hang up, the machine is connected to the zookeeper's off, it creates a temporary directory node is removed, all other machines have notified: a sibling directory is removed , so everyone knows: it on board a.
The new machine is similar to join, all machines receive notification: new sibling to join , highcount there has been, for the second point, we change a little bit, all the machines in order to create a temporary directory node number, select the smallest number each time the machine as a master just fine .

10.Zookeeper Distributed Lock (file system notification mechanism)

With zookeeper consistency of the file system, lock the problem easier. Lock service can be divided into two categories, one is to maintain exclusivity , the other is to control the timing
For the first category, we will be on a ZooKeeper znode seen as a lock , it is achieved by createznode way. All clients to create / distribute_lock nodes, that create the ultimate success of the client that is owned the lock. Run delete distribute_lock node that you create will release the lock. 
For the second category, / distribute_lock already pre-existing, all clients to create a temporary directory node sequentially numbered under it, and choose the same master, the smallest number to obtain a lock , spent deleted, and convenient.

11. The process of acquiring Distributed Lock


Delete the temporary node in a distributed lock acquired when creating a temporary order of nodes in the locker node, release the lock time. The client calls createNode method to create a temporary order of nodes in the locker,
and then call getChildren ( "locker") to acquire all the child nodes locker below, note this time without setting any Watcher. After the client gets to all child nodes path, if you find yourself node created in all sub-node number to create the smallest, then consider the client to get a lock. If you find that you have created a node is not all child nodes locker smallest, indicating that he still did not get the lock, then the client needs to find the node is smaller than its own , then it calls the exist () method, while its registration event listener. After that, let the node is concerned about the deletion, the client's Watcher will receive appropriate notice, this time again to determine whether the nodes themselves are creating a locker child nodes smallest number, if it is acquired to lock, if not repeat the above step continues to get smaller than its own node and a registered listener. The current process also requires a number of logical judgment.

Implementation code is mainly based on the mutex, acquire focus logic distributed lock that BaseDistributedLock , to achieve the realization details on Zookeeper distributed lock.

12.Zookeeper queue management (file system notification mechanism)

两种类型的队列:
1、同步队列,当一个队列的成员都聚齐时,这个队列才可用,否则一直等待所有成员到达。 
2、队列按照 FIFO 方式进行入队和出队操作。 
第一类,在约定目录下创建临时目录节点,监听节点数目是否是我们要求的数目。 
第二类,和分布式锁服务中的控制时序场景基本原理一致,入列有编号,出列按编号。在特定的目录下创建PERSISTENT_SEQUENTIAL节点,创建成功时Watcher通知等待的队列,队列删除序列号最小的节点用以消费。此场景下Zookeeper的znode用于消息存储,znode存储的数据就是消息队列中的消息内容,SEQUENTIAL序列号就是消息的编号,按序取出即可。由于创建的节点是持久化的,所以不必担心队列消息的丢失问题

13.Zookeeper数据复制

Zookeeper作为一个集群提供一致的数据服务,自然,它要在所有机器间做数据复制。数据复制的好处: 
1、容错:一个节点出错,不致于让整个系统停止工作,别的节点可以接管它的工作; 
2、提高系统的扩展能力 :把负载分布到多个节点上,或者增加节点来提高系统的负载能力; 
3、提高性能:让客户端本地访问就近的节点,提高用户访问速度

从客户端读写访问的透明度来看,数据复制集群系统分下面两种: 
1、写主(WriteMaster) :对数据的修改提交给指定的节点。读无此限制,可以读取任何一个节点。这种情况下客户端需要对读与写进行区别,俗称读写分离; 
2、写任意(Write Any):对数据的修改可提交给任意的节点,跟读一样。这种情况下,客户端对集群节点的角色与变化透明。

对zookeeper来说,它采用的方式是写任意。通过增加机器,它的读吞吐能力和响应能力扩展性非常好,而写,随着机器的增多吞吐能力肯定下降(这也是它建立observer的原因),而响应能力则取决于具体实现方式,是延迟复制保持最终一致性,还是立即复制快速响应

14.Zookeeper工作原理

Zookeeper 的核心是原子广播,这个机制保证了各个Server之间的同步。实现这个机制的协议叫做Zab协议。Zab协议有两种模式,它们分别是恢复模式(选主)广播模式(同步)。当服务启动或者在领导者崩溃后,Zab就进入了恢复模式,当领导者被选举出来,且大多数Server完成了和 leader的状态同步以后,恢复模式就结束了。状态同步保证了leader和Server具有相同的系统状态。

15.zookeeper是如何保证事务的顺序一致性的?

zookeeper采用了递增的事务Id来标识,所有的proposal(提议)都在被提出的时候加上了zxid,zxid实际上是一个64位的数字,高32位是epoch(时期; 纪元; 世; 新时代)用来标识leader是否发生改变,如果有新的leader产生出来,epoch会自增,低32位用来递增计数。当新产生proposal的时候,会依据数据库的两阶段过程,首先会向其他的server发出事务执行请求,如果超过半数的机器都能执行并且能够成功,那么就会开始执行。

16.Zookeeper 下 Server工作状态

每个Server在工作过程中有三种状态: 
LOOKING:当前Server不知道leader是谁,正在搜寻
LEADING:当前Server即为选举出来的leader
FOLLOWING:leader已经选举出来,当前Server与之同步

17.zookeeper是如何选取主leader的?

当leader崩溃或者leader失去大多数的follower,这时zk进入恢复模式,恢复模式需要重新选举出一个新的leader,让所有的Server都恢复到一个正确的状态。Zk的选举算法有两种:一种是基于basic paxos实现的,另外一种是基于fast paxos算法实现的。系统默认的选举算法为fast paxos

1、Zookeeper选主流程(basic paxos)
(1)选举线程由当前Server发起选举的线程担任,其主要功能是对投票结果进行统计,并选出推荐的Server; 
(2)选举线程首先向所有Server发起一次询问(包括自己); 
(3)选举线程收到回复后,验证是否是自己发起的询问(验证zxid是否一致),然后获取对方的id(myid),并存储到当前询问对象列表中,最后获取对方提议的leader相关信息(id,zxid),并将这些信息存储到当次选举的投票记录表中; 
(4)收到所有Server回复以后,就计算出zxid最大的那个Server,并将这个Server相关信息设置成下一次要投票的Server; 
(5)线程将当前zxid最大的Server设置为当前Server要推荐的Leader,如果此时获胜的Server获得n/2 + 1的Server票数,设置当前推荐的leader为获胜的Server,将根据获胜的Server相关信息设置自己的状态,否则,继续这个过程,直到leader被选举出来。 通过流程分析我们可以得出:要使Leader获得多数Server的支持,则Server总数必须是奇数2n+1,且存活的Server的数目不得少于n+1. 每个Server启动后都会重复以上流程。在恢复模式下,如果是刚从崩溃状态恢复的或者刚启动的server还会从磁盘快照中恢复数据和会话信息,zk会记录事务日志并定期进行快照,方便在恢复时进行状态恢复。

2、Zookeeper选主流程(basic paxos)
fast paxos流程是在选举过程中,某Server首先向所有Server提议自己要成为leader,当其它Server收到提议以后,解决epoch和 zxid的冲突,并接受对方的提议,然后向对方发送接受提议完成的消息,重复这个流程,最后一定能选举出Leader。

18.Zookeeper同步流程

选完Leader以后,zk就进入状态同步过程。 
1、Leader等待server连接; 
2、Follower连接leader,将最大的zxid发送给leader; 
3、Leader根据follower的zxid确定同步点; 
4、完成同步后通知follower 已经成为uptodate状态; 
5、Follower收到uptodate消息后,又可以重新接受client的请求进行服务了。

19.分布式通知和协调

对于系统调度来说:操作人员发送通知实际是通过控制台改变某个节点的状态然后zk将这些变化发送给注册了这个节点的watcher的所有客户端
对于执行情况汇报:每个工作进程都在某个目录下创建一个临时节点并携带工作的进度数据,这样汇总的进程可以监控目录子节点的变化获得工作进度的实时的全局情况

20.机器中为什么会有leader?

在分布式环境中,有些业务逻辑只需要集群中的某一台机器进行执行,其他的机器可以共享这个结果,这样可以大大减少重复计算提高性能,于是就需要进行leader选举。

21.zk节点宕机如何处理?

Zookeeper本身也是集群,推荐配置不少于3个服务器。Zookeeper自身也要保证当一个节点宕机时,其他节点会继续提供服务。
如果是一个Follower宕机,还有2台服务器提供访问,因为Zookeeper上的数据是有多个副本的,数据并不会丢失;
如果是一个Leader宕机,Zookeeper会选举出新的Leader。
ZK集群的机制是只要超过半数的节点正常,集群就能正常提供服务。只有在ZK节点挂得太多,只剩一半或不到一半节点能工作,集群才失效。
所以
3个节点的cluster可以挂掉1个节点(leader可以得到2票>1.5)
2个节点的cluster就不能挂掉任何1个节点了(leader可以得到1票<=1)

22.zookeeper负载均衡和nginx负载均衡区别

zk的负载均衡是可以调控,nginx只是能调权重,其他需要可控的都需要自己写插件;但是nginx的吞吐量比zk大很多,应该说按业务选择用哪种方式。

23.zookeeper watch机制

Watch机制官方声明:一个Watch事件是一个一次性的触发器,当被设置了Watch的数据发生了改变的时候,则服务器将这个改变发送给设置了Watch的客户端,以便通知它们。
Zookeeper机制的特点:
1、一次性触发数据发生改变时,一个watcher event会被发送到client,但是client只会收到一次这样的信息
2、watcher event异步发送watcher的通知事件从server发送到client是异步的,这就存在一个问题,不同的客户端和服务器之间通过socket进行通信,由于网络延迟或其他因素导致客户端在不通的时刻监听到事件,由于Zookeeper本身提供了ordering guarantee,即客户端监听事件后,才会感知它所监视znode发生了变化。所以我们使用Zookeeper不能期望能够监控到节点每次的变化。Zookeeper只能保证最终的一致性,而无法保证强一致性
3、数据监视Zookeeper有数据监视和子数据监视getdata() and exists()设置数据监视,getchildren()设置了子节点监视。
4、注册watcher getData、exists、getChildren
5、触发watcher create、delete、setData
6、setData()会触发znode上设置的data watch(如果set成功的话)。一个成功的create() 操作会触发被创建的znode上的数据watch,以及其父节点上的child watch。而一个成功的delete()操作将会同时触发一个znode的data watch和child watch(因为这样就没有子节点了),同时也会触发其父节点的child watch。
7、当一个客户端连接到一个新的服务器上时,watch将会被以任意会话事件触发。当与一个服务器失去连接的时候,是无法接收到watch的。而当client重新连接时,如果需要的话,所有先前注册过的watch,都会被重新注册。通常这是完全透明的。只有在一个特殊情况下,watch可能会丢失:对于一个未创建的znode的exist watch,如果在客户端断开连接期间被创建了,并且随后在客户端连接上之前又删除了,这种情况下,这个watch事件可能会被丢失。
8、Watch是轻量级的,其实就是本地JVM的Callback,服务器端只是存了是否有设置了Watcher的布尔类型

Guess you like

Origin blog.csdn.net/qq_35488412/article/details/91042445