Zookeeper learning (1) - basic concepts

What is zookeeper? 

         ZooKeeper is a distributed, open source coordination service for distributed applications, which contains a simple set of primitives on which distributed applications can implement synchronization services, configuration maintenance, and naming services. Zookeeper is a sub-project of hadoop, and its development process does not need to be repeated. In distributed applications, since engineers cannot use locking mechanisms well, and message-based coordination mechanisms are not suitable for use in some applications, there is a need for a reliable, scalable, distributed, and configurable coordination mechanism. mechanism to unify the state of the system. That's what Zookeeper is for. This article briefly analyzes the working principle of zookeeper, and how to use zookeeper is not the focus of this article.

 

The basic concept of Zookeeper?

1. Roles: There are three main types of roles in Zookeeper, as shown in the following table:


 

The system model is shown in the figure:




2. Design purpose:

2.1. Final consistency: No matter which server the client connects to, it shows the same view, which is the most important performance of zookeeper.

2.2. Reliability: Simple, robust, and good performance, if message m is accepted by one server, then it will be accepted by all servers.

2.3. Real-time: Zookeeper guarantees that the client will get the updated information of the server within a time interval, or the information of the server failure. However, due to network delay and other reasons, Zookeeper cannot guarantee that two clients can get the newly updated data at the same time. If the latest data is required, the sync() interface should be called before reading the data.

2.4. Wait-free: A slow or invalid client must not interfere with a fast client's request, so that each client can effectively wait.

2.5. Atomicity: Updates can only succeed or fail, with no intermediate states.

2.6. Sequential: including global ordering and partial ordering: global ordering means that if message a is published before message b on one server, message a will be published before message b on all servers; Partial order means that if a message b is published by the same sender after message a, a will be ranked before b.

 

Zookeeper's data structure:

1. Each subdirectory item such as NameService is called a znode, which is uniquely identified by its path.

2. A znode can have sub-node directories, and each znode can store data. Note that EPHEMERAL type directory nodes cannot have sub-node directories.

3. Znodes have versions. The data stored in each znode can have multiple versions, that is, multiple copies of data can be stored in one access path.

4. The znode can be a temporary node. Once the client that created the znode loses contact with the server, the znode will also be deleted automatically. Zookeeper's client-server communication adopts a long-term connection, and each client and server maintain a connection through heartbeat. , this connection state is called session. If the znode is a temporary node, the session is invalid and the znode is deleted.

5. The directory name of the znode can be automatically numbered. If App1 already exists, if it is created again, it will be automatically named App2

6. The znode can be monitored, including the modification of the data stored in this directory node, the change of the sub-node directory, etc. Once the change is made, the client who sets the monitoring can be notified. This is the core feature of Zookeeper, and many functions of Zookeeper are based on this. features are implemented.

 

 

How does Zookeeper work?

        The core of Zookeeper is atomic broadcast, which ensures synchronization between servers. The protocol that implements this mechanism is called the Zab protocol.

       Zab protocol has two modes, they are recovery mode (select master) and broadcast mode (synchronization) .

       When the service starts or after the leader crashes, Zab enters recovery mode. When the leader is elected, and most servers have completed synchronization with the leader's state, the recovery mode ends. State synchronization ensures that the leader and server have the same system state.

       为了保证事务的顺序一致性,zookeeper采用了递增的事务id号(zxid)来标识事务。所有的提议(proposal)都在被提出的时候加上了zxid。实现中zxid是一个64位的数字,它高32位是epoch用来标识leader关系是否改变,每次一个leader被选出来,它都会有一个新的epoch(纪元),标识当前属于那个leader的统治时期。低32位用于递增计数。

       每个Server在工作过程中有三种状态:

             LOOKING:当前Server不知道leader是谁,正在搜寻

             LEADING:当前Server即为选举出来的leader

             FOLLOWING:leader已经选举出来,当前Server与之同步

 

选举流程:

        当leader崩溃或者leader失去大多数的follower,这时候zk进入恢复模式,恢复模式需要重新选举出一个新的leader,让所有的Server都恢复到一个正确的状态。Zk的选举算法有两种:一种是基于basic paxos实现的,另外一种是基于fast paxos算法实现的。系统默认的选举算法为fast paxos。先介绍basic paxos流程

1 .选举线程由当前Server发起选举的线程担任,其主要功能是对投票结果进行统计,并选出推荐的Server;

2 .选举线程首先向所有Server发起一次询问(包括自己);

3 .选举线程收到回复后,验证是否是自己发起的询问(验证zxid是否一致),然后获取对方的id(myid),并存储到当前询问对象列表中,最后获取对方提议的leader相关信息(id,zxid),并将这些信息存储到当次选举的投票记录表中;

4.  收到所有Server回复以后,就计算出zxid最大的那个Server,并将这个Server相关信息设置成下一次要投票的Server;

5.  线程将当前zxid最大的Server设置为当前Server要推荐的Leader,如果此时获胜的Server获得n/2 + 1的Server票数, 设置当前推荐的leader为获胜的Server,将根据获胜的Server相关信息设置自己的状态,否则,继续这个过程,直到leader被选举出来。

       通过流程分析我们可以得出:要使Leader获得多数Server的支持,则Server总数必须是奇数2n+1,且存活的Server的数目不得少于n+1.

 

同步流程:

选完leader以后,zk就进入状态同步过程。

1. leader等待server连接;

2 .Follower连接leader,将最大的zxid发送给leader;

3 .Leader根据follower的zxid确定同步点;

4 .完成同步后通知follower 已经成为uptodate状态;

5 .Follower收到uptodate消息后,又可以重新接受client的请求进行服务了。

流程图如下所示:



 

 

工作流程:

Leader工作流程:

Leader主要有三个功能:

1 .恢复数据;

2 .维持与Learner的心跳,接收Learner请求并判断Learner的请求消息类型;

3 .Learner的消息类型主要有PING消息、REQUEST消息、ACK消息、REVALIDATE消息,根据不同的消息类型,进行不同的处理。

 

Follower工作流程:

Follower主要有四个功能:

1. 向Leader发送请求(PING消息、REQUEST消息、ACK消息、REVALIDATE消息);

2 .接收Leader消息并进行处理;

3 .接收Client的请求,如果为写请求,发送给Leader进行投票;

4 .返回Client结果。

Follower的消息循环处理如下几种来自Leader的消息:

1 .PING消息: 心跳消息;

2 .PROPOSAL消息:Leader发起的提案,要求Follower投票;

3 .COMMIT消息:服务器端最新一次提案的信息;

4 .UPTODATE消息:表明同步完成;

5 .REVALIDATE消息:根据Leader的REVALIDATE结果,关闭待revalidate的session还是允许其接受消息;

6 .SYNC消息:返回SYNC结果到客户端,这个消息最初由客户端发起,用来强制得到最新的更新。

Follower的工作流程简图如下所示,在实际实现中,Follower是通过5个线程来实现功能的。

对于observer的流程不再叙述,observer流程和Follower的唯一不同的地方就是observer不会参加leader发起的投票。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326611591&siteId=291194637