Zookeeper design principles and working principles

1. What is zookeeper

ZooKeeper is a distributed, open source distributed application coordination service

For details, see the official website: https://zookeeper.apache.org/

2. What can zookeeper do

Zookeeper design principles and working principles

zookeeper

1. Configuration maintenance: In a distributed system, services are generally deployed to n machines, and the service configuration files are the same. If the configuration options of the configuration file are changed, then we have to go one by one. change. At this time, zookeeper comes into play. You can use zk as a high-availability configuration storage, hand over such configuration to zk for management, copy the cluster configuration file to a node of zookeeper's file system, and then use zk monitors the status of configuration files in all distributed systems. Once a configuration file is found to have changed, each server synchronizes the configuration file of zk. At the same time, zk ensures the atomicity of the synchronization operation and ensures that the configuration file of each server can be accessed. renew.

2. Naming service: In distributed applications, a complete naming rule is usually required, which can generate a unique name and is easy for people to identify and remember. Zk provides this kind of service, which is similar to the correspondence between domain names and IPs. Domain names are easy to remember, and information such as addresses and providers of resources and services can be obtained through names.

3. Distributed locks: Processes of distributed programs distributed on different hosts need to lock when accessing mutually exclusive resources. Understand it this way: Many distributed systems have multiple service windows, but only one service is allowed to work at a certain time. When this server fails, the lock is released, and the tenderloin fails over to another service. For example, when you go to a certain place to apply for a certificate, you can only have one window to serve you. If the teller at this window leaves in a hurry, the system or the manager will designate another window for you to continue the service.

4. Cluster management: In a distributed cluster, due to various reasons, such as hardware failures and network problems, some nodes hang up and some nodes join in. At this time, the machine needs to perceive the change, and then make corresponding decisions according to the change, then zk realizes the management of such a cluster.

5. Queue management: Similar to the functions of some mq implementation queues, this is not commonly used and is not suitable for high-performance applications.

3. Role management of zookeeper

Zookeeper design principles and working principles

zookeeper

Leader (Leader): The leader is responsible for the initiation and resolution of voting and updating the system status.

Learner (Learner): Follower (Follower): used to accept client requests and return results to the client, participate in voting in the process of electing the master.

Observer (ObServer): ObServer can accept client connections and forward write requests to the leader node, but ObServer does not participate in the voting process and only synchronizes the leader state. The purpose of ObServer is to expand the system and improve the reading speed.

Client (Client): The originator of the request.

Four, zookeeper design principles

1. Final consistency: No matter which zk node the client connects to, the view displayed to him is the same.

2. Reliability: If the message is accepted by one server, it will be accepted by any server.

3. Real-time: zk guarantees to obtain server update information or server failure information within a time interval. However, due to some other reasons such as network delay, zk cannot guarantee that the two client colleagues will get the update or invalidation information.

4. Waiting is irrelevant: A slow or invalid client (Client) must not interfere with the request of a fast client, so that each client can effectively wait.

5. Atomicity: The update can only succeed or fail, and there is no other intermediate information.

6. Sequential: including global ordering and partial ordering: global ordering means that if message a is published before message b on another server, message a will be published before message b on all servers; Partial order means that if a message b is published by the same sender after message a, a will be ranked before b.

Five, the working principle of zookeeper

Zookeeper design principles and working principles

zookeeper

The core of zk is atomic broadcast. This mechanism ensures synchronization between servers. The protocol that implements this mechanism is called the Zab protocol. The Zab protocol has two modes, recovery mode (select master) and broadcast mode (synchronization). When the service starts or the leader crashes, Zab enters the recovery mode. When the leader is elected, the synchronization mode is performed. After the synchronization is completed, the recovery mode ends.

To ensure sequential consistency of transactions. In the implementation, zxid is a 64-bit number, and its upper 32 bits are used to indicate whether the leader relationship has changed. Every time a new leader is elected, it will have a new epoch. The lower 32 bits are used to increment the count.

(1) Serverid: When configuring the server, the identification id of the given server.

(2)Zxid:服务器在运行时产生的数据id,zxid越大,表示数据越新。

(3)Epoch:选举的轮数,即逻辑时钟。随着选举的轮数++

1、 选主流程

Zookeeper design principles and working principles

zookeeper

当leader崩溃或者leader失去大多数的follower,这时候zk进入恢复模式,然后需要重新选举出一个leader。让所有的Server都恢复到一个正确的状态。Zk选举算法有两种,一种是基于basic paxos实现,一种是基于fast paxos算法实现。系统默认的是fast paxos。

每个Server在工作过程中有三种状态:

LOOKING:当前Server不知道Leader是谁,正在搜寻。

LEADING:当前Server即为选举出来的leader。

FOLLOWING:leader已经选举出来,当前Server与之同步。

basic paxos流程:

1、选举线程由当前Server发起选举的线程担任,其主要功能是对投票结果进行统计,并选出推荐的Server。

2、选举线程首先向所有Server发起一次询问(包括自己)。

3、选举线程收到回复后,验证是否是自己发起的询问(验证zxid是否一致),然后获取对方的id(myid),并存储到当前询问对象列表中,最后获取对方提议的leader相关信息(myid,zxid),并将这些信息存储到当次选举的投票记录表中。

4、收到所有Server回复以后,就计算出zxid最大的那个Server,并将这个Server相关信息设置成下一次投票的Server。

5、线程将当前zxid最大的Server设置成为当前Server要推荐的Leader,若果此时获胜的Server获得n/2+1的Server票数,设置当前推荐的leader为获胜的Server,将根据获胜的Server相关信息设置成自己的状态,否则,继续这个过程,直到leader被选举出来。

备注:要使Leader获得多数的Server支持,则Server总数必须是奇数2n+1,且存活的Server的数据不得少于n+1。

Zookeeper design principles and working principles

zookeeper

fast paxos流程:

1、 server启动、恢复准备加入集群,此时都会读取本身的zxid等信息。

2、 所有server加入集群时都会推荐自己成为leader,然后将(leader id,zxid,epoch)作为广播信息到集群中所有的server,等待集群中的server返回信息。

3、 收到集群中其他服务器返回的信息,分为两类,服务器处于looking状态,或者其他状态。

(1) 服务器处于looking状态

先判断逻辑时钟Epoch:

(a) 如果接受到Epoch大于自己目前的逻辑时钟,那么更新本机的Epoch,同时clear其他服务器发送来的选举数据。然后判断是否需要更新当前自己的选举情况(开始选择的leader id是自己)。

判断规则:保存的zxid最大值和leaderid来进行判断。先看数据zxid,zxid大的胜出;其次判断leader id,leader id大的胜出;然后再将自身最新的选举结果广播给其他server。

(b) 如果接受到的Epoch小于目前的逻辑时钟,说明对方处于一个比较低一轮的选举轮数,这时需要将自己的选举情况发送给它即可。

(c) 如果接收到的Epoch等于目前的逻辑时钟,再根据(a)中的判断规则,将自身的最新选举结果广播给其他server。

同时server还要处理两种情况:

(a) 如果server接收到了其他所有服务器的选举信息,那么则根据这些选举信息确定自己的状态(Following,Leading),结束Looking,退出选举。

(b) 即时没有收到所有服务器的选举信息,也可以判断一下根据以上过程之后最新的选举leader是不是得到了超过半数以上服务器的支持,如果是则尝试接受最新数据,如果没有最新数据,说明都接受了这个结果,同样也退出选举过程。

(2) 服务器处于其他状态(Following,Leading)

(a) 若果逻辑时钟Epoch相同,将该数据保存到recvset,若果所接受服务器宣称自己是leader,那么将判断是不是有半数以上的服务器选举他,若果是则设置选举状态退出选举过程。

(b) 若果Epoch不相同,那么说明另一个选举过程中已经有了选举结果,于是将选举结果加入到outofelection集合中,再根据outofelection来判断是否可以结束选举,保存逻辑时钟,设置选举状态,并退出选举过程。

Zookeeper design principles and working principles

zookeeper

2、 同步流程

1、 leader等待server连接。

2、 follower连接到leader,将最大的zxid发送给leader。

3、 leader根据zxid确定同步点。

4、 同步完成之后,通知follower成为uptodat状态。

5、 follower收到uptodate消息后,开始接受client请求服务。

3、 主要功能

Zookeeper design principles and working principles

zookeeper

1、 Leader主要功能

(a) 恢复数据。

(b) 维持与Learner的心跳,接受Learner请求并判断Learner的请求消息类型。

备注:Learner的消息类型主要是ping、request、ack、revalidate。

ping消息:是指Learner的心跳信息。

request消息:follower发送的提议信息,包括写请求和同步请求。

ack消息:是follower对提议的回复,超过半数follower通过,则commit提议。

revalidate消息:用来延长session有效时间。

2、 Follower主要功能

(a) 向Leader发送请求。

(b) 接受Leaser消息并进行处理。

(c) 接受Client的请求,如果是写请求,发送给Leader进行投票。

(d) 返回结果给Client。

备注:follower处理Leader的如下几个消息:

ping:心跳信息。

proposal消息:leader发起提案,要求follower投票。

commit消息:服务器端最新一次提案的消息。

uptodate消息:表明同步完成。

Revalidate message: According to the REVALIDATE result of the Leader, whether to close the session to be revalidated or allow it to accept messages;

sync message: return sync information to the client client.

Source: https://www.toutiao.com/i6500040588123963917/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325550715&siteId=291194637