Zookeeper Literacy 2 Introduction, Principle and Application of Zookeeper, a Distributed Service Framework

http://www.jianshu.com/p/bf32e44d3113

Introduction, principle and application of the distributed service framework

   



Zookeeper Introduction to

  Zookeeper Zookeeper distributed service framework is a sub-project of Apache Hadoop, which is mainly used to solve the problems often encountered in distributed applications. Some data management issues encountered, such as: unified naming service, state synchronization service, cluster management, management of distributed application configuration items, etc.

Basic concepts of Zookeeper

zk role

  The roles in Zookeeper are mainly divided into the following three categories, as shown in the following table:




zookeeper role

zk service network structure

  Zookeeper's working cluster can be simply divided into two categories, one is the leader, the only one, and the rest are followers. How? The leader is determined through internal elections.




zookeeper service

1. The Leader and each follower communicate with each other. The data of the zk system is stored in the memory, and a copy is also backed up on the disk.
  For each zk node, it can be seen that the namespace of each zk node is the same, that is, it has the same data. (See the tree structure below)
2. If the leader hangs, the zk cluster will re-elect, and a leader will be re-elected at the millisecond level.
3. The zk service is unavailable unless more than half of the zk nodes in the cluster are down.

zk namespace structure

  Zookeeper's namespace is the file system of the zk application. It is very similar to the Linux file system and is also tree-like, so that each path can be determined to be unique, and operations on the namespace must be absolute path operations. Different from the linux file system, the linux file system has the difference between directories and files, and zk is uniformly called znode, a znode node can contain child znodes, and can also contain data.




Zookeeper tree structure


Tips :
For example, /Nginx/conf, / is a znode, /Nginx is a child znode of /, /Nginx can also contain data, the data content is the IP of all machines where Nginx is installed, and /Nginx/conf is a child znode of /Nginx , it can also contain content, the data is the content of the Nginx configuration file. In the application, we can obtain the IP list of all machines where Nginx is installed through such a path, and also obtain the configuration files of Nginx on these machines.

zk reads and writes data



zookeeper reads and writes data
• writes data, but when a client makes a data write request, it will specify the node in the zk cluster. If the follower receives the write request, it will forward the request to the leader, and the leader will pass the internal Zab The protocol performs atomic broadcast until all zk nodes have successfully written data (memory synchronization and disk update), this write request is completed, and then the zk service will send back a response to the client.
Read data, because all zk in the cluster Nodes all present the same namespace view (that is, structural data). The above write request has ensured that when writing data once, it must ensure that all zk nodes in the cluster are synchronous namespaces, so when reading, it can be in any zk node. superior

PS: In fact, when writing data, it is not necessary to ensure that all zk nodes respond after writing, but to ensure that more than half of the nodes have finished writing and update the change to memory, and use it as the application of the latest namespace. Therefore, when reading data, it may read the zk node that is not the latest, which can only be solved by sync(). Let's not consider it here, assuming that the entire zk service is synchronized with meta information, which will be discussed in a later article.


zk znode type The type

  of znode node in Zookeeper can be specified when it is created. There are mainly the following types.
1. PERSISTENT: Persistent znode node, once the data stored in this znode point is created, it will not disappear automatically unless the client actively deletes it.
SEQUENCE: Increase the number of znode nodes in sequence. For example, ClientA goes to zk service to create a znode named /Nginx/conf. After specifying this type of node, zk will create /Nginx/conf0000000000, and ClientB will create it again by creating /Nginx/conf0000000001 , ClientC is to create /Nginx/conf0000000002, any Client to create this znode in the future will get a znode with the largest znode number +1 in the current zk namespace, that is to say, any Client to create a znode is guaranteed to get an incremental znode , and is unique.
2.EPHEMERAL: Temporary znode node. When the Client connects to the zk service, it will establish a session, and then use this zk connection instance to create a znode of this type. Once the Client closes the zk connection, the server will clear the session, and then the session will be established. The znode nodes will disappear from the namespace. In summary, the life cycle of this type of znode is the same as the connection established by the Client. For example, ClientA creates a znode at /Nginx/conf0000000011 of EPHEMERAL. Once ClientA's zk connection is closed, the znode will disappear. The znode will be deleted from the entire zk service namespace.
3. PERSISTENT|SEQUENTIAL: znode nodes that are automatically numbered in sequence. This kind of znoe node will automatically increase by 1 according to the number of the znode node that currently exists, and will not disappear when the session is disconnected.
4.EPHEMERAL|SEQUENTIAL: Temporary automatic numbering of nodes, the znode node number will automatically increase, but it will disappear when the session disappears.

Zookeeper design purpose
1. Final consistency: no matter which server the client is connected to, it is the same view displayed to it, This is the most important performance of zookeeper.
2. Reliability: It has simple, robust and good performance. If message m is accepted by one server, it will be accepted by all servers.
3. Real-time: Zookeeper ensures that the client will obtain the updated information of the server within a time interval, or the information of the server failure. However, due to network delay and other reasons, Zookeeper cannot guarantee that two clients can get the newly updated data at the same time. If the latest data is required, the sync() interface should be called before reading the data.
4. Wait-free: A slow or invalid client must not interfere with the request of a fast client, so that each client can effectively wait.
5. Atomicity: Updates can only succeed or fail, with no intermediate states.
6. Sequential: including global ordering and partial ordering: global ordering means that if message a is published before message b on one server, message a will be published before message b on all servers; Partial order means that if a message b is published by the same sender after message a, a will be ranked before b.

How Zookeeper Works

  The core of Zookeeper is broadcasting, which ensures synchronization between servers. The protocol that implements this mechanism is called the Zab protocol.
  The Zab protocol has two modes, which are recovery mode (select master) and broadcast mode (synchronization). When the service starts or after the leader crashes, Zab enters recovery mode. When the leader is elected and most servers have finished synchronizing with the leader's state, the recovery mode ends. State synchronization ensures that the leader and server have the same system state. In order to ensure the sequential consistency of transactions, zookeeper uses an increasing transaction id number (zxid) to identify transactions. All proposals are made with zxid added. In the implementation, zxid is a 64-bit number, and its high-order 32 bits are the epoch used to identify whether the leader relationship has changed. Every time a leader is elected, it will have a new epoch to identify the current reign of that leader. The lower 32 bits are used to count up.

Each Server has three states in the working process:
• LOOKING: The current Server does not know who the leader is and is searching.
• LEADING: The current Server is the elected leader.
•FOLLOWING: The leader has been elected, and the current server is synchronized with it.
Main election process
  When the leader crashes or the leader loses most of the followers, zk enters the recovery mode, and the recovery mode needs to re-elect a new leader to restore all servers to a correct state.
Zookeeper has two election algorithms:
  one is based on basic paxos, and the other is based on fast paxos.
The system default election algorithm is fast paxos.
Basic paxos process: 1. The election thread is held by the thread that initiates the election of the current server. Its main function is to count the voting results and select the recommended server;
2. The election thread first initiates a query to all servers (including itself);
3. After the election thread receives the reply, it verifies whether it is the query initiated by itself (verifies whether the zxid is consistent), then obtains the id (myid) of the other party, stores it in the current query object list, and finally obtains the information about the leader proposed by the other party ( id, zxid), and store these information in the voting record table of the current election;
4. After receiving the replies from all servers, calculate the server with the largest zxid, and set the relevant information of this server as the next voting Server;
5. The thread sets the current server with the largest zxid as the leader to be recommended by the current server. If the winning server gets n/2 + 1 server votes at this time, set the currently recommended leader as the winning server, which will be related to the winning server. The message sets its own state, otherwise, the process continues until the leader is elected. Through the process analysis, we can conclude that in order for the leader to obtain the support of most servers, the total number of servers must be an odd number of 2n+1, and the number of surviving servers must not be less than n+1. The above process will be repeated after each server is started. In recovery mode, if the server has just recovered from a crash or has just started, data and session information will be recovered from disk snapshots. ZK will record transaction logs and take snapshots regularly to facilitate state recovery during recovery.
The specific flow chart of the election is as follows:




zk basic paxos election


fast paxos process:
  During the election process, a server first proposes to all servers that it wants to become the leader. When other servers receive the proposal, the conflict between epoch and zxid is resolved, and the Accept the other party's proposal, then send a message that the acceptance of the proposal is completed to the other party, repeat this process, and finally elect the leader.
The specific flow chart of the election is as follows:



zk fast paxos election

synchronization process

After the leader is elected, zk enters the state synchronization process.
1. The leader waits for the server to connect; 2. The
follower connects to the leader and sends the largest zxid to the leader; 3. The leader
determines the synchronization point according to the follower's zxid;
4. After the synchronization is completed, it notifies the follower that it has become the uptodate state;
5. After the Follower receives the update message, it can re-accept the client's request for service.
The specific flowchart of synchronization is as follows:



zk synchronization

process Workflow

Leader workflow
1. Restore data;
2. Maintain heartbeat with Learner, receive learner requests and determine the type of
learner's request message; 3. The main message types of learner are PING Message, REQUEST message, ACK message, and REVALIDATE message are processed differently according to different message types.

The PING message refers to the learner's heartbeat information; the
REQUEST message is the proposal information sent by the follower, including the write request and the synchronization request;
the ACK message is the follower's reply to the proposal, and if more than half of the followers pass, the proposal is committed; the
REVALIDATE message is used to extend the SESSION valid time.


The leader's workflow diagram is as follows:



Leader workflow

Follower workflow

Follower has four main functions:
1. Sending requests to the Leader (PING message, REQUEST message, ACK message, REVALIDATE message);
2. Receive the Leader message and Process;
3. Receive the client's request, and if it is a write request, send it to the Leader for voting;
4. Return the Client result.
The follower's message loop processes the following messages from the leader:
5.PING message: heartbeat message;
6.PROPOSAL message: a proposal initiated by the leader, requiring followers to vote;
7.COMMIT message: information about the latest proposal on the server side;
8. UPTODATE message: Indicates that the synchronization is completed;
9. REVALIDATE message: According to the REVAIDATE result of the Leader, close the session to be revalidated or allow it to accept the message;
10. SYNC message: Return the SYNC result to the client, this message is originally initiated by the client, with to force the latest update.
The workflow diagram of Follower is as follows:



Workflow of Follower

Application Chapter

  The operation of distributed system is very complicated, because it involves uncontrollable situations such as network communication and node failure. The following describes the main problems that can be encountered in the most traditional master-workers model, how to solve the traditional method and how to use zookeeper to solve it.

Master node management The most important thing in the

  cluster is the Master, so a Backup of the Master is generally set up.
The Backup will periodically obtain Meta information from the Master and check the survivability of the Master. Once the Master hangs, the Backup will start immediately and take over the work of the Master to become the Master itself. There are various distributed situations, because it involves the jitter of the network communication, for the following Case:
1. The traditional way for Backup to detect Master survivability is to send packets regularly. Once no response is received within a certain period of time, it is determined that the Master is down, and then the Backup starts. If the Master is not actually down, the Backup does not receive a response or the response is delayed. The reason is because of network congestion problems? Backup is also started. At this time, there are two Masters in the cluster. It is very likely that some workers report to the Master, and another part of the workers reports to the Backup that was started later. Now the services are completely messed up.
2. Backup regularly synchronizes the meta information in the Master, so it always lags behind. Once the Master hangs, the Backup information must be old, which may affect the cluster running status.
Solve the problem:
The Master node is highly available and guaranteed to be unique.
Timely synchronization of Meta information.
Zookeeper Master election
  Zookeeper will assign a number to the client registered on it, and zk will guarantee the uniqueness and incrementality of this number. It only needs to select the Client with the smallest number as the Master among the N machines, and guarantee these All machines maintain the same view of meta information. Once the Master hangs up, the smallest number of the N machines is qualified as the Master, and the Meta information is consistent.

Cluster worker management

  It is very possible for the workers in the cluster to hang up. Once worker A hangs up, if there are other workers that need to communicate with each other, then the workers must update their hosts list as soon as possible, remove the hung workers, and stop communicating with it. , and what the Master has to do is to schedule the jobs that hang on the worker to other workers. Similarly, the worker is back to normal again, and other workers should be notified to update the hosts list. The traditional method is to have a special monitoring system, which constantly sends heartbeat packets (such as ping) to find out whether the worker is alive. The defect is the problem of timeliness, which cannot be applied to scenarios with high online rate requirements.
Solve problem:
cluster worker monitoring.
Zookeeper to monitor the cluster
  Use zookeeper to establish strong consistency of znodes, which can be used in scenarios that have high requirements on the state of the machines in the cluster and the online rate of the machines, and can quickly respond to changes in the machines in the cluster.

Distributed lock It is relatively simple to require multiple processes or threads to operate the same resource

  on a machine, because there can be a large amount of status information or log information to provide guarantees. For example, two processes A and B write a file at the same time, and locking is required. can be realised. But what about distributed systems? A three-party lock allocation mechanism is required. Hundreds of workers write to files in the same network. How to coordinate? And how to ensure efficient operation?
Solve the problem:
efficient distributed distributed lock
Zookeeper distributed lock
  Distributed lock is mainly due to the strong consistency of data guaranteed by ZooKeeper for us, the uniqueness and incremental performance of zookeeper znode node creation to ensure that all workers who come to grab the lock atomicity.

Profile management

  The update and synchronization of configuration files in the cluster are very frequent. The traditional configuration file distribution needs to distribute the configuration file data to each worker, and then reload the worker. This method is the most stupid method and the structure is difficult. Maintenance, because if the configuration files of many kinds of applications in the cluster need to be synchronized, and the efficiency is very low, the cluster size is very large and the load is very high. Another is to save the configuration file in a database for each update, and then periodically pull the data on the worker side. In this way, the timeliness of the data cannot be synchronized.
Solve the problem:
unify configuration file distribution and make workers take effect in time
Zookeeper publish and subscribe model The
  publish and subscribe model, the so-called configuration center, as the name implies, is that the publisher publishes the data to the ZK node for the subscriber to dynamically obtain the data and realize the configuration information centralized management and dynamic updates. For example, global configuration information, service address list of service-based service framework, etc. are very suitable for use.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326763008&siteId=291194637