Introduction to basic knowledge of zookeeper (1)

We used Zookeeper to register the hadoop service when we built the hadoop distributed environment before. So what role does Zookeeper play in a distributed environment? This time we will discuss this issue.

In a distributed system, there are usually multiple machines forming a cluster to provide services to the outside world. Externally, it does not care how many machines are providing services. Internally, it is a question of how the machines that make up the cluster coordinate with each other. question. Zookeeper will provide such a function - distributed coordination technology.

1. Overview of Zookerrper

ZooKeeper is a highly available, high-performance and consistent open source coordination service designed for distributed applications. It provides a basic service: distributed lock service. Due to the open source nature of ZooKeeper, our developers later explored other usage methods based on distributed locks: configuration maintenance, group services, distributed message queues, distributed notification/coordination, etc.

The performance characteristics of ZooKeeper determine that it can be used in large, distributed systems. In terms of reliability, it does not crash due to a single node error. Among other things, its strict sequential access control means that complex control primitives can be applied on the client side. ZooKeeper's guarantee of consistency, availability, and fault tolerance is also the success of ZooKeeper. All its success is inseparable from the protocol it adopts - the Zab protocol.

When ZooKeeper implements these services, it first designs a new data structure - Znode, and then defines some primitives based on the data structure, that is, some operations on the data structure. Having these data structures and primitives is not enough, because our ZooKeeper works in a distributed environment, and our services are sent to our distributed applications in the form of messages through the network, so we also need a notification mechanism - Watcher mechanism. So to sum up, the services provided by ZooKeeper are mainly realized through three parts: data structure + primitive + watcher mechanism. Then I will introduce ZooKeeper from these three aspects.

2. Zookeeper's data model Znode

Znode is used to store node information, and its data model is very similar to our usual file management system directory tree. the difference lies in:

1) Path reference

Znode manages child nodes through path references. Paths must be absolute, so paths always start with a slash character. Second, like our filesystem, paths must be unique.

2) Znode data structure

Znode not only maintains node information, but also stores some related information itself. This information can be divided into 3 parts:

  1. stat: status information, describing the version of Znode and permission information;
  2. data; data information associated with the Znode;
  3. children: child node information under the Znode;

The information in stat is closely related to our usual operations, and stat contains the following fields:

  1. czxid: The zxid that caused this znode to be created
  2. mzxid: the last updated zxid of the znode
  3. ctime: milliseconds the znode was created (since 1970)
  4. mtime: milliseconds since the znode was last modified (since 1970)
  5. version: znode data change number
  6. cversion: znode child node change number
  7. aversion: the change number of the znode access control list
  8. ephemeralOwner: If it is a temporary node, this is the session id of the znode owner. 0 if not an ephemeral node
  9. dataLength: the data length of the znode
  10. numChildren: the number of znode children

We will analyze the meaning of these fields later.

The data part does not store some very large data. Zookeeper is not designed as a conventional data warehouse or database. As a distributed coordination task scheduler, it usually saves some necessary configuration files, status, and routing information. The storage of this information is usually not too large. Zookeeper Both the server and client are designed to strictly check the data size of each Znode to a maximum of 1M.

3) Data access

The data stored by each node in Zookeeper is required to be an atomic operation. Each node has its own ACL (Access Control List), which specifies the permissions of users.

4) Node type

Nodes in zk are divided into two types: ephemeral nodes and permanent nodes. The type of a node is determined when it is created and cannot be changed.

  1. Ephemeral Nodes: The lifetime of nodes depends on the session that created them. Ephemeral nodes are automatically deleted once the session ends. Generally, although temporary nodes are created by a client that initiates a session, they are visible to all clients. In addition , zk stipulates that temporary nodes cannot have child nodes.
  2. Permanent node: The life cycle of this node does not depend on the session, and it will be deleted only when the client initiates a delete command.

5) Observation

Clients can set up watches, ie monitors, on nodes. When the node changes, start the operation corresponding to the watch, zk will send one and only one notification to the client. Because the watch can only be started once, this reduces network traffic consumption.

2.1 Time in zk

The time recorded in zk is not a simple timestamp, it contains the following properties:

Zxid

Each operation that changes the state of a node in zk will be recorded with a timestamp in Zxid format, which is globally ordered. That is, each change to a node will record a globally unique timestamp. If the value of Zxid1 is less than Zxid2, then the time of Zxid1 must be before Zxid2. In fact, each node of zk maintains three Zxids: cZxid, mZxid, pZxid:

 cZxid: 是节点的创建时间所对应的Zxid格式时间戳。
 mZxid: 是节点的修改时间所对应的Zxid格式时间戳。
 pZxid:  最新修改的Zxid,是不是与mZxid重复了。

Zxid is a 64-bit number, and the upper 32 bits are the epoch used to identify whether the leader relationship has changed. Every time a leader is elected, he will have a new epoch. The lower 32 bits are an up counter.

Version version number

Every operation on a node increments the node's version number. Each node maintains three version numbers:

  1. version: node data version number
  2. cversion: child node version number
  3. aversion: ACL version number owned by the node

3. Basic operations in zk

create : create a Znode (if it is a zi node, the parent node must exist)
delete : delete a Znode (Znode has no child nodes)
exists : test whether a Znode exists, if it exists, get its metadata information
getACL/setACL : get/set ACL for Znode Information
getChildren: Get a list of all child nodes of Znode
getData/setData: Get/set relevant data
of Znode sync: It is the synchronization between the client's Znode master and zk

operate illustrate
create Create a Znode (if it is a zi node, the parent node must exist)
delete Delete Znode (Znode has no children)
exists Test whether the Znode exists, and get its metadata information if it exists
getACL/setACL Get/set ACL information for Znode
getChildren Get a list of all child nodes of Znode
getData/setData Get/set related data of Znode
sync Is the client's Znode master and apprentice sync with zk

4. watch trigger

ZooKeeper can set watches for all read operations, including exists(), getChildren(), and getData(). watch events are one-time triggers.

Watch types can be divided into two categories:

  • Data watch (data watches): getData and exists are responsible for setting data watch
  • Child watch (child watches): getChildren is responsible for setting the child watch

5. Application scenarios of zk

1) Distributed lock

Shared locks are easy to implement in the same process, but not so easy to implement across processes or in different servers. zk to achieve this function is very easy.

In the implementation, the server that obtained the lock creates an EPHEMERAL_SEQUENTIAL directory node, and then calls the getChildren method to obtain whether the smallest directory node in the current directory node list is the directory node created by itself, if it is created by itself, then it gets This lock, if not, then it calls the exists(String path, boolean watch) method and monitors the changes in the directory node list on Zookeeper until the node created by itself is the directory node with the smallest number in the list, so as to obtain the lock and release the lock very quickly. Simple, just delete the directory node it created by itself earlier.

Specific steps are as follows:

Locking: ZooKeeper will implement the locking operation as follows:

  1. ZooKeeper calls the create() method to create a node with a path of the format " locknode /lock-", which is of type sequence (continuous) and ephemeral (ephemeral). That is, the created nodes are temporary nodes, and all nodes are numbered consecutively, that is, in the "lock-i" format.

  2. Call the getChildren() method on the created lock node to obtain the minimum numbered node in the lock directory, and do not set watch.

  3. The node acquired in step 2 happens to be the node created by the client in step 1, then the client acquires this type of lock and then exits the operation.

  4. The client calls the exists() method on the lock directory, and sets a watch to monitor the status of successive temporary nodes in the lock directory that are one smaller than itself.

  5. If the state of the monitoring node changes, jump to step 2 and continue to perform subsequent operations until the lock competition is exited.

2) Configuration management (data publishing and subscription)

In a distributed system, we will deploy a service application to n servers respectively. The configuration files of these servers are the same. If the configuration options of the configuration files change, then we have to change these configuration files one by one. If we need to change fewer servers, these operations are not too troublesome. If we have a lot of distributed servers, then changing configuration options is a troublesome and dangerous thing. At this time, we can save the configuration information in a directory node of Zookeeper, and then monitor the status of the configuration information for all application machines that need to be modified. Once the configuration information changes, each application machine will receive a notification from Zookeeper, and then Get new configuration information from Zookeeper and apply it to the system.

3) Cluster management

Zookeeper can easily implement the function of cluster management. If multiple servers form a service cluster, then a "manager" must know the service status of each machine in the current cluster. Once a machine cannot provide services, other clusters in the cluster It must be known to make adjustments to reallocate the service strategy. Similarly, when the service capacity of the cluster is increased, one or more servers will be added, and the "manager" must also be informed.

Zookeeper can not only help you maintain the service status of the machines in the current cluster, but also help you choose a "manager" to manage the cluster, which is another function of Zookeeper, Leader Election.

They are implemented by creating an EPHEMERAL type directory node on Zookeeper, and then each Server calls the getChildren(String path, boolean watch) method on the parent directory node where they created the directory node and sets watch to true, because it is EPHEMERAL Directory node, when the server that created it dies, the directory node is also deleted, so Children will change, then the Watch on getChildren will be called, so other servers know that a certain server has died. Adding a new server is the same principle.

How Zookeeper implements Leader Election, that is, selects a Master Server. As before, each Server creates an EPHAMERAL directory node, the difference is that it is also a SEQUENTIAL directory node, so it is an EPHEMERAL_SEQUENTIAL directory node. The reason why it is the EPHEMERAL_SEQUENTIAL directory node is because we can number each server, and we can choose the server with the smallest number as the master. If the server with the smallest number dies, because it is an EPHEMERA node, the node corresponding to the dead server is also is deleted, so there is another node with the smallest number in the current node list, and we select this node as the current Master. In this way, the dynamic selection of the master is realized, which avoids the problem that a single master is prone to a single point of failure in the traditional sense.

4) Queue management

Zookeeper can handle two types of queues:

  • When the members of a queue are all gathered, the queue is available, otherwise it will wait for all members to arrive. This is a synchronous queue.
  • Queues are enqueued and dequeued in a FIFO manner, such as implementing the producer and consumer models.

A. The realization idea of ​​synchronous queue implemented by Zookeeper is as follows:

Create a parent directory /synchronizing, each member monitors whether the flag (Set Watch) bit directory /synchronizing/start exists, and then each member joins this queue. The way to join the queue is to create a temporary directory node of /synchronizing/member_i, Then each member gets/synchronizing all directory nodes of the directory, ie member_i. Determine whether the value of i is already the number of members. If it is less than the number of members, wait for the appearance of /synchronizing/start. If it is already equal, create /synchronizing/start.

B. FIFO queue:

The idea of ​​​​implementation is also very simple, that is to create a subdirectory /queue_i of type SEQUENTIAL in a specific directory, so as to ensure that all members are numbered when they join the queue, and the getChildren( ) method can return all current Elements in the queue, and then consume the smallest one of them, which guarantees FIFO.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324415830&siteId=291194637