2021-03-25 Easy-to-understand analysis of Zookeeper

       Zookeeper is a very useful production tool, the top-level product of Apache , and is widely used. Its application scope, application method, architectural logic and composition can bring many inspirations to our specific applications.

       In fact, there are many introductions about zookeeper on the Internet , but they are always relatively scattered, not popular and comprehensive enough. After reading various materials about ZooKeeper , try to summarize what ZooKeeper is. This article mainly analyzes the technical logic of zookeeper from the principle, design and application scenarios of zookeeper , and tries to interpret the technical interpretation of zookeeper from a more popular and comprehensive perspective , so that everyone can understand it, and make application reference and technical decisions.

 1. Introduction to Zookeeper

        Zookeeper is a distributed coordination system that supports high throughput (ZooKeeper is a distributed, open-source coordination service for distributed applications) . There are several concepts in this sentence: high throughput, distributed, coordinated system . What exactly are these concepts talking about?
        First, let's look at " distributed ". The so-called distributed means distributed deployment. Everyone is familiar with the concept, and there is nothing to explain. Distributed deployment is usually not a few machines, but a relatively large cluster. When the cluster needs to update the configuration and data, the operation and maintenance students manually ssh to update the nodes one by one? nonexistent. Therefore, it is a problem to consider batch management of distributed clusters and coordination of transactions. Zookeeper is a product born to solve these problems. Although it is not appropriate, it can be considered that zookeeper is the management center of a distributed cluster. Interestingly, zookeeper itself is also deployed in a distributed manner;
        secondly, let's look at " high throughput ". To be clear, the high throughput here is for zookeeper, not for the cluster managed by zookeeper. Because under normal circumstances, the clusters managed by zookeeper are relatively large and need to communicate with multiple nodes in the cluster, so a relatively large throughput is required. So, how does zookeeper achieve high throughput? In fact, it is also very simple, put data in memory like redis (although the internal data structure is still a tree directory file system), and because it is distributed, there are many zookeeper servers to provide services (zookeeper also has data stored in In the hard disk,
         it is basically the data snapshots and log data stored for backup);". The so-called coordination is the distributed coordination technology . So what is the distributed coordination technology? The distributed coordination technology is mainly used to solve the synchronization control between multiple processes in a distributed environment, so that they can access a certain criticality in an orderly manner. resources to prevent the consequences of "dirty data". At this time, some people may say that this is simple, and it is easy to solve it by writing a scheduling algorithm. People who say this may not know much about distributed systems, so it will appear This kind of misunderstanding. If these processes all run on one machine, it is relatively easy to handle. The problem is that it is in a distributed environment. Zookeeper is a tool to solve this problem, which is the "coordination" This lock ensures that shared resources are exclusively occupied by a certain process at a certain time. This ensures that multiple processes can access critical resources in a distributed environment. In this distributed environment The lock is called distributed lock .
        Therefore, in general, zookeeper is another distributed cluster coordination tool for managing distributed clusters. It has high management efficiency and is open source. From a technical perspective, ZooKeeper is a distributed application The designed high-availability, high-performance and consistent open source coordination service provides a basic service: distributed lock service. Due to the open source features of ZooKeeper, developers later explored other How to use: configuration maintenance, group service, distributed message queue, distributed notification/coordination, etc.
        To achieve an efficient distributed lock like zookeeper, it is not as simple as implementing task scheduling primitives on a single machine. Because the distribution The system is connected by the network, and the network is unreliable. If a process X accesses a critical resource, it returns failed. Does the result must be fail? I am afraid not necessarily, and the execution may have been successful; for example, service A and service B all go to access service C, and service A initiates the request first. Can you guarantee that the request from service A arrives first?

        So how is zookeeper implemented?

Second, the principle of Zookeeper

Before talking about the principle of zookeeper, let's take a look at the data structure inside zookeeper .

1, ZooKeeper data structure

        The data structure of ZooKeeper is a tree, similar to the file system of Unix. The tree contains nodes one by one, and each node can contain many descendant nodes of many levels. The specific schematic diagram is as follows: At this time, ignore the nodes in the graph The content in the box is a complete ZooKeeper internal data structure template, where each node is called Znode. There are four types of Znodes:
(1) PERSISTANT--persistent directory node
(2) PERSISTENT_SEQUENTIAL-persistent sequential numbering directory node
(3) EPHEMRAL-temporary directory node.
(4) EPHEMRAL_SEQUENTIAL-temporary sequential numbering directory node

        These four are mainly divided into two categories, persistent nodes and temporary nodes. As the name implies, a persistent node is a persistent node. Once the node is created, it will always exist unless it is actively deleted. A temporary node is a node temporarily created by the client. Once the session ends, the node will be automatically deleted. The so-called SEQUENTIAL is only persistent and temporary nodes are stored in order.

2. Internal structure of Znode

        Znode internally includes data and properties.
(1)
        The mechanism of data ZooKeeper ensures the atomicity of read and write operations, that is, ZooKeeper does not support partial read and partial write. Note at this point that ZooKeeper is not a database, and ZooKeeper cannot be used as a database. ZooKeeper stipulates that the size of data stored in each node cannot exceed 1M. Moreover, the large amount of data will definitely affect the performance. Therefore, it is necessary to store as little data as possible in the nodes.
(2) Attributes
        Each Znode has various attributes, and the client can obtain various attributes through the get command. The following are some of the more important properties:


get /module1/app2 //node
app2 //data
cZxid = 0x20000000e //transaction ID
ctime = Thu Jun 30 20:41:55 HKT 2016 //transaction update time
mZxid = 0x20000000e //node ID
mtime = Thu Jun 30 20: 41:55 HKT 2016 
pZxid = 0x20000000e 
cversion = 0 
dataVersion = 0 
aclVersion = 0 
ephemeralOwner = 0x0 
dataLength = 4 
numChildren = 0

Please pay attention to these properties, which will be used in the following.


3. Basic operation of Znode

        The basic operations of ZooKeeper on nodes are mainly divided into the following categories:

create: create a node
delete: delete a node
exists: determine whether a node exists
getData: get the data of a node
setData: set the data of a node
getChildren: get all the child nodes under the node

        The exists, getData, and getChildren here belong to the read behavior, and the rest are write. This is easy to see.
        ZooKeeper is a distributed C/S structure, but in order to facilitate understanding of the problem, let's first simplify ZooKeeper, which has only one Server and one Client--Client clients can choose whether to set watch when performing read behavior to Server. So what is a watch? In fact, it is equivalent to the client registering a callback on a specific Znode. Once the data of the node changes - if the data content is changed, the node triggers this callback to asynchronously notify the client to pull the data again.
        Then, this callback is called Watcher, and the callbacks of multiple clients form the Watcher List of the node. It should be noted here that the data update operation here does not rely on the node to push the data, nor does it rely on the node to actively (pull), but first notify and then pull, which is ZooKeeper's unique push and pull ( push-pull) mechanism. Why not just push it or wait for the client to pull it? You can think about this?
        Here is a brief answer, how does this data organization method of Znode support ZooKeeper to realize the basic function-distributed lock? What are the advantages of this data organization?
        A: Distributed locks are implemented with the help of temporary sequential nodes (EPHEMERAL_SEQUENTIAL). Multiple clients access shared resources. Each client registers a temporary sequence node, and the server notifies the access in sequence. After completion, the temporary node is destroyed. In this way, multiple clients can access shared resources without conflict, which is the basic principle of distributed lock implementation.
This way of data organization lays the structural foundation that distributed locks can implement.

 

Third, the basic architecture of ZooKeeper


1. Basic structure
        The basic principle of ZooKeeper operation is introduced above, and the simplified method of one server + one client is used, which does not exactly reflect the basic structure of ZooKeeper distributed cluster. Let's take a look at the distributed architecture of ZooKeeper in detail:

        In the figure, the upper part is the zookeeper service, and there are five nodes in the figure. The lower part is the Client, which can also be simply considered as a distributed cluster managed by zookeeper.
        Let's look at the second half first, mainly some managed clients. These clients cooperate with the zookeeper service to complete the management of the distributed cluster. One or more adjacent Clients are connected to a certain Server, and there is no situation where one Client is connected to multiple Servers;
        look at the zookeeper Service in the upper part, including 5 servers, or 3, 7, 9, 11 Wait, why is it odd? This is related to the election mechanism of zookeeper, which will be discussed below. One of the 5 servers here is the Leader, and the remaining 4 are called Followers. It can be seen that all the Followers interact with the Leader, and the Followers have no data interaction with each other;
        according to the above interaction schematic, we can probably know The overall architecture of zookeeper. It is done through the interaction between Client and Server (Follower), and then Server (Follower) and Server (Leader). The interaction here is a qualifier, and the so-called interaction is nothing more than read (Read) and write (Write). When the client initiates a read request, it can simply initiate a request to the nearest server, and the relevant data can be obtained directly (because the data of the ZooKeeper server cluster is consistent, but note that the consistency here is not strong consistency, but final consistency, which will be discussed later. speaking). However, the write operation request must be made through the leader, that is, the follower forwards the write request to the leader. After the write operation is completed, the relevant data is synchronized by the leader to each follower.

2. Consistency principle The consistency principle
        here includes the consistency of the consistency protocol and the data.
        Based on the basic structure of Zookeeper mentioned above, let's first think about a question: what if the cluster managed by ZooKeeper is OK, but ZooKeeper itself is Down?
        In fact, the leader is cold, and the remaining servers have to re-elect a leader. This is the leader election process of ZooKeeper. So how does this process work?
     (1)
        The core of ZAB protocol zookeeper to achieve data consistency is ZAB protocol (ZooKeeper Automatic Broadcast, Zookeeper atomic message broadcast protocol). This protocol needs to meet the following requirements:

a. The cluster can normally provide external services when less than half of the nodes are down;
b. All client write requests are forwarded to the leader for processing, and the leader needs to ensure that the write changes can be synchronized to all followers in real time;
c. The leader is down or When the entire cluster is restarted, it is necessary to ensure that the transactions that have been submitted on the leader server are finally committed by all servers, ensure that those transactions that are only proposed on the leader server are discarded, and ensure that the cluster can quickly recover to the state before the failure.

        The Zab protocol has two modes, crash recovery (master election + data synchronization) and message broadcast (transactional operation). It is necessary to ensure that only one main process is responsible for transaction operations at any time, and if the leader crashes or the server cannot maintain a normal connection with the leader, a new leader needs to be quickly elected. The leader election mechanism is closely related to the transaction operation mechanism. The protocol rules of these three scenarios are explained in detail below, and the data consistency principle of the ZAB protocol is explored from the details.
     (2) Leader Election
        Leader election is one of the most important technologies in ZooKeeper, and it is also the key to ensuring distributed data consistency. During the election, each server enters the LOOKING state. The election process is voting. Each server broadcasts its own voting content. The content mainly includes two fields: myId, cid (two fields in the above attributes), that is, its own ID. with the transaction ID.
        Election process:

a. Set the state to LOOKING, initialize the internal voting Vote (id, zxid) data to memory, and broadcast it to other nodes in the cluster. The first vote of the node is to elect itself as the leader, and broadcast its own service ID, the ZXID of the latest transaction request processed (ZXID is taken from the in-memory database, that is, the transaction id of the last commit completed by the node) and the current status. . Then enter the process of circular waiting and processing the voting information of other nodes;
b. In the circular waiting process, every time a node receives an external Vote information, it needs to PK it with its own memory Vote data. The rule is to take the larger ZXID. , if the ZXIDs are equal, take the vote with the larger ID. If the external vote wins the election, the node needs to overwrite the previous memory vote data and broadcast it again; at the same time, it also needs to count whether more than half of the approvers are consistent with the new memory vote data, and if not, continue to cycle and wait for new votes. If there is, it is necessary to judge whether the leader is among the approvers. If it is, it exits the cycle and the election ends. According to the election result and the switching status of their respective roles, the leader switches to LEADING, and the followers switch to FOLLOWING;

The above figure shows the logical flow process of election on a single server. Multiple servers cooperate with each other to complete the election process of the entire cluster according to this process. The specific methods of multi-server cooperation in election are as follows:

        For the algorithm details of the election process, please refer to FastLeaderElection.lookForLeader(). There are mainly three threads at work: the election thread (the thread that actively calls the lookForLeader method, and cooperates with the other two threads through the blocking queue sendqueue and recvqueue), the WorkerReceiver thread (the vote receiver , continuously obtain election messages from other servers, and save them in the recvqueue queue after filtering. When the zk server starts, it starts to work normally, and does not stop) and the WorkerSender thread (vote sender, which will continuously obtain pending sending from the sendqueue queue) votes and broadcast to the cluster). Further details are not repeated here.
     (3) Data synchronization
         Let's talk about the data synchronization process and explain the eventual consistency problem mentioned above.
        After completing the Leader Election, the Leader needs to send a synchronization data message to the Follower to ensure that the ZooKeeper cluster data remains in a consistent state. In fact, the specific process is a broadcast process of data. What is Broadcast? Simply put, when Zookeeper updates data under normal circumstances, it is broadcast by the Leader to all Followers. By the way, we take the client's request as an example to describe the process, as follows:

a. The client sends a write data request to any Follower;
b. The Follower forwards the write data request to the Leader;
c. The Leader adopts the two-stage submission method, and sends the Propose broadcast to the Follower first;
d. The Follower receives the Propose message and writes After the log entry is successful, it returns an ACK message to the Leader;
e. The Leader receives more than half of the ACK messages, returns success to the client, and broadcasts a Commit request to the Follower;

        In the above figure, if it is a failure recovery, there is no Request, and the data is obtained by the leader from the memory snapshot (as mentioned above, the data of ZooKeeper is stored in the memory), and then the data is synchronized.
        It can be seen from this process that ZooKeeper is neither strong consistency nor weak consistency, but ZooKeeper guarantees the final consistency of the cluster.

        It can be seen that ZooKeeper has mentioned the "more than half" strategy many times in its use. This strategy is a trade-off between A (availability) and C (consistency) made by zk, and it is also the essence of zk's high fault tolerance. However, ZooKeeper is essentially a distributed coordination system, the purpose is to ensure the consistency of data between governing services. So, please remember that ZooKeeper is still a CP rather than an AP, which is why Ali's Dubbo does not use ZooKeeper and turns to Nacos as a service registration and discovery structure (Note: For Dubbo registration and discovery, please refer to: https://mp .weixin.qq.com/s/RML_nofuh0vIagQc6To54A )


Fourth, the application scenarios of ZooKeeper

        As a distributed coordination system for basic distributed locks, ZooKeeper is most commonly used as a distributed lock, which is one of the reasons why ZooKeeper was originally a Hadoop product.
 1. Distributed locks
        Distributed locks mainly benefit from the strong consistency of data guaranteed by ZooKeeper for us. Lock services can be divided into two categories, one is to keep exclusive, and the other is to control timing.

(1) The so-called "maintaining exclusive" means that all clients who try to acquire the lock will eventually only be able to successfully acquire the lock. The usual practice is to regard a znode on zk as a lock, which is achieved by creating a znode. All clients go to create the /distribute_lock node, and the client that is finally successfully created also owns the lock;
(2) Control timing, that is, all clients that acquire the lock from all views will eventually be arranged to execute, There is just a global timing. The method is basically similar to the above, except that /distribute_lock already exists in advance, and the client creates a temporary ordered node under it (this can be controlled by the attribute of the node: CreateMode.EPHEMERAL_SEQUENTIAL to specify). The parent node of Zk (/distribute_lock) maintains a sequence to ensure the timing of the creation of child nodes, thus forming the global timing of each client.

2. Data subscription and distribution
         The publish and subscribe model, the so-called configuration center, as the name implies, is that publishers publish data to ZooKeeper nodes for subscribers to dynamically obtain data and realize centralized management and dynamic update of configuration information. For example, global configuration information, service address list of service-based service framework, etc. are very suitable for use.
3. Load balancing
        The load balancing mentioned here refers to soft load balancing. In a distributed environment, in order to ensure high availability, usually the same application or the same service provider will deploy multiple copies to achieve peer-to-peer services. Consumers need to choose one of these peer servers to execute related business logic, among which is the typical producer and consumer load balancing in message middleware.
        For the load balancing of publishers and subscribers in message middleware, linkedin's open source KafkaMQ and Alibaba's open source metaq both use zookeeper to achieve load balancing between producers and consumers.
4. Distributed notification/coordination
        ZooKeeper has a unique watcher registration and asynchronous notification mechanism, which can well realize notification and coordination between different systems in a distributed environment, and realize real-time processing of data changes. The method of use is usually that different systems register the same znode on the ZK, monitor the changes of the znode (including the content of the znode itself and its sub-nodes), and one system updates the znode, then the other system can receive the notification and respond accordingly. deal with.
        In addition to the above application scenarios, ZooKeeper has other application scenarios, which will not be repeated here. I believe that after reading these introductions of ZooKeeper, you can probably make a more clear decision on whether to apply and how to apply ZooKeeper in specific technical selection and application.


Five, the use of ZooKeeper

         ZooKeeper has an active community and rich documentation. The specific installation, configuration, and usage methods will not be repeated here. When using it, you can refer to several introductions in the classic practice mentioned at the beginning of this article, or you can go to the community for support, and the most important thing is to read the Manual on the official website.

 

Thank you for taking the time to read this article, please correct me if I am wrong.

Reference:
http://zookeeper.apache.org/
http://zookeeper.apache.org/doc/current/index.html
https://blog.csdn.net/gs80140/article/details/51496925
https: // blog.csdn.net/WeiJiFeng_/article/details/79775738
https://www.cnblogs.com/tommyli/p/3766189.html
https://mp.weixin.qq.com/s?__biz=MjM5MjAwODM4MA== & mid = 2650712100 & idx = 4 & sn = 94ea4679c1565f69023ef1b63e6ca5ab

Guess you like

Origin blog.csdn.net/Ango_/article/details/115207110