Introduction and core principles of ZK zookeeper

1 Introduction

Zookeeper is an open source distributed Apache project that provides coordination services for distributed applications.

ZooKeeper is a distributed, open source coordination service for distributed applications. It exposes a set of simple primitives, and distributed applications can implement higher-level services for synchronization, configuration maintenance, and group and naming based on these primitives. It is designed to be easy to program and uses a data model set up in accordance with the familiar directory tree structure of the file system. It runs in Java and has bindings between Java and C.

As we all know, coordination services are difficult to achieve. They are particularly prone to errors such as game conditions and deadlocks. The motivation behind ZooKeeper is to relieve distributed applications from the responsibility of implementing coordination services from the ground up.

Attach the official website address: https://zookeeper.apache.org

2 core functions

2.1 Working mechanism

Zookeeper understands from the perspective of design patterns: it is a distributed service management framework designed based on the observer pattern. It is responsible for storing and managing the data that everyone cares about, and then accepts the registration of the observer. Once the state of these data changes, Zookeeper will It will be responsible for notifying those observers who have registered on Zookeeper to react accordingly.

Zookeeper = file system + notification mechanism

Insert picture description here

2.2 Features

Insert picture description here

  1. Zookeeper: a leader (Leader), a cluster of multiple followers (Follower).
  2. As long as there is 半数以上节点alive in the cluster, the Zookeeper cluster can serve normally
  3. Global data is consistent: each server saves a copy of the same data, no matter which server the client is connected to, the data is consistent
  4. Update requests are carried out in sequence, and update requests from the same Client are executed in the order in which they are sent
  5. Data update is atomic, and a data update either succeeds or fails.
  6. Real-time, within a certain time range, Client can read the latest data.

2.3 Data structure

The structure of the ZooKeeper data model Unix文件系统很类似can be regarded as a tree as a whole, and each node is called a ZNode. 1MBThe data that each ZNode can store by default can be passed by each ZNode 其路径唯一标识.

Insert picture description here

2.4 Application scenarios

The services provided include: unified naming service, unified configuration management, unified cluster management, dynamic online and offline server nodes, soft load balancing, etc.

2.4.1 Uniform Naming Service

Naming service is a relatively common type of scenario in a step-by-step implementation system. In a distributed system, the named entity can usually be a machine in a cluster, a service address provided or a remote object, etc., through the naming service, the client can specify The name is used to obtain information about the entity, service address, and provider of the resource. The most common one is the naming of the service address list of the RPC framework. Zookeeper can also help application systems to locate and use resources through resource reference. The resource positioning of naming services in a broad sense is not a real physical resource. In a distributed environment, upper-level applications only need a global unique name. Zookeeper can implement a distributed global unique ID allocation mechanism. (The problem with the UUID method is that the generated string is too long, wastes storage space, and the irregular string is not conducive to development and debugging)
By calling the API interface created by the Zookeeper node, a sequence node can be created, and the API return value will be Return the full name of this node. Using this feature, you can generate a global ID. The steps are as follows

1. According to the task type, the client creates a sequence node under the specified type of task by calling the interface, such as "job-".

2. After the creation is complete, a complete node name will be returned, such as "job-00000001".
3. After the client splicing the type and return value, it can be used as a globally unique ID, such as "type2-job-00000001"

2.4.2 Unified Configuration Management

1) In a distributed environment, configuration file synchronization is very common.

  1. It is generally required that the configuration information of all nodes in a cluster is consistent, such as a Kafka cluster.
  2. After modifying the configuration file, it is hoped that it can be quickly synchronized to each node.

2) Configuration management can be implemented by ZooKeeper

  1. The configuration information can be written to a Znode on ZooKeeper
  2. Each client server listens to this Znode
  3. Once the data in the Znode is modified, ZooKeeper will notify each client server.

2.4.3 Unified cluster management

1) In a distributed environment, it is necessary to grasp the status of each node in real time.

  1. Some adjustments can be made according to the real-time status of the node.

2) ZooKeeper can realize real-time monitoring of node status changes

  1. The node information can be written to a ZNode on ZooKeeper.

  2. Monitor this ZNode to get its real-time status changes.

2.4.5 Server nodes dynamically go online and offline

The client has real-time insight into the changes of the server's online and offline

Insert picture description here

2.4.6 Soft load balancing

Record the number of visits to each server in Zookeeper, and let the server with the least number of visits handle the latest client requests.

3 internal principles of Zookeeper

3.1 Election mechanism

1) Half mechanism: More than half of the machines in the cluster survive and the cluster is available. So Zookeeper is suitable for installing an odd number of servers.

2) Although Zookeeper does not specify Master and Slave in the configuration file. However, when Zookeeper works, one node is Leader, and the others are Followers. Leaders are temporarily generated through an internal election mechanism.

3) Take a simple example to illustrate the entire election process.

Insert picture description here

(1) Server 1 is started. At this time, only one of the servers is started. There is no response to the message it sends, so its election status is always LOOKING.

(2) Server 2 is started. It communicates with the server 1 that was started at the beginning, and exchanges its election results with each other. Since both have no historical data, the server 2 with the larger id value wins, but because it does not reach more than half All servers agree to vote for it (more than half in this example are 3), so servers 1 and 2 continue to maintain the LOOKING state.

(3) Server 3 starts. According to the previous theoretical analysis, server 3 becomes the boss of servers 1, 2, and 3. The difference from the above is that at this time, three servers elected it, so it became the election of this time. Leader.

(4) Server 4 starts. According to the previous analysis, in theory, server 4 should be the largest among servers 1, 2, 3, and 4. However, since more than half of the servers have elected server 3, server 4 can only become follower.

(5) Server 5 starts, and can only become a follower.

3.2 Node type

Insert picture description here

1) Znode has two types:

  • Ephemeral: After the client and server are disconnected, the created node is deleted by itself
  • Persistent (persistent): After the client and server are disconnected, the created node will not be deleted

2) Znode has four types of directory nodes (persistent by default)

(1) Persistent directory node (PERSISTENT)

After the client disconnects from zookeeper, the node still exists

(2) Persistent sequence number directory node (PERSISTENT_SEQUENTIAL)

After the client is disconnected from zookeeper, the node still exists, but Zookeeper gives the node name a sequential number

(3) Temporary directory node (EPHEMERAL)

After the client disconnects from zookeeper, the node is deleted

(4) Temporary sequence numbering directory node (EPHEMERAL_SEQUENTIAL)

After the client disconnects from zookeeper, the node is deleted, but Zookeeper gives the node name a sequential number

3) Set the sequence identifier when creating a znode, a value will be appended after the znode name, and the sequence number is a monotonically increasing counter maintained by the parent node

4) In a distributed system, the sequence number can be used to sort all events globally, so that the client can infer the sequence of events through the sequence number

3.3 Stat structure

1) czxid-the transaction zxid that created the node

Every time the ZooKeeper state is modified, a timestamp in the form of zxid will be received, which is the ZooKeeper transaction ID.

The transaction ID is the total sequence of all modifications in ZooKeeper. Each modification has a unique zxid. If zxid1 is less than zxid2, then zxid1 occurs before zxid2.

2) ctime-the number of milliseconds that the znode was created (since 1970)

3) mzxid-last updated transaction zxid of znode

4) mtime-the number of milliseconds last modified by znode (since 1970)

5) The last updated child node zxid of pZxid-znode

6) cversion-znode child node change number, znode child node modification times

7) dataversion-znode data change number

8) aclVersion-the change number of the znode access control list

9) ephemeralOwner- If it is a temporary node, this is the session id of the znode owner. If it is not a temporary node, it is 0.

10)dataLength- znode的数据长度

11)numChildren - znode子节点数量

3.4 Monitoring principle

Insert picture description here

1. Detailed explanation of monitoring principle :

  1. First, there must be a main() thread
  2. Create a Zookeeper client in the main thread. At this time, two threads will be created, one is responsible for network connection communication (connet), and the other is responsible for listening (listener)
  3. Send the registered monitoring events to Zookeeper through the connect thread.
  4. Add the registered listener event to the list of registered listeners in Zookeeper
  5. Zookeeper will send this message to the listener thread when it detects data or path changes.
  6. The process() method is called inside the listener thread

2. Common monitoring :

  1. Monitor changes in node data get path [watch]
  2. Monitor changes in child nodes ls path [watch]

3.5 Write data flow

Insert picture description here

4 Related information

  • The blog post is not easy, everyone who has worked so hard to pay attention and praise, thank you

Guess you like

Origin blog.csdn.net/qq_15769939/article/details/115224121