Zookeeper Tutorial: Getting Started

Author's other platforms:

| CSDN:blog.csdn.net/qq_41153943

| Nuggets: juejin.cn/user/651387…

| Zhihu: www.zhihu.com/people/1024…

| GitHub: github.com/JiangXia-10…

| Official account: 1024 notes

This article has a total of 5232 words, and the estimated reading time is 13 minutes

foreword

In a distributed system, the registry plays an important role and is an indispensable member of service discovery and client load balancing. In addition to the basic functions of the registration center, its stability, availability and robustness have a great impact on the smooth operation of the entire distributed system. As a mainstream distributed system in China, dubbo supports third-party middleware such as zookeeper, nacos and redis in the registry.

The high-concurrency distributed development technology system is already very large. I have been preparing to find a job some time ago, and participated in the interview. Through the interview, I can find that RPC, Dubbo, zookeeper, nacos, distributed, microservices, etc. have become job hunting tools. The most basic skill requirements.

A previous article introduced how to use nacos as a registration center: SpringCloud: Building Nacos services and service discovery . In fact, not only nacos can be used as a registration center, but zookeeper can also be used as a registration center. But Zookeeper can not only be used as a registration center.

For Zookeeper, the explanation in its official documentation is: it is a distributed service framework and a sub-project of Apache Hadoop, which is mainly used to solve some data management problems often encountered in distributed applications, such as: unified naming services, state synchronization services, cluster management, management of distributed application configuration items, etc. It can be understood that zookeeper is a file system + monitoring notification mechanism.

Today's article is to learn and learn Zookeeper together. I am also in the process of learning, so if there is something wrong, please discuss and correct me!

what is zookeeper

With the expansion of system applications and the guarantee of data volume, our system often encounters these situations:

How to ensure that all servers in a server cluster maintain the consistency of shared configuration information?

If a machine in the server cluster hangs up, how do other machines sense this change and take over the task?

For a distributed system, how to efficiently coordinate multiple services to write to the same network file and maintain consistency?

How to add machines without restarting the cluster?

In order to solve the above problems, a tool similar to the thread coordination mechanism is needed to allow various services to work together. And zookeeper is such a tool.

As mentioned above, Zookeeper is explained in its official documentation: it is a distributed service framework and a sub-project of Apache Hadoop. It is mainly used to solve some data management problems often encountered in distributed applications, such as : Unified naming service, state synchronization service, cluster management, management of distributed application configuration items, etc.

So it can be understood that zookeeper is a high-performance coordination service that can be used in distributed applications. Its data is stored in memory, and its persistence is implemented in logs. And its memory structure is similar to a tree structure, with the characteristics of high throughput and low latency. Zookeeper can not only help us realize a distributed unified configuration center, service registration, distributed locks, etc., they maintain state images in memory, as well as transaction logs and snapshots in persistent storage. The ZooKeeper service is available as long as a majority of servers are available. Clients connect to a single ZooKeeper server. The client maintains a TCP connection over which it sends requests, gets responses, gets watch events, and sends ticks. If the TCP connection to the server is broken, the client will connect to a different server. So you can simply think of zookeeper = file system + monitoring notification mechanism.

We can also understand it this way: the Chinese meaning of zookeeper is the zoo administrator (zoom+keeper). The role of the zookeeper is to manage the animals in the zoo and keep them in order. Zookeeper is an open source project under apache. Many open source projects under apache actually use animals as icons, such as Hadoop (elephant), Hive (bee), Pig (pig), tomcat (cat).

So it can be remembered that the project under apache is the zoo, and zookeeper is the zookeeper responsible for managing these animals (open source projects).

picture

The data structure of zookeeper

Zookeeper maintains a hierarchical data structure, which is very similar to a standard file system:

picture

Each node (directory item) in the tree structure in the above figure, such as NameService, is called a znode (directory node). Zondes are referenced by paths, and paths must be absolute, so they must begin with a slash character. In addition, they must be unique, meaning that there is only one representation of each path, so these paths cannot be changed. In zookeeper, paths consist of Unicode strings with some restrictions. The string "/ZooKeeper" is used to save management information, such as key quota information.

A znode has both the characteristics of a file and a directory. It not only maintains data structures such as data, meta information, access control list), timestamp, etc. like a file, but also can be used as a part of the path identifier like a directory, and can freely add and delete znodes.

Each znode is composed of three parts:

stat: This is status information, describing the znode version, permissions and other information

data: the data associated with this znode

children: the child nodes under the znode

It should be noted that the names of sub-nodes under the same node cannot be the same, and the naming is standardized. Its path has no concept of relative path, and it is an absolute path. Any start starts with "/", and the end is, The size of the data it stores is limited.

The node type of zookeeper

There are two types of nodes in zookeeper, namely Ephemeral Node and Persistent Node. A node's type is determined when it is created and cannot be changed.

The difference between the two nodes is whether they depend on the session (Session) to survive. A connection between a client and a ZooKeeper server is called a session. The client maintains a session by establishing a long TCP connection with the server. When the client starts, it first establishes a TCP connection with the server. Through this connection, the client can maintain a valid session with the server through heartbeat detection, and can also send The ZooKeeper server sends requests and gets responses.

(1) Temporary nodes: The life cycle of the nodes depends on the session that created them. Temporary nodes will be deleted automatically once the session ends, but can also be deleted manually. Although each ephemeral Znode is bound to a client session, they are still visible to all clients. In addition, zookeeper's temporary nodes are not allowed to have child nodes. Temporary nodes can be subdivided into: temporary directory nodes and temporary sequence numbered directory nodes.

Temporary directory node (EPHEMERAL): After the client disconnects from zookeeper, the node is deleted;

Temporary sequential number directory node (EPHEMERAL_SEQUENTIAL): After the client disconnects from zookeeper, the node is deleted, but Zookeeper sequentially numbers the node name;

(2) Permanent node: The life cycle of this node does not depend on the session, and they can be deleted only when the client shows that the delete operation is performed. Temporary nodes can be subdivided into: persistent directory nodes and persistent sequential number directory nodes.

Persistent directory node (PERSISTENT): After the client disconnects from zookeeper, the node still exists

Persistent sequential number directory node (PERSISTENT_SEQUENTIAL): After the client disconnects from zookeeper, the node still exists, but zookeeper sequentially numbers the node name.

The above classification has a concept called sequential nodes: when creating nodes, users can request to add an incremental count at the end of the zooKeeper path. This count is unique to the parent node of this node. When the client requests to create this node, zookeeper will write a unique number for this node according to the zxid status of the parent node, and this number will only keep increasing . Such nodes are called sequential nodes.

A concept called zxid is mentioned above: for every operation that changes the state of a zookeeper node, this node will receive a timestamp in Zxid format, and this timestamp is globally ordered. It can be understood that every operation that changes the node will generate a unique transaction id called Zxid. If the value of Zxid1 is smaller than the value of Zxid2, then it can be considered that the event corresponding to Zxid1 occurs before the event corresponding to Zxid2. In fact, each node of zookeeper maintains two Zxid values, namely: cZxid and mZxid.

  • cZxid: refers to the Zxid format timestamp corresponding to the creation time of the node.

  • mZxid: refers to the Zxid format timestamp corresponding to the modification time of the node.

In the implementation, Zxid is a 64-bit number, and its upper 32 bits are epoch (voting) to identify whether the Leader relationship has changed. Every time a Leader is elected, it will have a new epoch. The lower 32 bits are an incremental count.

Features of zookeeper

1. Orderliness

Zookeeper provides a variety of ways to track time. Zookeeper affixes a number (the zxid mentioned above) to each update. This number reflects the order of all zookeeper transactions. The strict order means that complex synchronization can be achieved on the client. In addition to the zxid mentioned above, there are also ticks configurations in version and zoo.cfg.

Version numbers (version number): The version number is used to record the modification times of node data or node child node list or permission information. If a node's version is 1, it means that the node has been modified once since it was created.

Each node maintains three version numbers, they are:

  • version: node data version number

  • cversion: child node version number

  • aversion: ACL version number owned by the node

A write request to a node will cause the three version numbers of the node to increase, and the principle is similar to that of an optimistic lock.

ticks : Configuration in the zoo.cfg file. When using multi-server zookeeper, the server uses a "tick" to define the time of events, such as status upload, session timeout, etc., which is exposed indirectly through the minimum session timeout (default is tick time x2), if the client request exceeds this time, Then the client can no longer connect to the server

real time: zookeeper does not use real time

So it can be understood that zookeeper is a coordinator, making some interactive connections orderly!

high speed

As mentioned earlier, zookeeper data is loaded in memory, so it has the effect of high throughput and low latency. And the speed of reading is particularly fast, and the znode size of the operation is limited to 1m. It is these characteristics that make zookeeper suitable for large distributed systems

2. Reproducible

Zookeeper data can be replicated and backed up. Zookeeper can quickly build a cluster, and it comes with some tools and mechanisms inside. We only need to set some configurations to ensure reliable services, so it will not become a single point of failure. as follows:

picture

watcher mechanism

Zookeeper allows users to register some Watchers on designated nodes. When a data node changes, the zookeeper server will send a notification of the change to interested clients. This is the core feature of zookeeper, and many functions of zookeeper are implemented based on this feature.

If two clients have registered watchers (event listeners) in the zookeeper cluster, then when the node data in zookeeper changes, zookeeper will send a notification of this change to the client, and when the client receives When this change is notified, certain pre-defined actions will be triggered. Generally speaking, zookeeper will send only one notification to the client. If a watch registers multiple interfaces (exists, getData) at the same time, if the node is deleted at this time, although this event is valid for both exists and getData, the watch only will be called once. And there may be delays in these requests, so every change that occurs on each node cannot be obtained absolutely reliably. After the watch is triggered, it will be deleted immediately. If you want to continue to monitor changes, you need to continue to provide settings for the watch. And the client can view the change result only after getting the watch notification.

picture

There are 4 conditions to trigger the watch event, create, delete, change, child (child node event)

So the characteristics of zookeeper can be summarized as follows:

1. Atomicity, the update succeeds or fails. no partial results;

2. Reliability: Data changes will not be lost unless they are overwritten and modified by the client;

3. Real-time performance: the data read by the client of the system at that time is the latest;

4. Orderliness: Client operations are all effective in order;

5. Consistency: Also known as a single system image, no matter which server is connected, the content seen by the client is the same.

Summarize

The above is a brief introduction to zookeeper. It is also a learning process. Summarized some concepts and related knowledge points about zookeeper, if there is something wrong, please point it out, exchange and discuss!

Follow-up will introduce more zookeeper related content!

related suggestion

Guess you like

Origin blog.csdn.net/qq_41153943/article/details/125584130