ZooKeeper (1): Basic introduction

What is ZooKeeper?

ZooKeeper is a distributed, open source distributed application collaboration service. The design goal of ZooKeeper is to encapsulate those complex and error-prone distributed consistency services to form an efficient and reliable primitive set, and provide users with a series of simple and easy-to-use interfaces.

ZooKeeper development history

ZooKeeper originated from a research group in Yahoo Research. At that time, researchers found that many large systems within Yahoo basically needed to rely on a similar system for distributed collaboration, but these systems often had distributed single-point problems.

Therefore, Yahoo's developers developed a general-purpose distributed coordination framework without a single point of problem, which is ZooKeeper. ZooKeeper has since been widely used in the open source community. The following lists how three famous open source projects use ZooKeeper:

  • Hadoop: Use ZooKeeper for high availability of Namenode.
  • HBase: Ensure that there is only one master in the cluster, save the location of the hbase:meta table, and save the list of RegionServers in the cluster.
  • Kafka: cluster member management, controller node election.

insert image description here

ZooKeeper application scenarios

Many distributed coordination services can be implemented with ZooKeeper. Typical application scenarios are as follows:

  • Configuration management: If we do ordinary Java applications, the general configuration item is a local configuration file. If it is a microservice system, each independent service must use centralized configuration management. At this time, ZooKeeper is needed.
  • DNS service
  • Group membership management (group membership): For example, the HBase mentioned above is actually used for group membership management of the cluster.
  • Various distributed locks

ZooKeeper is suitable for key data related to storage and coordination, and is not suitable for large data storage. If you want to store KV or a large amount of business data, you still need to use a database or other NoSql to do it.

Why is ZooKeeper not suitable for large data storage? There are two main reasons:

  1. Design aspect: ZooKeeper needs to load all data (its data tree) into memory. This determines that the amount of data stored by ZooKeeper is limited by memory. In this regard, ZooKeeper is similar to Redis. General database systems such as MySQL (if the InnoDB storage engine is used) can store data larger than memory, because InnoDB is a B-Tree-based storage engine. Both the B-tree storage engine and the LSM storage engine can store data larger than memory.
  2. Engineering: The design goal of ZooKeeper is to provide data storage for collaborative services. High availability and performance of data are the most important system indicators, and handling large quantities is not the primary goal of ZooKeeper. Therefore, ZooKeeper will not do too much engineering optimization for large amounts of storage.

Use of the ZooKeeper service

To use the ZooKeeper service, first of all, our application needs to introduce the ZooKeeper client library, and then our client library and the ZooKeeper cluster communicate over the network to use the ZooKeeper service, which is essentially a Client-Server architecture. Our application acts as a client end to call the services on the ZooKeeper Server end.

insert image description here

ZooKeeper data model

insert image description here

ZooKeeper's data model is a hierarchical model. Hierarchical models are common in file systems. Hierarchical model and key-value model are two mainstream data models. ZooKeeper uses the file system model mainly based on the following two considerations:

  1. The tree structure of the file system is convenient for expressing the hierarchical relationship between data.
  2. The tree structure of the file system facilitates the allocation of separate namespaces for different applications.

ZooKeeper's hierarchical model is called a data tree. Each node of the Data tree is called a znode. Unlike a file system, each node can store data. Each node has a version (version), version counting from 0.
insert image description here

There are two subtrees in the data tree shown above, one for application 1 (/app1) and the other for application 2 (/app2).

The subtree of application 1 implements a simple group membership protocol: each client process pi creates a znode p_i under /app1, as long as /app1/p_i exists, it means that the process pi is running normally.

data tree interface

ZooKeeper provides a simplified file system API for accessing data trees:

  • Use UNIX-style pathnames to locate znodes, eg /A/X for child node X of znode A.
  • The znode data only supports full writing and reading, and does not support partial writing and reading like the general file system.
  • All APIs of the data tree are wait-free, and the API calls being executed will not affect the completion of other APIs.
  • The APIs of the data tree are all wait-free operations on the file system, and do not directly provide a distributed coordination mechanism such as locks. But the API of the data tree is very powerful and can be used to implement a variety of distributed collaboration mechanisms.

znode classification

A znode can be persistent or ephemeral, and znode nodes can also be sequential. Each sequential znode is associated with a unique monotonically increasing integer, so ZooKeeper mainly has the following four types of znodes:

  • Persistent znode (PERSISTENT): ZooKeeper downtime, or client downtime, this znode will not be lost once created.
  • Temporary znode (EPHEMERAL): ZooKeeper is down, or the client does not connect to the server within the specified timeout time, it will be considered lost.
  • Persistent sequential znode (PERSISTENT_SEQUENTIAL): In addition to the characteristics of a persistent znode, the name of the znode is sequential.
  • Temporary sequential znode (EPHEMERAL_SEQUENTIAL): In addition to the characteristics of a temporary znode, znode names are sequential.

Summarize

This article mainly introduces the basic concepts, development history and application scenarios of ZooKeeper, and introduces the ZooKeeper data model in detail, laying a solid foundation for more in-depth learning later.

Guess you like

Origin blog.csdn.net/weixin_44816664/article/details/130829147