1. Zookeeper overview

Zookeeper is an open source distributed coordination service framework, mainly used to solve the consistency problem and data management problem of the application system in the distributed cluster

2: Features of Zookeeper

Zookeeper is essentially a distributed file system, suitable for storing small files, and can also be understood as a database

On the left side of the figure above, what is stored in Zookeeper is actually one Znode after another, and Znode is a node in Zookeeper
Znode has a path, such as /data/host1, /data/host2, this path can also be understood as the name of Znode
Znode can also carry data, for example, the path of a certain Znode is /data/host1, and its value is a string "192.168.0.1"
Because of the characteristics of Znode, Zookeeper can provide an attempt similar to the file system, and can operate Zookeeper by operating the file system
- Get Znode using path
- Get data carried by Znode
- Modify the data carried by Znode
- Delete Znodes
- Add Znodes

3. Zookeeper application scenarios

3.1 Data publish/subscribe

The data publishing/subscribing system requires publishers to publish data to Zookeeper nodes for subscribers to subscribe to, thereby achieving the purpose of dynamically obtaining data, realizing centralized management of configuration information and dynamic update of data.

There are generally two design modes for publish/subscribe: push mode and pull mode. The server actively sends data updates to all subscribed clients called push mode; the client actively requests to obtain the latest data is called pull mode.

Zookeeper adopts a combination of push and pull mode. The client registers the node that it needs to pay attention to to the server. Once the data of the node changes, the server will push the Watcher event notification to the corresponding client. After the client receives the notification , take the initiative to obtain the latest data from the server.

3.2 Naming service

Naming service is a common type of scenario in the step-by-step implementation system. In a distributed system, the named entity can usually be a machine in the cluster, a service address provided, or a remote object. Through the naming service, the client can specify In a distributed environment, upper-layer applications only need a globally unique name. Zookeeper
can implement a distributed globally unique ID allocation mechanism.

A sequential node can be created by calling the API interface created by the Zookeeper node, and the full name of the node will be returned in the API return value. Using this feature, a global ID can be generated. The steps are as follows

According to the task type, the client creates a sequence node by calling the interface under the task of the specified type, such as "job-".
After the creation is complete, a complete node name will be returned, such as "job-00000001".
After the client concatenates the type type and the return value, it can be used as a globally unique ID, such as "type2-job-00000001".

3.3 Distributed coordination/notification

The unique Watcher in Zookeeper is registered in the asynchronous notification mechanism, which can well realize the coordination and notification between different machines or even different systems in a distributed environment, so as to realize real-time processing of data changes. The usual practice is that different clients register Watcher for the same data node on Zookeeper, and monitor the changes of the data node (including the
node itself and its sub-nodes). If the data node changes, all subscribed clients can receive it. Corresponding Watcher notification, and make corresponding processing.

In most distributed systems, the communication between system machines is nothing more than **heartbeat detection, work progress report and system scheduling**.

(1) Heartbeat detection : Different machines need to detect whether each other is running normally. Zookeeper can be used to implement heartbeat detection between machines, based on its temporary node characteristics (the life cycle of a temporary node is a client session, if the client is immediately, Its temporary node naturally no longer exists), so that different machines can create temporary sub-nodes under a designated node of Zookeeper, and different machines can judge whether the corresponding client machine is alive or not based on this temporary sub-node. System coupling can be greatly reduced through Zookeeper.

(2) Work progress report , usually after the task is distributed to different machines, it needs to report the progress of its own task execution to the distribution system in real time, you can select a node on Zookeeper, and each task client creates a temporary task under this node In this way, not only can you judge whether the machine is alive, but also each machine can write its own task execution progress to the temporary node, so that the central system can obtain the task execution progress in real time

(3) System scheduling . Zookeeper can realize the following system scheduling mode: the distributed system consists of two parts: the console and some client systems. The responsibility of the console is to send some instruction information to all clients to control them Corresponding business logic, background management personnel do some operations on the console, in fact, modify the data of some nodes on Zookeeper, and Zookeeper can send data changes to subscribing clients in the form of time notifications

3.4 Distributed lock

Distributed locks are used to control the synchronous access to shared resources between distributed systems. It can ensure the consistency when different systems access one or a group of resources. It is mainly divided into exclusive locks and shared locks.

Exclusive locks are also called write locks or exclusive locks. If transaction T1 adds an exclusive lock to data object O1, then only transaction T1 is allowed to read and update O1 during the entire locking period, and any other transaction is not allowed. No further operations of any kind can be performed on this data object until T1 releases the exclusive lock

① Acquire the lock. When an exclusive lock needs to be obtained, all clients create a temporary child node /exclusive_lock/lock under the /exclusive_lock node by calling the interface. Zookeeper can guarantee that only one client can be successfully created, and no successful client needs to register /exclusive_lock node monitoring.

② Release the lock. When the client that acquires the lock goes down or completes the business logic normally, the temporary node will be deleted. At this time, all clients registered to listen on the /exclusive_lock node will receive a notification and can re-initiate the distributed lock acquisition .

Shared locks are also called read locks. If transaction T1 adds a shared lock to data object O1, then the current transaction can only perform read operations on O1, and other transactions can only add shared locks to this data object until the data object is locked. All shared locks on are released. When it is necessary to acquire a shared lock, all clients will create a temporary sequential node under /shared_lock.

3.5 Distributed queue

Sometimes, multiple teams need to complete a task together. For example, team A hands over the results of Hadoop cluster computing to team B to continue the calculation, and B completes its own task before handing it over to team C to continue. This is a bit like the workflow of the business system, which is passed down one by one.

In a distributed environment, we also need a component similar to a single-process queue to realize cross-process, cross-host, and cross-network data sharing and data transfer. This is our distributed queue.

4. Zookeeper architecture

Zookeeper cluster is a highly available cluster based on master-slave architecture

Each server assumes one of the following three roles

Leader A Zookeeper cluster will only have one actual working Leader at a time, and it will initiate and maintain heartbeats with each Follwer and Observer. All write operations must be completed by the Leader, and then the Leader broadcasts the write operations to other servers.

Follower A Zookeeper cluster may have multiple Followers at the same time, and it will respond to the Leader's heartbeat. The Follower can directly process and return the read request of the client, and at the same time forward the write request to the Leader for processing, and is responsible for voting on the request when the Leader processes the write request.

The Observer role is similar to Follower, but without voting rights.

5. Zookeeper's election mechanism

Leader election is the key to ensuring the consistency of distributed data. When one of the following two situations occurs on a server in the Zookeeper cluster, it needs to enter the Leader election.

5.1. Leader election during server startup

If leader election is performed, at least two machines are required. Here, a server cluster composed of three machines is selected as an example. In the cluster initialization phase, when one server Server1 is started, it cannot perform and complete the Leader election alone. When the second server Server2 is started, the two machines can communicate with each other at this time, and each machine tries to find the Leader, so enter Leader election process. The election process is as follows

(1) Each Server issues a vote. Since it is the initial situation, both Server1 and Server2 will vote themselves as the Leader server, and each vote will include the myid and ZXID of the recommended server, represented by (myid, ZXID). At this time, the vote of Server1 is (1, 0), Server2's vote is (2, 0), and then each sends this vote to other machines in the cluster

(2) Accept votes from each server. After each server in the cluster receives the vote, it first judges the validity of the vote, such as checking whether it is the current round of voting and whether it is from a server in the LOOKING state.

(3) Process voting. For each vote, the server needs to PK other people's votes with its own vote. The PK rules are as follows

Check ZXID first. The server with a larger ZXID is given priority as the leader.
If ZXID is the same, then compare myid. The server with the larger myid acts as the leader server.

For Server1, its vote is (1, 0), and the vote received by Server2 is (2, 0). First, the ZXID of the two will be compared, both of which are 0, and then the myid will be compared. At this time, the myid of Server2 is the largest, so Update your own vote to (2, 0), and then vote again. For Server2, it does not need to update its own vote, just send the last voting
information to all machines in the cluster again

(4) Count votes. After each vote, the server will count the voting information to determine whether more than half of the machines have received the same voting information. For Server1 and Server2, it is counted that two machines in the cluster have accepted the (2, 0) vote Information, at this time it is considered that the Leader has been elected.

(5) Change the server status. Once the Leader is determined, each server will update its status. If it is a Follower, it will be changed to FOLLOWING, and if it is a Leader, it will be changed to LEADING.

5.2. Leader election during server running

During the running of Zookeeper, the Leader and non-Leader servers perform their respective duties. Even if a non-Leader server is down or newly joined, it will not affect the Leader at this time. However, once the Leader server hangs up, the entire cluster will suspend external services. Entering a new round of Leader election, the process is basically the same as the Leader election process during the start-up period.

6. Zookeeper installation

cluster planning

Server IP	CPU name	the value of myid
192.168.174.100	node01	1
192.168.174.110	node02	2
192.168.174.120	node03	3

The network segment (192.168.174) in front of the server IP should use its own network segment

Step 1: Download the compressed package of zookeeper, the download URL is as follows

http://archive.apache.org/dist/zookeeper/

We download the zk version we use as 3.4.9 from this URL

After the download is complete, upload it to the /export/soxwares path of our linux for installation

Step 2: Unzip

Unzip the zookeeper compressed package to the /export/servers path, and then prepare for installation (the path can be set by yourself, I use /export/servers here)

cd /export/software  --到上传文件的目录
tar -zxvf zookeeper-3.4.9.tar.gz -C ../servers/ --解压到servers目录

Step 3: Modify the configuration file

Modify the configuration file on the first machine

cd /export/servers/zookeeper-3.4.9/conf/
cp zoo_sample.cfg zoo.cfg
mkdir -p /export/servers/zookeeper-3.4.9/zkdatas/
vim zoo.cfg

Add at the end of the zoo.cfg file

dataDir=/export/servers/zookeeper-3.4.9/zkdatas  -- 自己的路径
# 保留多少个快照
autopurge.snapRetainCount=3
# 日志多少小时清理一次
autopurge.purgeInterval=1
# 集群中服务器地址
server.1=node01:2888:3888
server.2=node02:2888:3888
server.3=node03:2888:3888

Step 4: Add myid configuration

Create a file under the path of /export/servers/zookeeper-3.4.9/zkdatas/ on the first machine , the file name is myid, and the file content is 1

echo 1 > /export/servers/zookeeper-3.4.9/zkdatas/myid

Step 5: Install the package distribution and modify the value of myid

Distribute the installation package to other machines

Execute the following two commands on the first machine

scp -r /export/servers/zookeeper-3.4.9/ node02:/export/servers/
scp -r /export/servers/zookeeper-3.4.9/ node03:/export/servers/

Modify the value of myid on the second machine to 2

echo 2 > /export/servers/zookeeper-3.4.9/zkdatas/myid

Modify the value of myid on the third machine to 3

echo 3 > /export/servers/zookeeper-3.4.9/zkdatas/myid

Step 6: Three machines start the zookeeper service

Three machines start the zookeeper service

This command must be executed on all three machines

/export/servers/zookeeper-3.4.9/bin/zkServer.sh start

View startup status

/export/servers/zookeeper-3.4.9/bin/zkServer.sh status

7. Shell client operation of Zookeeper

Start: Go to the zookeeper folder and execute the following paragraph, where node01 is the server name, you can also use the IP address

 bin/zkCli.sh -server node01:2181

1: Create a normal node

create /app1 hello

2: Create sequence nodes

create -s /app3 world

3: Create a temporary node

create -e /tempnode world

4: Create sequential temporary nodes

create -s -e /tempnode2 aaa

5: Get node data

get /app1

6: Modify node data

set /app1 xxx

7: delete node

delete /app1 删除的节点不能有子节点
rmr /app1 递归删除

Features of Znodes

The core of the file system is the Znode
If you want to select a Znode, you need to use the path form, for example /test1/test11
Znode itself is not a file, nor is it a folder. Because Znode has a path similar to Name, it can logically implement a tree-like file system
ZK guarantees the atomicity of Znode access, and there will be no problem that some ZK nodes update successfully and some ZK nodes fail to update
The data in Znode is limited in size, the maximum can only be 1M
Znode is composed of three parts
- stat : status, Znode permission information, version, etc.
- data : data, each Znode can carry data, no matter whether there are child nodes or not
- children : list of child nodes

Types of Znodes

Each Znode has two characteristics, which can form four different types of Znodes

Persistence
- When the persistent client disconnects, the held Znode will not be deleted
- When the temporary client is disconnected, delete all the held Znodes, temporary Znodes are not allowed to have child Znodes
sequence
- Znodes created in an orderly manner have a sequence, and the order is to append a serial number at the back, and the serial number is self-incremented managed by the parent node
- Znodes created out of order have no sequence

Properties of Znodes

dataVersion data version, every time when the data in Znode changes, dataVersion will increase by itself
cversion Node version, every time when the node of Znode changes, cversion will increase automatically
aclVersion ACL (Access Control List) version number, aclVersion will increase automatically when the permission information of Znode changes
zxid transaction ID
ctime creation time
mtime time of last update
ephemeralOwner If Znode is a temporary node, ephemeralOwner indicates the SessionId associated with the node

notification mechanism

Notifications are similar to triggers in the database. Watcher is set for a certain Znode. When the Znode changes, WatchManager will call the corresponding Watcher.
When Znode is deleted, modified, created, and child nodes are modified, the corresponding Watcher will be notified
Features of Watcher
- Triggering a Watcher at one time will only be triggered once, if you need to continue monitoring, you need to add Watcher again
- Event encapsulation: The events obtained by Watcher are encapsulated, including three contents: keeperStae, eventType, path

8. Zookeeper's Java API operation

The Zookeeper Java API used here uses a zookeeper client framework Curator, which solves many very low-level details of the Zookeeper client development work.
Curator contains several packages:

Curator-framework: some encapsulation of zookeeper's underlying api
curator-recipes: encapsulates some advanced features, such as: Cache event monitoring, elections, distributed locks, distributed counters, etc.

Maven dependency (use curator version: 2.12.0, corresponding Zookeeper version: 3.4.x, if there are compatibility issues across versions, it is likely to cause node operation failure):

8.1. Create java project and import jar package

<dependencies>
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-framework</artifactId>
            <version>2.12.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-recipes</artifactId>
            <version>2.12.0</version>
        </dependency>
        <dependency>
            <groupId>com.google.collections</groupId>
            <artifactId>google-collections</artifactId>
            <version>1.0</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>1.7.25</version>
        </dependency>
    </dependencies>

8.2 Node operation

import org.apache.curator.RetryPolicy;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.CuratorFrameworkFactory;
import org.apache.curator.framework.recipes.cache.ChildData;
import org.apache.curator.framework.recipes.cache.TreeCache;
import org.apache.curator.framework.recipes.cache.TreeCacheEvent;
import org.apache.curator.framework.recipes.cache.TreeCacheListener;
import org.apache.curator.retry.ExponentialBackoffRetry;
import org.apache.zookeeper.CreateMode;
import org.junit.Test;

/**
 * @Description
 * @Author wugongzi
 * @Date 2020/7/10 21:12
 */
public class ZKTest {
    
    

    /*
    节点的watch机制
     */

    @Test
    public void watchZnode() throws Exception {
    
    
        //1:定制一个重试策略
        RetryPolicy retryPolicy = new ExponentialBackoffRetry(3000, 1);

        //2:获取客户端
        String conectionStr = "192.168.79.100:2181,192.168.79.110:2181,192.168.79.120:2181";
        CuratorFramework client = CuratorFrameworkFactory.newClient(conectionStr, 8000, 8000, retryPolicy);

        //3:启动客户端
        client.start();

        //4:创建一个TreeCache对象，指定要监控的节点路径
        TreeCache treeCache = new TreeCache(client, "/hello3");

        //5:自定义一个监听器
        treeCache.getListenable().addListener(new TreeCacheListener() {
    
    
            //@Override
            public void childEvent(CuratorFramework curatorFramework, TreeCacheEvent treeCacheEvent) throws Exception {
    
    

                ChildData data = treeCacheEvent.getData();
                if(data != null){
    
    
                    switch (treeCacheEvent.getType()){
    
    
                        case NODE_ADDED:
                            System.out.println("监控到有新增节点!");
                            break;
                        case NODE_REMOVED:
                            System.out.println("监控到有节点被移除!");
                            break;
                        case NODE_UPDATED:
                            System.out.println("监控到节点被更新!");
                            break;
                        default:
                            break;
                    }
                }
            }
        });

        //开始监听
        treeCache.start();

        Thread.sleep(1000000);
    }
    /*
     获取节点数据
     */
    @Test
    public void getZnodeData() throws Exception {
    
    
        //1:定制一个重试策略
        RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 1);
        //2:获取客户端
        String conectionStr = "192.168.79.100:2181,192.168.79.110:2181,192.168.79.120:2181";
        CuratorFramework client = CuratorFrameworkFactory.newClient(conectionStr, 8000, 8000, retryPolicy);

        //3:启动客户端
        client.start();
        //4:获取节点数据
        byte[] bytes = client.getData().forPath("/hello");
        System.out.println(new String(bytes));

        //5:关闭客户端
        client.close();

    }
    /*
      设置节点数据
     */
    @Test
    public void setZnodeData() throws Exception {
    
    
        //1:定制一个重试策略
        RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 1);
        //2:获取客户端
        String conectionStr = "192.168.79.100:2181,192.168.79.110:2181,192.168.79.120:2181";
        CuratorFramework client = CuratorFrameworkFactory.newClient(conectionStr, 8000, 8000, retryPolicy);
        //3:启动客户端
        client.start();
        //4:修改节点数据
        client.setData().forPath("/hello", "zookeeper".getBytes());
        //5:关闭客户端
        client.close();

    }
    /*
    创建临时节点
     */
    @Test
    public void createTmpZnode() throws Exception {
    
    
        //1:定制一个重试策略
        /*
            param1: 重试的间隔时间
            param2:重试的最大次数
         */
        RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000,1);
        //2:获取一个客户端对象
        /*
           param1:要连接的Zookeeper服务器列表
           param2:会话的超时时间
           param3:链接超时时间
           param4:重试策略
         */
        String connectionStr = "192.168.79.100:2181,192.168.79.110:2181,192.168.79.120:2181";
        CuratorFramework client = CuratorFrameworkFactory.newClient(connectionStr, 8000, 8000, retryPolicy);

        //3:开启客户端
        client.start();
        //4:创建节点
        client.create().creatingParentsIfNeeded().withMode(CreateMode.EPHEMERAL).forPath("/hello4","world".getBytes());

        Thread.sleep(5000);
        //5:关闭客户端
        client.close();

    }
    /*
        创建永久节点

     */
    @Test
    public void createZnode() throws Exception {
    
    
        //1:定制一个重试策略
        /*
            param1: 重试的间隔时间
            param2:重试的最大次数
         */
        RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000,1);
        //2:获取一个客户端对象
        /*
           param1:要连接的Zookeeper服务器列表
           param2:会话的超时时间
           param3:链接超时时间
           param4:重试策略
         */
        String connectionStr = "192.168.79.100:2181,192.168.79.110:2181,192.168.79.120:2181";
        CuratorFramework client = CuratorFrameworkFactory.newClient(connectionStr, 8000, 8000, retryPolicy);

        //3:开启客户端
        client.start();
        //4:创建节点
        client.create().creatingParentsIfNeeded().withMode(CreateMode.PERSISTENT).forPath("/hello2","world".getBytes());
        //5:关闭客户端
        client.close();
    }

}

What is Zookeeper and how to use it