Hadoop of big data technology (5) - Zookeeper

Table of contents

 

1. Understanding Zookeeper

1. Concept

2. Characteristics

3. Cluster role

2. Data model

1. Data storage structure

2. Types of Znodes

3. Znode properties

 

3. Watch mechanism of Zookeeper

1. Understanding of the Watch mechanism

2. Notification status and time type of Watch mechanism

4. Zookeeper's election mechanism

1. Understanding of the election mechanism

2. Types of election mechanism

Five, Zookeeper distributed cluster deployment

1. Download and install the Zookeeper installation package

(1) Download the Zookeeper installation package

(2) Upload the Zookeeper installation package

 (3) Unzip the Zookeeper installation package

2. Zookeeper related configuration

(1) Modify the configuration file of Zookeeper

(2) Create directory and myid file

(3) Configure environment variables

(4) Distribute zookeeper files to other servers

(5) Modify the content of the myid file on hadoop02.bgd01, hadoop03.bgd01

(6) Make environment variables effective on 3 hosts

3. The startup and shutdown of the Zookeeper service

(1) Start the Zookeeper service

(2) View the role of Zookeeper

 (3) Close the Zookeeper service

6. Shell operation of Zookeeper

1. Operate Zookeeper through Shell commands

(1) Display all operation instructions

(2) View the content contained in the current Zookeeper

(3) View current node data

(4) Create a node

(5) Get the node

(6) Modify the node

(7) Listening node

(8) Delete node

Seven, Zookeeper's Java API operation

(1) Configuration related dependencies

(2) Operate Zookeeper


        The Zookeeper installation package has been placed on the Baidu network disk, and you can extract it yourself if necessary.

http://Link: https://pan.baidu.com/s/1cxNjkDg8Jp1HNKQGhC3_-w?pwd=v443 Extraction code: v443

1. Understanding Zookeeper

1. Concept

        Zookeeper is a distributed coordination framework, which is mainly used to solve the consistency problem of the application system in the distributed cluster, such as how to avoid the problem of dirty reading caused by operating the same data at the same time. It is essentially a distributed small file storage system that provides data storage based on a directory tree similar to a file system, and can effectively manage the nodes in the tree to maintain and monitor the state changes of the stored data. By monitoring the changes of these data states, data-based cluster management can be achieved, such as unified naming service, distributed configuration management, distributed message queue, distributed lock, distributed coordination and other functions.

2. Characteristics

(1) Global data uniqueness

        Each server keeps a copy of the same data, and the client connects to any node in the cluster, and the directory tree they see is consistent (that is, the data is consistent).

(2) Reliability

        If a message (operations such as adding, deleting, modifying, and checking a directory) is received by one of the servers, it will be received by all servers.

(3) Sequence

        The sequence of zookeeper is mainly divided into global order and partial order. Global order means that if message A is published before message B on one server, message A will be published before message B on all servers; Partial order means that if a message B is published by the same sender after message A, A must be ranked in front of B. Regardless of global order or partial order, the purpose is to ensure that Zookeeper's global data is consistent.

(4) Data update atomicity

        A data update operation either succeeds (more than half of the nodes succeed), or fails, and there is no intermediate state.

(5) Real-time

        Zookeeper guarantees that the client will get the update information of the server within a time interval, or the information of the server failure.

3. Cluster role

(1)Leader

        Leader is the core of Zookeeper cluster work and the only scheduler and processor of write operations. It ensures the order of cluster transaction processing, and is responsible for initiating and resolution of voting and updating system status.

(2)Follower

        Responsible for processing the client's read operation request. If a transactional request is received from the client, it will be forwarded to the Leader for processing. It is also responsible for participating in voting during the Leader election process.

(3)Observer

        Responsible for observing the latest state changes of the Zookeeper cluster and synchronizing these states. Non-transactional requests can be processed independently; for transactional requests, they will be forwarded to the Leader server for processing. It will not participate in any form of voting, and only provides non-transactional services. It is usually used to improve the cluster's non-transactional processing capabilities without affecting the cluster's transactions (improve the cluster's ability to read, and also reduce the cost of cluster election. Complexity).

2. Data model

1. Data storage structure

        The data storage structure in Zookeeper is very similar to the standard file system. It has a hierarchical namespace, which is also separated by a slash "/", and both adopt a tree-like hierarchical structure. But the standard file system is a tree composed of folders and files, while Zookeeper is a tree composed of nodes . Each node in the tree is called a Znode, and each node can have child nodes. Each Znode can store 1MB of data by default, and each Znode can be uniquely identified by its path.

2. Types of Znodes

        The type of node is specified when it is created, and it cannot be changed once it is created. There are two types of Znodes, namely temporary nodes and permanent nodes.

(1) Temporary node

        The life cycle depends on the session that created them. Once the session ends, the temporary nodes will be deleted automatically, and of course they can be deleted manually. Although each ephemeral Znode is bound to a client, they are still visible to all clients. In addition, it should be noted that temporary nodes are not allowed to have child nodes.

(2) Permanent node

        This lifecycle is not dependent on sessions, and they are only deleted when the client explicitly performs a dredging operation.

       

        Due to the serialization feature of Znode, when creating a node, the user can request to add an increasing serial number at the end of the path of the Znode. The serial number is unique to the parent node of this node, so that each child node will be recorded The order in which they were created. Its format is "%010d" (10 digits, digits without values ​​are supplemented with 0, such as 0000000001). When the count value is greater than 23-1, the counter will overflow. In this way, there will be four types of directory nodes, corresponding to the following.

PERSISTENT: permanent node;

EPHEMERAL: temporary node;

· PERSISTENT_SEQUENTIAL: Serialized permanent node;

EPHEMERAL SEQUENTIAL: Serialize ephemeral nodes.

3. Znode properties

attribute name attribute description
czxid The Zxid value of the node being created
ctime The time the node was created
mzxid The last modified Zxid value of the node
mtime The time the node was last modified
pZxid The last modified Zxid value of the node's child nodes
cversion The version number of the child node being modified
data Version data version number
acl Version ACL version number
ephemeralOwner If this node is a temporary node, then this value represents the session ID of the node owner; the value is 0
dataLength node data field length
numChildren The number of child nodes used by the node

 

 

 

 

 

 

 

 

 

 

 

 

 

3. Watch mechanism of Zookeeper

1. Understanding of the Watch mechanism

        ZooKeeper provides a distributed data publish/subscribe function. A typical publish/subscribe model system defines a one-to-many subscription relationship, allowing multiple subscribers to monitor a topic object at the same time. When the topic object itself changes , all subscribers are notified so that they can act accordingly. In ZooKeeper, the Watch mechanism is introduced to realize this distributed notification function. ZooKeeper allows the client to register a Watch with the server. When some events on the server trigger the Watch, an event notification will be sent to the specified client to realize the distributed notification function.

(1) One-time trigger

        When the Watch object changes, the event corresponding to the Watch on the object will be triggered. This kind of monitoring is one-time, and the same event will not be triggered again if the same event occurs again in the future.

(2) Event encapsulation

        Zookeeper uses WatchedEvent objects to encapsulate server-side events and deliver them. The object contains three basic properties of each event, namely notification state (keeperState), event type (EventType) and node path (path).

(3) Send asynchronously

        Watch notification events are sent asynchronously from the server to the client.

(4) Register first and then trigger

        For the Watch mechanism in Zookeeper, the client must first go to the server to register for monitoring, so that the event will be triggered and notified to the client.

2. Notification status and time type of Watch mechanism

Connection Status state meaning event type event meaning
Disconnected Connection failed NodeCreated node is created
SyncConnected connection succeeded NodeDataChanged Node data changes
AuthFailed Authentication failed NodeChildrenChanged Child node data changes
Expired Authentication succeeded NodeDeleted node is deleted

 

 

 

 

 

 

 

 

 

 

 

 

4. Zookeeper's election mechanism

1. Understanding of the election mechanism

        In order to ensure the coordinated work of each node, Zookeeper needs a Leader role when working, and Zookeeper uses the FastLeaderElection algorithm by default, and the mechanism that wins if the number of votes is more than half.

(1) Server ID

        This is the myid parameter file set when configuring the cluster, and the parameters are respectively represented as server 1, server 2, and service 3. The larger the number, the greater the weight in the FastLeaderElection algorithm.

(2) Election status

        During the election process, the Zookeeper service state has four states, which are election state (LOOKING), follower state (FOLLOWING, synchronized leader state, participating in voting), observing state (OBSERVING, synchronous leader state, not participating in voting) and leadership status (LEADING).

(3) Data ID

        This is the latest data version number stored in the server. The larger the value, the newer the data. During the election process, the newer the data, the greater the weight.

(4) Logic clock

        In layman's terms, the logical clock is called the number of votes. The logical clock value in the same round of voting is the same. The initial value of the logical clock is 0. This data will increase every time a vote is cast. Then, compared with the numerical values ​​in the ticket information returned by other servers, different judgments are made according to different values. If a machine is down, then this machine will not participate in the vote, so the logical clock will be lower than the others.

2. Types of election mechanism

        There are two types of Zookeeper election mechanisms, namely new cluster elections and non-new cluster elections.

(1) New cluster election

        The new cluster election is newly built, and there is no data ID and logical clock to affect the cluster election. Assume that there are currently 5 servers, and their numbers are 1~5 respectively, and the Zookeeper service is started sequentially according to the numbers. Let's explain the process of new cluster election

Step 1: Server 1 is started. First, it will vote for itself; second, it will send voting information. Because other machines have not started, it cannot receive the feedback information of voting, so the status of server 1 is always in the LOOKING state.

Step 2: Server 2 starts up. First, it will vote for itself; secondly, the machine that starts the Zookeeper service in the cluster initiates a voting comparison. At this time, it will exchange results with Server 1. Since Server 2 has a larger number, Server 2 wins. At this time, server 1 will vote for server 2, but the number of votes of server 2 is not more than half of the cluster (2<5/2), so the status of the two servers is still LOOKING.

Step 3: Server 3 starts, firstly, it will vote for itself; secondly, exchange information with the previously started server 1 and server 2, because server 3 has the largest number, so server 3 wins, then servers 1 and 2 will vote for For server 3, the number of votes is just greater than half (3>5/2), so server 3 becomes the leader, and servers 1 and 2 become followers.

Step 4: Server 4 starts, first, votes for itself; second, exchanges information with previously started servers 12 and 3, although server 4 has a higher number, server 3 has already won. So server 4 can only be a follower state.

Step 5: Server 5 is started, and like Server 4, it becomes a follower.

(2) Non-new cluster election

        For a normally running Zookeeper cluster, once a server goes down in the middle, when re-election is required, the server ID, data ID and logical clock need to be introduced during the election process. This is because the Zookeeper cluster has been running for a period of time, so there will be running data in the server. The following is an explanation of the non-new cluster election process.

Step 1: First, count whether the logical clock is the same, if the logical clock is small, it means that there may be a downtime problem on the way, so the data is incomplete, then the election result is ignored and the election is re-voted;

Step 2: Secondly, after unifying the logic clock, compare the data ID value, the data ID reflects the newness of the blood data, so the data ID with the larger wins;

Step 3: If the logical clock and data ID are the same, then compare the server ID (number), and the value of the big buy wins;

        Simply put, when a non-brand-new cluster is elected, it is the best among the best to ensure that the Leader is a server with the most complete and reliable data in the Zookeeper cluster.

Five, Zookeeper distributed cluster deployment

1. Download and install the Zookeeper installation package

(1) Download the Zookeeper installation package

        Download address: http://172.16.1.89/tools/HadoopInstall/zookeeper/apache-zookeeper-3.7.1-bin.tar.gz

Or in the Alibaba Cloud mirror: http://mirrors.aliyun.com/apache/zookeeper/ to download the relevant version.

(2) Upload the Zookeeper installation package

确保虚拟机安装了上传文件工具rz软件,没有执行以下指令。
yum install lrzsz -y

将安装包上传到指定目录下,这里作者上传到/export/software目录下。
cd /export/software

rz

 (3) Unzip the Zookeeper installation package

解压软件包到指定目录下,这里作者解压到/export/servers目录下。
tar -zxvf apache-zookeeper-3.7.1-bin.tar.gz -C /export/servers/

对于软件包进行重命名,便于后续管理。
mv /export/servers/apache-zookeeper-3.7.1-bin /export/servers/zookeeper

2. Zookeeper related configuration

(1) Modify the configuration file of Zookeeper

进入Zookeeper配置目录:
cd /export/servers/zookeeper/conf

执行如下命令,复制文件
cp zoo_sample.cfg zoo.cfg

编辑文件 zoo.cfg
vi zoo.cfg
    
将行 “dataDir=/tmp/zookeeper” 修改为:
dataDir=/export/data/zookeeper/zkdata
dataLogDir=/export/data/zookeeper/zklog

在文件末尾添加如下几行:

#配置Zookeeper集群的服务器编号及对应的主机名、通信端口号(心跳端口号)和选举端口号
server.1=hadoop01.bgd01:2888:3888
server.2=hadoop02.bgd01:2888:3888
server.3=hadoop03.bgd01:2888:3888

注意:这里的主机名以自己的虚拟机名字为准

73954b3a4e2e4c3d9a910758b075c0fc.png 

624e4a2d775a40f3b9e6d2c236ab425b.png 

(2) Create directory and myid file

创建目录 /export/data/zookeeper/zkdata、/export/data/zookeeper/zklog
mkdir -p /export/data/zookeeper/zkdata
mkdir -p /export/data/zookeeper/zklog

进入 /export/data/zookeeper/zkdata 目录
cd /export/data/zookeeper/zkdata

创建myid文件
echo 1 > myid

(3) Configure environment variables

在/etc/profile文件中配置Zookeeper环境变量
export ZK_HOME=/export/servers/zookeeper
export PATH=$PATH:$ZK_HOME/bin

79cbe41cd24b42d18d8872d1284421b9.png 

(4) Distribute zookeeper files to other servers

scp -r /export/servers/zookeeper hadoop02.bgd01:/export/servers/
scp -r /export/servers/zookeeper hadoop03.bgd01:/export/servers/

scp -r /export/data/zookeeper  hadoop02.bgd01:/export/data/
scp -r /export/data/zookeeper  hadoop03.bgd01:/export/data/

scp -r /etc/profile hadoop02.bgd01:/etc/profile 
scp -r /etc/profile hadoop03.bgd01:/etc/profile 

(5) Modify the content of the myid file on hadoop02.bgd01, hadoop03.bgd01

hadoop02.bgd01上/export/data/zookeeper/zkdata/myid的内容为2
vi /export/data/zookeeper/zkdata/myid

hadoop03.bgd01上/export/data/zookeeper/zkdata/myid的内容为3
vi /export/data/zookeeper/zkdata/myid

(6) Make environment variables effective on 3 hosts

source /etc/profile

3. The startup and shutdown of the Zookeeper service

(1) Start the Zookeeper service

分别在三台虚拟机上执行以下指令。
zkServer.sh start

34df53a2f6544b4e8e47df1f65134735.png 

(2) View the role of Zookeeper

分别在三台机上执行以下指令。
zkServer.sh status

329d62fafa5944359c4836f1dcec756a.png 

4ef0efd649194613b7c1ca5232da0e73.png 

7580514d257f4a92bf486d5ef828ec00.png 

 (3) Close the Zookeeper service

分别在三台虚拟上执行以下指令。
zkServer.sh stop

9925e87b88434663b5ffcd63d092d9e7.png 

6. Shell operation of Zookeeper

1. Operate Zookeeper through Shell commands

        Start and connect the Zookeeper service.

zkServer.sh start

zkCli.sh -server localhost:2181

a7b1c171bca842228f131b26d86646f2.png 

(1) Display all operation instructions

在客户端输入help,会输出所有可用的Shell指令。
help

(2) View the content contained in the current Zookeeper

6d186307a9b243718b690140c984aa56.png

        Note: There is a self-contained /zookeeper sub-node in the root directory, which is used to save Zookeeper configuration management information, do not delete it casually.

(3) View current node data

ls2 /

(4) Create a node

create [-s] [-e] path data acl

其中-s表示是否开启节点的序列化特性,-e表示开启临时节点特性,不指定则表示永久节点;Path表示路径,data表示创建节点数据,这是因为Znode可以像目录一样存在也可以像文件一样保存数据,acl用来进行权限控制。

创建序列化永久节点
create -s /testnode test

创建临时节点
create -e /testnode-temp testtemp

创建永久节点
create /testnode-p testp

a55285f2d8fd4163bdb329bd3ca2f68d.png 

(5) Get the node

ls Path [watch]
ls2 Path [watch]
get Path [watch]

其中get命令可以获取Zookeeper指定节点的数据内容和属性信息

a29a4ce5b77845eaa66ea07887cef510.png 

(6) Modify the node

set path data [version]

data表示修改的新内容,version表示版本。

示例如下:
set /testnode-temp 123

c37d4c32fb3d470cb9fa5ada6807ae20.png 

(7) Listening node

监听节点也就是监听节点的变化,可以概括为3个过程。客户端向服务端注册 Watch服务端事件发生触发 Watch、客户端回调 Watch 得到触发事件的情况。
首先,客户端向服务端注册 Watch,在服务器 hadoop01客户端的命令行输人命令,具体命令如下:
get /testnode temp watch

其次,服务端发生事件触发 Watch,在服务器 hadoop02 客户端的命令行输入命令,具体命令如下:

set /testnode-temp testwatch

最后,客户端回调 Watch 得到触发事件的情况。

(8) Delete node

delete path [version]
rmr path [version]

其中,使用delete命令删除节点时,要删除的节点存在子节点,就无法删除该节点,必须先删除子节点,才能删除父节点;rmr命令递归删除节点,无论该节点是否存在子节点。

Seven, Zookeeper's Java API operation

(1) Configuration related dependencies

       <dependency>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
            <version>3.7.1</version>
        </dependency>

(2) Operate Zookeeper

package cn.itcast.zookeeper;

import org.apache.zookeeper.*;

public class Zookeepertext {
    public static void main(String[] args) throws Exception {
        // 初始化ZooKeeper实例(zk地址、会话超时时间,与系统默认一致, watcher)
        ZooKeeper zk = new ZooKeeper("localhost:2181", 30000, new Watcher() {
            public void process(WatchedEvent event) {
                System.out.println("事件类型为: " + event.getType());
                System.out.println("事件发生的路径: " + event.getPath());
                System.out.println("通知状态为: " + event.getState());
            }
        });
        // 创建一个目录节点
        zk.create("/testRootPath", "testRootData".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
        // 创建一个子目录节点
        zk.create("/testRootPath/testChildPathOne", "testChildDataOne".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE,
                CreateMode.PERSISTENT);
        System.out.println(new String(zk.getData("/testRootPath", false, null)));
        // 取出子目录节点列表
        System.out.println(zk.getChildren("/testRootPath", true));
        // 修改子目录节点数据
        zk.setData("/testRootPath/testChildPathOne", "modifyChildDataOne".getBytes(), -1);
        //判断目录节点是否存在
        System.out.println("目录节点状态: [" + zk.exists("/testRootPath", true) + "]");
        // 创建另外一个子目录节点
        zk.create("/testRootPath/testChildPathTwo", "testChildDataTwo".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE,
                CreateMode.PERSISTENT);
        System.out.println(new String(zk.getData("/testRootPath/testChildPathTwo", true, null)));
        // 删除子目录节点
        zk.delete("/testRootPath/testChildPathTwo", -1);
        zk.delete("/testRootPath/testChildPathOne", -1);
        // 删除父目录节点
        zk.delete("/testRootPath", -1);
        zk.close();
    }
}

 

 

Guess you like

Origin blog.csdn.net/weixin_63507910/article/details/128571275