ZooKeeper environment construction
1. Introduction to ZooKeeper
In the distributed field, an indispensable component is ZooKeeper.
ZooKeeper is a highly available distributed data management and coordination framework, and can well ensure data consistency in a distributed environment.
ZooKeeper was created by Yahoo and is an open source implementation of Google Chubby. The consistency of Chubby is based on the Paxos algorithm, and ZK uses a variant of the Paxos protocol, ZAB (ZooKeeper Atomic Broadcast protocol, full name: ZooKeeper Atomic Message Broadcast Protocol).
The main application scenarios of ZooKeeper include: data publishing/subscription, load balancing, naming service, distributed coordination/notification, cluster management, Master election, distributed locks, distributed queues, etc. Currently, Zookeeper is used as a core component in more and more distributed systems (Hadoop, HBase, Storm, Kafka).
Server has two main roles: Leader and Follower.
-
Leader: responsible for the initiation and resolution of voting, and updating the system status;
-
Follower: Receive client requests and return results to the client, and participate in voting in the election process;
Fun fact about the ZooKeeper name . In the early stage of the project, considering that many internal projects were named after animals (such as the famous Pig project), Yahoo engineers hoped to give this project an animal name. Raghu Ramakrishnan, the chief scientist of the research institute at the time, joked: "If this goes on like this, we will become a zoo!" Putting the components together, Yahoo's entire distributed system looks like a large zoo, and Zookeeper is just used to coordinate the distributed environment—and thus, the name Zookeeper was born.
2. ZooKeeper installation
There are two ways to deploy ZooKeeper:
- Standalone mode (stand-alone mode): used in the development environment, single server
- Cluster mode (multi-server mode): used in production environment, the number of servers is odd
Why is ZK set to an odd number?
Zookeeper has such a feature: as long as more than half of the machines in the cluster are working normally, the entire cluster is available to the outside world. That is to say, if there are 2 zookeeper servers, as long as 1 server hangs, the entire zookeeper cluster will not be able to use, because 1 is not more than half, so the error tolerance of 2 zookeeper servers is 0; similarly, if there are 3 One of the zookeepers hangs up, and there are still 2 working normally, more than half of them, so the tolerance of 3 zookeepers is 1; for the same reason, you can list a few more: 2 -> 0; 3 -> 1; 4 -> 1 ; 5 -> 2; 6 -> 2 will find a rule, the tolerance of 2n and 2n-1 is the same, both are n-1, so in order to be more efficient, why add an unnecessary zookeeper.
It can be seen that the number of servers in the ZK cluster is at least three.
0. Preparations
Requirements list:
-
OS: Ubuntu-18.04
If you need the installation steps of the operating system, please refer to: Virtual Machine Installation (Nanny Level Tutorial)
-
ZooKeeper 3.7.0
Official website download address: https://zookeeper.apache.org/releases.html
-
JDK:JDK1.8
ZooKeeper runs based on JVM. The ZK installed in this article requires JDK version 1.8 and above (JDK 8 LTS, JDK 11 LTS, JDK 12; Java 9 and 10 versions are not supported)
Official website download address: http://java.sun.com/javase/downloads/index.jsp
In order to take care of basic users, the required software is placed on Baidu network disk
链接:https://pan.baidu.com/s/1kjcuNNCY2FxYA5o7Z2tgkQ 提取码:nuli
Official installation steps:
1) Install JDK
2) Set Java heap size (Java stack size)
This is an important step in order to avoid memory swapping that affects ZooKeeper performance. To determine the correct value, you need to load test and make sure you are well below the usage limit that causes the swap
3) Install ZooKeeper
4) Create a configuration file, the file name can be chosen arbitrarily, it is recommended to put the configuration file in the conf directory of ZooKeeper and name it zoo.cfg, so that it is convenient to start the service without specifying the configuration file.
Complete the following configuration:
tickTime=2000
initLimit=5
syncLimit=2
dataDir=/var/lib/zookeeper/
clientPort=2181
maxClientCnxns=60
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
Parameter Description:
parameter | Defaults | illustrate |
---|---|---|
tickTime | 2000 | Client-Server communication heartbeat time The time interval for maintaining heartbeats between Zookeeper servers or between clients and servers, that is, a heartbeat is sent every tickTime. tickTime is in milliseconds. |
initLimit | 10 | Leader-Follower initial communication time limit The maximum number of heartbeats (the number of tickTimes) that can be tolerated during the initial connection between the follower server (F) and the leader server (L) in the cluster. |
syncLimit | 5 | Leader-Follower synchronous communication time limit The maximum number of heartbeats (the number of tickTimes) that can be tolerated between the request and response between the follower server and the leader server in the cluster. |
dataDir | /tmp/zookeeper | The data file directory Zookeeper saves the data directory. By default, Zookeeper also saves the log files for writing data in this directory. |
clientPort | 2181 | Client connection port The port on which the client connects to the Zookeeper server. Zookeeper will listen to this port and accept access requests from clients. |
maxClientCnxns | 60 | Maximum supported client connections |
server.id=host:port:port | Cluster information (server number, server address, LF communication port, election port) is written in a special format. The rules are as follows: server.N=YYY:A:B, where N is used to indicate a serial number of the server in the cluster , we need to create a file in the dataDir directory , the myid content of the file is the corresponding number N A is the port number, which is used for machine communication in the cluster (only the leader listens to this port B is the port number, which is used for the election of the leader (every Zookeeper listens) this port) |
For more information about parameters, please refer to: https://zookeeper.apache.org/doc/r3.7.0/zookeeperAdmin.html#sc_configuration
5) Create myid file
myid
Create a file in the dataDir directory set in the previous step .
myid
The file
The ID size is between 1 and 255. If extended features are enabled, such as TTL nodes, the ID needs to be between 1 and 254.
6) Create initial identity fileinitialize
initialize
The file is located in the dataDir directory and is created when a new cluster is started.
7) Start the ZooKeeper service as follows
$ java -cp zookeeper.jar:lib/*:conf org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.conf
1. Stand-alone mode
Stand-alone mode is the first way for beginners or users with limited resources. This article mainly introduces the stand-alone mode installation of ZK.
Assuming the current user name is xiaobai
(if your user name is not xiaobai, you can take two methods: one is to create a xiaobai user, and the other is to modify the corresponding configuration according to your user name), combined with the official installation steps, we press Install as follows:
1) Install JDK
If already installed, skip
Convention:
Upload or download the required installation package to the soft directory under the Home directory
~/soft
The installation directory is located in the opt directory under the Home directory
~/opt
Upload jdk to the ~/soft
directory, make sure the file has been uploaded, enter the command ls ~/soft
to enter the verification
Next unzip the file
mkdir ~/opt
tar -xvf ~/soft/jdk-8u261-linux-x64.tar.gz -C ~/opt
set soft connection
cd ~/opt
ln -s jdk1.8.0_261/ jdk
Configure environment variables and open the bash configuration file
cd
vi .bashrc
Press i
to enter insert mode, add the following code at the end , then press to esc
exit editing, enter to :x
save
export JAVA_HOME=/home/xiaobai/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin
Enter the following command to make the modification take effect, you can use the java
command to verify whether the configuration is successful
source .bashrc
java
2) Install ZooKeeper
(1) Upload the ZooKeeper installation package downloaded from Baidu network disk apache-zookeeper-3.7.0-bin.tar.gz
to the ~/soft
directory
Of course, you can also copy the download link from the official website and use the wget
command to download.
Make sure the file has been uploaded, enter the command ls ~/soft
to enter the verification
(2) Unzip the ZooKeeper installation package to the ~/opt
directory
tar -xvf ~/soft/apache-zookeeper-3.7.0-bin.tar.gz -C ~/opt
ls ~/opt/apache-zookeeper-3.7.0-bin
(3) Create soft links
cd ~/opt
ln -s apache-zookeeper-3.7.0-bin zookeeper
- The bin directory includes executable scripts, such as the commonly used zkServer.sh, zkCli.sh
- conf directory contains configuration files
- docs directory contains related documentation
- The lib directory contains related jar packages
(4) Modify the configuration file -
cd ~/opt/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
The modified dataDir
value is/home/xiaobai/opt/zookeeper/tmp
Note that for Xiaobai, if you don't want to use the vi command, you can use
sudo gedit ~/opt/zookeeper/conf/zoo.cfg
Notepad to open a file similar to Windows to edit the file, and handle similar situations later.The vi command is simple to use : after entering the file, enter the letter
i
to enter the insert mode => modify the content of the file to node1 => press theEsc
key to enter the command line mode => enter:
the bottom line mode => enterx
orwq
save and exit.If you do not want to save the file after modification, enter the bottom line mode and then enter
q!
to execute and exit without saving.
3) Configure environment variables
vi ~/.bashrc
Add the following at the end of the file:
export ZOOKEEPER_HOME=/home/xiaobai/opt/zookeeper
export PATH=$ZOOKEEPER_HOME/bin:$PATH
Make environment variables take effect:
source ~/.bashrc
4) Start Zookeeper
zkServer.sh start
View progress
Enter the command to jps
see if the startup is successful
View status
Enter a command to zkServer.sh status
view status
5) Client connection
interview method:
-
Via client tools:
-
Command Line Tool: zkCli.sh
-
Interface tool: ZooInspector
Download address: https://issues.apache.org/jira/secure/attachment/12436620/ZooInspector.zip
It can also be downloaded from the Baidu network disk provided earlier.
-
-
Via the Java API
Here is a simple demonstration with the command line tool zkCli.sh,
(1) Start the client
zkCli.sh -server localhost:2181
(2) Create a node
create /test 888
create -s /test/lock 666
create -s /test/lock 666
(3) View nodes
ls /
ls -s /tset
ZooKeeper maintains a tree-like hierarchy. The nodes in the tree are called znodes. Each znode will save its own data content and a series of attribute information. Each Znode has a unique path identifier; it should be noted that the znode data cannot exceed 1MB.
ZooKeeper's directory tree can be viewed through the tool ZooInspector .
You can view detailed node information through commands
ls -s path
. The following is a brief explanation of the above information:
[lock0000000000, lock0000000001] //Refer to which nodes are in this directory
cZxid = 0xd //Created ZXID, indicating the transaction ID when the ZNode was created
ctime = Thu Dec 16 20:52:57 CST 2021 //Created Time, indicating the time when the ZNode was created
mZxid = 0xd //Modified ZXID, indicating the transaction ID when the ZNode was last updated
mtime = Thu Dec 16 20:52:57 CST 2021 //Modified Time, indicating the last time the node was updated
pZxid = 0xf //Indicates the transaction ID when the child node list of this node was last modified. Note that pZxid will only be changed if the list of child nodes is changed, and changes to the content of child nodes will not affect pZxid.
cversion = 2 //version number of the child node
dataVersion = 0 //version number of the data node
aclVersion = 0 //ACL version number
ephemeralOwner = 0x0 //seddionID of the session that created this node. If the node is a persistent node, the value of this attribute is 0.
dataLength = 3 //Length of data content
numChildren = 2 //Number of child nodes
(4) Delete node
delete /test/local0000000001
deleteall /test
ls
(5) Exit the client
quit
Commonly used commands are listed below
Classification | Order | describe |
help | help | View help |
create node | create | create [-s] [-e] path data acl Among them, -s or -e specify node characteristics, sequence or temporary node respectively, if not specified, it means persistent node; acl is used for permission control |
read node | ls | ls path [watch] |
get | get path [watch] | |
ls2 | ls2 path [watch] | |
stat | stat path [watch] Get the status information of the node | |
update node | set | set path data [version] data is the new content to be updated, version indicates the data version |
delete node | delete | delete path [version] |
deleteall | is a recursive delete command | |
Synchronize | sync | 使客户端的Znode视图与Zookeeper同步 |
ACL | getACL/setACL | 为Znode获取/设置ACL |
配额 | setquota | 设置子节点个数以及数据长度的配额 setquota –n 4 /zookeeper/node 设置/zookeeper/node 子节点个数最大为4 |
delquota | delquota命令用于删除配额, -n为子节点个数, -b为节点数据长度,如:delquota –n 2 | |
listquota | 命令用于显示配额,如listquota /storm | |
操作历史 | history/redo | history用于列出最近的命令历史,redo命令用于再次执行某个命令,使用方式为redo cmdid 如 redo 20 |
会话 | connect | 连接服务器 |
close | 关闭当前连接,可用connect 再次连接,不会退出客户端 | |
quit | 关闭连接并退出连接客户端 |
2. 集群模式
集群模式这里只做简单介绍,假设有三台服务器node1
、node2
、node3
在单机模式的步骤:2)安装ZooKeeper -> (4)修改配置文件
1)修改zoo.cfg文件时,在后面添加如下集群信息:
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888
2)分别在node1
、node2
、node3
三台服务器的/home/xiaobai/opt/zookeeper/tmp
目录中,创建两个文件
touch myid
touch initialize
-
myid: 分别设置
node1
、node2
、node3
三台服务器的文件myid的内容分别为1
、2
、3
,比如对服务器node1
,它对应的集群id号为1
,myid文件的内容即为1
。 -
initialize: 文件initialize留空即可
注意:
- If the server name is used instead of ip between servers, pay attention to modifying the hosts file of each server
- When configuring multiple servers, you can configure a certain server first, then use the remote copy command
scp
to synchronize, and then fine-tune the respective servers, such as modifying the myid file.
3. Common exceptions and solutions
1. The port is occupied
Error message: Address already in use
Solution:
-
On the one hand, you can choose to stop the process that is currently occupying the port, and use the command
netstat -nltp
in combination with the commandgrep
to query -
On the other hand, you can modify zoo.cfg and change the port number
2. Not enough disk space
Error message: No space left on device
Solution: clear the disk or disk
3. Unable to find myid file
Error message: myid file is missing
Solution: dataDir
Create a myid file in the corresponding directory and set the correct content (the id corresponding to the server)
4. The leader election port of other machines in the cluster is not open
Error message: Cannot open channel to 2 at election address /122.228.242.21:3888
Solution:
-
Check whether the firewall of each server is closed, use the command
sudo ufw status
-
Check whether the content in each server
/etc/hosts
is consistent, and whether the IPs of all nodes are configured -
Check that the time of each server is consistent
-
Modify the zoo.cfg of each server, and modify the host corresponding to its own cluster information in each server to
0.0.0.0
For example, for the server node1 in the example, modify the cluster information of its zoo.cfg to
server.1=0.0.0.0:2888:3888 server.2=node2:2888:3888 server.3=node3:2888:3888