Zookeeper study notes (1) - basic knowledge

Zookeeper overview

Zookeeper is an open source distributed Apache projectthat provides coordination services for distributed frameworks

Working Mechanism

Zookeeper is understood from the perspective of design pattern: it is a distributed design based on Observer pattern Service management framework, whichis responsible for storing and managing data that everyone cares about, and thenAccept the registration of observers. Once the status of these data changes, Zookeeper will be responsible for notifying those observers who have registered on Zookeeper to respond accordingly

Zookeeper = file system + notification mechanism

Its main functions are: data storage + notification updates


Take the server online and offline as an example:


1. Start the server and register information in the zookeeper cluster.

2. The client obtains the list of current servers from the zookeeper cluster and registers to listen.

3. Server node goes offline

4. The zookeeper cluster notifies the client of the server node offline event

5. The client re-obtains the server list and registers for monitoring

Features

1) Zookeeper is a cluster composed of a leader and multiple followers

2) As long as more than half of the nodes in the cluster survive, the Zookeeper cluster can serve normally. Therefore, Zookeeper is suitable for installing an odd number of servers.

An even number of servers does not improve zookeeper performance

3)Global data consistency: Each server saves a copy of the same data. No matter which server the client connects to, the data is consistent

4) Update requests are executed sequentially. Update requests from the same Client are executed sequentially in the order they are sent

5) Data updateAtomicity, a data update either succeeds or fails

6) Real-time, within a certain time range, the Client can read the latest data

data structure

The structure of the ZooKeeper data model is very similar to the Unix file system. It can be regarded as a tree as a whole. Each node is called a ZNode. Each ZNode can store 1MB of data by default, each ZNode can be uniquely identified by its path

The ZNode structure determines that ZooKeeper is only suitable for storing some simple configuration files and is not suitable for storing massive data.

Application scenarios

The services provided by Zookeeper include:Unified naming service, unified configuration management, unified cluster management, dynamic online and offline server nodes, soft load balancing Wait

Unified naming service

Unified configuration management

In a distributed environment, there are often requirements for the configuration information of each node to be consistent. Therefore, after modifying the configuration file, it is hoped that it can be quickly synchronized to each node;


A simple process for unified configuration management of zookeeper:

(1) Configuration information can be written to a Znode on ZooKeeper

(2) Each client server monitors this Znode

(3) Once the data in the Znode is modified, ZooKeeper will notify each client server

Unified cluster management

Write the node information to the ZNode of zookeeper, and then monitor the ZNode to obtain the real-time status changes of the cluster nodes;

Server dynamic online and offline

The client can gain real-time insight into server online and offline changes:

1. Start the server and register information in the zookeeper cluster.

2. The client obtains the list of current servers from the zookeeper cluster and registers to listen.

3. Server node goes offline

4. The zookeeper cluster notifies the client of the server node offline event

5. The client re-obtains the server list and registers for monitoring

Soft load balancing

Record the number of visits to each server in Zookeeper,Let the server with the least number of visits handle the latest client request

Zookeeper cluster construction

Installation package download

Official website address:Apache ZooKeeper

Enter the download interface:

Select tar package:

Installation process

The cluster uses a total of three servers to deploy zookeeper, and the server names are hadoop102-hadoop104.

1. Upload the installation package to the server and usetar -zxvf to decompress it to the/opt/module/ path (customized path)

2. Rename the decompressed apache-zookeeper-3.5.7-bin to zookeeper-3.5.7

3. Configure the server number:

Create zkData/opt/module/zookeeper-3.5.7/ in the directory

Then create a file namedmyid in this directory

The file name is fixed because the file name read in the source code is myid

Then add the number corresponding to the server in the file (the numbers of the three servers are 2, 3, and 4 respectively)

4. Configure the zoo.cfg file:
Rename/opt/module/zookeeper-3.5.7/conf zoo_sample.cfg in this directory to zoo.cfg

Then open zoo.cfg:

① Modify the data storage path dataDir:dataDir=/opt/module/zookeeper-3.5.7/zkData

②Add cluster configuration:

server.2=hadoop102:2888:3888
server.3=hadoop103:2888:3888
server.4=hadoop104:2888:3888

Configuration parameter format:server.A=B:C:D

A is a number, indicating which server this is; configure a file myid in cluster mode, this file is in the dataDir directory , there is a data in this file which is the value of A.Zookeeper reads this file when it starts, compares the data inside with the configuration information in zoo.cfg to determine which server it is< /span>

B is the address of this server;

C is the port (2888) used by this server Follower to exchange information with the Leader server in the cluster ;

D is in case the Leader server in the cluster hangs up, and a port is needed to re-elect and elect a new Leader, and this port is used to perform the election. The port for servers to communicate with each other (3888)

5. Distribute the configuration of myid and zoo.cfg to all servers (note that the server number needs to be modified)

Cluster startup

Enter the zookeeper path:

start up:bin/zkServer.sh start

stop:bin/zkServer.sh stop

View status:bin/zkServer.sh status

(Attachment) Interpretation of zoo.cfg configuration parameters

1.tickTime = 2000: communication heartbeat time, communication heartbeat time between Zookeeper server and client, unit is milliseconds

2.initLimit = 10: LF initial communication time limit (The maximum number of heartbeats (the number of tickTimes) that Leader and Follower can tolerate during the initial connection))

Under the current configuration, tickTime = 2000, initLimit = 10, if the Leader and Follower do not establish a connection within 20s, the communication is considered to have failed.

3.syncLimit = 5: LF synchronous communication time limit

If the communication time between Leader and Follower exceedssyncLimit * tickTime (i.e. 10s), Leader will think that Follower is down and delete Follower from the server list

4.dataDir: the path to data storage in Zookeeper

It is not recommended to use the default tmp directory, which may be deleted regularly by Linux.

5.clientPort = 2181: client connection port, usually not modified

(Attached) Cluster startup and shutdown script

Create a new zk.sh file in the path/home/username/bin (such as /home/why/bin):

#!/bin/bash
case $1 in
"start"){
for i in hadoop102 hadoop103 hadoop104
do
 echo ---------- zookeeper $i 启动 ------------
ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh start"
done
};;

"stop"){
for i in hadoop102 hadoop103 hadoop104
do
 echo ---------- zookeeper $i 停止 ------------ 
ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh stop"
done
};;

"status"){
for i in hadoop102 hadoop103 hadoop104
do
 echo ---------- zookeeper $i 状态 ------------ 
ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh status"
done
};;

esac

Immediate General bin/zkServer.sh start, bin/zkServer.sh stop, bin/zkServer.sh status, etc. Command Enclosure Origination

Add permissions:chmod u+x zk.sh

In this way, you can start and stop the cluster throughzk.sh start, zk.sh stop

Zookeeper election mechanism

first start

Assume there are 5 servers in the cluster:

(1) Server 1 starts and initiates an election. Server 1 casts its vote. At this time, server 1 has one vote, which is not enough for more than more than (3 votes). The election cannot be completed, and the status of server 1 remains as LOOKING;

(2) Server 2 starts and initiates another election. Servers 1 and 2 each vote for themselves and exchange vote information: At this time Server 1 finds that the myid of Server 2 is greater than the one it is currently voting for (Server 1), and changes the vote to recommend Server 2. . At this time, server 1 has 0 votes, server 2 has 2 votes, and there is no more than half of the results. The election cannot be completed, and the status of servers 1 and 2 remains LOOKING

Voting based on myid

(3) Server 3 starts and initiates an election. At this time, both servers 1 and 2 will change the votes to server 3. The results of this vote: Server 1 has 0 votes, Server 2 has 0 votes, and Server 3 has 3 votes. At this time, server 3 has more than half of the votes, and server 3 is elected leader. Servers 1 and 2 change status to FOLLOWING, server 3 changes status to LEADING;

(4) Server 4 starts and initiates an election. At this time, servers 1, 2, and 3 are no longer in the LOOKING state and will not change the voting information. The result of exchanging vote information: Server 3 has 3 votes and Server 4 has 1 vote. At this time, server 4 obeys the majority, changes the voting information to server 3, and changes the status to FOLLOWING;

(5) Server 5 starts, same as server 4

After the leader is generated in the cluster, the election will no longer continue.

Not the first startup

When a server in the ZooKeeper cluster encounters one of the following two situations, it will begin to enter Leader election:

  • Server initialization startup
  • Unable to maintain connection with Leader while the server is running

When a machine enters the Leader election process, the current cluster may also be in the following two states:

  • There is already a Leader in the cluster.

For the first situation where a leader already exists, when the machine tries to elect a leader, it will be informed of the leader information of the current server. For this machine, it only needs to establish a connection with the leader machine and synchronize the status.

  • There is indeed no Leader in the cluster

The election rules at this time are as follows:

Assume that ZooKeeper consists of 5 servers, with SIDs 1, 2, 3, 4, and 5, and ZXIDs 8, 8, 8, 7, and 7, and the server with SID 3 is the Leader. At some point, servers 3 and 5 failed, so Leader election began.

The voting situation of machines with SIDs 1, 2, and 4: (EPOCH, ZXID, SID)

(1,8,1) (1,8,2) (1,7,4)

Leader election rules:

①The one with the bigger EPOCH wins directly

②EPOCH is the same, the one with the larger transaction ID wins.

③If the transaction IDs are the same, the one with the larger server ID wins.

Parameter Description:

● SID: Server ID. Used to uniquely identify a machine in the ZooKeeper cluster. Each machine cannot be repeated and is consistent with myid.

● ZXID: Transaction ID. ZXID is a transaction ID used to identify a change in server status. At a certain moment, the ZXID value of each machine in the cluster may not be exactly the same. This is related to the ZooKeeper server's processing logic for the client's "update request".

● Epoch: The code name of each Leader term. When there is no leader, the logical clock value in the same round of voting is the same. This data will increase every time a vote is cast.

Zookeeper command line operations

Command line syntax

Basic command syntax

Function description

help

Show all operation commands

ls path

Use the ls command to view the child nodes of the current znode [listenable]

-w listens for changes in child nodes

-s additional secondary information

create

Normally create znode node

-s contains sequence

-e Temporary (restart or disappear after timeout)

get path

Get the value of the node [listenable]

-w monitors node content changes

-s additional secondary information

set

Set the specific value of the node

stat

View node status

delete

Delete node

deleteall

Recursively delete nodes

Command line practice

First start the zookeeper cluster

Then enter the zookeeper installation path and start the client:

bin/zkCli.sh -server hadoop102:2181

help

Use help to view help:

Node data information (ls)

ls /: View all znode nodes in zookeeper

ls -s /: View more node information

(1) czxid: transaction zxid that creates the node

Each modification to ZooKeeper state generates a ZooKeeper transaction ID. The transaction ID is the total order of all modifications in ZooKeeper. Each modification has a unique zxid. If zxid1 is less than zxid2, then zxid1 occurs before zxid2

(2) ctime: The number of milliseconds since znode was created (since 1970)

(3) mzxid: zxid of the last updated transaction of znode

(4) mtime:znode Number of milliseconds last modified (since 1970)

(5) pZxid: znode Last updated child node zxid

(6) cversion: znode child node change number, znode child node number of modifications

(7) dataversion: znode data change number

(8) aclVersion: Change number of znode access control list

(9) ephemeralOwner: If it is a temporary node, this is the session id of the znode owner. If it is not a temporary node, it is 0

(10) dataLength: data length of znode

(11)numChildren: the number of znode child nodes

Note that using ls -s / to view the root node of the entire znode tree

That is, all the child nodes under the root node. If you want to view the specific information of the child nodes, just use the specific path;

Example:ls -s /why:

Node type (create/get/set) 

Node types are mainly divided into the following four types:
(1) Persistent directory node: After the client disconnects from Zookeeper, the node still exists

(2) Persistent sequential numbering directory node: After the client disconnects from Zookeeper, the node still exists, but Zookeeper sequentially numbers the node name.

(3) Temporary directory node: After the client disconnects from Zookeeper, the node is deleted

(4) Temporary sequential number directory node: After the client disconnects from Zookeeper, the node is deleted, but Zookeeper sequentially numbers the node name.

The meaning of the sequence number:
Set the sequence identifier when creating a znode, A value will be appended to the znode name, the sequence number is a monotonically increasing counter, maintained by the parent node

In distributed systems, sequence numbers can be used to globally sort all events so that clients can Infer the sequence of events

Create a normal node (permanent node + without serial number)

1.create /bigdata "bigdata": create a normal node, /bigdata is the path, "bigdata" is the node value

Zookeeper needs to assign a value when creating a node

2.create /bigdata/test1 "test1"


View the value of a node:

get -s /bigdata

get -s /bigdata/test1

Create a node with a serial number (permanent node + with a serial number)

First create a node:create /bigdata/test2 "test2"

Then create a permanent node with a serial number under this node (created with -s)

If there is no sequence number node originally, the sequence number starts from 0 and increases in sequence. If there are already 2 nodes under the original node, the reordering will start from 2, and so on.

Create ephemeral nodes

First create a node:create /bigdata/test3 "test3"

Then create an ephemeral node under that node (created with -e ):create -e /bigdata/test3/e1 "e1"

You can view this node:

Next, exit the client, restart the zookeeper cluster, and then re-enter the client to view the node:

You can see that the ephemeral node no longer exists;

Modify the value of a node

Use the set command:

set /bigdata "bigdata_why"

Listener principle

The client registers to monitor the directory nodes it cares about. When the directory nodes change (data changes, node deletions, subdirectory nodes are added and deleted), ZooKeeper will notify the client. The monitoring mechanism ensures that any changes in data saved by ZooKeeper can quickly respond to the application monitoring the node.

work process

1) First there must be a main() thread

2) Create a Zookeeper client in the main thread. At this time, two threads will be created, one is responsible for network connection communication (connet), and the other is responsible for listening (listener)

3) Send registered listening events to Zookeeper throughconnect thread

4) Add the registered listening events to the list of registered listeners in Zookeeper

5) When Zookeeper detects data or path changes, it will send this message to the listener thread.

6) The process() method is called internally in the listener thread to notify the client of the changes.

Common monitoring

1) Monitor changes in node data:get path [watch]

2) Monitor changes in the increase or decrease of child nodes:ls path [watch]

The value of the node changes

Monitor changes in bigdata nodes:get -w /bigdata

You can see the current value of the node:

Modify the node value on hadoop103:

You can monitor changes in node data in hadoop102:

Node's child node change monitoring

In hadoop102:

ls -w /bigdata: Monitor bigdata node

Create a new child node in hadoop103:

Changes in child nodes can be monitored in hadoop102

Node deletion and status viewing

Delete node:delete /bigdata/test4

Recursive deletion:deleteall /bigdata/test2

You can see that the deletion was successful

View node status:stat /bigdata

Guess you like

Origin blog.csdn.net/qq_51235856/article/details/134431919