How ZooKeeper works

ZooKeeper

ZooKeeper is a distributed, open source distributed application coordination service application.

In fact, it is used to save some information for synchronization between servers in a distributed system. This information can be accessed by all
clients (also servers) accessed in the way of ZooKeeper client API.

The role of ZooKeeper can be like this Understand that all systems to be distributed (multiple servers are clustered), they save some synchronized information to
one location (Znode) of the ZooKeeper server, and all servers of the distributed system are connected to the ZooKeeper server in a long-term connection mode. communication.


Take a simple example:
a client (server) reads the data in the cluster, but the cluster is composed of multiple servers (not all servers have the same data, only
the master is the latest, the slave It may not have been updated yet), then the client does not know to read the data from the server in the cluster (of course, it can read to each
text once, but this is inefficient, and this method is not available). In order to solve this problem, a middleware can be introduced. This middleware will select
a and save the information of the master server (such as IP, port number, etc.), and the client will also connect to this middleware, and can get The middleware saves the information of the main
server (such as IP, port number, etc.), so that the client can know which server in the cluster to transmit data. ZooKeeper is equivalent
to the role of this middleware.

Changes in the content of the middleware will notify all clients connected to it (clients, all servers in the cluster) to update information.
The middleware is used as the In terms of servers, they are actually servers.



ZooKeeper data model
1. ZooKeeper has a hierarchical namespace, which is very similar to a distributed file system.
2. The difference is the Znode in the ZooKeeper namespace, which has both the characteristics of a file and a directory. Like a file, it maintains data structures such as data, meta information, ACL, and
timestamp , and like a directory, it can be used as part of the path identification and can have child znodes.
3. The user can add, delete, modify, check and other operations on the znode (if the permission allows).
4. ZooKeeper stores data in the form of a Znode. Znode contains data and metadata (the representation information of Znode, which will be discussed later)
5. Znode has atomic operations, and the data of each znode will be atomic The read operation will read all the data related to the znode, and the write operation will
replace . The data size of znode is 1M by default (can be configured), which generally does not save a large amount of data. Each node in the




graph is called a znode. Each znode consists of 3 parts:
1.stat: This is the status information, which describes the znode version, permissions and other information.
2.data: data associated with the znode.
3.children: child nodes under the znode.



File format analysis of zookeeper storage
1.zookeeper mainly stores two types of files, one is snapshot ( Memory snapshot) and log (zookeeper operation record), the former is a snapshot of the number of memory, the latter is
similar to mysql's binlog, which records all operations related to modifying data in the log


snapshot (memory snapshot)
It is to save the data of those znodes in memory (including znode data and metadata) to the disk.
It is convenient to restore the memory structure information of the znode from the snapshot content after restarting .

Refer to the original text: http://blog.csdn.net/pwlazy/article/details/8080626


log
all operation records related to modifying data, this is a log record file for zookeeper operations, the addition, deletion and modification of znodes in zookeeper , and check operation
logs will be saved to this log.



ZooKeeper Client API (communication with ZooKeeper server) :
1.create(path, data, flags): Create a ZNode, path is its path, data is the data to be stored on the ZNode, commonly used flags are: PERSISTEN,
PERSISTENT_SEQUENTAIL, EPHEMERAL, EPHEMERAL_SEQUENTAIL
2.delete(path, version): To delete a ZNode, you can delete the specified version through version, if the version is -1, it means delete all versions
3.exists(path, watch): Determine whether the specified ZNode exists , and set whether to Watch this ZNode. Here, if you want to set the Watcher, the Watcher is specified when the ZooKeeper instance is
created . If you want to set a specific Watcher, you can call another overloaded version of exists(path, watcher).
4.getData(path, watch): Read the data on the specified ZNode and set whether to watch the ZNode
5.setData(path, watch): Update the data of the specified ZNode and set whether to watch the ZNode
6.getChildren(path, watch): Get the names of all child ZNodes of the specified ZNode, and set whether to watch this ZNode
7.sync(path): Synchronize all update operations before sync, so that each request is on more than half of the ZooKeeper Servers effective. The path parameter is currently not used
8.setAcl(path, acl): Set the Acl information of the specified ZNode
9.getAcl(path): Get the Acl information of the specified ZNode

watch is whether to set a monitoring of this ZNode, once the ZNode has changed (Sub ZNode, self, data, update, etc. changes) will send notifications to
you (read the client of this ZNode, let the client do the corresponding processing)



ZooKeeper Session
client and ZooKeeper are connected by heartbeat judged. Once each client is connected to ZooKeeper, it can create
its (but not necessarily).



ZooKeeper Watch
1. Zookeeper watch is a monitoring notification mechanism.
2. The monitoring event can be understood as a one-time trigger. When the monitoring data (ZNode) is changed, the monitoring event will be sent to the client. For example, if the client
When getData("/znode1", true) is called and the data on /znode1 is changed or deleted later, the client will get the watch event for the
change of /znode1, and if /znode1 happens again changes, the client will not receive event notifications (one-time) unless the client sets the watch on /znode1 again.



The zookeeper cluster (to prevent the downtime of a zookeeper)
In the zookeeper cluster (that is, the cluster of the zookeeper server itself (the zookeeper may also be down)), each node (client) has the following 3 roles and 4 states:
1. Role : leader, follower, observer
2. Status: leading, following, observing, looking


The working principle of ZooKeeper cluster


1. zk (that is, the znode) enters recovery mode. In recovery mode, a new leader needs to be re-elected, so that all servers are restore to a correct state.
2. The data to be saved here is of course the related data of the znode tree.
3. Each zookeeper server communicates with each other. When the client updates data to one of the zookeeper servers (adding znode, changing znode data), the follower
will This transaction is submitted to the leader, and the leader broadcasts the transaction to the zookeeper cluster. Each follower that receives the transaction updates its own data (relevant data of the znode tree)
and replies to the leader's response. The leader receives the response from each follower. If the follower responds correctly, it is considered that the transaction was successfully submitted, and the data is returned to the client.
update status. Of course, the leader monitors that a znode has a state change and also broadcasts the transaction to the zookeeper cluster.

ZooKeeper is used to achieve distributed data consistency (of course, weak consistency). When more than half of the data servers update the data, considered to be consistent.
Of course, if the application needs to get the latest data (the data of this data server may not be the latest), it can also get the latest data through the sync() method. ZooKeeper uses the ZAB protocol
to achieve consistency. The

client needs to know the IP of each ZooKeeper server, in case one cannot access the other and can only know one, depending on the usage scenario requirements.


Leader Election (principle of selecting the master)
1. Because the leader will carry zxid (which can be understood as a transaction ID) and other related information every time it submits a transaction, the zxid will increase by 1 each time, and the leader and follower logs
will also have For this information, the zxid will start from 0 every time the leader changes.
2. Because of this zxid, the larger the zxid, the more up-to-date the data (relevant data of the znode tree) is, so when the leader goes down, the follower with the larger zxid will be elected as
the new leader, and the other followers will be updated to the same as the leader. The data.



zookeeper application
1. Configuration management (shared configuration information)
For example, all configurations of APP1 are configured under the /APP1 znode, and all machines in APP1 will monitor the /APP1 node as soon as they are started (zk.exist("/APP1", true)) , and realize
Callback method Watcher, then when the data on zookeeper /APP1 znode changes, each machine will receive a notification, the Watcher method will be executed,
then the application can remove the data (zk.getData(“/APP1 ”,false,null));


2. In cluster management
application clusters, we often need to let each machine know which machines in the cluster (or a dependent cluster) are alive, and when the cluster machines are down, the network
Reasons such as chain breakage can be quickly notified to each machine without manual intervention.

Zookeeper is also easy to implement this function. For example, if I have a znode called /APP1SERVERS on the zookeeper server, then each machine in the cluster will go to
this node to create an EPHEMERA type node, such as server1 to create /APP1SERVERS/SERVER1( You can use ip to ensure non-repetition), server2 creates
/APP1SERVERS/SERVER2, and then both SERVER1 and SERVER2 watch the parent node of /APP1SERVERS, then the data or child node changes under this parent node will
notify the client who watches the node . Because the EPHEMERAL type node has a very important feature, that is, if the connection between the client and the server is disconnected or the session expires,
the node will disappear. Then when a machine hangs or the chain is disconnected, its corresponding node will disappear. Then all clients in the cluster that watch /APP1SERVERS
will receive the notification and get the latest list.

Another application scenario is that the cluster selects the master. Once the master hangs up, a master can be selected from the slave immediately. The implementation steps are the same as the former, but the machine is starting up.
When the node type created in APP1SERVERS becomes EPHEMERAL_SEQUENTIAL type, each node will be automatically numbered.

By default, we stipulate that the smallest number is master, so when we monitor the /APP1SERVERS node, we can get the server list, as long as all clusters The machine logic considers the minimum
numbered node to be the master, then the master is selected, and when the master goes down, the corresponding znode will disappear, and then the new server list is pushed to the client,
and then each node logic considers the minimum number The node is the master, which enables dynamic master election. (The master here is not the same thing as the leader above. The
master here is for other applications. For example, if each client always wants to send data to the master, it is necessary to know the relevant information of the master (such as IP, etc.) )


3. The shared lock
Zookeeper is easy to implement this function. The implementation method is also to create an EPHEMERAL_SEQUENTIAL directory node that needs to obtain the lock, and then call
the getChildren method to obtain the smallest directory node in the current directory node list. Is it the directory node created by yourself? , if it is created by itself, then it
obtains the lock, if not, then it calls the exists(String path, boolean watch) method and monitors the changes in the list of directory nodes on Zookeeper
until the node created by itself is in the list The directory node with the smallest number can obtain the lock and release the lock, as long as it deletes the directory node created by itself.


4. Queue management
Zookeeper can handle two types of queues: when the members of a queue are all gathered, the queue is available, otherwise it has been waiting for all members to arrive, this is a synchronous queue;
The queue is enqueued and dequeued in a FIFO manner, such as implementing the producer and consumer models.

Create a parent directory /synchronizing, each member monitors the existence of the directory /synchronizing/start, and then each member joins the queue (creates
/synchronizing/member_i temporary directory node), then each member obtains all directory nodes of the /synchronizing directory, and judges whether the value of i is already the number of
members , if it is less than the number of members, wait for the appearance of /synchronizing/start, if it has been Create/synchronizing/start if equal.

Refer to the original text: http://blog.csdn.net/l1028386804/article/details/52226265Summary



:
Zookeeper shares some information in the form of a directory tree, and notifies all Zookeeper clients connected to it when the information changes The client can get the
shared information and know the change of the information, so that the client can make corresponding processing.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327018684&siteId=291194637