What is a zookeeper?

First, the definition

1, the literal meaning of the term

What is the zookeeper, zookeeper from the literal meaning of the word perspective is Zookeeper mean, why should such a name does, in fact, also with these components; it hadoop terms of the original source of the name of the author of the children have named hadop elephant toys, just the word it is also in line with the characteristics of easy to remember, and then take it as hadoop name, and behind it a series of big data components are to follow, all the way to the animal name, and then With our zoo (Hadoop ecosystem), animal Well, after all, is an animal, but there are also so many, it would mean if nobody disarray, so there is the meaning of our zookeeper, zookeeper and its name Like, it is a manager, it is under the coordination for distributed environments.
Here Insert Picture Description

2, the official explanation

Official website of the explanation is that many distributed applications when they are competing for the same resources, sometimes because there is no negotiation leading to a system error, in fact, solve the synchronization problem in a distributed environment;
Here Insert Picture Description
the official website of the explanation is actually very vague, but probably talked about the role of zookeeper, we'll look at the explanation on the Wiki:
Here Insert Picture Description
you can see above on Wiki explained very clearly, that is to say about it:
ZooKeeper is used to maintain configuration information, naming, providing distributed synchronization, and provide centralized services group services. All of these types of services are using some form of distributed applications. Each time you run them, will be a lot of work to fix the inevitable errors and competitive conditions. ZooKeeper purpose is the nature of these different services will be refined into a very simple interface, in order to achieve a centralized coordination services. Service itself is distributed, and highly reliable. Consensus, group management and service status agreement will be achieved, so that applications do not need to implement them. The use of these specific applications include mixing the particular components and particular application Zoo Keeper in the convention.

Two, zookeeper what things to do

1, take a look at its structure

  • From the Wiki, we find this sentence we

      ZooKeeper nodes store their data in a hierarchical name space, much like a file system or a tree data structure
    

    What does it mean to say, zookeeper nodes stored in a hierarchical namespace of them, somewhat similar to the tree structure of the Linux file system.
    Here Insert Picture Description
    Linux file system format on as shown above, the zookeeper tree is kind of how characteristic it?

    • Each node in the zookeeper is becoming Znode and Znode divided into two types:
      1. (Ephemeral) known as transient or temporary node node, the node features of this is that when a client connection to the server is disconnected, then the create a temporary node connected during the session will be written off.
      2. (Persistent) lasting nodes, such nodes are not automatically deleted in case the client disconnected.
      According to the above explanation, I believe we have guessed, when using the C's right zookeeper / S architecture that is, from the server and client of two parts, can be understood from the following figure:
      Here Insert Picture Description
  • Listeners
    In the above we already know a simple data structure of the ZooKeeper, ZooKeeper also with the listener to be able to do so eventful.
    Common listener has the following two scenarios:
    (1) monitor data changes Znode node
    because the node is in a zookeeper can save the data, so we can change the data in the monitor node
    Here Insert Picture Description
    addition and changes listens child nodes (2)
    This no doubt, we can monitor the number of nodes in a node erupted.
    Here Insert Picture Description

2. unified configuration

  • If you have a lot of servers, you need to configure a distributed service in this cluster above, since it is to configure the service, and that ultimately is to change the configuration file, my server, and if people go manually by the hundreds or thousands change the configuration that would not have fit into one of these years, there are operation and maintenance base may be uniformly configured with synchronization commands, it goes without saying, zookeeper here, too, can be a good solution to this problem, for example, we want to configure this file called the test-site.xml then work on the mechanism zookeeper as shown below:
    Here Insert Picture Description
    All of our servers are listening test-site.xml data / configuration node, if the data node has been a change, it will trigger the listener, Oh, and then sends the message to the server for this node is listening, the server receives the message and then make the appropriate configuration changes, which achieved a unified configuration;

3. Uniform Naming Service

  • If I have a domain name www.nickwiki.com, but I have more than one server, then I think people my own internal coordination strip which machine to access it by visiting my domain name, zookeeper we can complete this requirement, in fact, this is somewhat similar nginx function,
    ZooKeeper implementation is very simple, and a little above the unified configuration similar to my other machine if the IP address is
    192.168.1.1
    192.168.1.2
    192.168.1.3
    192.168.1.4
    then there is the figure below :

Here Insert Picture Description
I asked in an external domain can access only, access to which server is placed in the end the choice by the master terminal itself, which is very helpful server load balancing, improve service reliability;

4. The implementation of distributed lock

  • The concept lock in this I will not say, can own Baidu to understand, here we can use ZooKeeper to implement a distributed lock, that's how to do it? ? Here's a look:
    If the system server1, server2, server3 went to visit / locks node
    Here Insert Picture Description
    access time will be created with a sequence number of temporary / short (EPHEMERAL_SEQUENTIAL) node, for example, serve1 created id_000000 node, serve2 created id_000002 node, serve3 created id_000001 node.
    Here Insert Picture Description

  • Then, get all the child nodes (id_000000, id_000001, id_000002) in / lock node, create their own judgment is not the smallest one node

    If so, get a lock.

    Release the lock: after performing the operation, the node created to delete

    If not, the monitor changes than their smaller nodes 1

for example:

  • server1 to get all child nodes under / lock node, by comparison, found himself (id_000000), is the smallest of all the child nodes. So get lock

  • server2 get all child nodes under / lock node, by comparison, the smallest found themselves (id_000002), not all child nodes. So listeners than their small node id_000001 1 state

  • server3 get all child nodes under / lock node, by comparison, the smallest found themselves (id_000001), not all child nodes. So listening state than their small node id_000000 1
    ......

  • Wait until the system server1 performing the operation, the node create yourself delete (id_000000). By listening, server3 found id_000000 node has been deleted, found himself the smallest node, and then successfully got the lock
    ... .server2 above

5. cluster status

  • In many distributed environment in which we need to monitor the status of the entire cluster, such as that machine is down, which machine has restarted, which require us to monitor, another example hadoop in HA (high availability) mode have namenode multiple nodes between these namenode there from the main points, only one of the many namnode namenode in the active state, the other all in the standby state, only when namenode problem in the active state, will be in standby state the namenode re-elected as a namenode out of active, monitor active-namenode of this among, and behind namnode elections need to use zookeeper to achieve, on the following simple terms what zookeeper how to monitor the state of the cluster ;

Here Insert Picture Description

Each machine in a cluster on / cluster node maintains a child node, if server1 hung up, then it will be automatically deleted / cluster / s1 node, then the whole system by all sub-node / cluster under listens in on it among know the status of all cluster machines;
the electoral process in active-namenode hadoop previously mentioned fact can also be achieved with reference to earlier distributed lock manner;

  • First, the current active-namenode hung up, then down through the sub-node listener / cluster node can know the news.
  • Namenode then notifies other cluster is still alive know the news, then they will be registered in each zookeeper in a use for the elections and there is a temporary node order, and then by whether the node comparing himself created the smallest number ;
    (1) if it is then he can switch identities from the standby state to the active state
    (2) is not the most hours he would give up the fight for

Third, the performance situation

  • zoookeeper start date of the project itself since starting goal is a high-performance distributed coordination framework, based on the experimental point of view of Yahoo to do, it is also true:
    Here Insert Picture Description
    the number of reads over writes the number of applications, due to the write operation will involve All synchronization state of the server, so the operation is particularly cost performance. (But coordination services usually read more frequently than the write count.)

Fourth, the simple API

One of the design goals of ZooKeeper is to provide a very simple programming interface. Therefore, it only supports the following operations:

  • create: create a node in the tree somewhere

  • delete: delete nodes

  • exists: test node exists in a certain location

  • get data: reading data from node

  • set data: the data is written into the node

  • get children: get a list of child nodes of node

  • sync: waiting for data dissemination

Today's talk a bit simple working principle and use zookeeper, in fact, in practice it is far from simple as mentioned above, to really want to learn in-depth understanding zookeeper also need to see some professional documents, can be more comprehensive understanding zookeeper, if you find any problems arise Bowen, welcome message in the comments area to explore.

Published 27 original articles · won praise 62 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_42359956/article/details/105403277