ZooKeeper getting started guide

ZooKeeper getting started guide


Start: coordinate distributed programs through zookeeper

This document contains helpful information to get you started using zookeeper quickly. This article is mainly for developers who want to try to use zookeeper at the beginning. It contains some simple examples, using only one zookeeper server, some commands to confirm that the server is running, and a simple program example. At the end of the article, for convenience, there are also some content that considers some relatively complicated examples. For example, deploy in replication mode to optimize transaction logs. But if you want to apply it to commercial projects. Please refer to ZooKeeper Administrator Guide

prerequisites

Refer to the system requirements in the management manual

download

Download the stable release of Zookeeper from the Apache mirror site .

Stand-alone operation

Starting the zookeeper server in stand-alone mode is simple. The zookeeper service is contained in a single jar file, so the installation process includes setting up the configuration.

Once you have downloaded the stable version of Zookeeper, you can unzip it and enter its root directory.

To start zookeeper you first need a configuration file. Here is an example, you can create it in conf/zoo.cfg :

tickTime=2000 
dataDir=/var/lib/zookeeper
clientPort=2181

You can name this file whatever you want, but for clarity of description, we will call it conf/zoo.cfg. Set the value of dataDir to an existing (initially empty) directory. The following is an introduction to the meaning of each field:


The time unit used by tickTime zookeeper is milliseconds. This time is like the heartbeat time of zookeeper, and the timeout range of the minimum session unit is twice this time.

The dataDir
directory is the location where the database snapshots are stored in the memory. Unless you specify otherwise, the logs of transactions that update the database are also in this location.

clientPort
client connection listening port.

Now that you have created this configuration file, you can start Zookeeper:

bin/zkServer.sh start

ZooKeeper's logging is done through log4j- please refer to the developer manual for more explanation about logging . You will see the log information is sent to the console and/or log file according to the log4j configuration.

The startup zookeeper described here is in singleton mode. No replication is done here, so if the zookeeper process fails, the service will hang. Singleton mode startup is good for the development environment. If you want to start Zookeeper in replication mode, please see Running Replicated ZooKeeper .

Manage ZooKeeper storage

For the zookeeper service that runs in the production environment for a long time, the storage must be additionally managed (dataDir and logs). For more details about these, please see maintenance

Connect to ZooKeeper

bin/zkCli.sh -server 127.0.0.1:2181

Through this command, you can operate Zookeeper in a simple way similar to file operations.

Once you are connected, you can see something similar to the following:

Connecting to localhost:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
Welcome to ZooKeeper!
JLine support is enabled
[zkshell: 0]

If zookeeper is already running, you can connect through the following options

Run from the shell, type help to get a list of commands that the client can execute, as follows:

[zkshell: 0] help
ZooKeeper host:port cmd args
        get path [watch]
        ls path [watch]
        set path data [version]
        delquota [-n|-b] path
        quit
        printwatches on|off
        create path data acl
        stat path [watch]
        listquota path
        history
        setAcl path acl
        getAcl path
        sync path
        redo cmdno
        addauth scheme auth
        delete path [version]
        deleteall path
        setquota -n|-b val path

You can try some simple commands to understand this simple command line. First, start by understanding the list of commands, such as ls, as follows:
[zkshell: 8] ls /
[zookeeper]

Next, create a new znode by running create /zk_test my_data. This will create a new znode and associate it with the string data "my_data" . You can see the following results:

[zkshell: 9] create /zk_test my_data
Created /zk_test

Run the ls / command to see the current directory situation:

[zkshell: 11] ls /
[zookeeper, zk_test]

Note that this zk_test directory has been created.

Next, you can confirm the data associated with this node by running the get command, as follows:

[zkshell: 12] get /zk_test
my_data
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 5
mtime = Fri Jun 05 13:57:06 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0
dataLength = 7
numChildren = 0

We can also change the data associated with zk_test through the set command, as follows:

[zkshell: 14] set /zk_test junk
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 6
mtime = Fri Jun 05 14:01:52 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0
dataLength = 4
numChildren = 0
[zkshell: 15] get /zk_test
junk
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 6
mtime = Fri Jun 05 14:01:52 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0
dataLength = 4
numChildren = 0

(Note that we ran the get command after setting the data of zk_test, and the data did change).
Finally, delete the node zk_test through delete:

[zkshell: 16] delete /zk_test
[zkshell: 17] ls /
[zookeeper]
[zkshell: 18]

For more information, please see the Programmer's Guide .

Programming ZooKeeper

ZooKeeper has bindings for Java and C. They are ultimately equivalent. C binding has two variants: single-threaded and multi-threaded. They only differ in how the message loop is implemented. For more information, please refer to the programming examples in the Zookeeper programming manual for sample codes that use different APIs.

Run zookeeper in replication mode

Running zookeeper in singleton mode is very convenient for evaluation test development. But in a production environment you should run in replication mode. The replicated server group in the same application is called quorum . In replication mode, all servers have the same configuration file in quorum.

Note: In replication mode, at least three servers are required, and it is strongly recommended to use an odd number of servers. If you only have two servers, you will be trapped when one server fails and there are not enough machines to generate a quorum that can obtain a majority vote. Two servers inherit the inherent weak stability of a single server, because it will cause two single points of failure.

Every zookeeper server has the same configuration file. This file is similar to the configuration file used in the singleton mode described above, except for a little difference, as follows:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

The new attribute, initLimit is used to define the timeout period for zookeeper to connect to the leader. The attribute syncLimit limits a leader's stale time.
For these two timeouts, you can also specify the time unit of tickTime to be measured. For example, initLimit is 5 ticks, and each tick is 2000 milliseconds, which is 10 seconds.
The attribute server.X lists the composition of the zookeeper service. When the server starts, find out which server it is by looking for the myid file in the data directory. This file contains the server number encoded in ASCII.
Finally, pay attention to the two port numbers after each server name: "2888" and "3888". Establish a connection with each other through this form of port. Such a connection is necessary for communication, for example, when updating data sequentially. Especially when the zookeeper server uses this port to connect followers to the leader. When a new leader is born, followers will open the TCP protocol to connect to the leader through this port number. Because the default leader election also uses the tcp protocol, we must require another port for leader election. It is the second port number of the attribute server.

Note: If you want to test multiple zookeeper services on one machine, you need to indicate the unique cluster name localhost and those leader election ports (for example, 2888:3888, 2889:3889, 2890:3890 in the following example). It is also necessary to isolate each dataDir directory and different port numbers. (In this example of running in replicated mode, each running on a single machine has a configuration file)
Please note that setting up multiple servers on a single computer will not create any redundancy. If something happens that causes the machine to go down, all zookeeper servers will go offline. Full redundancy requires each server to have its own machine. It must be a completely independent physical server. Multiple virtual machines on the same physical host are still vulnerable to the complete failure of the host.

Other optimizations

Other configuration parameters can improve performance:
-In order to reduce delays and fast updates, it is important to have a dedicated transaction log directory. The default transaction log file is put together with the data snapshot and myid file. The parameter dataLogDir can specify another place to store the transaction log.


Guess you like

Origin blog.csdn.net/killingbow/article/details/53113966