Zookeeper entry and stand-alone and clustered environment to build

1.Zookeeper Profile

Zookeeper is a distributed service framework, used to be a subproject of Apache Hadoop, is now a separate top-level project of the Apache, it is mainly used to solve some of the data management problems often encountered in distributed applications, such as: Uniform Naming Service , state synchronization service, cluster management, and management of distributed application configuration items. Issues related to distributed please refer to Part I blog: distributed system problems and solutions

2. Design goals

  • ZooKeeper simple. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace, the namespace is organized similar to a standard file system. Namespace by the data register (in ZooKeeper view, called znode), which are similar to files and directories. A typical file and is designed for storage of different systems, ZooKeeper data remains in memory, which means that ZooKeeper can achieve high throughput and low latency number.
    ZooKeeper features include high-performance, high availability, strict and orderly. ZooKeeper performance means that it can be used in large-scale distributed systems. Reliability that it does not become a single point of failure. Strict and orderly means that you can implement complex synchronization primitives on the client.
    Here Insert Picture Description
  • ZooKeeper can be copied. Like its coordination of distributed processes, like, ZooKeeper itself can be replicated on a group of hosts called the collection. Servers ZooKeeper services must understand each other. They maintain the state of the image in memory, as well as transaction logs and snapshots persistent storage. As long as most servers are available, ZooKeeper service will be available. ZooKeeper connected to a single client server. Client maintains a TCP connection by sending a request it, get the response, acquire monitor events and send a heartbeat. If the server TCP connection is lost, the client will connect to other servers.
    Here Insert Picture Description
  • ZooKeeper orderly. With a ZooKeeper ZooKeeper mark all the numbers reflect the sequence of transactions each update. Subsequent operations can use the command to achieve a higher level of abstraction, such as synchronization primitives.
  • ZooKeeper soon. In the "Read first" workloads, it is particularly fast. ZooKeeper applications can run on thousands of computers, and in a case where the reading is more common than writing, for optimal performance, a ratio of about 10: 1.

More details, please refer to the official document: http://zookeeper.apache.org/doc/r3.6.0/zookeeperOver.html

3. Zookeeper download and install

3.1 Download

Download: http://zookeeper.apache.org/releases.html , click on the link interface, you can choose to download the latest release version directly.
Here Insert Picture Description

3.2 extract the path to their own habits

Here Insert Picture Description

3.3 modify the configuration

Into the conf directory, copy zoo_sample.cfg to zoo.cfg
Here Insert Picture Description

# 这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每间隔 tickTime 时间就会发送一个心跳,单位毫秒。
tickTime=2000

# 这个配置项是用来配置 Zookeeper Leader接受Follower初始化连接时最长能忍受多少个心跳时间间隔数。当已经超过 10个心跳的时间(也就是 tickTime)长度后 Zookeeper Leader还没有收到Follower的返回信息,那么表明这个Follower连接失败。总的时间长度就是 10*2000=20 秒
initLimit=10

# 这个配置项标识 Leader 与 Follower 之间发送消息,请求和应答时间长度,最长不能超过多少个 tickTime 的时间长度,总的时间长度就是 5*2000=10秒
syncLimit=5

# 顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
dataDir=/FreeofInstallation/apache-zookeeper-3.6.0-bin/data

# 这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。
clientPort=2181

#单个客户端与单台服务器之间的连接数的限制,是ip级别的,默认是60,如果设置为0,那么表明不作任何限制。请注意这个限制的使用范围,仅仅是单台客户端机器与单台ZK服务器之间的连接数限制,不是针对指定客户端IP,也不是ZK集群的连接数限制,也不是单台ZK对所有客户端的连接数限制。
#maxClientCnxns=60

#这个参数和下面的参数搭配使用,这个参数指定了需要保留的文件数目。默认是保留3个,也是3.4以后才有的。
#autopurge.snapRetainCount=3

#3.4.0及之后版本,ZK提供了自动清理事务日志和快照文件的功能,这个参数指定了清理频率,单位是小时,需要配置一个1或更大的整数,默认是0,表示不开启自动清理功能。
#autopurge.purgeInterval=1

3.4 Starting and stopping services

Zookeeper extract the package into the bin directory, or it can be added to the path environment variable, easier to use (windows environment using the suffix .cmd script).

Start Service

./zkServer.sh start
#启动成功之后会显示如下信息
#Starting zookeeper ... STARTED

Out of service

./zkServer.sh stop
#停止服务后打印信息
#Stopping zookeeper ... STOPPED

Start the client

./zkCli.sh 
#连接成功会进入shell终端
#[zk: localhost:2181(CONNECTED) 0] 

4. Stand-alone cluster to build Zookeeper

4.1 build clusters

Enter the conf directory and copy the three profiles

cp zoo.cfg zoo-1.cfg
cp zoo.cfg zoo-2.cfg
cp zoo.cfg zoo-3.cfg

Create a data directory

mkdir /FreeofInstallation/zookeeper/data-1 -p
mkdir /FreeofInstallation/zookeeper/data-2 -p
mkdir /FreeofInstallation/zookeeper/data-3 -p

Create a log directory

mkdir /FreeofInstallation/zookeeper/log-1 -p
mkdir /FreeofInstallation/zookeeper/log-2 -p
mkdir /FreeofInstallation/zookeeper/log-3 -p

Creating myid

echo "1" > /FreeofInstallation/zookeeper/data-1/myid
echo "1" > /FreeofInstallation/zookeeper/data-2/myid
echo "1" > /FreeofInstallation/zookeeper/data-3/myid

Modify three configuration files: zoo-1.cfg, zoo-2.cfg, zoo-3.cfg

tickTime=2000
initLimit=10
syncLimit=5
#数据路径
dataDir=/FreeofInstallation/zookeeper/data-1
#日志路径
dataLogDir=/FreeofInstallation/zookeeper/log-1
clientPort=2181

#server.x中的x要和刚设置的myid文件内容一致;
#前面的端口用于同步数据通信,后面的端口用于选举投票通信
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
tickTime=2000
initLimit=10
syncLimit=5
#数据路径
dataDir=/FreeofInstallation/zookeeper/data-2
#日志路径
dataLogDir=/FreeofInstallation/zookeeper/log-2
clientPort=2182

#server.x中的x要和刚设置的myid文件内容一致;
#前面的端口用于同步数据通信,后面的端口用于选举投票通信
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
tickTime=2000
initLimit=10
syncLimit=5
#数据路径
dataDir=/FreeofInstallation/zookeeper/data-3
#日志路径
dataLogDir=/FreeofInstallation/zookeeper/log-3
clientPort=2183

#server.x中的x要和刚设置的myid文件内容一致;
#前面的端口用于同步数据通信,后面的端口用于选举投票通信
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889

Start the cluster

./zkServer.sh start ../conf/zoo-1.cfg
./zkServer.sh start ../conf/zoo-2.cfg
./zkServer.sh start ../conf/zoo-3.cfg

Check node status verification (2 Leader, 1,3 is Follower)

./zkServer.sh status../conf/zoo-1.cfg
./zkServer.sh status../conf/zoo-2.cfg
./zkServer.sh status../conf/zoo-3.cfg

Here Insert Picture Description

Increase Observer node
other steps are the same, but the configuration file (zoo-ob.cfg) is somewhat different

tickTime=2000
initLimit=10
syncLimit=5
#数据路径
dataDir=/FreeofInstallation/zookeeper/data-4
#日志路径
dataLogDir=/FreeofInstallation/zookeeper/log-4
clientPort=2184
#指定是observer节点
peerType=observer
#server.x中的x要和刚设置的myid文件内容一致;
#前面的端口用于同步数据通信,后面的端口用于选举投票通信
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
server.4=localhost:2886:3886:observer

The same direct way to start the node, and view node status
Here Insert Picture Description

4.2 cluster roles

  • zookeeper leader is the core of the cluster, responsible for initiating and decision-making vote, handling client requests and final resolution.
  • follower non-transactional processing client requests, and forwards the request to the vote affairs leader server while participating in leader election.
  • observer observed zookeeper cluster and these latest changes state to state synchronization server on the observer does not participate in the voting process. Basically the same observer works with follower role, and it is the role of follower and the only difference is that observer does not participate in the voting of any kind, including things that request Polls elections and leader of the Proposal. In simple terms, observer server provides non things only service requests, usually lies in the ability to enhance the cluster without affecting the non-clustered transaction processing capabilities to handle things.
  • learner and leader synchronization server a state referred to as learner and follower observer and above all learner.

What do 5. Zookeeper

  • Naming Service (Name Service)
    mainly as a distributed naming service, by calling create node api zk can be very easy to create a globally unique path, this path can be used as a name. These paht has a hierarchical structure, is very easy to understand and manage.
  • Configuration Management (Configuration Management)
    Configuration management is very common in distributed application environments, such as the need for multiple Server to run the same applications, but some configurations in the application system is the same, if you want to modify these same configuration items , then you must also modify the configuration on each computer that runs the system Server, this is very cumbersome and error-prone.
    Configuration information such as this can be managed to Zookeeper, the configuration information is stored in a node Zookeeper, and then all will need to modify the application configuration monitoring machine status information, once the configuration information changes, each application on the machine Zookeeper will be notified, and then acquire new configuration information from Zookeeper applied to the system.
  • Cluster Management (Group Membership)
    ZooKeeper cluster management mainly on two points: whether there is a cluster monitoring machine quit and join the electoral master.
    On the first point, past practice is usually: monitoring system by some means (such as ping) to detect the timing of each machine or each machine its own regular reports to the monitoring system, "I'm alive." This approach works, but there are two obvious problems: 1) there is a time machine in the cluster changes, more implicated modify things. 2) a certain delay.
    Use ZooKeeper has two characteristics, it can be another cluster real-time machine monitoring system survivability: all machines agreed to create a temporary directory node in the parent directory (such as / GroupMembers), then listens for the child node of the parent directory node change message. Once the machine hang up, the machine is connected to the zookeeper's off, it creates a temporary directory node is removed, all other machines have notified: a directory is deleted, that is a machine hung up. The new machine is similar to join.
    For the second point, in a distributed environment, the same business applications distributed on different machines, some business logic (e.g., a number of time-consuming calculation, network I / O process), often only make a whole cluster execution machine, the machine can share the rest of this result, which can greatly reduce duplication and improve performance, so this election is the master of the main problems encountered in this scenario. ZooKeeper use strong consistency, to ensure the nodes created under high concurrency distributed global uniqueness, namely: there are multiple client requests to create · / currentMaster node, ultimately only a certain client requests to create success. Using this feature, it can easily be selected clusters in a distributed environment.
  • Distributed Lock
    DLM: Distributed lock means that in a distributed environment, the protection of cross-process and cross hosts, shared resources across the network, to achieve mutually exclusive access, to ensure consistency. Distributed Lock mainly due to ZooKeeper for us to ensure the consistency of the data, that is, as long as the user fully convinced that all the time, on the same Znode zk any node in the cluster (a zk server) data is necessarily the same .

More detailed usage scenarios and ideas, please refer to: ZooKeeper (b) What ZooKeeper do?
Part II operations related to Bowen: Zookeeper client node and basic operation

Published 118 original articles · won praise 7 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_43792385/article/details/104833440