ZooKeeper 01 - What is deploying ZooKeeper ZooKeeper + cluster

1. What is ZooKeeper

ZooKeeper literal translation is Zookeeper, which is used to pipe Hadoop (elephant), Hive (Bee), Pig (pig), etc. administrator (these are the technical aspects of big data frameworks Apache Software Foundation's), Apache Hbase and Apache Solr are distributed cluster uses ZooKeeper.

(1) Baidu Encyclopedia explained:

  • ZooKeeper is a high-availability, distributed program coordination services, is an open source implementation of Google's Chubby, is a key component of Hadoop and Hbase.

  • It is to provide a consistent service for distributed applications, provides features include: configuration maintenance, domain name service, distributed synchronization, group services.

  • ZooKeeper goal is to better encapsulate complex error-prone critical services, the easy to use interface and efficient performance, function and stability of the system to the user.

(2) ZooKeeper basic operation process:

① election Leader;
② synchronous data;
there are ③ elections Leader of the algorithms for many, but to achieve the election criteria are consistent;
④ Leader to have the highest execution ID, similar root privileges;
⑤ Most machines in the cluster and get a response accepting Leader elected.

2.ZooKeeper function

2.1 Configuration Management

Most projects involve the development of a variety of configuration information, such as JDBC connection information, etc. This information is generally set to a specific file, the introduction of the relevant configuration files in your code.

- This is a single-server applications, a common practice when a large application, configuration files a lot, especially in distributed projects, configure multiple servers need to be consistent.
If the configuration information is frequently modified, this time also use the configuration file is not a good idea.

- You can not modify it manually one by one is not only repetitive work too much, and a greater possibility of error, maintenance cost too much?.

In this case often need to find a way to centrally manage configuration - modify the configuration in this central place, equipped with all the dependent services are available to change.

To consider is that, due to the items on multiple servers rely on this configuration, in order to reliably run the application, you need to provide this centralized configuration service applications with very high reliability.

Based on the above analysis, we can provide the service through a cluster configuration, to ensure system reliability. At this legacy problem is how to ensure consistency in the cluster to configure it?

In order to provide this consistency, coherence protocol predecessors put forward to achieve this service agreements have ZooKeeper - Zab it using this coherency protocol to ensure consistency.

Scenario:

○ HBase, the client is connected to a ZooKeeper, it obtains the necessary configuration information HBase cluster before it can be further manipulated.
○ open source message queue Kafka, a ZooKeeper to maintain information broker's.
Extensively SOA framework Dubbo ○ Alibaba open source ZooKeeper configuration using management information to achieve service governance.

2.2 Naming Service

Scene: In order to access the system via a network, we need to know each other's IP address, but the IP address is a series of numbers that are difficult to remember, not user-friendly when people come up through the domain name to access the specified IP address.

But the computer does not recognize the domain name. To solve this problem, designers have proposed in each computer is stored in a "domain name to IP address mapping" program. Question again, if the domain name corresponding to the IP address occurs changes, how should map it?

Predecessors designed a DNS (Domain Name System, the domain name system). We only need to access a first all machines are aware of the (known) nodes, DNS to tell us what domain name through the node corresponding IP address of the current visit is also DNS is to provide a unified access interface.

Such problems also exist in application development, especially in the presence of a large number of service applications, we will address if the service is stored locally, other users are not obtaining these addresses and access. But if we are to provide a unified user inlet, a variety of user requests for locally mapping process accordingly, can solve such problems.

2.3 Distributed Lock

ZooKeeper is a distributed coordination service, we can use ZooKeeper to coordinate activities between multiple distributed processes.

For example, in a distributed environment, to improve the reliability of the system, the cluster each server deployed the same service.

These same services have to perform the same tasks, in order to ensure consistency between the data, the cluster will coordinate with each other, conventional programming solution to the coordination problem is very complex and cumbersome.

The usual practice is: the use of distributed lock, only one service at a time at work, this service when a problem with the lock is released immediately, and fail over to another service of this design is called called Leader Election (leader. elections), such as the HBase Master on the use of this mechanism.

Note: Distributed Lock and lock process is different, use to be more cautious.

2.4 Cluster Management

In a distributed cluster applications, such as the presence of hardware and software failures, power outages, network and other issues, there is the phenomenon of nodes out that the new node joins the cluster, the old node from the cluster. In these cases, the other nodes in the cluster to be able to perceive this kinds of changes, and then make the corresponding decisions based on this change.

Scenario:

○ In a distributed storage system, there is a central control node is responsible for allocating storage, new storage nodes to join, you need to dynamically allocate storage based on a cluster node status, which requires real-time status of a cluster-aware.
○ distributed SOA architecture, services are provided by a cluster, when consumers access to a service, we need some mechanism to find which nodes in the cluster can provide the service (which is also known as service discovery, such as open source SOA framework Dubbo Alibaba ZooKeeper on the use as the underlying mechanism for service discovery).
○ Kafka open message queue by the Consumer ZooKeeper downline of management.

3 ZooKeeper cluster deployment

3.1 Download and unzip the installation package

ZooKeeper Download: http://hadoop.apache.org/zookeeper/releases.html.

#下载后, 上传至特定目录, 这里上传至/data/zookeeper下: 
mkdir -p /data/zookeeper && cd /data/zookeeper
#解压ZooKeeper安装包: 
tar -zxf zookeeper-3.4.10.tar.gz

3.2 Creating data and datalog directory

#进入ZooKeeper安装目录
cd zookeeper-3.4.10

#data为ZooKeeper数据存放目录, datalog为ZooKeeper日志存放目录
#若不指定datalog, 默认与数据存放目录一致
mkdir data datalog

#赋予当前用户写权限
chmod 644 data datalog 

3.3 Creating a file myid

Myid created data file in the directory, the file contains only a single line: the number corresponding to the node id as the content server.1 server.id myid node is a file.

#将编号写入到myid文件中
echo 1 > /data/zookeeper/data/myid

#查看写入是否成功
cat /data/zookeeper/data/myid

3.4 modify the configuration file zoo.cfg

(1) modified as follows:

cd /data/zookeeper/zookeeper-3.4.10/conf

#拷贝文件, 重命名为zoo.cfg
cp zoo_sample.cfg zoo.cfg

#修改zoo.cfg文件: 
vim zoo.cfg

#添加如下内容: 
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/datalog
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

(2) the configuration of zoo.cfg following documents:

  # 基本事件单元(毫秒), 用来控制心跳和超时. 
  tickTime=2000 
  
  # 集群中有多台Server, 其中一台为Leader, 其余Server为Follower. initLimit参数指定Follower连接并同步到Leader的初始化心跳时间(即最长通信时间), 以tickTime的倍数表示, 超过该时间则连接失败.
  initLimit=5
  
  # Leader与Follower之间发送消息时, 请求和应答的最大时间, 是tickTime的倍数. 如果Follower在设置的时间内不能与Leader建立通信, 此Follower将被丢弃. 
  syncLimit=2
  
  # 存放ZooKeeper运行时数据的目录, 需要提前建立. 
  dataDir=/data/zookeeper/data
  
  # log目录, 如果没有设置该参数, 默认使用dataDir的目录, 需要提前建立. 
  # 应当谨慎选择日志目录, 使用专用的日志存储设备能很大程度提高系统的性能. 
  dataLogDir=/data/zookeeper/datalog
  
  # 监听client连接的端口号. 
  clientPort=2181
  
  # 设置连接到ZooKeeper的客户端的最大数量(限制并发连接的数量, 它通过IP来区分不同的客户端). 此配置选项可以用来阻止某些类别的Dos攻击, 将它设置为0或不设置将会取消对并发连接的限制. 
  maxClientCnxns=0
  
  # 最小的会话超时时间, 默认为 2 * tickTme 时间
  minSessionTimeout=4000
  
  # 最大的会话超时时间默认情况下为 20 倍的会话超时时间
  maxSessionTimeout=10000
  
  # 集群中各个节点的信息(server.id=ip:port1:port2)
  server.1=zoo1:2888:3888 
  server.2=zoo2:2888:3888 
  server.3=zoo3:2888:3888

(3) About server.id = host: port1: port2 Description:

id is the number of each ZooKeeper node, save the file in myid under dataDir directory;

zoo1 ~ zoo3 represents hostname or IP address of each node ZooKeeper mapping relationship is provided in the system file / etc / hosts in;

Leader port1 designated port communicating with the Server to be used in the cluster;

port2 specify the port while the cluster elections Leader used.

(4) common error description:

○ clientPort can not be port1, port2 same. Otherwise, the cluster will not start.
○ use of a pseudo-distributed configuration (ie analog cluster configuration on one server), the port1 and port2 each Server can not be the same.
○ If the pseudo-distribution type configuration, dataDir dataLogDir also need to make the different configurations.

3.5 Deployment Services on other nodes

#拷贝ZooKeeper文件夹到其他服务器(zoo2和zoo3) --- 需要确保相应的路径/data/zookeeper存在: 
scp -r /data/zookeeper zoo2:/data/
scp -r /data/zookeeper zoo2:/data/

#分别在zoo2和zoo3服务器上修改ZooKeeper的myid: 
echo 2 > /data/zookeeper/zookeeper-3.4.10/data/myid
echo 3 > /data/zookeeper/zookeeper-3.4.10/data/myid

4 Start ZooKeeper cluster

4.1 turn off the firewall

ZooKeeper client uses the port number 2181, in order to be able to use outside normal Zookeeper, need to open port number 2181, or turn off the firewall:

(1) 7 CentOS system before the command:

#查看防火墙状态: 
service iptable status

#临时关闭防火墙: 
service iptables stop

#永久关闭防火墙(禁止开机启动): 
chkconfig iptables off

(2) CentOS 7 start using systemctl to manage services and programs, including service and chkconfig:

#查看防火墙状态: 
systemctl status firewalld.service

#临时关闭防火墙: 
systemctl stop firewalld.service

#永久关闭防火墙(禁止开机启动): 
systemctl disable firewalld.service 

4.2 start ZooKeeper cluster

(1) start-up procedure:

#依次进入三台服务器, 执行下述命令: 
cd /usr/local/zookeeper-3.4.10/bin
./zkServer.sh start

#查看ZooKeeper运行状态: 
./zkServer.sh status

(2) Common mistakes: when you view the status of ZooKeeper, you may find Console throws an error:

[root@localhost bin]# ./zkServer.sh status 
JMX enabled by default 
Using config: /data/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg 
Error contacting service. It is probably not running 

Cause: here is the deployment of a cluster, another server has not been started, the current node in accordance with a request to initiate elections Leader of the list of services zoo.cfg configured, unable to communicate with other nodes in the cluster, so throw an error.

Error Resolution: After starting the second stage ZooKeeper service, Leader will be selected, the error will disappear because ZooKeeper cluster, if there are 2n + 1 servers, which allows service station n hang up without affecting service.

4.3 ZooKeeper common commands

#启动服务: 
sh zkServer.sh start 

#查看服务状态: 
sh zkServer.sh status 

#停止服务: 
sh zkServer.sh stop 

#重启服务: 
sh zkServer.sh restart  
Published 46 original articles · won praise 27 · views 160 000 +

Guess you like

Origin blog.csdn.net/shichen2010/article/details/104550403