1. The role of zookeeper
The main applications of ZooKeepr in Hadoop are:
1.1 HA of NameNode in HDFS and HA of ResourceManager in YARN.
1.2 Store RMStateStore state information
2. Reasons for Migration
The original solution is to consider that the namenode occupies a small amount of CPU resources. In order to improve the utilization rate of the equipment, the zookeeper and the namenode are built on the same device. After testing, it was found that the namenode would cache a large amount of data in the memory, which would cause the zookeeper response time to become longer, and the namenode and resourcemanager would switch frequently due to the timeout of the connection to the zookeeper. After discussion, migrate zookeeper to datanode node.
3. Migration steps
3.1 Backup the zookeeper configuration in the original zookeeper cluster device, and the configuration of the two namenode nodes in the hadoop cluster
cp -r zookeeper-3.4.10/conf zookeeper-3.4.10/conf.bak cp -r hadoop-2.6.0/etc/hadoop hadoop-2.6.0/etc/hadoop.bak
3.2 Copy the zookeeper installation package to the selected three datanode devices
scp zookeeper-3.4.10.tar.gz datanode1:/home/hadoop scp zookeeper-3.4.10.tar.gz datanode2:/home/hadoop scp zookeeper-3.4.10.tar.gz datanode3:/home/hadoop
3.3 Modify the configuration file, start zookeeper, and check the status of zookeeper
scp namenode:/home/hadoop/zookeeper-3.4.10/conf/zoo.cfg datanode1:/home/hadoop/zookeer-3.4.10/conf scp namenode:/home/hadoop/zookeeper-3.4.10/conf/zoo.cfg datanode3:/home/hadoop/zookeer-3.4.10/conf scp namenode:/home/hadoop/zookeeper-3.4.10/conf/zoo.cfg datanode3:/home/hadoop/zookeer-3.4.10/conf #Modify the ip address in the configuration file to the device ip of the new zookeeper cluster vi zoo.cfg server.1=datanode1:2888:3888 server.2=datanode2:2888:3888 server.3=datanode3:2888:3888 #Create a myid file, the my.id of each zookeeper should be consistent with the server.id in the configuration file vi myid 1 vi myid 2 vi myid 3 #start zookeeper bin/zkServer.sh start # After the zookeeper is all started, check the zookeeper status bin/zkServer.sh status
3.4 Modify hadoop related configuration files and restart resourcemanger
Modify the zookeeper addresses in the hdfs-site.xml, core-site.xml, yarn-site.xml configuration files.
cd /home/hadoop/hadoop-2.6.0 sbin/yarn-demon.sh stop resourcemanger sbin/yarn-demon.sh start resourcemanger
3.5 Stop zkfc, namenode application
sbin/hadoop-demon.sh stop zkfc sbin/hadoop-demon.sh stop namenode
3.6 Format zkfc, start namenode, zkfc
bin/hdfs zkfc –formatZK sbin/hadoop-demon.sh start zkfc sbin/hadoop-demon.sh start namenode
3.7 Testing hadoop and yarn availability
#Check whether namenode:50070 and namenode:8088webui are normal #Upload test files to hdfs to test hdfs availability hdfs dfs -put test.txt /user/ #Execute wordcount test yarn availability hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/test.txt /user/output
4. Summary
Zookeeper has high requirements for network, disk and memory response, and cannot share hosts with applications with high network, disk and memory usage. It is best to use a separate device.