hadoop cluster zookeeper migration

  1. The role of zookeeper

    The main applications of ZooKeepr in Hadoop are:

    1.1 HA of NameNode in HDFS and HA of ResourceManager in YARN.

    1.2 Store RMStateStore state information

  2. Reasons for Migration

    The original solution is to consider that the namenode occupies a small amount of CPU resources. In order to improve the utilization rate of the equipment, the zookeeper and the namenode are built on the same device. After testing, it was found that the namenode would cache a large amount of data in the memory, which would cause the zookeeper response time to become longer, and the namenode and resourcemanager would switch frequently due to the timeout of the connection to the zookeeper. After discussion, migrate zookeeper to datanode node.

  3. Migration steps

    3.1 Backup the zookeeper configuration in the original zookeeper cluster device, and the configuration of the two namenode nodes in the hadoop cluster

cp -r zookeeper-3.4.10/conf zookeeper-3.4.10/conf.bak
cp -r hadoop-2.6.0/etc/hadoop hadoop-2.6.0/etc/hadoop.bak   

    3.2 Copy the zookeeper installation package to the selected three datanode devices

scp zookeeper-3.4.10.tar.gz datanode1:/home/hadoop
scp zookeeper-3.4.10.tar.gz datanode2:/home/hadoop
scp zookeeper-3.4.10.tar.gz datanode3:/home/hadoop

    3.3 Modify the configuration file, start zookeeper, and check the status of zookeeper

scp namenode:/home/hadoop/zookeeper-3.4.10/conf/zoo.cfg datanode1:/home/hadoop/zookeer-3.4.10/conf
scp namenode:/home/hadoop/zookeeper-3.4.10/conf/zoo.cfg datanode3:/home/hadoop/zookeer-3.4.10/conf
scp namenode:/home/hadoop/zookeeper-3.4.10/conf/zoo.cfg datanode3:/home/hadoop/zookeer-3.4.10/conf
#Modify the ip address in the configuration file to the device ip of the new zookeeper cluster
vi zoo.cfg
server.1=datanode1:2888:3888
server.2=datanode2:2888:3888
server.3=datanode3:2888:3888
#Create a myid file, the my.id of each zookeeper should be consistent with the server.id in the configuration file
vi myid
 1 
vi myid
 2 
vi myid
 3
#start zookeeper
bin/zkServer.sh start
# After the zookeeper is all started, check the zookeeper status
bin/zkServer.sh status

    3.4 Modify hadoop related configuration files and restart resourcemanger

      Modify the zookeeper addresses in the hdfs-site.xml, core-site.xml, yarn-site.xml configuration files.

cd /home/hadoop/hadoop-2.6.0
sbin/yarn-demon.sh stop resourcemanger
sbin/yarn-demon.sh start resourcemanger

    3.5 Stop zkfc, namenode application 

sbin/hadoop-demon.sh stop zkfc
sbin/hadoop-demon.sh stop namenode  

    3.6 Format zkfc, start namenode, zkfc

bin/hdfs zkfc –formatZK
sbin/hadoop-demon.sh start zkfc
sbin/hadoop-demon.sh start namenode

    3.7 Testing hadoop and yarn availability

#Check whether namenode:50070 and namenode:8088webui are normal
#Upload test files to hdfs to test hdfs availability
hdfs dfs -put test.txt /user/
#Execute wordcount test yarn availability
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/test.txt /user/output

 

  4. Summary

    Zookeeper has high requirements for network, disk and memory response, and cannot share hosts with applications with high network, disk and memory usage. It is best to use a separate device.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325812533&siteId=291194637