HA high availability architecture cluster deployment of big data Hadoop

1 Overview

Before Hadoop 2.0.0, there was only one NameNode in a Hadoop cluster, so the NameNode would have a single point of failure problem. Fortunately, this problem was solved after Hadoop 2.0.0, that is, it supports the HA high availability of the NameNode and the high availability of the NameNode. It is through redundant two NameNodes in the cluster, and the two NameNodes are deployed to different servers, one of which is in the Active state, and the other is in the Standby state. If the primary NameNode fails, the cluster will immediately switch to the other one. NameNode to ensure the normal operation of the entire cluster, then this article will mainly introduce how to build a Hadoop HA cluster.

This environment is completed on the basis of the previous article: https://my.oschina.net/feinik/blog/1621000  Big Data Platform Hadoop Distributed Cluster Environment Construction

2 List of cluster HA deployment nodes

Note: Hadoop version: Hadoop 2.7.5

NN (NameNode name node), DN (DataNode data node), ZK (Zookeeper), ZKFC (ZKFailoverController), JN (JournalNode metadata sharing node), RM (ResourceManager resource manager), DM (DataManager data node manager)

3 Install the Zookeeper cluster

(1) First install in slave1

    a) Download the zookeeper-3.4.9.tar.gz package

    b) Unzip: tar -zxvf zookeeper-3.4.9.tar.gz

    c)cd zookeeper-3.4.9/conf

    d)cp zoo_sample.cfg zoo.cfg

    e) Modify the zoo.cfg configuration file

tickTime=2000
dataDir=/home/hadoop/app/zookeeper/data
clientPort=2181
initLimit=10
syncLimit=5
server.1=slave1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888

(2) Copy the same copy of zookeeper-3.4.9 to slave2 and slave3 services

(3) Configure the environment variables of Zookeeper and start them separately to complete the deployment of the Zookeeper cluster

4 Start configuration

(1) Modify hdfs-site.xml as follows:

<configuration>
	<!--HDFS HA的逻辑服务名称配置-->
	<property>
	  <name>dfs.nameservices</name>
	  <value>masters</value>
	</property>
	
	<!--NameNode的唯一标识服务名,注意这里namenodes.masters中的masters必须是上面的dfs.nameservices中的逻辑服务名,下面同理-->
	<property>
		<name>dfs.ha.namenodes.masters</name>
		<value>master1,master2</value>
	</property>
	
	<!--NameNode的rpc监听地址-->
	<property>
	  <name>dfs.namenode.rpc-address.masters.master1</name>
	  <value>master1:8020</value>
	</property>
	<property>
	  <name>dfs.namenode.rpc-address.masters.master2</name>
	  <value>master2:8020</value>
	</property>
	
	<!--NameNode的http地址配置-->
	<property>
	  <name>dfs.namenode.http-address.masters.master1</name>
	  <value>master1:50070</value>
	</property>
	<property>
	  <name>dfs.namenode.http-address.masters.master2</name>
	  <value>master2:50070</value>
	</property>
	
	<!--NameNode的edits文件的共享地址配置,及JournalNodes的节点配置-->
	<property>
	  <name>dfs.namenode.shared.edits.dir</name>
	  <value>qjournal://slave1:8485;slave2:8485;slave3:8485/masters</value>
	</property>
	
	<!--指定JournalNode在本地磁盘存放数据的位置-->
	<property>
	  <name>dfs.journalnode.edits.dir</name>
	  <value>/home/hadoop/app/hadoop-HA/data/journal</value>
	</property>
	
	<!--配置将由DFS客户端使用的Java类的名称,以确定哪个NameNode当前正在服务于客户端请求-->
	<property>
	  <name>dfs.client.failover.proxy.provider.masters</name>
	  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>
	
	<property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
	
    <!--HA切换时的免密登录的秘钥访问路径配置-->
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
	
	<!--开启NameNode失败自动切换-->
	<property>
	  <name>dfs.ha.automatic-failover.enabled</name>
	  <value>true</value>
	</property>
	
	<!--配置HDFS的DataNode的备份数量-->
	<property>
      <name>dfs.replication</name>
      <value>3</value>
    </property>

    <property>
      <name>dfs.namenode.name.dir</name>
      <value>/home/hadoop/app/hadoop-HA/data/hdfs/name</value>
    </property>

    <property>
      <name>dfs.datanode.data.dir</name>
      <value>/home/hadoop/app/hadoop-HA/data/hdfs/data</value>
    </property>
	
	<!--HDFS访问是否开启权限控制-->
    <property>
      <name>dfs.permissions.enabled</name>
	  <value>false</value>
    </property>
</configuration>

(2) Modify the core-site.xml file as follows:

<configuration>
	<!--这里的masters值需与hdfs-site.xml中的dfs.nameservices配置项的值相同-->
	<property>
	  <name>fs.defaultFS</name>
	  <value>hdfs://masters</value>
	</property>
	
	<!-- 配置hadoop的临时目录 -->
	<property>
	   <name>hadoop.tmp.dir</name>
	   <value>/home/hadoop/app/hadoop-HA/data/tmp</value>
    </property>
	
	<!--Zookeeper服务配置-->
	<property>
   	   <name>ha.zookeeper.quorum</name>
       <value>slave1:2181,slave2:2181,slave3:2181</value>
 	</property>

	<!--表示客户端连接服务器失败尝试的次数-->
	<property>
	   <name>ipc.client.connect.max.retries</name>
	   <value>10</value>
    </property>
	
	<!--表示客户端每次连接服务器需要等待的毫秒数,默认1s,如不配置则使用start-dfs.sh启动可能会导致NameNode连接JournalNode超时,如果单独一个个节点启动则无需此配置项-->
	<property>
	   <name>ipc.client.connect.retry.interval</name>
	   <value>10000</value>
	</property>
</configuration>

(3) Modify the slaves file as follows:

slave1
slave2
slave3

5 Initialize the cluster and start

(1) Start three JournalNodes respectively

$hadoop-daemon.sh start journalnode

(2) Initialize the NameNode in one of the NameNode nodes, here select the NameNode on master1

$hdfs namenode -format

(3) Start the NameNode service initialized in step 2

$hadoop-daemon.sh start namenode

(4) Run the following command in the master2 server to synchronize the metadata of the NameNode on master1

$hdfs namenode -bootstrapStandby

(5) Initialize the state of ZKFC in one of the NameNode nodes, here select the NameNode on master1

$hdfs zkfc -formatZK

(6) Start the HA cluster of Hadoop

$start-dfs.sh

6 Test whether the switch is successful

(1) You can determine which NameNode handles the activation status by viewing http:master1:50070 and http:master2:50070 in the browser

(2) As shown in the figure above, if the status of master2 is active, kill the NameNode process in master2 by kill, and then check whether the NameNode in master1 is converted from standby state to active state. If yes, the switch is successful!

Note : When testing the HA switch of Hadoop, it was found that the switch failed. Check the log on the node that killed the NameNode:

vi hadoop-2.7.5/logs/hadoop-hadoop-zkfc-master2.log

Found an error: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 8020 via ssh: bash: fuser: command not found

Solution : The psmisc dependency package is missing, and it can be installed on each NameNode node server:

yum install psmisc

Reference: http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324455097&siteId=291194637