How to Build a Hadoop High Availability Cluster

1. Cluster configuration diagram

        Before building a cluster, we need to consider the configuration of each machine in the cluster. Here we take four machines as an example, the configuration diagram is as follows:

集群配置图
ant151         ant152         ant153          ant154
NameNode       NameNode
DataNode       DataNode       DataNode        DataNode
NodeManager    NodeManager    NodeManager     NodeManager
                              ResourceManager ResourceManager
JournalNode    JournalNode    JournalNode
DFSZKFConler   DFSZKFConler
zk0				zk1 			zk2

         ant151, ant152, ant153, ant154 are four host names.

        ant151 and ant152 act as the primary NameNode and secondary NameNode respectively. When one of the NameNodes hangs up, the primary and backup switches can be performed.

        Each of DataNode and NodeManager is used as a node to perform active/standby switchover.

        I set ResourceManager to ant153 and ant154 as master and backup.

        JournalNode I set to ant151, ant152, ant153. JournalNode is equivalent to the NameNode daemon process, there are at least 3 JournalNodes for 100 nodes, and at least 5 for more than 100 nodes. For details, please refer to  the role of journalnode .

        When DFSZKFConler is highly available, it is responsible for monitoring the status of NN (NameNode) and writing the status information to ZK in a timely manner. It obtains the health status of NN by periodically calling a specific interface on NN through an independent thread. FC also has the right to choose who will be the Active NN, because there are only two nodes at most, and the current selection strategy is relatively simple (first come, first served, rotation)

        zk0, zk1, zk2 are server IDs in the zookeeper cluster.

Second, set up each host

        First create a virtual machine and configure the network.        

        Then set each host name, and then use the command bash to refresh and take effect.

hostnamectl set-hostname ant151

        Then, turn off the firewall on each host

systemctl stop firewalld
systemctl disable firewalld

        Then synchronize the time of each host. First install the synchronization time service.

# 同步时间
	[root@xsqone31 ~]# yum install -y ntpdate
	[root@xsqone31 ~]# ntpdate time.windows.com
	[root@xsqone31 ~]# date

        After installing the service use the command

crontab -e

        Enter the command, then save and exit.

* */5 * * * /usr/sbin/ntpdate -u time.windows.com

        Then reload and start the timing service

	# 重新加载
	[root@xsqone31 ~]# service crond reload
	# 启动定时任务
	[root@xsqone31 ~]# service crond start

        Then set password-free login, first obtain the local public key

# 配置免密登录
ssh-keygen -t rsa -P ""

        After obtaining it, pass the public key to other hosts

# 将本地公钥拷贝到要免密登录的目标机器
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant151
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant152
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant153
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant154
# 测试
ssh -p22 主机名

        Finally, the active and standby switching installation services

yum install psmisc -y

3. Install JDK

        Install JDK for each machine and configure the files

Fourth, build a zookeeper cluster

        Scripts can be used here. Here I will store the compressed package in the /opt/install folder, extract it to the /opt/software folder, and rename it to zk345. You can modify the installation path according to your needs. The specific script is as follows:

#! /bin/bash
zk=true
hostname=`hostname`
if [ "$zk" = true ];then
	echo 'ZK安装开始'
	tar -zxf /opt/install/zookeeper-3.4.5-cdh5.14.2.tar.gz -C /opt/software/
	mv /opt/software/zookeeper-3.4.5-cdh5.14.2 /opt/software/zk345
	cp /opt/software/zk345/conf/zoo_sample.cfg /opt/software/zk345/conf/zoo.cfg
	mkdir -p /opt/software/zk345/datas
	sed -i '12c\dataDir=/opt/software/zk345/datas' /opt/software/zk345/conf/zoo.cfg
	echo 'server.0='$hostname':2287:3387' >> /opt/software/zk345/conf/zoo.cfg
	echo "0" > /opt/software/zk345/datas/myid
	sed -i '73a\export PATH=$PATH:$ZOOKEEPER_HOME/bin' /etc/profile
	sed -i '73a\export ZOOKEEPER_HOME=/opt/software/zk345/' /etc/profile
	sed -i '73a\#ZK' /etc/profile
	source /etc/profile
	echo 'ZK安装完成'
fi

        After the installation is complete, you need to configure the zoo.cfg file in the zk345/datas directory

vim /opt/soft/zk345/datas/zoo.cfg
# zookeeper集群的配置
	server.0=ant151:2287:3387
	server.1=ant152:2287:3387
	server.2=ant153:2287:3387
	server.3=ant154:2287:3387

        Afterwards, transfer the configuration file and zk345 to other hosts, and transfer the ant152 code as follows, and others are omitted:

# 将安装目录传输给其他主机
scp -r /opt/software/zk345 root@ant152:/opt/soft/
# 将配置文件传输给其他主机
scp -r /etc/profile root@1522:/etc/

        Then modify /opt/soft/zk345/datas/myid according to the configuration in the zookeeper configuration file. As above, ant151 is 0, ant152 is 1, ant153 is 2, and ant154 is 3.

        In order to facilitate the startup of the cluster, we write a cluster operation script.

#! /bin/bash
case $1 in 
"start"){
	for i in ant151 ant152 ant153
	do
	ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh start"
	done
};;
"stop"){
	for i in ant151 ant152 ant153
	do
	ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh stop"
	done
};;
"status"){
	for i in ant151 ant152 ant153
	do
	ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh status"
	done
};;
esac

Fourth, build a Hadoop cluster

        First install hadoop

  tar -zxf /opt/install/hadoop-3.1.3.tar.gz -C /opt/software/
  mv /opt/software/hadoop-3.1.3 /opt/software/hadoop313
  chown -R root:root /opt/software/hadoop313/

      Come to the hadoop313/etc/hadoop directory and configure the configuration file

        Configuration of core-site.xml file

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://gky</value>
		<description>逻辑名称,必须与hdfs-site.xml中的dfs.nameservices值保持一致</description>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/opt/software/hadoop313/tmpdata</value>
		<description>namenode上本地的hadoop临时文件夹</description>
	</property>
	<property>
		<name>hadoop.http.staticuser.user</name>
		<value>root</value>
		<description>默认用户</description>
	</property>
	<property>
		<name>hadoop.proxyuser.root.hosts</name>
		<value>*</value>
		<description></description>
	</property>
	<property>
		<name>hadoop.proxyuser.root.groups</name>
		<value>*</value>
		<description></description>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
		<description>读写文件的buffer大小为:128K</description>
	</property>
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>ant151:2181,ant152:2181,ant153:2181</value>
		<description></description>
	</property>
	<property>
		<name>ha.zookeeper.session-timeout.ms</name>
		<value>10000</value>
		<description>hadoop链接zookeeper的超时时长设置为10s</description>
	</property>
</configuration>

        Configuration of hadoop-env.sh file

export JAVA_HOME=/opt/software/jdk180
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

        Configuration of the hdfs-site.xml file

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
		<description>Hadoop中每一个block的备份数</description>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/opt/software/hadoop313/data/dfs/name</value>
		<description>namenode上存储hdfs名字空间元数据目录</description>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/opt/software/hadoop313/data/dfs/data</value>
		<description>datanode上数据块的物理存储位置</description>
	</property>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>ant151:9869</value>
		<description></description>
	</property>
	<property>
		<name>dfs.nameservices</name>
		<value>gky</value>
		<description>指定hdfs的nameservice,需要和core-site.xml中保持一致</description>
	</property>
	<property>
		<name>dfs.ha.namenodes.gky</name>
		<value>nn1,nn2</value>
		<description>gky为集群的逻辑名称,映射两个namenode逻辑名</description>
	</property>
	<property>
		<name>dfs.namenode.rpc-address.gky.nn1</name>
		<value>ant151:9000</value>
		<description>namenode1的RPC通信地址</description>
	</property>
	<property>
		<name>dfs.namenode.http-address.gky.nn1</name>
		<value>ant151:9870</value>
		<description>namenode1的http通信地址</description>
	</property>
	
	<property>
		<name>dfs.namenode.rpc-address.gky.nn2</name>
		<value>ant152:9000</value>
		<description>namenode2的RPC通信地址</description>
	</property>
	<property>
		<name>dfs.namenode.http-address.gky.nn2</name>
		<value>ant152:9870</value>
		<description>namenode2的http通信地址</description>
	</property>
	<property>
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://ant151:8485;ant152:8485;ant153:8485/gky</value>
		<description>指定NameNode的edits元数据的共享存储位置(JournalNode列表)</description>
	</property>
	<property>
		<name>dfs.journalnode.edits.dir</name>
		<value>/opt/software/hadoop313/data/journaldata</value>
		<description>指定JournalNode在本地磁盘存放数据的位置</description>
	</property>	
	<!-- 容错 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
		<description>开启NameNode故障自动切换</description>
	</property>
	<property>
		<name>dfs.client.failover.proxy.provider.gky</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
		<description>失败后自动切换的实现方式</description>
	</property>
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
		<description>防止脑裂的处理</description>
	</property>
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/root/.ssh/id_rsa</value>
		<description>使用sshfence隔离机制时,需要ssh免密登陆</description>
	</property>	
	<property>
		<name>dfs.permissions.enabled</name>
		<value>false</value>
		<description>关闭HDFS操作权限验证</description>
	</property>
	<property>
		<name>dfs.image.transfer.bandwidthPerSec</name>
		<value>1048576</value>
		<description></description>
	</property>	
	<property>
		<name>dfs.block.scanner.volume.bytes.per.second</name>
		<value>1048576</value>
		<description></description>
	</property>
</configuration>

        Configuration of mapred-site.xml file

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
		<description>job执行框架: local, classic or yarn</description>
		<final>true</final>
	</property>
	<property>
		<name>mapreduce.application.classpath</name>
		<value>/opt/software/hadoop313/etc/hadoop:/opt/software/hadoop313/share/hadoop/common/lib/*:/opt/software/hadoop313/share/hadoop/common/*:/opt/software/hadoop313/share/hadoop/hdfs/*:/opt/software/hadoop313/share/hadoop/hdfs/lib/*:/opt/software/hadoop313/share/hadoop/mapreduce/*:/opt/software/hadoop313/share/hadoop/mapreduce/lib/*:/opt/software/hadoop313/share/hadoop/yarn/*:/opt/software/hadoop313/share/hadoop/yarn/lib/*</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>ant151:10020</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>ant151:19888</value>
	</property>
	
	<property>
		<name>mapreduce.map.memory.mb</name>
		<value>1024</value>
		<description>map阶段的task工作内存</description>
	</property>
	<property>
		<name>mapreduce.reduce.memory.mb</name>
		<value>2048</value>
		<description>reduce阶段的task工作内存</description>
	</property>
	
</configuration>

        Configuration of the yarn-site.xml file

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
	<property>
		<name>yarn.resourcemanager.ha.enabled</name>
		<value>true</value>
		<description>开启resourcemanager高可用</description>
	</property>
	<property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>yrcabc</value>
		<description>指定yarn集群中的id</description>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
		<description>指定resourcemanager的名字</description>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname.rm1</name>
		<value>ant153</value>
		<description>设置rm1的名字</description>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname.rm2</name>
		<value>ant154</value>
		<description>设置rm2的名字</description>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>ant153:8088</value>
		<description></description>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>ant154:8088</value>
		<description></description>
	</property>	
	<property>
		<name>yarn.resourcemanager.zk-address</name>
		<value>ant151:2181,ant152:2181,ant153:2181</value>
		<description>指定zk集群地址</description>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
		<description>运行mapreduce程序必须配置的附属服务</description>
	</property>
	<property>
		<name>yarn.nodemanager.local-dirs</name>
		<value>/opt/software/hadoop313/tmpdata/yarn/local</value>
		<description>nodemanager本地存储目录</description>
	</property>
	<property>
		<name>yarn.nodemanager.log-dirs</name>
		<value>/opt/software/hadoop313/tmpdata/yarn/log</value>
		<description>nodemanager本地日志目录</description>
	</property>
	
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>2048</value>
		<description>resource进程的工作内存</description>
	</property>
	<property>
		<name>yarn.nodemanager.resource.cpu-vcores</name>
		<value>2</value>
		<description>resource工作中所能使用机器的内核数</description>
	</property>
	<property>
		<name>yarn.scheduler.minimum-allocation-mb</name>
		<value>256</value>
		<description></description>
	</property>
	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
		<description></description>
	</property>
	<property>
		<name>yarn.log-aggregation.retain-seconds</name>
		<value>86400</value>
		<description>日志保留多少秒</description>
	</property>
	<property>
		<name>yarn.nodemanager.vmem-check-enabled</name>
		<value>false</value>
		<description></description>
	</property>
	<property>
		<name>yarn.application.classpath</name>
		<value>/opt/software/hadoop313/etc/hadoop:/opt/software/hadoop313/share/hadoop/common/lib/*:/opt/software/hadoop313/share/hadoop/common/*:/opt/software/hadoop313/share/hadoop/hdfs/*:/opt/software/hadoop313/share/hadoop/hdfs/lib/*:/opt/software/hadoop313/share/hadoop/mapreduce/*:/opt/software/hadoop313/share/hadoop/mapreduce/lib/*:/opt/software/hadoop313/share/hadoop/yarn/*:/opt/software/hadoop313/share/hadoop/yarn/lib/*</value>
		<description></description>
	</property>
	<property>
		<name>yarn.nodemanager.env-whitelist</name>
		<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
		<description></description>
	</property>
</configuration>

        Configuration of the workers file

ant151
ant152
ant153
ant154

        Configuration of system environment variables (/etc/profile)

#hadoop
export JAVA_LIBRARY_PATH=/opt/software/hadoop313/lib/native
export HADOOP_HOME=/opt/software/hadoop313
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib

        Finally, transfer configuration files and hadoop to other hosts

# 将安装目录传输给其他主机
scp -r /opt/software/hadoop313 root@ant152:/opt/software/
# 将配置文件传输给其他主机
scp -r /etc/profile root@ant1522:/etc/

5. Start the cluster for the first time

The first start of the cluster and related instructions
1. Start the zk cluster
2. Start the journalnode service of ant151, ant152, ant153: hdfs --daemon start journalnode
3. Format the hfds namenode in ant151: hdfs namenode -format
4. Start the namenode service in ant151: hdfs --daemon start namenode
5. Synchronize namenode information on ant152 machine [root@ant152 soft]# hdfs namenode -bootstrapStandby
6. Start namenode service on ant152: hdfs --daemon start namenode
   View namenode node status: hdfs haadmin -getServiceState nn1 |nn2
7. Close all dfs-related services [root@ant151 soft]# stop-dfs.sh
8. Format zk:[root@ant151 soft]# hdfs zkfc -formatZK
9. Start dfs: [root@ant151 soft] # start-dfs.sh 
10. Start yarn: [root@ant151 soft]# start-yarn.sh 
   View resourcemanager node status: yarn rmadmin -getServiceState rm1|rm2

Guess you like

Origin blog.csdn.net/Alcaibur/article/details/129061281