Hadoop cluster configuration HA

Configure HA, complete based on Zookeeper configuration

Placement NameNode

A preparation

1.把所有的服务给停掉

2.清理/tmp下所有的内容

3.删除hadoop中的data和logs

4.一定要保证各节点之间可以使用ssh无密登录

Two manual failover

5. Placement core-site.xml

	<!--指定HDFS中NameNode的地址 -->
		<property>
			<name>fs.defaultFS</name>
			<value>hdfs://mycluster</value>
	</property>
	
	<!--自定义标签-其它地方要引用 -->
	  <property>
		<name>hadoop.data.dir</name>
		<value>/opt/ha/hadoop-3.1.3/data</value>
	  </property>


	<!-- 指定Hadoop运行时产生文件的存储目录 -->
		<property>
			<name>hadoop.tmp.dir</name>
			<value>/opt/ha/hadoop-3.1.3/data</value>
	 </property>

6. Configure hdfs-site.xml

<!-- 添加如下内容-->	
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file://${hadoop.data.dir}/name</value>
	  </property>
	  <property>
		<name>dfs.datanode.data.dir</name>
		<value>file://${hadoop.data.dir}/data</value>
	  </property>
	  <property>
		<name>dfs.nameservices</name>
		<value>mycluster</value>
	  </property>
	  <property>
		<name>dfs.ha.namenodes.mycluster</name>
		<value>nn1,nn2,nn3</value>
	  </property>
	  <property>
		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
		<value>hadoop102:9820</value>
	  </property>
	  <property>
		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
		<value>hadoop103:9820</value>
	  </property>
	  <property>
		<name>dfs.namenode.rpc-address.mycluster.nn3</name>
		<value>hadoop104:9820</value>
	  </property>
	  <property>
		<name>dfs.namenode.http-address.mycluster.nn1</name>
		<value>hadoop102:9870</value>
	  </property>
	  <property>
		<name>dfs.namenode.http-address.mycluster.nn2</name>
		<value>hadoop103:9870</value>
	  </property>
	  <property>
		<name>dfs.namenode.http-address.mycluster.nn3</name>
		<value>hadoop104:9870</value>
	  </property>
	  <property>
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
	  </property>
	<!--  访问代理类,client用于确定哪个NN为Active -->
	  <property>
		<name>dfs.client.failover.proxy.provider.mycluster</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	  </property>
	  <!-- 配制隔离机制,即同一时刻只能有一台服务器对外响应 -->
	  <property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	  </property>
	  <!-- 使用隔离机制时需要ssh无密登录 -->
	  <property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/home/atguigu/.ssh/id_rsa</value>
	  </property>
	<!--  指定NN的元数据在JournalNode的哪个位置存放 -->
	  <property>
		<name>dfs.journalnode.edits.dir</name>
		<value>${hadoop.data.dir}/jn</value>
	  </property>

7. Distribute to each node (①Other nodes need to delete /tmp/*)

8. Each node (hadoop102, hadoop103, hadoop104) starts the journalnode:

	hdfs --daemon start journalnode

9. Format the namenode on any node and start the service

	hdfs namenode -format
	hdfs --daemon start namenode

10. Sync metadata information on unformatted nodes

	hdfs namenode -bootstrapStandby

11. Start namenode and DataNode on other nodes

hdfs --daemon start namenode
hdfs --daemon start datanode

12. Manual failover (turn nn1 into active state):

hdfs haadmin -transitionToActive nn1

Two automatic failover

1. Configure hdfs-site.xml file

	<!-- 开启自动故障转移 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>

2. Add in core-site.xml

	<!-- 配置zookeeper的地址 -->
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
	</property>

3. Start
(1) Shut down all HDFS services:

	stop-dfs.sh

(2) Start the Zookeeper cluster:

	zkServer.sh start

(3) Initialize the state of HA in Zookeeper:

	hdfs zkfc -formatZK

(4) Start the HDFS service:

	start-dfs.sh

Configure Yarn

Note: delete the original ResourceManager address

1. Configure in Yarn-site.xml

    <!--启用resourcemanager ha-->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
 
    <!--声明两台resourcemanager的地址-->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>cluster-yarn1</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2,rm3</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop102</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop103</value>
    </property>
	
	 <property>
        <name>yarn.resourcemanager.hostname.rm3</name>
        <value>hadoop104</value>
    </property>
 
    <!--指定zookeeper集群的地址--> 
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
    </property>

    <!--启用自动恢复--> 
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>
 
    <!--指定resourcemanager的状态信息存储在zookeeper集群--> 
    <property>
        <name>yarn.resourcemanager.store.class</name>     
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
	</property>

2. Start

(1) Execute in hadoop102:

	start-yarn.sh

(2) View service status

	yarn rmadmin -getServiceState rm1

Guess you like

Origin blog.csdn.net/qq_38705144/article/details/111697782