hadoop集群配置HA

配置HA,基于Zookeeper配置完成

配置NameNode

一 准备工作

1.把所有的服务给停掉

2.清理/tmp下所有的内容

3.删除hadoop中的data和logs

4.一定要保证各节点之间可以使用ssh无密登录

二 手动故障转移

5.配置core-site.xml

	<!--指定HDFS中NameNode的地址 -->
		<property>
			<name>fs.defaultFS</name>
			<value>hdfs://mycluster</value>
	</property>
	
	<!--自定义标签-其它地方要引用 -->
	  <property>
		<name>hadoop.data.dir</name>
		<value>/opt/ha/hadoop-3.1.3/data</value>
	  </property>


	<!-- 指定Hadoop运行时产生文件的存储目录 -->
		<property>
			<name>hadoop.tmp.dir</name>
			<value>/opt/ha/hadoop-3.1.3/data</value>
	 </property>

6.配置hdfs-site.xml

<!-- 添加如下内容-->	
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file://${hadoop.data.dir}/name</value>
	  </property>
	  <property>
		<name>dfs.datanode.data.dir</name>
		<value>file://${hadoop.data.dir}/data</value>
	  </property>
	  <property>
		<name>dfs.nameservices</name>
		<value>mycluster</value>
	  </property>
	  <property>
		<name>dfs.ha.namenodes.mycluster</name>
		<value>nn1,nn2,nn3</value>
	  </property>
	  <property>
		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
		<value>hadoop102:9820</value>
	  </property>
	  <property>
		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
		<value>hadoop103:9820</value>
	  </property>
	  <property>
		<name>dfs.namenode.rpc-address.mycluster.nn3</name>
		<value>hadoop104:9820</value>
	  </property>
	  <property>
		<name>dfs.namenode.http-address.mycluster.nn1</name>
		<value>hadoop102:9870</value>
	  </property>
	  <property>
		<name>dfs.namenode.http-address.mycluster.nn2</name>
		<value>hadoop103:9870</value>
	  </property>
	  <property>
		<name>dfs.namenode.http-address.mycluster.nn3</name>
		<value>hadoop104:9870</value>
	  </property>
	  <property>
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
	  </property>
	<!--  访问代理类,client用于确定哪个NN为Active -->
	  <property>
		<name>dfs.client.failover.proxy.provider.mycluster</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	  </property>
	  <!-- 配制隔离机制,即同一时刻只能有一台服务器对外响应 -->
	  <property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	  </property>
	  <!-- 使用隔离机制时需要ssh无密登录 -->
	  <property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/home/atguigu/.ssh/id_rsa</value>
	  </property>
	<!--  指定NN的元数据在JournalNode的哪个位置存放 -->
	  <property>
		<name>dfs.journalnode.edits.dir</name>
		<value>${hadoop.data.dir}/jn</value>
	  </property>

7.分发到各节点(①其它节点需要删除/tmp/*)

8.各节点(hadoop102,hadoop103,hadoop104)起动journalnode:

	hdfs --daemon start journalnode

9.在任意一台节点上格式化namenode并启动该服务

	hdfs namenode -format
	hdfs --daemon start namenode

10.在没有格式化的节点上同步元数据信息

	hdfs namenode -bootstrapStandby

11.在其它节点上启动namenode和DataNode

hdfs --daemon start namenode
hdfs --daemon start datanode

12.手动故障转移(将nn1变成active状态) :

hdfs haadmin -transitionToActive nn1

二 自动故障转移

1.配置hdfs-site.xml的文件

	<!-- 开启自动故障转移 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>

2.core-site.xml中添加

	<!-- 配置zookeeper的地址 -->
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
	</property>

3、启动
(1)关闭所有HDFS服务:

	stop-dfs.sh

(2)启动Zookeeper集群:

	zkServer.sh start

(3)初始化HA在Zookeeper中状态:

	hdfs zkfc -formatZK

(4)启动HDFS服务:

	start-dfs.sh

配置Yarn

注意:将原来的ResourceManager的地址给删掉

1.Yarn-site.xml中配置

    <!--启用resourcemanager ha-->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
 
    <!--声明两台resourcemanager的地址-->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>cluster-yarn1</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2,rm3</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop102</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop103</value>
    </property>
	
	 <property>
        <name>yarn.resourcemanager.hostname.rm3</name>
        <value>hadoop104</value>
    </property>
 
    <!--指定zookeeper集群的地址--> 
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
    </property>

    <!--启用自动恢复--> 
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>
 
    <!--指定resourcemanager的状态信息存储在zookeeper集群--> 
    <property>
        <name>yarn.resourcemanager.store.class</name>     
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
	</property>

2.启动

(1)在hadoop102中执行:

	start-yarn.sh

(2)查看服务状态

	yarn rmadmin -getServiceState rm1

猜你喜欢

转载自blog.csdn.net/qq_38705144/article/details/111697782