Configure HA, complete based on Zookeeper configuration
Placement NameNode
A preparation
1.把所有的服务给停掉
2.清理/tmp下所有的内容
3.删除hadoop中的data和logs
4.一定要保证各节点之间可以使用ssh无密登录
Two manual failover
5. Placement core-site.xml
<!--指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!--自定义标签-其它地方要引用 -->
<property>
<name>hadoop.data.dir</name>
<value>/opt/ha/hadoop-3.1.3/data</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/ha/hadoop-3.1.3/data</value>
</property>
6. Configure hdfs-site.xml
<!-- 添加如下内容-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${hadoop.data.dir}/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file://${hadoop.data.dir}/data</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2,nn3</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop102:9820</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop103:9820</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn3</name>
<value>hadoop104:9820</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop102:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop103:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn3</name>
<value>hadoop104:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
</property>
<!-- 访问代理类,client用于确定哪个NN为Active -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配制隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh无密登录 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/atguigu/.ssh/id_rsa</value>
</property>
<!-- 指定NN的元数据在JournalNode的哪个位置存放 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>${hadoop.data.dir}/jn</value>
</property>
7. Distribute to each node (①Other nodes need to delete /tmp/*)
8. Each node (hadoop102, hadoop103, hadoop104) starts the journalnode:
hdfs --daemon start journalnode
9. Format the namenode on any node and start the service
hdfs namenode -format
hdfs --daemon start namenode
10. Sync metadata information on unformatted nodes
hdfs namenode -bootstrapStandby
11. Start namenode and DataNode on other nodes
hdfs --daemon start namenode
hdfs --daemon start datanode
12. Manual failover (turn nn1 into active state):
hdfs haadmin -transitionToActive nn1
Two automatic failover
1. Configure hdfs-site.xml file
<!-- 开启自动故障转移 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
2. Add in core-site.xml
<!-- 配置zookeeper的地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
</property>
3. Start
(1) Shut down all HDFS services:
stop-dfs.sh
(2) Start the Zookeeper cluster:
zkServer.sh start
(3) Initialize the state of HA in Zookeeper:
hdfs zkfc -formatZK
(4) Start the HDFS service:
start-dfs.sh
Configure Yarn
Note: delete the original ResourceManager address
1. Configure in Yarn-site.xml
<!--启用resourcemanager ha-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--声明两台resourcemanager的地址-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster-yarn1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2,rm3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop102</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop103</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm3</name>
<value>hadoop104</value>
</property>
<!--指定zookeeper集群的地址-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
</property>
<!--启用自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--指定resourcemanager的状态信息存储在zookeeper集群-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
2. Start
(1) Execute in hadoop102:
start-yarn.sh
(2) View service status
yarn rmadmin -getServiceState rm1