Hadoop high-availability architecture construction

Hadoop high-availability architecture construction

Prepare the environment

Prepare three virtual machines, install jdk, hadoop, zookeeper. And configure secret-free access to ensure that the Hadoop fully distributed and ZooKeeper fully distributed environments have been installed.

HDFS-HA cluster configuration

You can configure one virtual machine first, and then transfer the files to other virtual machines. You can also use tools to modify three virtual machines at the same time. The content of the following configuration files are all new content, which does not affect the previous hadoop fully distributed and ZooKeeper fully distributed. Distributed configuration

Configure the core-site.xml file

The value of hadoop.tmp.dir fills in the installation path of the actual corresponding hadoop file

<configuration>
<!-- 把两个 NameNode)的地址组装成一个集群 mycluster -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoopHA</value>
</property>
<!-- 指定 hadoop 运行时产生文件的存储目录 -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/root/software/hadoop/data/tmp</value>   
</property>
</configuration>

Configure hdfs-site.xml

<configuration>
<!-- 完全分布式集群名称 -->
<property>
    <name>dfs.nameservices</name>
    <value>hadoopHA</value>
</property>
<!-- 集群中 NameNode 节点都有哪些,这里是 nn1 和 nn2 -->
<property>
    <name>dfs.ha.namenodes.hadoopHA</name>
    <value>nn1,nn2</value>
</property>
<!-- nn1 的 RPC 通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.hadoopHA.nn1</name>
    <value>hadoop100:9000</value>
</property>
<!-- nn2 的 RPC 通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.hadoopHA.nn2</name>
    <value>hadoop102:9000</value>
</property>
<!-- nn1 的 http 通信地址 -->
<property>
    <name>dfs.namenode.http-address.hadoopHA.nn1</name>
    <value>hadoop100:50070</value>
</property>
<!-- nn2 的 http 通信地址 -->
<property>
    <name>dfs.namenode.http-address.hadoopHA.nn2</name>
    <value>hadoop102:50070</value>
</property>
<!-- 指定 NameNode 元数据在 JournalNode 上的存放位置 -->
<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop100:8485;hadoop101:8485;hadoop102:8485/hadoopHA</value>
</property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>
    sshfence
    shell(/bin/true)
    </value>
</property>
<!-- 使用隔离机制时需要 ssh 无秘钥登录-->
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
</property>
<!-- 声明 journalnode 服务器存储目录-->
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/root/software/hadoop/data/jn</value>
</property>
<!-- 关闭权限检查-->
<property>
    <name>dfs.permissions.enable</name>
    <value>false</value>
</property>
<!-- 访问代理类:client,mycluster,active 配置失败自动切换实现方
式-->
<property>
    <name>dfs.client.failover.proxy.provider.hadoopHA</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
</configuration>

Placement yarn-site.xml

<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<!--启用 resourcemanager ha-->
<property>
    <name>yarn.resourcemanager.ha.enabled</name>
</property>
<!--声明两台 resourcemanager 的地址-->
<property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>cluster-yarn1</value>
</property>
<property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>hadoop100</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>hadoop102</value>
</property>
<!--指定 zookeeper 集群的地址-->
<property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>hadoop100:2181,hadoop101:2181,hadoop102:2181</value>
</property>
<!--启用自动恢复-->
<property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
</property>
<!--指定 resourcemanager 的状态信息存储在 zookeeper 集群-->
<property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>

Start HDFS-HA cluster

1. Start the zookeeper cluster

All three virtual machines need to be started

zkServer.sh start

2. Start journalnode

All three virtual machines need to be started

Note: journalnode is the management edit.log file of qjournal distributed application (used for management), which relies on zk management, so put three node nodes on zk to start. Hadoop 2.x uses qjournal to manage the edit.log file.

hadoop-daemon.sh start journalnode

3. Format HDFS (only for the first startup execution)

hadoop100 running on the master node

Note: If you have run hdfs before, you need to delete the directory corresponding to hadoophadoop.tmp.dir. After formatting is complete, you need to distribute the new tmp folder to two other virtual machines

hdfs namenode -format

4. Format ZKFC (only the first time to start execution)

Note: zkfc is used to manage the process of switching between two namenodes. The same is dependent on zk implementation. When the active namenode status is abnormal, the zkfc on the namenode will send a status to zk, and the zkfc on the standby namenode will check the abnormal status, and send a command to the active namenode via ssh, kill -9 process number , Kill the process and reset itself to active, place active suspended animation and split-brain event, in case the ssh sending fails, you can also start a custom .sh script file to forcibly kill the active namenode process.

Run the following command on the hadoop100 master node

hdfs zkfc -formatZK

5. Start HDFS

Run the following command on the hadoop100 master node

start-dfs.sh

6. Test HDFS

View the service status of each virtual machine through the jps command in the system:
Insert picture description here

You can also check the running status of each service on the web page:
HDFS: http://hadoop100:50070 master node
http://hadoop102:50070 sub-node
Insert picture description here
Insert picture description here
Manually kill hadoop100 after the namenode node above:
Insert picture description here
Insert picture description here
when the main node is abnormal, it can automatically switch to sub-node Node, configuration is complete!

7. Start YARN

Run the following command on the hadoop100 master node

start-yarn.sh

Execute on hadoop102 sub-node

yarn-daemon.sh start resourcemanager

8. Test yarn-HA

Enter the standby machine hadoop102:8088 in the browser and it will automatically jump to the master node hadoop100:8088, which proves that yarn-HA is enabled. In the actual test, it is found that the redirection cannot be performed. It is suspected that the domain name on the local computer cannot be recognized.
Insert picture description here
Note that there may be multiple hosts files in the C drive of the computer . The search is to distinguish the paths.
After opening, add the corresponding IP address and host name in it, then you can jump to the page normally.
Insert picture description here

Insert picture description here
Insert picture description here

The high-availability configuration ends here, thank you!

Guess you like

Origin blog.csdn.net/giantleech/article/details/115086245