Hadoop framework: HDFS high availability environment configuration

Source code of this article: GitHub·click here || GitEE·click here

1. HDFS high availability

1. Basic description

In the case of a single point or a small number of node failures, the cluster can still provide services normally. The HDFS high-availability mechanism can eliminate the problem of single node failure by configuring Active/Standby two NameNodes nodes to achieve hot standby of the NameNode in the cluster. If a single node fails, the NameNode can be quickly switched to another node in this way.

2. Detailed mechanism

Hadoop framework: HDFS high availability environment configuration

  • High availability based on two NameNodes, relying on shared Edits files and Zookeeper cluster;
  • Each NameNode node is configured with a ZKfailover process, responsible for monitoring the status of the NameNode node;
  • The NameNode maintains a persistent session with the ZooKeeper cluster;
  • If the Active node fails and shuts down, ZooKeeper notifies the NameNode node in the Standby state;
  • After the ZKfailover process detects and confirms that the failed node cannot work;
  • ZKfailover notifies the NameNode in the Standby state to switch to the Active state to continue the service;

ZooKeeper is very important in the big data system. It coordinates the work of different components, maintains and transmits data. For example, the above-mentioned automatic failover under high availability depends on the ZooKeeper component.

Two, HDFS high availability

1. Overall configuration

Service list HDFS files YARN scheduling Single service shared documents Zk cluster
hop01 DataNode NodeManager NameNode JournalNode ZK-hop01
hop02 DataNode NodeManager ResourceManager JournalNode ZK-hop02
hop03 DataNode NodeManager SecondaryNameNode JournalNode ZK-hop03

2. Configure JournalNode

Create a directory

[root@hop01 opt]# mkdir hopHA

Copy Hadoop directory

cp -r /opt/hadoop2.7/ /opt/hopHA/

Placement core-site.xml

<configuration>
    <!-- NameNode集群模式 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>
    <!-- 指定hadoop运行时产生文件的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
       <value>/opt/hopHA/hadoop2.7/data/tmp</value>
    </property>
</configuration>

Configure hdfs-site.xml , add the following content

<!-- 分布式集群名称 -->
<property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
</property>

<!-- 集群中NameNode节点 -->
<property>
    <name>dfs.ha.namenodes.mycluster</name>
    <value>nn1,nn2</value>
</property>

<!-- NN1 RPC通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    <value>hop01:9000</value>
</property>

<!-- NN2 RPC通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>hop02:9000</value>
</property>

<!-- NN1 Http通信地址 -->
<property>
    <name>dfs.namenode.http-address.mycluster.nn1</name>
    <value>hop01:50070</value>
</property>

<!-- NN2 Http通信地址 -->
<property>
    <name>dfs.namenode.http-address.mycluster.nn2</name>
    <value>hop02:50070</value>
</property>

<!-- 指定NameNode元数据在JournalNode上的存放位置 -->
<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hop01:8485;hop02:8485;hop03:8485/mycluster</value>
</property>

<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
</property>

<!-- 使用隔离机制时需要ssh无秘钥登录-->
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
</property>

<!-- 声明journalnode服务器存储目录-->
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/opt/hopHA/hadoop2.7/data/jn</value>
</property>

<!-- 关闭权限检查-->
<property>
    <name>dfs.permissions.enable</name>
    <value>false</value>
</property>

<!-- 访问代理类失败自动切换实现方式-->
<property>
    <name>dfs.client.failover.proxy.provider.mycluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

Start the journalnode service in turn

[root@hop01 hadoop2.7]# pwd
/opt/hopHA/hadoop2.7
[root@hop01 hadoop2.7]# sbin/hadoop-daemon.sh start journalnode

Delete data under hopHA

[root@hop01 hadoop2.7]# rm -rf data/ logs/

NN1 format and start NameNode

[root@hop01 hadoop2.7]# pwd
/opt/hopHA/hadoop2.7
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode

NN2 synchronizes NN1 data

[root@hop02 hadoop2.7]# bin/hdfs namenode -bootstrapStandby

NN2 starts NameNode

[root@hop02 hadoop2.7]# sbin/hadoop-daemon.sh start namenode

View current status

Hadoop framework: HDFS high availability environment configuration

Start all DataNodes on NN1

[root@hop01 hadoop2.7]# sbin/hadoop-daemons.sh start datanode

NN1 switches to Active state

[root@hop01 hadoop2.7]# bin/hdfs haadmin -transitionToActive nn1
[root@hop01 hadoop2.7]# bin/hdfs haadmin -getServiceState nn1
active

Hadoop framework: HDFS high availability environment configuration

3. Failover configuration

Configure hdfs-site.xml , the new content is as follows, synchronize the cluster

<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>

Configure core-site.xml , the new content is as follows, synchronize the cluster

<property>
    <name>ha.zookeeper.quorum</name>
    <value>hop01:2181,hop02:2181,hop03:2181</value>
</property>

Close all HDFS services

[root@hop01 hadoop2.7]# sbin/stop-dfs.sh

Start the Zookeeper cluster

/opt/zookeeper3.4/bin/zkServer.sh start

hop01 initializes the HA state in Zookeeper

[root@hop01 hadoop2.7]# bin/hdfs zkfc -formatZK

hop01 starts the HDFS service

[root@hop01 hadoop2.7]# sbin/start-dfs.sh

NameNode node starts ZKFailover

Here, the service status of hop01 and hop02 started first is Active, and hop02 is started first.

[hadoop2.7]# sbin/hadoop-daemon.sh start zkfc

Hadoop framework: HDFS high availability environment configuration

End the NameNode process of hop02

kill -9 14422

Wait a moment to check the status of hop01

[root@hop01 hadoop2.7]# bin/hdfs haadmin -getServiceState nn1
active

Three, YARN high availability

1. Basic description

Hadoop framework: HDFS high availability environment configuration

The basic process and ideas are similar to the HDFS mechanism, relying on the Zookeeper cluster. When the Active node fails, the Standby node will switch to the Active state for continuous service.

2. Detailed configuration

The environment is also demonstrated based on hop01 and hop02.

Configure yarn-site.xml to synchronize services under the cluster

<configuration>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <!--启用HA机制-->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <!--声明Resourcemanager服务-->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>cluster-yarn01</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hop01</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hop02</value>
    </property>

    <!--Zookeeper集群的地址--> 
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hop01:2181,hop02:2181,hop03:2181</value>
    </property>

    <!--启用自动恢复机制--> 
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <!--指定状态存储Zookeeper集群--> 
    <property>
        <name>yarn.resourcemanager.store.class</name>     <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>

</configuration>

Restart the journalnode node

sbin/hadoop-daemon.sh start journalnode

Format and start the NN1 service

[root@hop01 hadoop2.7]# bin/hdfs namenode -format
[root@hop01 hadoop2.7]# sbin/hadoop-daemon.sh start namenode

Synchronize NN1 metadata on NN2

[root@hop02 hadoop2.7]# bin/hdfs namenode -bootstrapStandby

Start the DataNode under the cluster

[root@hop01 hadoop2.7]# sbin/hadoop-daemons.sh start datanode

NN1 is set to Active state

Start hop01 first, then start hop02.

[root@hop01 hadoop2.7]# sbin/hadoop-daemon.sh start zkfc

hop01 start yarn

[root@hop01 hadoop2.7]# sbin/start-yarn.sh

hop02 start ResourceManager

[root@hop02 hadoop2.7]# sbin/yarn-daemon.sh start resourcemanager

Check status

[root@hop01 hadoop2.7]# bin/yarn rmadmin -getServiceState rm1

Hadoop framework: HDFS high availability environment configuration

Fourth, the source code address

GitHub·地址
https://github.com/cicadasmile/big-data-parent
GitEE·地址
https://gitee.com/cicadasmile/big-data-parent

Recommended reading: finishing programming system

Serial number project name GitHub address GitEE address Recommended
01 Java describes design patterns, algorithms, and data structures GitHub·click here GitEE·Click here ☆☆☆☆☆
02 Java foundation, concurrency, object-oriented, web development GitHub·click here GitEE·Click here ☆☆☆☆
03 Detailed explanation of SpringCloud microservice basic component case GitHub·click here GitEE·Click here ☆☆☆
04 Comprehensive case of SpringCloud microservice architecture actual combat GitHub·click here GitEE·Click here ☆☆☆☆☆
05 Getting started with SpringBoot framework basic application to advanced GitHub·click here GitEE·Click here ☆☆☆☆
06 SpringBoot framework integrates and develops common middleware GitHub·click here GitEE·Click here ☆☆☆☆☆
07 Basic case of data management, distribution, architecture design GitHub·click here GitEE·Click here ☆☆☆☆☆
08 Big data series, storage, components, computing and other frameworks GitHub·click here GitEE·Click here ☆☆☆☆☆

Guess you like

Origin blog.51cto.com/14439672/2544554
Recommended