Off-line data systems HA (High Availability)

 

1 Hadoop mechanism of HA

Preface: HA mechanism is formally introduced hadoop2.0 from the beginning, the previous version is no mechanism for HA

1.1 HA of the operating mechanism

(1) hadoop-HA cluster operating mechanism introduced

The so-called HA, namely high availability (7 * 24 hours uninterrupted service)

High availability is the most critical single point of failure

hadoop-ha should strictly speaking be divided into various components --HDFS HA ​​mechanism of HA, YARN of HA

 

(2) HDFS Detailed mechanisms of HA

Single point of failure by the double namenode

Double namenode coordination of points:

    A, metadata management needs to change:

    Memory, keep a journal metadata

    Edits can only have a log, only namenode node Active state can do write operations

    Two namenode can read edits

    Share of edits in a shared storage management (qjournal and NFS achieve two mainstream)

    B, need a state management function module

    It implements a zkfailover, where each node resident in namenode

    Each zkfailover responsible for monitoring where their namenode node using the state identification zk

    When the need for state switching, switching by the responsible zkfailover

    The need to prevent the occurrence of the phenomenon of brain split switching

 

1.2 HDFS-HA diagram:

 

 

 

1.2 HA cluster installation and deployment

1.2.1 cluster nodes planning

Roles node cluster deployment planning (10 nodes):

server01 namenode KFC> start-dfs.sh

server02 namenode KFC

 

server03   resourcemanager    > start-yarn.sh

server04   resourcemanager

 

server05   datanode   nodemanager    

server06   datanode   nodemanager    

server07   datanode   nodemanager    

 

server08   journal node    zookeeper

server09   journal node    zookeeper

server10   journal node    zookeeper

 

集群部署节点角色的规划(3节点)

server01   namenode    resourcemanager  zkfc   nodemanager  datanode   zookeeper   journal node

server02   namenode    resourcemanager  zkfc   nodemanager  datanode   zookeeper   journal node

server05   datanode    nodemanager     zookeeper    journal node

 

1.2.2 环境准备

1、环境准备

a/linux系统准备

    ip地址配置

    hostname配置

    hosts映射配置

    防火墙关闭

init启动级别修改

sudoers加入hadoop用户

ssh免密登陆配置

 

b/java环境的配置

    上传jdk,解压,修改/etc/profile

 

 c/zookeeper集群的部署

 

1.2.3 配置文件

 

core-site.xml

                <configuration>

                    <!-- 指定hdfs的nameservice为ns1 -->

                    <property>

                        <name>fs.defaultFS</name>

                        <value>hdfs://ns1/</value>

                    </property>

                    <!-- 指定hadoop临时目录 -->

                    <property>

                        <name>hadoop.tmp.dir</name>

                        <value>/home/hadoop/app/hadoop-2.4.1/tmp</value>

                    </property>

                   

                    <!-- 指定zookeeper地址 -->

                    <property>

                        <name>ha.zookeeper.quorum</name>

                        <value>weekend05:2181,weekend06:2181,weekend07:2181</value>

                    </property>

                </configuration>

 

 

hdfs-site.xml

configuration>

<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->

<property>

    <name>dfs.nameservices</name>

    <value>ns1</value>

</property>

<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->

<property>

    <name>dfs.ha.namenodes.ns1</name>

    <value>nn1,nn2</value>

</property>

<!-- nn1的RPC通信地址 -->

<property>

    <name>dfs.namenode.rpc-address.ns1.nn1</name>

    <value>weekend01:9000</value>

</property>

<!-- nn1的http通信地址 -->

<property>

    <name>dfs.namenode.http-address.ns1.nn1</name>

    <value>weekend01:50070</value>

</property>

<!-- nn2的RPC通信地址 -->

<property>

    <name>dfs.namenode.rpc-address.ns1.nn2</name>

    <value>weekend02:9000</value>

</property>

<!-- nn2的http通信地址 -->

<property>

    <name>dfs.namenode.http-address.ns1.nn2</name>

    <value>weekend02:50070</value>

</property>

<!-- 指定NameNode的edits元数据在JournalNode上的存放位置 -->

<property>

    <name>dfs.namenode.shared.edits.dir</name>

    <value>qjournal://weekend05:8485;weekend06:8485;weekend07:8485/ns1</value>

</property>

<!-- 指定JournalNode在本地磁盘存放数据的位置 -->

<property>

    <name>dfs.journalnode.edits.dir</name>

    <value>/home/hadoop/app/hadoop-2.4.1/journaldata</value>

</property>

<!-- 开启NameNode失败自动切换 -->

<property>

    <name>dfs.ha.automatic-failover.enabled</name>

    <value>true</value>

</property>

<!-- 配置失败自动切换实现方式 -->

<property>

    <name>dfs.client.failover.proxy.provider.ns1</name>

    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->

<property>

    <name>dfs.ha.fencing.methods</name>

    <value>

        sshfence

        shell(/bin/true)

    </value>

</property>

<!-- 使用sshfence隔离机制时需要ssh免登陆 -->

<property>

    <name>dfs.ha.fencing.ssh.private-key-files</name>

    <value>/home/hadoop/.ssh/id_rsa</value>

</property>

<!-- 配置sshfence隔离机制超时时间 -->

<property>

    <name>dfs.ha.fencing.ssh.connect-timeout</name>

    <value>30000</value>

</property>

/configuration>

 

 

1.2.4 集群运维测试

1、Datanode动态上下线

Datanode动态上下线很简单,步骤如下:

  1. 准备一台服务器,设置好环境
  2. 部署hadoop的安装包,并同步集群配置
  3. 联网上线,新datanode会自动加入集群
  4. 如果是一次增加大批datanode,还应该做集群负载重均衡

 

2、Namenode状态切换管理

使用的命令上hdfs  haadmin

可用 hdfs  haadmin –help查看所有帮助信息

 

可以看到,状态操作的命令示例:

查看namenode工作状态  

hdfs haadmin -getServiceState nn1

 

将standby状态namenode切换到active

hdfs haadmin –transitionToActive nn1

 

将active状态namenode切换到standby

hdfs haadmin –transitionToStandby nn2

 

3、数据块的balance

启动balancer的命令:

start-balancer.sh -threshold 8

运行之后,会有Balancer进程出现:

http://img0.tuicool.com/bIvyYj.jpg!web

上述命令设置了Threshold为8%,那么执行balancer命令的时候,首先统计所有DataNode的磁盘利用率的均值,然后判断如果某一个DataNode的磁盘利用率超过这个均值Threshold,那么将会把这个DataNode的block转移到磁盘利用率低的DataNode,这对于新节点的加入来说十分有用。Threshold的值为1到100之间,不显示的进行参数设置的话,默认是10。

 

1.2.5 HA下hdfs-api变化

客户端需要nameservice的配置信息,其他不变

/**

 * 如果访问的是一个ha机制的集群

 * 则一定要把core-site.xml和hdfs-site.xml配置文件放在客户端程序的classpath下

 * 以让客户端能够理解hdfs://ns1/中  “ns1”是一个ha机制中的namenode对——nameservice

 * 以及知道ns1下具体的namenode通信地址

 * @author

 *

 */

public class UploadFile {

        

         public static void main(String[] args) throws Exception  {

                  

                   Configuration conf = new Configuration();

                   conf.set("fs.defaultFS", "hdfs://ns1/");

                  

                   FileSystem fs = FileSystem.get(new URI("hdfs://ns1/"),conf,"hadoop");

                  

                   fs.copyFromLocalFile(new Path("g:/eclipse-jee-luna-SR1-linux-gtk.tar.gz"), new Path("hdfs://ns1/"));

                  

                   fs.close();

                  

         }

        

 

}

 

Federation下 mr程序运行的staging提交目录问题

<property>

  <name>yarn.app.mapreduce.am.staging-dir</name>

  <value>/bi/tmp/hadoop-yarn/staging</value>

  <description>The staging dir used while submitting jobs.

  </description>

</property>

Guess you like

Origin blog.csdn.net/lixinkuan328/article/details/94844654