大数据之路-Hadoop-3-hadoop HA集群搭建

版权声明:本文为博主原创文章,转载请注明出处:http://blog.csdn.net/liudongdong0909。 https://blog.csdn.net/liudongdong0909/article/details/79101677

这里写图片描述

一、前期需要准备的环境

1.1 Mac中准备7台Centos 6.5 虚拟机

具体安装过程参见:大数据之路-MAC中VMwareFusion安装CentOS6.8

Hadoop HA 集群中的机器规划如下:

主机名 IP 安装的软件 运行的进程 端口号
mini1 172.16.29.141 jdk、hadoop NameNode、DFSZKFailoverController(zkfc) 50070
mini2 172.16.29.142 jdk、hadoop NameNode、DFSZKFailoverController(zkfc) 50070
mini3 172.16.29.143 jdk、hadoop ResourceManager 8088
mini4 172.16.29.144 jdk、hadoop ResourceManager 8088
mini5 172.16.29.145 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain 2181、2888、3888
mini6 172.16.29.146 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain 2181、2888、3888
mini7 172.16.29.147 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain 2181、2888、3888

1.2 配置 Linux 主机名 - 7台机子都配置

[root@localhost hadoop]# vim /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=mini1  ## hostname 每台机子分别一次配置,如: mini2 配置: hostname=mini2          

1.3 配置IP - 7台机子都分别配置

[root@localhost ~]$ vim /etc/udev/rules.d/70-persistent-net.rules

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:50:56:30:03:5
8", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
~                                                  

[root@localhost ~]$ vim /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
HWADDR=00:50:56:30:03:58
TYPE=Ethernet
UUID=9186432d-9f75-45f1-894a-f74d8cd684de
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=dhcp
~              

1.4 配置主机名和IP的映射关系

[root@localhost hadoop]# vim /etc/hosts

172.16.29.141  mini1
172.16.29.142  mini2
172.16.29.143  mini3
172.16.29.144  mini4
172.16.29.145  mini5
172.16.29.146  mini6
172.16.29.147  mini7

7台机子都配置

[root@mini1 hadoop]# scp -r /etc/hosts mini2:/etc/
[root@mini1 hadoop]# scp -r /etc/hosts mini3:/etc/
[root@mini1 hadoop]# scp -r /etc/hosts mini4:/etc/
[root@mini1 hadoop]# scp -r /etc/hosts mini5:/etc/
[root@mini1 hadoop]# scp -r /etc/hosts mini6:/etc/
[root@mini1 hadoop]# scp -r /etc/hosts mini7:/etc/

1.5 配置 SSH 免密登录

[hadoop@mini1 ~]$ ssh-keygen

运行上面的命令以后,系统会出现一系列提示,可以一路回车。其中有一个问题是,要不要对私钥设置口令(passphrase),如果担心私钥的安全,这里可以设置一个。

运行结束以后,在$HOME/.ssh/目录下,会新生成两个文件:id_rsa.pub和id_rsa。前者是你的公钥,后者是你的私钥。

这时再输入下面的命令,将公钥传送到远程主机host上面:

[hadoop@mini1 ~]$ ssh-copy-id mini2

余下的几台机子依次分别执行


[hadoop@mini1 ~]$ ssh-copy-id mini3
[hadoop@mini1 ~]$ ssh-copy-id mini4
[hadoop@mini1 ~]$ ssh-copy-id mini5
[hadoop@mini1 ~]$ ssh-copy-id mini6
[hadoop@mini1 ~]$ ssh-copy-id mini7

免密登录:

[hadoop@mini1 ~]$ ssh mini7
Last login: Wed Dec 27 12:19:14 2017 from mini1 ## mini1 登录到 mini7 用户是hadoop
[hadoop@mini7 ~]$ ll ## mini7 hadoop用户下的文件列表
总用量 0   
[hadoop@mini7 ~]$ su ## 切换 mini7的root用户 
密码:
[root@mini7 hadoop]# 

二、安装zookeeper集群

mini5、mini6、 mini7 安装zookeeper

mini5:

[hadoop@mini5 ~]$ ll
总用量 21740
-rw-r--r--. 1 hadoop hadoop 22261552 7月  29 16:05 zookeeper-3.4.8.tar.gz

2.1 hadoop用户目录下创建一个 apps文件夹

[hadoop@mini5 ~]$ mkdir apps
[hadoop@mini5 ~]$ ll
总用量 21744
drwxrwxr-x. 2 hadoop hadoop     4096 1227 12:50 apps
-rw-r--r--. 1 hadoop hadoop 22261552 729 16:05 zookeeper-3.4.8.tar.gz
[hadoop@mini5 ~]$ pwd
/home/hadoop
[hadoop@mini5 ~]$ cd apps/
[hadoop@mini5 apps]$ pwd
/home/hadoop/apps
[hadoop@mini5 apps]$ 

2.2 解压 zookeeper-3.4.8.tar.gz

[hadoop@mini5 ~]$ tar -zxvf zookeeper-3.4.8.tar.gz  -C /home/hadoop/apps/

2.3 修改配置文件 zoo.cfg

[hadoop@mini5 ~]$ cd /home/hadoop/apps/zookeeper-3.4.8/conf
[hadoop@mini5 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@mini5 conf]$ ll
总用量 16
-rw-rw-r--. 1 hadoop hadoop  535 26 2016 configuration.xsl
-rw-rw-r--. 1 hadoop hadoop 2161 26 2016 log4j.properties
-rw-rw-r--. 1 hadoop hadoop  922 1227 12:56 zoo.cfg
-rw-rw-r--. 1 hadoop hadoop  922 26 2016 zoo_sample.cfg
[hadoop@mini5 conf]$ 

2.4 zoo.cfg 中配置zookeeper数据存放位置

[hadoop@mini5 conf]$ 
[hadoop@mini5 conf]$ pwd
/home/hadoop/apps/zookeeper-3.4.8/conf
[hadoop@mini5 conf]$ vim zoo.cfg 
dataDir=/home/hadoop/apps/data/zookeeper

并且在最后面添加如下内容:

server.1=mini5:2888:3888
server.2=mini6:2888:3888
server.3=mini7:2888:3888

创建一个上面 dataDir 指定的文件夹:

[hadoop@mini5 conf]$ mkdir -p /home/hadoop/apps/data/zookeeper

2.5 dataDir 路径下新增配置 myid

[hadoop@mini5 conf]$ pwd
/home/hadoop/apps/zookeeper-3.4.8/conf
[hadoop@mini5 conf]$
[hadoop@mini5 conf]$ echo 1 > /home/hadoop/apps/data/zookeeper/myid
[hadoop@mini5 conf]$ 

2.6 将配置好的zookeeper节点拷贝到 mini6、mini7服务器

[hadoop@mini5 ~]$ scp -r /home/hadoop/apps/ mini6:/home/hadoop/
[hadoop@mini5 ~]$
[hadoop@mini5 ~]$ scp -r /home/hadoop/apps/ mini7:/home/hadoop/

2.6.1 分别修改 mini6、mini7 /home/hadoop/apps/data/zookeeper/myid 的内容

mini6:

[hadoop@mini6 ~]$ echo 2 > /home/hadoop/apps/data/zookeeper/myid 
[hadoop@mini6 ~]$ 

mini7:

[hadoop@mini7 ~]$ echo 3 > /home/hadoop/apps/data/zookeeper/myid 
[hadoop@mini7 ~]$ 

2.7 防火墙开放端口号

mini5:

[hadoop@mini5 ~]$ 
[hadoop@mini5 ~]$ su
密码:
[root@mini5 hadoop]# vim /etc/sysconfig/iptables

添加开放端口 2181, 2888, 3888:

-A INPUT -m state --state NEW -m tcp -p tcp --dport 2181 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 2888 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 3888 -j ACCEPT

刷新防火墙

[root@mini5 hadoop]# service iptables restart

mini6、mini7 同样放开端口 2181, 2888, 3888
我是直接关闭了防火墙

2.8 启动zookeeper

[hadoop@mini7 bin]$ pwd
/home/hadoop/apps/zookeeper-3.4.8/bin
[hadoop@mini7 bin]$ 
[hadoop@mini7 bin]$ 
[hadoop@mini7 bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apps/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@mini7 bin]$ 
[hadoop@mini7 bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apps/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

查看进程

[hadoop@mini7 bin]$ jps
1560 Jps
1465 QuorumPeerMain

三、安装hadoop 集群

mini1服务器下 先创建一个apps文件夹,并解压hadoop-2.6.4.tar.gz

[hadoop@mini1 ~]$ 
[hadoop@mini1 ~]$ mkdir apps; tar -zxvf cenos-6.5-hadoop-2.6.4.tar.gz -C apps

3.1 修改hadoo-env.sh

[hadoop@mini1 hadoop]$ 
[hadoop@mini1 hadoop]$ pwd
/home/hadoop/apps/hadoop-2.6.4/etc/hadoop
[hadoop@mini1 hadoop]$ vim hadoop-env.sh 

修改JAVA_HOME

export JAVA_HOME=/usr/local/jdk1.8

3.2 将hadoop添加到系统环境变量

hadoop2.0的配置文件全部在$HADOOP_HOME/etc/hadoop下

cd /home/hadoop/apps/hadoop-2.6.4/etc/hadoop
[hadoop@mini1 hadoop]$ su
密码:
[root@mini1 hadoop]# 
[root@mini1 hadoop]# 
[root@mini1 hadoop]# vim /etc/profile

添加 HADOOP_HOME

export JAVA_HOME=/usr/local/jdk1.8
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.6.4
export PATH=$HADOOP_HOME/bin:$PATH

3.3 修改 core-site.xml

[hadoop@mini1 hadoop]$ pwd
/home/hadoop/apps/hadoop-2.6.4/etc/hadoop
[hadoop@mini1 hadoop]$ vim core-site.xml 

添加如下内容

<configuration>

<!-- 指定hdfs的nameservice为ns1 -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://bi/</value>
</property>

<!-- 指定hadoop临时目录 -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/apps/hdpdata/</value>
</property>

<!-- 指定zookeeper地址 -->
<property>
    <name>ha.zookeeper.quorum</name>
    <value>mini5:2181,mini6:2181,mini7:2181</value>
</property>

</configuration>

3.4 修改 hdfs-site.xml

[hadoop@mini1 hadoop]$ 
[hadoop@mini1 hadoop]$ pwd
/home/hadoop/apps/hadoop-2.6.4/etc/hadoop
[hadoop@mini1 hadoop]$ vim hdfs-site.xml 

添加如下内容

<configuration>
    <!--指定hdfs的nameservice为bi,需要和core-site.xml中的保持一致 -->
    <property>
    <name>dfs.nameservices</name>
    <value>bi</value>
</property>

<!-- bi下面有两个NameNode,分别是nn1,nn2 -->
<property>
    <name>dfs.ha.namenodes.bi</name>
    <value>nn1,nn2</value>
</property>

<!-- nn1的RPC通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.bi.nn1</name>
    <value>mini1:9000</value>
</property>

<!-- nn1的http通信地址 -->
<property>
    <name>dfs.namenode.http-address.bi.nn1</name>
    <value>mini1:50070</value>
</property>

<!-- nn2的RPC通信地址 -->
<property>
    <name>dfs.namenode.rpc-address.bi.nn2</name>
    <value>mini2:9000</value>
</property>

<!-- nn2的http通信地址 -->
<property>
    <name>dfs.namenode.http-address.bi.nn2</name>
    <value>mini2:50070</value>
</property>

<!-- 指定NameNode的edits元数据在JournalNode上的存放位置 -->
<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://mini5:8485;mini6:8485;mini7:8485/bi</value>
</property>

<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/journaldata</value>
</property>

<!-- 开启NameNode失败自动切换 -->
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>

<!-- 配置失败自动切换实现方式 -->
<property>
    <name>dfs.client.failover.proxy.provider.bi</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>
        sshfence
        shell(/bin/true)
    </value>
</property>

<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
</property>

<!-- 配置sshfence隔离机制超时时间 -->
<property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
</property>
</configuration>

3.5 修改 mapred-site.xml

[hadoop@mini1 hadoop]$ pwd
/home/hadoop/apps/hadoop-2.6.4/etc/hadoop
[hadoop@mini1 hadoop]$ cp mapred-site.xml.template  mapred-site.xml
[hadoop@mini1 hadoop]$ vim mapred-site.xml

添加如下内容

<configuration>
        <!-- 指定mr框架为yarn方式 -->
        <property>              
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>    

3.6 修改yarn-site.xml

[hadoop@mini1 hadoop]$ pwd
/home/hadoop/apps/hadoop-2.6.4/etc/hadoop
[hadoop@mini1 hadoop]$ vim yarn-site.xml

添加如下内容

<configuration>
    <!-- 开启RM高可用 -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <!-- 指定RM的cluster id -->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yrc</value>
    </property>

    <!-- 指定RM的名字 -->
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <!-- 分别指定RM的地址 -->
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
    <value>mini3</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
    <value>mini4</value>
    </property>

    <!-- 指定zk集群地址 -->
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>mini5:2181,mini6:2181,mini7:2181</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

3.7 修改slaves

slaves是指定子节点的位置,因为要在mini1上启动HDFS、在mini3启动yarn,所以mini1上的slaves文件指定的是datanode的位置,mini3上的slaves文件指定的是nodemanager的位置

[hadoop@mini1 hadoop]$ 
[hadoop@mini1 hadoop]$ pwd
/home/hadoop/apps/hadoop-2.6.4/etc/hadoop
[hadoop@mini1 hadoop]$ 
[hadoop@mini1 hadoop]$ vim slaves 

删除localhost,添加下面的内容

mini5
mini6
mini7

3.8 配置免密登录

在1.5的基础上,还需要在配置如下服务器之间的免密登录

1. mini2 到 mini1, ... mini7之间的 ssh
2. mini3 mini4 互相 ssh

3.9 分发mini1中 apps/hadoop-2.6.4到其他服务器

[hadoop@mini1 apps]$ pwd
/home/hadoop/apps
[hadoop@mini1 apps]$ scp -r hadoop-2.6.4/ mini2:$PWD
[hadoop@mini1 apps]$ scp -r hadoop-2.6.4/ mini3:$PWD
[hadoop@mini1 apps]$ scp -r hadoop-2.6.4/ mini4:$PWD
[hadoop@mini1 apps]$ scp -r hadoop-2.6.4/ mini5:$PWD
[hadoop@mini1 apps]$ scp -r hadoop-2.6.4/ mini6:$PWD
[hadoop@mini1 apps]$ scp -r hadoop-2.6.4/ mini7:$PWD

四、hadoop HA集群的启动步骤

注意:严格按照下面的启动步骤

4.1 启动zookeeper集群(分别在mini5、mini6、mini7上启动zk)

[hadoop@mini7 bin]$ pwd
/home/hadoop/apps/zookeeper-3.4.8/bin
[hadoop@mini7 bin]$ 
[hadoop@mini7 bin]$ 
[hadoop@mini7 bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apps/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@mini7 bin]$ 
[hadoop@mini7 bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apps/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

4.2 启动journalnode(分别在在mini5、mini6、mini7上执行)

[hadoop@mini5 hadoop-2.6.4]$ 
[hadoop@mini5 hadoop-2.6.4]$ pwd
/home/hadoop/apps/hadoop-2.6.4
[hadoop@mini5 hadoop-2.6.4]$ 
[hadoop@mini5 hadoop-2.6.4]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-journalnode-mini5.out
[hadoop@mini5 hadoop-2.6.4]$ jps
2016 Jps
1970 JournalNode
1454 QuorumPeerMain
[hadoop@mini5 hadoop-2.6.4]$ 

4.3 格式化 HDFS

#### 4.3.1 在mini1上执行命令:

[hadoop@mini1 apps]$ pwd
/home/hadoop/apps
[hadoop@mini1 apps]$ hdfs namenode -format

倒数4行左右的地方,出现这一句就表示成功


17/12/27 19:31:03 INFO common.Storage: Storage directory /home/hadoop/apps/


/dfs/name has been successfully formatted.

4.3.2 复制 hadoop.tmp.dir 配置下的文件到 mini2中

格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/home/hadoop/apps/hdpdata,然后将/home/hadoop/apps/hdpdata拷贝到mini2的/home/hadoop/apps/下。

[hadoop@mini1 apps]$ pwd
/home/hadoop/apps
[hadoop@mini1 apps]$ ll
总用量 8
drwxrwxr-x. 9 hadoop hadoop 4096 1227 17:30 hadoop-2.6.4
drwxrwxr-x. 3 hadoop hadoop 4096 1227 19:31 hdpdata
[hadoop@mini1 apps]$ 
[hadoop@mini1 apps]$ 
[hadoop@mini1 apps]$ 
[hadoop@mini1 apps]$ 
[hadoop@mini1 apps]$ scp -r hdpdata mini2:/home/hadoop/apps/
seen_txid                                                                                                100%    2     0.0KB/s   00:00    
fsimage_0000000000000000000                                                                              100%  352     0.3KB/s   00:00    
VERSION                                                                                                  100%  204     0.2KB/s   00:00    
fsimage_0000000000000000000.md5                                                                          100%   62     0.1KB/s   00:00    
[hadoop@mini1 apps]$ 

也可以这样,建议

hdfs namenode -bootstrapStandby

4.4 格式化ZKFC(在mini1上执行一次即可)

[hadoop@mini1 apps]$ pwd
/home/hadoop/apps
[hadoop@mini1 apps]$ hdfs zkfc -formatZK

在倒数第3行提示如下内容表示成功

17/12/27 19:48:22 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/bi in ZK.

在zookeeper集群中可以查看

[hadoop@mini5 zookeeper-3.4.8]$ pwd
/home/hadoop/apps/zookeeper-3.4.8
[hadoop@mini5 zookeeper-3.4.8]$ ./bin/zkCli.sh 
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper, hadoop-ha]
[zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha
[bi]
[zk: localhost:2181(CONNECTED) 3] 
[zk: localhost:2181(CONNECTED) 3] get /hadoop-ha/bi

cZxid = 0x200000003
ctime = Thu Dec 28 12:15:57 CST 2017
mZxid = 0x200000003
mtime = Thu Dec 28 12:15:57 CST 2017
pZxid = 0x200000003
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0
[zk: localhost:2181(CONNECTED) 4] 

4.5 启动HDFS(在mini1上执行)

[hadoop@mini1 hadoop-2.6.4]$ pwd
/home/hadoop/apps/hadoop-2.6.4
[hadoop@mini1 hadoop-2.6.4]$ sbin/start-dfs.sh 

注意

mini1服务器,这里有可能在启动的时候存在一个错误:

permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)

原来我没有做一个很重要的操作!
那就是将生成的密钥文件,拷贝到authorized_keys中
可以使用

[hadoop@mini1 .ssh]$ 
[hadoop@mini1 .ssh]$ ll
总用量 16
-rw-------. 1 hadoop hadoop  394 1228 01:11 authorized_keys
-rw-------. 1 hadoop hadoop 1679 1227 20:00 id_rsa
-rw-r--r--. 1 hadoop hadoop  394 1227 20:00 id_rsa.pub
-rw-r--r--. 1 hadoop hadoop 2807 1227 19:54 known_hosts
[hadoop@mini1 .ssh]$ pwd
/home/hadoop/.ssh
[hadoop@mini1 .ssh]$ 
[hadoop@mini1 .ssh]$ cp id_rsa.pub authorized_keys 

4.6 启动YARN

是在mini3上执行start-yarn.sh,把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动)

[hadoop@mini3 sbin]$ pwd
/home/hadoop/apps/hadoop-2.6.4/sbin
[hadoop@mini3 sbin]$ ./start-yarn.sh 

这个脚本不完整,可以在mini4中使用如下命令启动

[hadoop@mini4 sbin]$ ./yarn-daemon.sh start resourcemanager

这样,mini3, mini4就有 resourcemanager

五、验证HDFS HA

5.1 验证 HA

mini1、mini2 两个中任何一个 kill -9
访问 http://mini1:50070 或者 http://mini2:5007 查看namenode的状态有没有切换成 active

也可以上传一个文件,然后kill -9 ,查看上传的文件是否存在

验证 YARN

运行 hadoop 提供的demo中的WordCount 程序
先创

建一个test.txt

[hadoop@mini1 ~]$ pwd
/home/hadoop
[hadoop@mini1 ~]$ vim test.txt 

添加如下内容:

hello word donggua  wang zi hahahhaah

hdfs 中创建一个 input 文件夹

[hadoop@mini1 ~]$ hadoop fs -mkdir /input
[hadoop@mini1 ~]$ 
[hadoop@mini1 ~]$ hadoop fs -ls /
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 01:46 /input

上传test.txt 到 input下

[hadoop@mini1 ~]$ hadoop fs -put test.txt /input
[hadoop@mini1 ~]$ hadoop fs -ls /input
Found 1 items
-rw-r--r--   3 hadoop supergroup         39 2018-01-14 01:47 /input/test.txt
[hadoop@mini1 ~]$ 

执行hadoop WordCount demo程序

[hadoop@mini1 ~]$ 
[hadoop@mini1 ~]$ 
[hadoop@mini1 ~]$ hadoop jar apps/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar  wordcount /input /output

执行过程

18/01/14 01:50:48 INFO input.FileInputFormat: Total input paths to process : 1
18/01/14 01:50:48 INFO mapreduce.JobSubmitter: number of splits:1
18/01/14 01:50:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514401745125_0004
18/01/14 01:50:49 INFO impl.YarnClientImpl: Submitted application application_1514401745125_0004
18/01/14 01:50:49 INFO mapreduce.Job: The url to track the job: http://mini3:8088/proxy/application_1514401745125_0004/
18/01/14 01:50:49 INFO mapreduce.Job: Running job: job_1514401745125_0004
18/01/14 01:50:59 INFO mapreduce.Job: Job job_1514401745125_0004 running in uber mode : false
18/01/14 01:50:59 INFO mapreduce.Job:  map 0% reduce 0%
18/01/14 01:51:07 INFO mapreduce.Job:  map 100% reduce 0%
18/01/14 01:51:14 INFO mapreduce.Job:  map 100% reduce 100%
18/01/14 01:51:15 INFO mapreduce.Job: Job job_1514401745125_0004 completed successfully
18/01/14 01:51:16 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=79
                FILE: Number of bytes written=218255
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=128
                HDFS: Number of bytes written=49
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6042
                Total time spent by all reduces in occupied slots (ms)=4356
                Total time spent by all map tasks (ms)=6042
                Total time spent by all reduce tasks (ms)=4356
                Total vcore-milliseconds taken by all map tasks=6042
                Total vcore-milliseconds taken by all reduce tasks=4356
                Total megabyte-milliseconds taken by all map tasks=6187008
                Total megabyte-milliseconds taken by all reduce tasks=4460544
        Map-Reduce Framework
                Map input records=1
                Map output records=6
                Map output bytes=61
                Map output materialized bytes=79
                Input split bytes=89
                Combine input records=6
                Combine output records=6
                Reduce input groups=6
                Reduce shuffle bytes=79
                Reduce input records=6
                Reduce output records=6
                Spilled Records=12
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=205
                CPU time spent (ms)=2190
                Physical memory (bytes) snapshot=287481856
                Virtual memory (bytes) snapshot=4134383616
                Total committed heap usage (bytes)=137498624
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=39
        File Output Format Counters 
                Bytes Written=49

看执行结果:

[hadoop@mini1 ~]$ hadoop fs -ls /output
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2018-01-14 01:51 /output/_SUCCESS
-rw-r--r--   3 hadoop supergroup         49 2018-01-14 01:51 /output/part-r-00000
[hadoop@mini1 ~]$ hadoop fs -cat /output/part-r-00000
donggua 1
hahahhaah       1
hello   1
wang    1
word    1
zi      1
[hadoop@mini1 ~]$ 

至此,hadoop HA搭建就大功告成了。
后面还要优化,使用脚本进行集群的自动化部署。

测试集群工作状态的一些指令 :

bin/hdfs dfsadmin -report   #  查看hdfs的各节点状态信息


bin/hdfs haadmin -getServiceState nn1        # 获取一个namenode节点的HA状态

sbin/hadoop-daemon.sh start namenode  # 单独启动一个namenode进程


./hadoop-daemon.sh start zkfc   # 单独启动一个zkfc进程

以上就是全文的内容,由于水平有限,文章中难免会有错误,希望大家指正。谢谢~

猜你喜欢

转载自blog.csdn.net/liudongdong0909/article/details/79101677