Hadoop搭建笔记(22)

本文是我学习Hadoop搭建过程中的各种笔记,内容来自于各种公开的教程,起点非常低,从Linux基础开始,直至在PC上搭建Hadoop成功,是真正的从零开始。

感谢过程中帮助我的各位认识的和不认识的老师。

38、Hadoop的集群配置02:

[root@hadoop01 hadoop-2.7.1]# ll ./etc/hadoop/mapred-site.xml.template

-rw-r--r--. 1 10021 10021 758 Jun 29  2015 ./etc/hadoop/mapred-site.xml.template

mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml

[[email protected]]#mv./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml

第四个配置文件: vi ./etc/hadoop/mapred-site.xml

< configuration >

<!--指定mapreduce运行框架-->

mapreduce跑的框架在yarn之上】

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>   

<final>ture</final>

</property >

<!--历史服务的通信地址-->

mapreduce.jobhistory.address:历史服务的内部通信地址】

<property>

<name>mapreduce.jobhistory.address</name>

<value>hadoop01:10020</value>   

【默认的端口号:10020

</property >

<!--历史服务的web ui地址-->

<property>

<name>mapreduce.jobhistory.webapp.address </name>

<value>hadoop01:19888</value>   

</property >

第五个配置文件:  vi ./etc/hadoop/yarn-site.xml

< configuration >

<!--指定rm所启动的服务主机名-->

【(rm: resourcemanagerresourcemanager要启动的节点】

<property>

<name>yarn.resourcemanager.hostname</name>

<value>hadoop01</value>

【因为resourcemanager规划在hadoop01上面】   

</property>

<!--指定mrshuffle-->

【(mr: mapreduce)没有这个,下面运型mapreduce,会报错】

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>   

</property>

<!--指定rm的内部通信地址-->

<property>

<name>yarn. .resourcemanager.address</name>

<value>hadoop01:8032</value>   

</property>

<!--指定rmscheduler内部通信地址-->

<property>

<name>yarn. .resourcemanager.scheduler.address</name>

<value>hadoop01:8030</value>   

</property>

<!--指定rmresource-tracker内部通信地址-->

<property>

<name>yarn. .resourcemanager.resource-tracker.address</name>

<value>hadoop01:8031</value>   

</property>

<!--指定rmadmin内部通信地址-->

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>hadoop01:8033</value>   

803几开始的就是yarn的一些内部通信地址】

</property>

<!--指定rmweb ui监控地址-->

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hadoop01:8033</value>   

</property>

第六个配置文件::vi ./etc/hadoop/slaves 

slaves:奴隶的意思,老大找小弟就靠这个文件】

[root@hadoop01 hadoop-2.7.1]# vi ./etc/hadoop/slaves

删除:localhost

输入:

hadoop01

hadoop02

hadoop03

 

实际操作配置:

进入hadoop的目录:

[root@hadoop01 ~]# cd $HADOOP_HOME

[root@hadoop01 hadoop-2.7.1]#

配六个相关文件:

1个:vi ./ect/hadoop/hadoop-env.sh

[root@hadoop01 hadoop-2.7.1]# vi ./ect/hadoop/hadoop-env.sh

之前已经配置过:export JAVA_HOME=/usr/local/jdk1.8.0_144/

2个:vi ./ect/hadoop/core-site.xml

[root@hadoop01 hadoop-2.7.1]# vi ./ect/hadoop/core-site.xml (配置了三项:)

<configuration>

<!--配置hdfs文件系统的命名空间-->

<property>

<name>fs.defaultFS</name>

<value>hdfs://hadoop01:9001</value>

</property >

<!--配置操作hdfs的缓冲大小-->

<property>

<name>io.file.buffer.size</name>

<value>4096</value>

</property>

<!--配置临时数据存储目录-->

<property>

<name>hadoop.tmp.dir</name>

<value>/home/bigdata/tmp</value>

</property>

</configuration>

3个:vi ./ect/hadoop/hdfs-site.xml

[root@hadoop01 hadoop-2.7.1]# vi ./ect/hadoop/hdfs-site.xml

<configuration>

<!--配置副本数 -->

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<!--块大小-->

<property>

<name>dfs.block.size</name>

<value>134217728</value>

</property>

<!--hdfs的元数据存储的位置-->

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoopdata/dfs/name</value>

</property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoopdata/dfs/name</value>

</property>

<!--hdfs的数据存储的位置-->

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoopdata/dfs/data</value>

</property>

<!--hdfs的检测目录-->

<property>

<name>fs.checkpoint.dir</name>

<value>/home/hadoopdata/checkpoint/dfs/cname</value>

</property>

<!--hdfs的namenode的web ui地址-->

<property>

<name>dfs.http.address</name>

<value>hadoop01:50070</value>

</property>

<!--hdfs的snn的web ui地址-->

<property>

<name>dfs.secondary.http.address</name>

<value>hadoop01:50090</value>

</property>

<!--是否开启web操作hdfs -->

<property>

<name>dfs.webhdfs.enabled</name>

<value>false</value>

</property>

<!--是否起用hdfs的权限(acl)-->

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>

4个:vi ./etc/hadoop/mapred-site.xml

[root@hadoop01 hadoop-2.7.1]# ll ./etc/hadoop/mapred-site.xml.template

-rw-r--r--. 1 10021 10021 758 Jun 29  2015 ./etc/hadoop/mapred-site.xml.template

[root@hadoop01 hadoop-2.7.1]# mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml

[root@hadoop01 hadoop-2.7.1]# vi ./etc/hadoop/mapred-site.xml

<configuration>

<!--指定mapreduce运行框架-->

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

<final>true</final>

</property >

<!--历史服务的通信地址-->

<property>

<name>mapreduce.jobhistory.address</name>

<value>hadoop01:10020</value>

</property>

<!--历史服务的web ui地址-->

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>hadoop01:19888</value>

</property>

</configuration>

 

5个:vi ./etc/hadoop/yarn-site.xml

[root@hadoop01 hadoop-2.7.1]# vi ./etc/hadoop/yarn-site.xml

<configuration>

<!--指定resourcemanager所启动的服务主机名-->

<property>

<name>yarn.resourcemanager.hostname</name>

<value>hadoop01</value>

</property>

<!--指定mapreduceshuffle-->

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<!--指定rm的内部通信地址-->

<property>

<name>yarn.resourcemanager.address</name>

<value>hadoop01:8032</value>

</property>

<!--指定rmscheduler内部通信地址-->

<property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>hadoop01:8030</value>

</property>

<!--指定rmresource-tracker内部通信地址-->

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>hadoop01:8031</value>

</property>

<!--指定rmadmin内部通信地址-->

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>hadoop01:8033</value>

</property>

<!--指定rmweb ui监控地址-->

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hadoop01:8088</value>

</property>

</configuration>

6个:vi ./etc/hadoop/slaves

[root@hadoop01 hadoop-2.7.1]# vi ./etc/hadoop/slaves

删除:localhost

输入:

hadoop01

hadoop02

hadoop03

规划、配置完成后远程分发到别的服务器上面,就是规划的三台服务器上面,使三台服务器都要有相同的配置。

 

远程分发

分发前,hadoop02,hadoop03上都有hadoop,删掉:rm rf /usr/local/hahdoop-2.7.1

[root@hadoop02 ~]# rm -rf /usr/local/hadoop-2.7.1/

[root@hadoop03 ~]# rm -rf /usr/local/hadoop-2.7.1/

删除后,此时hadoop02,hadoop03都没有不再有和hadoop先关的东西

分别在hadoop02hadoop03which Hadoopwhich hadoop都显示是没有的 

[root@hadoop02 ~]#cd /usr/local/

[root@hadoop02 ~]#ll

[root@hadoop02 local]# which hadoop

/usr/bin/which: no hadoop in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/jdk1.8.0_144//bin:/usr/local/hadoop-2.7.1//bin:/usr/local/hadoop-2.7.1//sbin::/root/bin)

[root@hadoop03 local]# which hadoop

/usr/bin/which: no hadoop in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/jdk1.8.0_144//bin:/usr/local/hadoop-2.7.1//bin:/usr/local/hadoop-2.7.1//sbin::/root/bin)

 

39、Hadoop的集群的启动和测试:

远程分发:scp

远程分发别的服务器上面:

hadoop01远程分发到hadoop02hadoop03上:

scp r . ./hadoop-2.7.1/ hadoop02:/usr/local

scp r . ./hadoop-2.7.1/ hadoop03:/usr/local

[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop02:/usr/local/

[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop03:/usr/local/

..)上上一次目录的hadoop-2.7.1,发到hadoop02hadoop03机子的usr/local目录

 

此时出现不能解决主机名的问题,丢掉了连接,原因是没有映射

[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop02:/usr/local/

The authenticity of host 'hadoop02 (192.168.216.112)' can't be established.

RSA key fingerprint is 04:ae:11:51:c3:ac:4b:0d:9b:78:3c:c0:58:8e:82:04.

Are you sure you want to continue connecting (yes/no)?

Host key verification failed.

lost connection

[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop02:/usr/local/

ssh: connect to host hadoop02 port 22: Connection refused  (没有映射)

 

添加映射:vi /etc/hosts

[root@hadoop01 hadoop-2.7.1]# vi /etc/hosts

192.168.216.111 hadoop01 www.hadoop01.com

192.168.216.112 hadoop02 www.hadoop02.com 添加

192.168.216.113 hadoop03 www.hadoop03.com 添加

添加的192.168.216.112. hadoop02 www.hadoop02.com192.168.216.113. hadoop03 www.hadoop03.com 指:输hadoop02,它IP才会到112上,否则找不到

 

添加完映射,分发前,删除doc文档,便于传输,rm -rf ./share/doc/

[root@hadoop01 ~]# cd /usr/local/

[root@hadoop01 local]# cd ./hadoop-2.7.1/

[root@hadoop01 hadoop-2.7.1]# ll ./share

total 8

drwxr-xr-x. 3 10021 10021 4096 Jun 29  2015 doc

drwxr-xr-x. 9 10021 10021 4096 Jun 29  2015 hadooprm –rf ./share/doc/    

ll

doc,是一个学习文档,东西很杂,要删掉,便于传输,传输会快一点:

[root@hadoop01 hadoop-2.7.1]# rm -rf ./share/doc/   

 

补充:若是第二次分发到同一台机子上是不需要的输yes ,直接输密码即可,因为只有第一次过去,才把主机名接到这台机子上)

 

添加完映射,删除doc后,分发到hadoop02上:

[root@hado op01 hadoop-2.7.1]#scp –r . ./hadoop-2.7.1/ hadoop02:/usr/local     

问是否真的要过去,是否进入一下主机名,输入:yes 、输入密码:root 

 

分发到hadoop03上:

[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop03:/usr/local/

The authenticity of host 'hadoop03 (192.168.216.113)' can't be established.

RSA key fingerprint is 58:0e:71:78:09:8c:54:ed:43:16:e3:71:eb:5c:20:57.

Are you sure you want to continue connecting (yes/no)? yes

root@hadoop03's password:

 

hadoop02上面查看有没有hadoop-2.7.1

[root@hadoop02 local]# ll

total 52

drwxrwxr-x. 9 1000 1000 4096 Apr 23 15:17 bashdb-4.4-0.93

………………

drwxr-xr-x. 9 root root 4096 Apr 25 10:43 hadoop-2.7.1

………………

drwxr-xr-x. 2 root root 4096 Sep 23  2011 src

[root@hadoop02 local]# which hadoop

/usr/local/hadoop-2.7.1/bin/hadoop

hadoop03上面查看:which hadoop

[root@hadoop03 local]# which hadoop

/usr/local/hadoop-2.7.1/bin/hadoop

 

第一次which hadoop,出现了下面这个:

[root@hadoop03 local]# which hadoop

/usr/local/hadoop-2.7.1/bin/hadoop

You have new mail in /var/spool/mail/root(您在/ var / spool / mail / root中有新邮件

远程分发做好

 

Hadoop集群规划、配置完成,远程分发也做好,接下来可以启动Hadoop集群了,但是启动之前还需进行一些操作,启动之前,由于集群刚搭起来,hdfs是一个文件系统,需要先格式化。

启动之前,namenode服务器上格式化只需要一次格式化一次就可以了,下一次直接启动即可。

一定要在hadoop01namenode)上面格式化,格式化命令:hadoop namenode-format

格式化后,启动NameNode、DataNode、ResourceManagerNodeManager节点。

 

查看hadoop-2.7.1下的home目录,目前没有hadoopdata目录以及bigdata临时目录:

[root@hadoop01 hadoop-2.7.1]# ll /home/

total 387564

drwx------. 5 aidon aidon      4096 Apr 22 00:49 aidon

drwxrwxr-x. 9  1000  1000      4096 Aug  5  2017 bashdb-4.4-0.93

-rw-r--r--. 1 root  root     699632 Apr 23 11:50 bashdb-4.4-0.93.tar.bz2

-rw-r--r--. 1 root  root  210606807 Apr  6 08:54 hadoop-2.7.1.tar.gz

drwxr-xr-x. 2 root  root       4096 Apr 24 20:14 input

-rw-r--r--. 1 root  root  185515842 Mar 29 23:03 jdk-8u144-linux-x64.tar.gz

drwxr-xr-x. 2 root  root       4096 Apr 24 20:17 output

drwxr-xr-x. 2 root  root       4096 Apr 24 18:28 shell

drwxr-xr-x. 3 root  root       4096 Apr 21 17:14 test2

drwxr-xr-x. 2 root  root       4096 Apr 21 17:16 test4

drwxr-xr-x. 2 root  root       4096 Apr 21 17:16 test5

-rw-r--r--. 1 root  root        441 Apr 21 23:45 test.tar

 

hadoop01上面格式化:

[root@hadoop01 hadoop-2.7.1]# hadoop namenode –format

18/04/25 20:50:27 INFO common.Storage: Storage directory /home/hadoopdata/dfs/name

has been successfully formatted.

格式化后出现此句话:表明格式化已经成功,保证了元数据目录是新

 

Duplicate 克隆一个新窗口: 此时在克隆的新窗口查看home目录下有hadoopdata:

[root@hadoop01 ~]# ll /home/

total 387568

………………

drwxr-xr-x. 3 root  root       4096 Apr 25 20:50 hadoopdata

………………

 

hadoopdata下面有一个dfs:

[root@hadoop01 ~]# ll /home/hadoopdata/

total 4

drwxr-xr-x. 3 root root 4096 Apr 25 20:50 dfs

 

dfs下面有一个name ,  name装的是元数据:

[root@hadoop01 ~]# ll /home/hadoopdata/dfs

total 4

drwxr-xr-x. 3 root root 4096 Apr 25 20:50 name

 

name下的current , 就是元数据:

[root@hadoop01 ~]# ll /home/hadoopdata/dfs/name/

total 4

drwxr-xr-x. 2 root root 4096 Apr 25 20:50 current

 

current里面装的就是元数据:(fsimage元数据)

[root@hadoop01 ~]# ll /home/hadoopdata/dfs/name/current/

total 16

-rw-r--r--. 1 root root 351 Apr 25 20:50 fsimage_0000000000000000000

-rw-r--r--. 1 root root  62 Apr 25 20:50 fsimage_0000000000000000000.md5

-rw-r--r--. 1 root root   2 Apr 25 20:50 seen_txid

-rw-r--r--. 1 root root 208 Apr 25 20:50 VERSION

 

启动服务:

格式化成功后,生成新的元数据目录,就可以正常启动集群(启动NameNode、DataNode、ResourceManagerNodeManager节点)

三种启动方式:

全启动

start-all.sh

模块启动

start-dfs.sh

start-yarn.sh

单个进程启动

注意:start/stop后面的全部要小写

hadoop-daemon.sh start/stop namenode  启动/停止 namenode

hadoop-daemons.sh start/stop datanode

启动/停止 整个集群全部的datanode

yarn-daemon.sh start/stop namenod

yarn-daemons.sh start/stop datanode

mr-jobhistory-daemon.sh start/stop historyserver

启动符命令,所在位置:ll ./sbin/

[root@hadoop01 hadoop-2.7.1]# ll ./sbin/

total 120

-rwxr-xr-x. 1 10021 10021 2752 Jun 29  2015 distribute-exclude.sh

-rwxr-xr-x. 1 10021 10021 6452 Jun 29  2015 hadoop-daemon.sh

-rwxr-xr-x. 1 10021 10021 1360 Jun 29  2015 hadoop-daemons.sh

-rwxr-xr-x. 1 10021 10021 1640 Jun 29  2015 hdfs-config.cmd

-rwxr-xr-x. 1 10021 10021 1427 Jun 29  2015 hdfs-config.sh

-rwxr-xr-x. 1 10021 10021 2291 Jun 29  2015 httpfs.sh

-rwxr-xr-x. 1 10021 10021 3128 Jun 29  2015 kms.sh

-rwxr-xr-x. 1 10021 10021 4080 Jun 29  2015 mr-jobhistory-daemon.sh

-rwxr-xr-x. 1 10021 10021 1648 Jun 29  2015 refresh-namenodes.sh

-rwxr-xr-x. 1 10021 10021 2145 Jun 29  2015 slaves.sh

-rwxr-xr-x. 1 10021 10021 1779 Jun 29  2015 start-all.cmd

-rwxr-xr-x. 1 10021 10021 1471 Jun 29  2015 start-all.sh

-rwxr-xr-x. 1 10021 10021 1128 Jun 29  2015 start-balancer.sh

-rwxr-xr-x. 1 10021 10021 1401 Jun 29  2015 start-dfs.cmd

-rwxr-xr-x. 1 10021 10021 3734 Jun 29  2015 start-dfs.sh

-rwxr-xr-x. 1 10021 10021 1357 Jun 29  2015 start-secure-dns.sh

-rwxr-xr-x. 1 10021 10021 1571 Jun 29  2015 start-yarn.cmd

-rwxr-xr-x. 1 10021 10021 1347 Jun 29  2015 start-yarn.sh

-rwxr-xr-x. 1 10021 10021 1770 Jun 29  2015 stop-all.cmd

-rwxr-xr-x. 1 10021 10021 1462 Jun 29  2015 stop-all.sh

-rwxr-xr-x. 1 10021 10021 1179 Jun 29  2015 stop-balancer.sh

-rwxr-xr-x. 1 10021 10021 1455 Jun 29  2015 stop-dfs.cmd

-rwxr-xr-x. 1 10021 10021 3206 Jun 29  2015 stop-dfs.sh

-rwxr-xr-x. 1 10021 10021 1340 Jun 29  2015 stop-secure-dns.sh

-rwxr-xr-x. 1 10021 10021 1642 Jun 29  2015 stop-yarn.cmd

-rwxr-xr-x. 1 10021 10021 1340 Jun 29  2015 stop-yarn.sh

-rwxr-xr-x. 1 10021 10021 4295 Jun 29  2015 yarn-daemon.sh

-rwxr-xr-x. 1 10021 10021 1353 Jun 29  2015 yarn-daemons.sh

使用模块启动集群:

此时在hadoop02的home下面仍然没有hadoopdata目录,因为hadoop02上没有元数据,只有01上有元数据 hadoop01、hadoop02hadoop03上面有datanode,表明有真正数据内容,只有正在去写数据时hadoop02、hadoop03才有目录出现。

 

hadoop01上启动: ./sbin/start-dfs.sh

[root@hadoop01 hadoop-2.7.1]# ./sbin/start-dfs.sh

由于没有配置ssh免密登录,要不断回答是否接受主机,输入yes和密码(root

18/04/26 09:14:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [hadoop01]

root@hadoop01's password:

hadoop01: namenode running as process 12853. Stop it first.

root@hadoop02's password: root@hadoop01's password: root@hadoop03's password:

hadoop02: datanode running as process 12790. Stop it first.

root

hadoop01: starting datanode, logging to /usr/local/hadoop-2.7.1/logs/hadoop-root-datanode-hadoop01.out

root

hadoop03: starting datanode, logging to /usr/local/hadoop-2.7.1/logs/hadoop-root-datanode-hadoop03.out

root

Starting secondary namenodes [hadoop01]

root@hadoop01's password:

hadoop01: secondarynamenode running as process 13052. Stop it first.

18/04/26 09:15:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

 

hadoop01hadoop02hadoop03上分别查看进程  jps(看进程)

查看hadoop01的进程:

[root@hadoop01 hadoop-2.7.1]# jps

13376 DataNode

12853 NameNode

13612 Jps

13052 SecondaryNameNode

 

查看hadoop02的进程:

[root@hadoop02 local]# jps

13027 Jps

12790 DataNode

查看hadoop03的进程:

[root@hadoop03 local]# jps

12925 DataNode

12989 Jps

 

测试(分几步):

1. 查看进程是否按照规划启动起来

2. 查看对应模块的web ui监控是否正常:

http:// 192.168.216.111:50070 

192.168.216.111:50070(用web,UI来看,用namenode的IP就行了)

3. 上传和下载文件(测试hdfs),跑一个MapReduce的作业(测yarn集群)

 

hdfs的模块的测试:

查看hdfs文件系统的根目录下有无东西,现在是没有的:

[root@hadoop01 hadoop-2.7.1]# hdfs dfs -ls /(这个hdfs dfs –ls /命令,以后会讲)

18/04/26 09:26:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

 

查看本地Linux系统,hadoop-2.7.1下面有哪些目录和文件:

[root@hadoop01 hadoop-2.7.1]# ll

total 56

drwxr-xr-x. 2 10021 10021  4096 Jun 29  2015 bin

drwxr-xr-x. 3 10021 10021  4096 Apr 25 10:15 etc

drwxr-xr-x. 2 10021 10021  4096 Jun 29  2015 include

drwxr-xr-x. 3 10021 10021  4096 Jun 29  2015 lib

drwxr-xr-x. 2 10021 10021  4096 Jun 29  2015 libexec

-rw-r--r--. 1 10021 10021 15429 Jun 29  2015 LICENSE.txt

drwxr-xr-x. 2 root  root   4096 Apr 26 09:15 logs

-rw-r--r--. 1 10021 10021   101 Jun 29  2015 NOTICE.txt

-rw-r--r--. 1 10021 10021  1366 Jun 29  2015 README.txt

drwxr-xr-x. 2 10021 10021  4096 Jun 29  2015 sbin

drwxr-xr-x. 3 10021 10021  4096 Apr 25 10:36 share

上传,在网页上:http://192.168.216.111:50070/

 

把本地Linux系统的文件READM.txt上传到hdfs文件系统的根目录下面,名字就是以前的名字 README.txt

[root@hadoop01 hadoop-2.7.1]# hdfs dfs -put ./README.txt /

18/04/26 09:29:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

 

查看hdfs文件系统的根目录,现在有了:

[root@hadoop01 hadoop-2.7.1]# hdfs dfs -ls /

18/04/26 09:31:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 1 items

-rw-r--r--   3 root supergroup       1366 2018-04-26 09:30 /README.txt

 

hdfs文件系统从本地Linux系统上传过来的README.txt文件:(注意:hdfs文件系统,从根目录开始,没有相对目录,不要打点)

[root@hadoop01 hadoop-2.7.1]# hdfs dfs -cat /README.txt

18/04/26 09:35:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/core/

and our wiki, at:

   http://wiki.apache.org/hadoop/

This distribution includes cryptographic software.  The country in

which you currently reside may have restrictions on the import,

possession, use, and/or re-export to another country, of

encryption software.  BEFORE using any encryption software, please

check your country's laws, regulations and policies concerning the

import, possession, or use, and re-export of encryption software, to

see if this is permitted.  See <http://www.wassenaar.org/> for more

information.

The U.S. Government Department of Commerce, Bureau of Industry and

Security (BIS), has classified this software as Export Commodity

Control Number (ECCN) 5D002.C.1, which includes information security

software using or performing cryptographic functions with asymmetric

algorithms.  The form and manner of this Apache Software Foundation

distribution makes it eligible for export under the License Exception

ENC Technology Software Unrestricted (TSU) exception (see the BIS

Export Administration Regulations, Section 740.13) for both object

code and source code.

The following provides more details on the included cryptographic

software:

  Hadoop Core uses the SSL libraries from the Jetty project written

by mortbay.org.

文件成功读出,hdfs模块集群搭建好↑

yarn的模块的测试:

启动yarn  start-yarn.sh

[root@hadoop01 hadoop-2.7.1]# start-yarn.sh

hadoop01查看jps

[root@hadoop01 hadoop-2.7.1]# jps

13376 DataNode

14465 Jps

13939 ResourceManager

12853 NameNode

14229 NodeManager

13052 SecondaryNameNode

 

hadoop02查看jps

[root@hadoop02 local]# jps

12790 DataNode

13110 NodeManager

13257 Jps

 

hadoop03查看jps

[root@hadoop03 local]# jps

12925 DataNode

13214 Jps

13071 NodeManager

web ui监控,在网页上:http://192.168.216.111:8088

 

yarn启动后,跑一个mapreduce的作业:

[跑一个作业,是在测试:启动的yarn能否应用到集群上]

yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount / README.txt /out/00

用默认的jar跑一个作业wordcount(对某个文件单词出现的频率进行统计);输入数据的目录,此时如果是集群在跑数据时输入的数据一定是hdfs文件系统的数据,hdfs文件系统,刚才上传了一个README.txt;输出:out

 

[root@hadoop01 hadoop-2.7.1]# yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /README.txt /out/00

18/04/26 09:57:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/04/26 09:57:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.216.111:8032

18/04/26 09:57:46 INFO input.FileInputFormat: Total input paths to process : 1

18/04/26 09:57:46 INFO mapreduce.JobSubmitter: number of splits:1

18/04/26 09:57:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524706682045_0001

18/04/26 09:57:48 INFO impl.YarnClientImpl: Submitted application application_1524706682045_0001

18/04/26 09:57:48 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1524706682045_0001/

18/04/26 09:57:48 INFO mapreduce.Job: Running job: job_1524706682045_0001

18/04/26 09:58:16 INFO mapreduce.Job: Job job_1524706682045_0001 running in uber mode : false

18/04/26 09:58:16 INFO mapreduce.Job:  map 0% reduce 0%  (这个能跑出来,证明yarn集群也搭建好了)

18/04/26 09:58:40 INFO mapreduce.Job:  map 100% reduce 0% map100%执行完,等待reduce端的执行)

18/04/26 09:58:59 INFO mapreduce.Job:  map 100% reduce 100%  reduce端执行完成后整个作业就执行完成了)

………………

根目录下之前没有out,查看是否创建出来:

[root@hadoop01 hadoop-2.7.1]# hdfs dfs -ls /out

18/04/26 10:11:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 1 items

drwxr-xr-x   - root supergroup          0 2018-04-26 09:58 /out/00

查看out里面的内容:

[root@hadoop01 hadoop-2.7.1]# hdfs dfs -ls /out/00

18/04/26 10:00:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

-rw-r--r--   3 root supergroup          0 2018-04-26 09:58 /out/00/_SUCCESS

-rw-r--r--   3 root supergroup       1306 2018-04-26 09:58 /out/00/part-r-00000

_SUCCESS  :成功标志文件     part-r-00000  :结果文件

part-r-00000结果文件:

[root@hadoop01 hadoop-2.7.1]# hdfs dfs -cat /out/00/part-r-00000

18/04/26 10:01:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

(BIS), 1

(ECCN) 1

(TSU) 1

(see 1

5D002.C.1, 1

740.13) 1

<http://www.wassenaar.org/> 1

Administration 1

Apache 1

BEFORE 1

BIS 1

Bureau 1

Commerce, 1

Commodity 1

Control 1

Core 1

Department 1

ENC 1

Exception 1

Export 2

For 1

Foundation 1

Government 1

Hadoop 1

Hadoop, 1

Industry 1

Jetty 1

License 1

Number 1

Regulations, 1

SSL 1

Section 1

Security 1

See 1

Software 2

Technology 1

The 4

This 1

U.S. 1

Unrestricted 1

about 1

algorithms. 1

and 6

and/or 1

another 1

any 1

as 1

asymmetric 1

at: 2

both 1

by 1

check 1

classified 1

code 1

code. 1

concerning 1

country 1

country's 1

country, 1

cryptographic 3

currently 1

details 1

distribution 2

eligible 1

encryption 3

exception 1

export 1

following 1

for 3

form 1

from 1

functions 1

has 1

have 1

http://hadoop.apache.org/core/ 1

http://wiki.apache.org/hadoop/ 1

if 1

import, 2

in 1

included 1

includes 2

information 2

information. 1

is 1

it 1

latest 1

laws, 1

libraries 1

makes 1

manner 1

may 1

more 2

mortbay.org. 1

object 1

of 5

on 2

or 2

our 2

performing 1

permitted. 1

please 2

policies 1

possession, 2

project 1

provides 1

re-export 2

regulations 1

reside 1

restrictions 1

security 1

see 1

software 2

software, 2

software. 2

software: 1

source 1

the 8

this 3

to 2

under 1

use, 2

uses 1

using 2

visit 1

website 1

which 2

wiki, 1

with 1

written 1

you 1

your 1

此时把上传的hdfs文件系统的README.txt文件,里面的每一个单词出现的频率都统计了出来

hdfs、yarn模块的启动,集群的测试OK↑


猜你喜欢

转载自blog.csdn.net/zxqjinhu/article/details/80488911