续我的上上篇博文:https://mp.csdn.net/postedit/89017772。即Hadoop单机模式已经安装部署好。
本篇博文安装的是Hadoop的分布模式。
一、实验环境(rhel7.3版本)
1、selinux和firewalld状态为disabled
2、各主机信息如下:
主机 | ip |
---|---|
server1(NameNode、 Secondary Namenode、nfs网络文件系统的服务端) | 172.25.83.1 |
server2(Datanode、nfs网络文件系统的客户端) | 172.25.83.2 |
server3(Datanode、nfs网络文件系统的客户端) | 172.25.83.3 |
3、删除上一实验部署生成的数据并停止dfs
[hadoop@server1 hadoop]$ rm -rf /tmp/*
[hadoop@server1 hadoop]$ ls /tmp/
[hadoop@server1 hadoop]$
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh
Stopping namenodes on [server1]
Stopping datanodes
Stopping secondary namenodes [server1]
[hadoop@server1 hadoop]$ jps
8915 Jps
二、Hadoop分布模式的安装与部署
1、首先配置 nfs网络文件系统:
<1>在server2端和server3端:添加hadoop用户(指定的hadoop用户的uid为1004(与server1端hadoop用户保持一
致)),以生成需要挂载的目录/home/hadoop
#添加用户的操作,必须是root用户来执行,否则会报错
[root@server2 ~]# useradd -u 1004 hadoop
[root@server2 ~]# id hadoop
uid=1004(hadoop) gid=1004(hadoop) groups=1004(hadoop)
#server3端的操作同server2端
<2>在server1端,server2端和server3端:安装nfs-utils软件,以提供rpcbind服务,并启动rpcbind服务
#安装nfs-utils软件,启动rpcbind服务必须是root用户来操作,否则会报错
[root@server1 ~]# yum install nfs-utils -y
[root@server1 ~]# systemctl start rpcbind #rpcbind这个服务在安装完nfs-utils软件之后,会自动启动,所以这步可做也可不做。但是重启系统之后,这步必须做(因为重启系统之后,rpcbind服务就会关闭)
#server2端,server3端的操作同server1端
<3>在server1端(nfs网络文件系统的服务端):启动nfs服务,编写共享目录的文件,并进行刷新
启动nfs服务,编写共享目录的文件,进行刷新必须由root用户来操作,否则会报错
[root@server1 ~]# systemctl start nfs #开启nfs服务前。必须开启rpcbind服务
[root@server1 ~]# vim /etc/exports
/home/hadoop *(rw,anonuid=1000,anongid=1000)
[root@server1 ~]# exportfs -rv
exporting *:/home/hadoop
<4>在server2和server3端(nfs网络文件系统的客户端):查看服务端IP共享的文件或目录,进行挂载,并进入共享目录
查看内容
查看服务端IP共享的文件或目录,进行挂载,必须由root用户来操作,否则会报错
[root@server2 ~]# showmount -e 172.25.83.1
Export list for 172.25.83.1:
/home/hadoop *
[root@server2 ~]# mount 172.25.83.1:/home/hadoop /home/hadoop
[root@server2 ~]# df
172.25.83.1:/home/hadoop 17811456 6520832 11290624 37% /home/hadoop
进入共享目录查看共享内容,可以由hadoop用户操作,因为挂载点是hadoop用户的家目录/home/hadoop目录
[root@server2 ~]# su - hadoop
[hadoop@server2 ~]$ ll
total 488256
lrwxrwxrwx 1 hadoop hadoop 12 Apr 4 11:50 hadoop -> hadoop-3.0.3
drwxr-xr-x 10 hadoop hadoop 188 Apr 9 22:28 hadoop-3.0.3
-rw-r--r-- 1 root root 314322972 Apr 4 11:47 hadoop-3.0.3.tar.gz
lrwxrwxrwx 1 hadoop hadoop 13 Apr 4 11:50 java -> jdk1.8.0_181/
drwxr-xr-x 7 hadoop hadoop 245 Jul 7 2018 jdk1.8.0_181
-rw-r--r-- 1 root root 185646832 Apr 4 11:47 jdk-8u181-linux-x64.tar.gz
#server3端的操作同server2
至此nfs网络文件系统也就部署好了。
2、配置hadoop:
配置server1(Namenode):当然也可以在server4端/server2端/server3端进行配置,因为配置了nfs网络文件系统,所以
在任何一端进行操作均可以。
<1>安装、部署Hadoop单机模式
具体步骤:https://mp.csdn.net/postedit/89017772
<2>设置ssh免密登陆(前提:安装ssh服务)—实际上因为server2和server3端的/home/hadoop目录中的内容是通过
server1端共享出来的,因此,实际上已经实现了server1,server2和server3之间的免密,而server1—>server1之
间的免密是上篇博文部署伪分布模式时配置好的。
- 验证普通用户hadoop:server1,server2,和server3之间的免密
<3>修改配置文件
(1)指定Namenode的地址
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim core-site.xml
19 <configuration>
20 <property>
21 <name>fs.defaultFS</name>
22 <value>hdfs://172.25.83.1:9000</value> #将上篇博文中的localhost该为172.25.83.1
23 </property>
24 </configuration>
(2)指定Datanode地址以及hdfs保存数据的副本数量(hdfs保存数据的副本数默认是3,因为这里只有两个Datanode,所以这里设定为2)
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim workers
172.25.83.2
172.25.83.3
[hadoop@server1 hadoop]$ vim hdfs-site.xml
19 <configuration>
20 <property>
21 <name>dfs.replication</name>
22 <value>2</value> #将上篇博文设置的1改为2
23 </property>
<4>格式化元数据节点(Namenode)
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ ll /tmp/
total 4
drwxrwxr-x 3 hadoop hadoop 17 Apr 10 11:35 hadoop-hadoop
-rw-rw-r-- 1 hadoop hadoop 5 Apr 10 11:35 hadoop-hadoop-namenode.pid
drwxr-xr-x 2 hadoop hadoop 6 Apr 10 11:35 hsperfdata_hadoop
示图:格式化后生成的文件
<5>开启dfs
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
<6>配置环境变量,以使得jps命令生效—实际上因为server2和server3端的/home/hadoop目录中的内容是通过
server1端共享出来的,而server1端之前在家目录下配置过环境变量,所以server2和server3端的环境变量实际上已经配
置好,这里不需要再配置
<7>jps命令查看java进程,并在Namemode节点(server2和server3端)查看java进程
[hadoop@server1 hadoop]$ jps
10451 Jps
9990 SecondaryNameNode
9789 NameNode
[hadoop@server2 ~]$ jps
10890 Jps
10813 DataNode
[hadoop@server3 ~]$ jps
2825 DataNode
2891 Jps
[hadoop@server1 hadoop]$ ps ax
9789 ? Sl 0:10 /home/hadoop/java/bin/java -Dproc_namenode -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullA
9990 ? Sl 0:06 /home/hadoop/java/bin/java -Dproc_secondarynamenode -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=I
[hadoop@server2 ~]$ ps ax
10813 ? Sl 0:14 /home/hadoop/java/bin/java -Dproc_datanode -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=ERROR
[hadoop@server3 ~]$ ps ax
2825 ? Sl 0:16 /home/hadoop/java/bin/java -Dproc_datanode -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=ERROR
- 查看server1端的9870端口是否已经打开
浏览器访问:http://172.25.83.1:9870
选择Datanodes
上面浏览器中的内容,也可以通过命令在终端中进行显示
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report
Configured Capacity: 36477861888 (33.97 GB)
Present Capacity: 33874124800 (31.55 GB)
DFS Remaining: 33874108416 (31.55 GB)
DFS Used: 16384 (16 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 172.25.83.2:9866 (server2)
Hostname: server2
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 1275748352 (1.19 GB)
DFS Remaining: 16963174400 (15.80 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.01%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Apr 10 11:48:39 CST 2019
Last Block Report: Wed Apr 10 11:37:40 CST 2019
Name: 172.25.83.3:9866 (server3)
Hostname: server3
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 1327988736 (1.24 GB)
DFS Remaining: 16910934016 (15.75 GB)
DFS Used%: 0.00%
DFS Remaining%: 92.72%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Apr 10 11:48:39 CST 2019
Last Block Report: Wed Apr 10 11:37:40 CST 2019
<8>进行测试:
<1>第一步:创建目录并上传input
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir -p /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir input
[hadoop@server1 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input
刷新浏览器并查看
选择Utilities——>Browse the file system
选择user
选择hadoop
选择input
<2>第二步:执行执行hadoop自带实例
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output 'dfs[a-z.]+'
刷新浏览器并查看
点击output
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2019-04-10 11:59 input
drwxr-xr-x - hadoop supergroup 0 2019-04-10 12:02 output
[hadoop@server1 hadoop]$ bin/hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication
#当然也可以下载到本地进行查看
[hadoop@server1 hadoop]$ bin/hdfs dfs -get output
[hadoop@server1 hadoop]$ ll -d output/
drwxr-xr-x 2 hadoop hadoop 42 Apr 10 12:06 output/
[hadoop@server1 hadoop]$ cat output/*
1 dfsadmin
1 dfs.replication
[hadoop@server1 hadoop]$ rm -rf output/ #查看之后,删除output目录,以清空本次实验的结果,进行后续实验