搭建4个节点的完全分布式Hadoop集群--hadoop3.2.0+jdk1.8

本文详细介绍搭建4个节点的完全分布式Hadoop集群的方法,Linux系统版本是CentOS 7,Hadoop版本是3.2.0,JDK版本是1.8。

一、准备环境

  1. 在VMware workstations上创建4个Linux虚拟机,并配置其静态IP。

有关【创建Linux虚拟机及配置网络】,请参考https://www.cnblogs.com/shireenlee4testing/p/9469855.html
2. 配置DNS(每个节点)
编辑配置文件,添加主节点和从节点的映射关系。

#vim /etc/hosts
192.168.44.3 hadoop01
192.168.44.4 hadoop02
192.168.44.5 hadoop03
192.168.44.6 hadoop04
3. 关闭防火墙(每个节点)

#关闭服务
[root@hadoop01 opt]# systemctl stop firewalld
#关闭开机自启动
[root@hadoop01 opt]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
4. 配置免密码登录
有关【配置免密码登录方法】,请参考
https://www.cnblogs.com/shireenlee4testing/p/10366061.html
5. 配置Java环境(每个节点)
有关【配置java环境方法】,请参考
https://www.cnblogs.com/shireenlee4testing/p/10368961.html

二、搭建Hadoop完全分布式集群

在各个节点上安装与配置Hadoop的过程都基本相同,因此可以在每个节点上安装好Hadoop后,在主节点master上进行统一配置,然后通过scp 命令将修改的配置文件拷贝到各个从节点上即可。

  1. 下载Hadoop安装包,解压,配置Hadoop环境变量

有关【Hadoop安装包下载方法】,请参考
https://www.cnblogs.com/shireenlee4testing/p/10365692.html

本文下载的Hadoop版本是3.2.0,指定一个目录(比如:/opt),使用rz命令上传Hadoop安装包到Linux系统,解压到指定目录,配置Hadoop环境变量,并使其生效。实现命令如下:

复制代码
#解压到/opt目录
[root@hadoop01 opt]# tar -zxvf hadoop-3.2.0.tar.gz
#链接/opt/hadoop-3.2.0到/opt/hadoop,方便后续配置
[root@hadoop01 opt] #ln -s hadoop-3.2.0 hadoop

#配置Hadoop环境变量
[root@hadoop01 opt]# vim /etc/profile
#Hadoop
export HADOOP_HOME=/opt/hadoop # 该目录为解压安装目录
export PATH= P A T H : PATH: HADOOP_HOME/bin
export PATH= P A T H : PATH: HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

#保存后,使profile生效
[root@hadoop01 opt]# source /etc/profile
复制代码
2. 配置Hadoop环境脚本文件中的JAVA_HOME参数

复制代码
#进入Hadoop安装目录下的etc/hadoop目录
[root@hadoop01 ~]#cd /opt/hadoop/etc/hadoop

#分别在hadoop-env.shmapred-env.sh、yarn-env.sh文件中添加或修改如下参数:
[root@hadoop01 hadoop]# vim hadoop-env.sh
[root@hadoop01 hadoop]# vim mapred-env.sh
[root@hadoop01 hadoop]# vim yarn-env.sh

export JAVA_HOME="/opt/jdk" # 路径为jdk安装路径

#验证Hadoop配置是否生效
[root@hadoop01 hadoop]# hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /opt/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
复制代码
3. 修改Hadoop配置文件

Hadoop安装目录下的etc/hadoop目录中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers文件,根据实际情况修改配置信息。

(1)core-site.xml (配置Common组件属性)

<configuration>
  <property>
      <!-- 配置hdfs地址 -->
      <name>fs.defaultFS</name>
      <value>hdfs://hadoop01:9000</value>
  </property>
  <property>
      <!-- 保存临时文件目录,需先在/opt/hadoop下创建tmp目录 -->
      <name>hadoop.tmp.dir</name>
     <value>/opt/hadoop/tmp</value>
 </property>
 </configuration>

(2)hdfs-site.xml (配置HDFS组件属性)

<configuration>
      <property>
         <!-- 主节点地址 -->
          <name>dfs.namenode.http-address</name>
          <value>hadoop01:50070</value>
      </property>
      <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:/opt/hadoop/dfs/name</value>
     </property>
     <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:/opt/hadoop/dfs/data</value>
     </property>
     <property>
        <!-- 备份数为默认值3 -->
        <name>dfs.replication</name>
         <value>3</value>
     </property>
        <property> 
      <name>dfs.webhdfs.enabled</name> 
      <value>true</value> 
     </property>

     <property>
      <name>dfs.permissions</name>
      <value>false</value>
      <description>配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除.</description>
     </property>

 </configuration>

(3)mapred-site.xml (配置Map-Reduce组件属性)

 <configuration>
 
      <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value> 
          <!--设置MapReduce的运行平台为yarn-->
      </property>

</configuration>

(4)yarn-site.xml(配置资源调度属性)

 <configuration>
	  <property>
         <name>yarn.resourcemanager.hostname</name> 
         <!--指定yarn的ResourceManager管理界面的地址,不配的话,Active Node始终为0-->
         <value>hadoop01</value>
 	   </property>
     
     <property>
         <name>yarn.nodemanager.aux-services</name> 
         <!-- #reducer获取数据的方式-->
         <value>mapreduce_shuffle</value>
     </property>
    
	<property>
		<name>yarn.nodemanager.vmem-check-enabled</name>
		<value>false</value>
		<description>忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。</description>
	</property>

</configuration>

(5)workers文件

#增加从节点地址(若配置了hosts,可直接使用主机名,亦可用IP地址)
[root@hadoop01 hadoop]# vim workers
hadoop02
hadoop03
hadoop04
4. 将配置好的文件夹拷贝到其他从节点

复制代码
[root@hadoop01 hadoop]# scp -r /opt/hadoop-3.2.0 root@hadoop02:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop-3.2.0 root@hadoop03:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop-3.2.0 root@hadoop04:/opt/

[root@hadoop01 hadoop]# scp -r /opt/hadoop root@hadoop02:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop root@hadoop03:/opt/
[root@hadoop01 hadoop]# scp -r /opt/hadoop root@hadoop04:/opt/
复制代码
5. 配置启动脚本,添加HDFS和Yarn权限

复制代码
添加HDFS权限:编辑如下脚本,在第二行空白位置添加HDFS权限
[root@hadoop01 sbin]# vim sbin/start-dfs.sh
[root@hadoop01 sbin]# vim sbin/stop-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
复制代码
复制代码
添加Yarn权限:编辑如下脚本,在第二行空白位置添加Yarn权限

[root@hadoop01 sbin]# vim sbin/start-yarn.sh
[root@hadoop01 sbin]# vim sbin/stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=yarn
YARN_NODEMANAGER_USER=root
复制代码
注意:若不添加上述权限,则会报错:缺少用户权限定义所致。

复制代码
ERROR: Attempting to launch hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting launch.
Starting datanodes
ERROR: Attempting to launch hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting launch.
Starting secondary namenodes [localhost.localdomain]
ERROR: Attempting to launch hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting launch.
复制代码
6. 初始化 & 启动

复制代码
#格式化
[root@hadoop01 hadoop-3.2.0]# bin/hdfs namenode -format

#启动(两种方式均可启动)
方法一:
[root@hadoop01 hadoop-3.2.0]# sbin/start-all.sh

方法二:
[root@hadoop01 hadoop-3.2.0]# sbin/start-dfs.sh
[root@hadoop01 hadoop-3.2.0]# sbin/start-yarn.sh
复制代码
7. 验证Hadoop启动成功

复制代码
#主节点
[root@hadoop01 sbin]# jps
11329 NameNode
11831 ResourceManager
11592 SecondaryNameNode
12186 Jps
#从节点
[root@hadoop02 hadoop]# jps
5152 SecondaryNameNode
5085 DataNode
5245 NodeManager
5357 Jps

[root@hadoop03 opt]# jps
5080 DataNode
5178 NodeManager
5278 Jps

[root@hadoop04 opt]# jps
5090 NodeManager
5190 Jps
4991 DataNode
复制代码
8. Web端口访问

注:先开放端口或直接关闭防火墙

查看防火墙状态

firewall-cmd --state

临时关闭

systemctl stop firewalld

禁止开机启动

systemctl disable firewalld
在浏览器输入:http://hadoop01:8088打开ResourceManager页面。

在浏览器输入:http://hadoop01:50070打开Hadoop Namenode页面。

注意:如果输入主节点名称无法打开Web页面,则需要配置Windows上的hosts,路径如下:

C:\Windows\System32\drivers\etc\hosts

192.168.44.3 hadooop01
192.168.44.4 hadooop02
192.168.44.5 hadooop03
192.168.44.6 hadooop04

【版权说明】
上文全出自:
https://www.cnblogs.com/shireenlee4testing/p/10472018.html
纯属抄袭!
但经本人验证、改进后,步骤可靠,良心作品,值得记录。
如下、个人部署:

Linux操作系统hadoop3.2.0部署步骤:

说明:以trs普通用户身份登录服务器178-181进行部署

【服务器178】
一、准备安装包:hadoop-3.2.0.tar.gz和jdk-8u11-linux-x64.tar.gz
	安装包路径:10.50.144.178服务器上
	hadoop安装包:/home/trs/Hadoop/hadoop-3.2.0.tar.gz
	jdk安装包:/home/trs/Hadoop/jdk-8u11-linux-x64.tar.gz

二、安装jdk
1. 解压到路径:/home/trs/Hadoop/下,
	cd到/home/trs/Hadoop/下(cd /home/trs/Hadoop/),再使用下面代码进行解压:
	tar -zxvf jdk-8u11-linux-x64.tar.gz -C /home/trs/Hadoop
	此时,/home/trs/Hadoop目录下生成了jdk1.8.0_11文件夹
	
2. 配置普通用户环境变量
	运行:vim ~/.bash_profile
	进入编辑该文件,内容如下:
	if [ -f ~/.bashrc ]; then
        . ~/.bashrc
	fi

	export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
	export JRE_HOME=/home/trs/Hadoop/jdk1.8.0_11/jre
	export CLASSPATH=.:$CLASSPATH:${JAVA_HOME}/lib:$JRE_HOME/lib
	export PATH=$JAVA_HOME/bin:$PATH:$JRE_HOME/bin
	
3.测试jdk是否配置成功
	#刷新上述配置,使配置立即生效:
	source ~/.bash_profile	
	再运行java、javac命令,若出现许多说明,则配置成功
	
三、安装单机版hadoop
1. 解压到路径:/home/trs/Hadoop/下,
	cd到/home/trs/Hadoop/下(cd /home/trs/Hadoop/),再使用下面代码进行解压:
	tar -zxvf hadoop-3.2.0.tar.gz -C /home/trs/Hadoop
	此时,/home/trs/Hadoop目录下生成了hadoop-3.2.0文件夹

2. 配置hadoop的jdk
	vi /home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/hadoop-env.sh
	添加如下内容:
	export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
	
3.配置配置普通用户环境变量
	运行:vim ~/.bash_profile
	进入编辑该文件,在底部添加如下内容:
	
	export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
	export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
	
4.测试hadoop是否配置成功
	#刷新上述配置,使配置立即生效:
	source ~/.bash_profile	
	再运行:hadoop version
	若出现版本信息,则配置成功

5. 小测试,
	5.1 新建一个input文件夹,并将etc/hadoop文件夹下的所有xml文件拷贝到该文件夹中,指令如下
		mkdir -p ~/resource/input
		cp /home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/*.xml ~/resource/input
	5.2 利用mapReduce方法从input文件夹下的文件中查找符合条件的字符串,并将结果保存到output文件夹中。
		/home/trs/Hadoop/hadoop-3.2.0/bin/hadoop jar /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar grep ~/resource/input ~/resource/output 'dfs[a-z.]+' 
	5.3 查看结果:
		cat ~/resource/output/*
		
	5.4 正常配置后,应是如下结果:
		trs@node178:~> cat ~/resource/output/*
		1	dfsadmin
		1	dfs.replication
		trs@node178:~> 
		
【注释】:在配置普通用户的环境变量时,也可使用下面的方法:
		##########
		vim ~/.bashrc #设置用户环境参数,使得Hadoop的命令和java命令能够在根目录下使用
		在文件中添加:
		export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
		export JRE_HOME=${JAVA_HOME}/jre
		export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
		
		export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
		export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
		export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
		
		export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

		##########
		source ~/.bashrc #使修改生效
		
四、伪分布式hadoop安装(在三、单机版hadoop安装基础上进行)
1. 编辑/home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml,命令:
	vi /home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/core-site.xml
	内容修改如下:
	<configuration>
		<property>
			<name>fs.defaultFS</name>
			<value>hdfs://localhost:9000</value>
		</property>
	</configuration>
2. 编辑/home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/hdfs-site.xml,命令:
	vi /home/trs/Hadoop/hadoop-3.2.0/etc/hadoop/hdfs-site.xml
	内容修改如下:
	<configuration>
		<property>
			<name>dfs.replication</name>
			<value>1</value>
		</property>
	</configuration>
3. 免密码连接设置:
	3.1 命令:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
		效果如下:
		trs@node178:~/.ssh> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
		Generating public/private rsa key pair.
		Your identification has been saved in /home/trs/.ssh/id_rsa.
		Your public key has been saved in /home/trs/.ssh/id_rsa.pub.
		The key fingerprint is:
		SHA256:4f8+BlMzV0ELlScla7MLDoO9eKBtCsHB8oOXW28TVBQ trs@node178
		The key's randomart image is:
		+---[RSA 2048]----+
		|         .E. .+=+|
		|   .     .    o++|
		|  . o   o     ++.|
		|   = o o +  +..o |
		|  . B . S +..+.  |
		|   . = + =o= . . |
		|    o . B +o. .  |
		|     . + o .o    |
		|      .    oo.   |
		+----[SHA256]-----+
		trs@node178:~/.ssh> 
	3.2 命令:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
	3.3 命令:chmod 0600 ~/.ssh/authorized_keys
	3.4 命令:ssh node178
	3.5 正常结果:
			trs@node178:~/.ssh> ssh node178
			Last login: Wed May 22 10:01:42 2019 from 127.0.0.1
			trs@node178:~> 

4. 文件系统格式化
	4.1 运行命令:/home/trs/Hadoop/hadoop-3.2.0/bin/hdfs namenode -format
	4.2 运行命令:/home/trs/Hadoop/hadoop-3.2.0/sbin/start-dfs.sh
	4.3 运行命令:jps
	4.4 正常结果:
		trs@node178:~> /home/trs/Hadoop/hadoop-3.2.0/sbin/start-dfs.sh
		Starting namenodes on [localhost]
		Starting datanodes
		Starting secondary namenodes [node178]
		trs@node178:~> jps
		17344 DataNode
		17649 SecondaryNameNode
		17873 Jps
		17178 NameNode
		trs@node178:~> 
	4.5 访问NameNode 网站:
		http://localhost:9870/
		
5. 生成执行MapReduce任务的目录,同样的用前文单机hadoop测试的方法。
	5.1 命令:/home/trs/Hadoop/hadoop-3.2.0/bin/hdfs dfs -mkdir -p /user/demo/input
		说明:用hdfs命令在hadoop内部文件系统dfs里的根目录下创建/user目录
	5.2 命令:hdfs dfs -put ~/resource/input/*.xml /user/demo/input
		说明:将本地~/resource/input/*.xml文件上传到hdfs文件系统的/user/demo/input路径下
	5.3 命令:/home/trs/Hadoop/hadoop-3.2.0/bin/hadoop jar /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar grep /user/demo/input output 'dfs[a-z.]+'
		说明:从dfs的/user/demo/input路径的文件里,找出dfs开头的文件,并将结果放到dfs里面的output目录下
	5.4 将输出文件从分布式文件系统拷贝到本地文件系统查看
		命令:hdfs dfs -get output output
		说明:把hdfs的output文件夹拷贝到本地当前路径下,新建一个output文件夹内
	5.5 查看输出文件:
		命令:cat output/*
	5.6 正常结果:
		trs@node178:~/resource/output> cat output/*
		1	dfsadmin
		1	dfs.replication
		trs@node178:~/resource/output> 
	5.7 在分布式文件系统上查看输出文件
		命令:hdfs dfs -cat output/*
		正常结果:
		trs@node178:~/resource/output> hdfs dfs -cat output/*
		1	dfsadmin
		1	dfs.replication
		trs@node178:~/resource/output>
		
6. 完成操作,停止守护进程
	命令:/home/trs/Hadoop/hadoop-3.2.0/sbin/stop-dfs.sh 
	
7. 关于yarn,待续……
		
五、hadoop完全分布式集群搭建
	在各个节点上安装与配置Hadoop的过程都基本相同,因此可以在每个节点上安装好Hadoop后,在主节点master上进行统一配置,
	然后通过scp 命令将修改的配置文件拷贝到各个从节点上即可。
1. 配置Hadoop环境脚本文件中的JAVA_HOME参数,路径:/home/trs/Hadoop/hadoop-3.2.0/etc/hadoop
	1.1 分别在hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中添加或修改如下参数:
		export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
	1.2 验证Hadoop配置是否生效
		命令:hadoop version
		正常结果:
		trs@node178:~/Hadoop/hadoop-3.2.0/etc/hadoop> hadoop version
		Hadoop 3.2.0
		Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
		Compiled by sunilg on 2019-01-08T06:08Z
		Compiled with protoc 2.5.0
		From source with checksum d3f0795ed0d9dc378e2c785d3668f39
		This command was run using /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
		trs@node178:~/Hadoop/hadoop-3.2.0/etc/hadoop> 
		
2. 进一步修改Hadoop配置文件
	Hadoop安装目录下的etc/hadoop目录中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers文件,根据实际情况修改配置信息。
	
	2.1 core-site.xml文件
	<configuration>
		<property>
			<name>fs.defaultFS</name>
			<value>hdfs://node178:9000</value>
		</property>
		<property>
			<!-- 保存临时文件目录,需先在/home/trs/Hadoop/hadoop-3.2.0/下创建tmp目录 -->
			<name>hadoop.tmp.dir</name>
			<value>/home/trs/Hadoop/hadoop-3.2.0/tmp</value>
		</property>
	</configuration>
	
	2.2 hdfs-site.xml文件
	<configuration>
	
      <property>
         <!-- 主节点地址 -->
          <name>dfs.namenode.http-address</name>
          <value>node178:50070</value>
      </property>
	  
      <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:/home/trs/Hadoop/hadoop-3.2.0/dfs/name</value>
      </property>
	  
     <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:/home/trs/Hadoop/hadoop-3.2.0/dfs/data</value>
     </property>
	 
     <property>
        <!-- 备份数为默认值3 -->
        <name>dfs.replication</name>
        <value>3</value>
     </property>
	 
     <property> 
    <name>dfs.webhdfs.enabled</name> 
    <value>true</value> 
  </property>

  <property>
    <name>dfs.permissions</name>
    <value>false</value>
    <description>配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除.</description>
  </property>

	</configuration>
	
	2.3 mapred-site.xml文件
	<configuration>
      <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value> 
		  <!--#设置MapReduce的运行平台为yarn-->
      </property>
	 
	</configuration>
	
	2.4 yarn-site.xml文件
	<configuration>
	
	<property>
         <name>yarn.resourcemanager.hostname</name> #指定yarn的ResourceManager管理界面的地址,不配的话,Active Node始终为0
         <value>node178</value>
    </property>
	 
     <property>
         <name>yarn.nodemanager.aux-services</name>  #reducer获取数据的方式
         <value>mapreduce_shuffle</value>
     </property>
	
	<property>
	<name>yarn.nodemanager.vmem-check-enabled</name>
	<value>false</value>
	<description>忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。</description>
	</property>
	
	</configuration>
	2.5 workers文件,内容如下:
		node179
		node180
		node181

3.将配置好的文件夹拷贝到其他从节点【3.0 先配置各机器免密登录ssh,3.1 以及master(node178)免密登录slave1(node179)、slave2(node180)、slave3(node181)】
	3.1 node178免密码登录node179、node180、node181
		a. node179 设计
			将authorized_keys文件拷到node179节点,并将该节点(node179节点)的ssh密钥id_rsa.pub加入该文件中。
			a.1 命令:scp ~/.ssh/authorized_keys trs@node179:/home/trs/.ssh/
			a.2 正常效果:
					trs@node178:~> scp ~/.ssh/authorized_keys trs@node179:/home/trs/.ssh/
					Password: 
					authorized_keys                                                                                             100%  393     0.4KB/s      
					trs@node178:~>
			a.3 转往node179完成以下操作:将node179的id_rsa.pub追加到authorized_keys
				trs@node179:~/.ssh> ls
				authorized_keys  id_rsa  id_rsa.pub  known_hosts
				trs@node179:~/.ssh> cat id_rsa.pub >> authorized_keys 
				trs@node179:~/.ssh> 
			a.4 回到node178,验证是否免密登录
				trs@node178:~> ssh node179
				Last login: Thu May 23 10:25:36 2019 from 10.50.144.179
				trs@node179:~> exit
				logout
				Connection to node179 closed.
				trs@node178:~> 
			a.5 免密连接设置成功!
		b. node180 设计
			将authorized_keys文件拷到node180节点,并将该节点(node180节点)的ssh密钥id_rsa.pub加入该文件中。
			b.1 命令:scp ~/.ssh/authorized_keys trs@node180:/home/trs/.ssh/
			b.2 正常效果:
					trs@node178:~> scp ~/.ssh/authorized_keys trs@node180:/home/trs/.ssh/
					Password: 
					authorized_keys                                                                                             100%  393     0.4KB/s      
					trs@node178:~>
			b.3 转往node180完成以下操作:将node180的id_rsa.pub追加到authorized_keys
				trs@node180:~/.ssh> ls
				authorized_keys  id_rsa  id_rsa.pub  known_hosts
				trs@node180:~/.ssh> cat id_rsa.pub >> authorized_keys 
				trs@node180:~/.ssh> 
			b.4 回到node178,验证是否免密登录
				trs@node178:~> ssh node180
				Last login: Thu May 23 10:25:36 2019 from 10.50.144.180
				trs@node180:~> exit
				logout
				Connection to node180 closed.
				trs@node178:~> 
			b.5 免密连接设置成功!
		c. node181 设计
			将authorized_keys文件拷到node179节点,并将该节点(node179节点)的ssh密钥id_rsa.pub加入该文件中。
			c.1 命令:scp ~/.ssh/authorized_keys trs@node181:/home/trs/.ssh/
			c.2 正常效果:
					trs@node178:~> scp ~/.ssh/authorized_keys trs@node181:/home/trs/.ssh/
					Password: 
					authorized_keys                                                                                             100%  393     0.4KB/s      
					trs@node178:~>
			c.3 转往node181完成以下操作:将node181的id_rsa.pub追加到authorized_keys
				trs@node181:~/.ssh> ls
				authorized_keys  id_rsa  id_rsa.pub  known_hosts
				trs@node181:~/.ssh> cat id_rsa.pub >> authorized_keys 
				trs@node181:~/.ssh> 
			c.4 回到node178,验证是否免密登录
				trs@node178:~> ssh node181
				Last login: Thu May 23 10:25:36 2019 from 10.50.144.181
				trs@node181:~> exit
				logout
				Connection to node181 closed.
				trs@node178:~> 
			c.5 免密连接设置成功!
		
	3.2 将配置好的文件夹拷贝到node179、node180、node181
		a.命令:scp -r /home/trs/Hadoop trs@node179:/home/trs/
			说明:将整个Hadoop目录拷贝到node179的trs用户目录下,包括子目录、文件
		b.命令:scp -r /home/trs/Hadoop trs@node180:/home/trs/
			说明:将整个Hadoop目录拷贝到node180的trs用户目录下,包括子目录、文件
		c.命令:scp -r /home/trs/Hadoop trs@node181:/home/trs/
			说明:将整个Hadoop目录拷贝到node181的trs用户目录下,包括子目录、文件
	
	3.3 转去node179、node180、node181分别配置普通用户环境变量,详情看【node179、node180、node181部分】
		
	
4. 运行,测试
	4.1 命令:hdfs namenode -format
		说明:格式化,将会在临时文件夹/home/trs/Hadoop/hadoop-3.2.0/tmp下创建临时文件,避免每次都需要格式化
	4.2 命令:/home/trs/Hadoop/hadoop-3.2.0/sbin/start-all.sh
		说明:启动hadoop的dfs、yarn,相当于start-dfs.sh、start-yarn.sh分别启动
	4.3 验证Hadoop启动成功
		a. 主节点node178
			命令:jps
			效果如下:
			trs@node178:~/Hadoop/hadoop-3.2.0/sbin> ./start-all.sh 
			WARNING: Attempting to start all Apache Hadoop daemons as trs in 10 seconds.
			WARNING: This is not a recommended production deployment configuration.
			WARNING: Use CTRL-C to abort.
			Starting namenodes on [localhost]
			Starting datanodes
			Starting secondary namenodes [node178]
			Starting resourcemanager
			Starting nodemanagers
			trs@node178:~/Hadoop/hadoop-3.2.0/sbin> jps
			7028 ResourceManager
			6663 SecondaryNameNode
			6347 NameNode
			7420 Jps
			trs@node178:~/Hadoop/hadoop-3.2.0/sbin> 
		b. 从节点node179
			命令:jps
			效果如下:
			trs@node179:~> jps
			838 DataNode
			1286 Jps
			1049 NodeManager
			trs@node179:~> 
		c. 从节点node180
			命令:jps
			效果如下:
			trs@node180:~> jps
			8689 DataNode
			8868 NodeManager
			9083 Jps
			trs@node180:~> 
		d. 从节点node181
			命令:jps
			效果如下:
			trs@node181:~> jps
			11765 DataNode
			11944 NodeManager
			12203 Jps
			trs@node181:~> 
		
	4.4 部署成功!说明:
		【注意点】:a. 只需要在主节点上使用start-all.sh命令就可以启动hadoop,关闭使用stop-all.sh命令;
					b. 关于改进上述配置,sbin路径加入到普通用户环境变量下,方便使用
					c. 关于具体功能,再详细改进,后续给出说明
					d. 待续……
			
六、交付测试,所需文件wc.jar  a.txt  b.txt
	其中,wc.jar在/home/trs/resource路径下;a.txt和b.txt 在/home/trs/resource/test路径下
	1、hadoop fs -put /home/trs/resource/test /testwc1
	2、hadoop jar wc.jar com.trs.hadoop.mr.black.WCRunner /testwc1 /testwc1r
	3、hadoop fs -cat /testwc1r/part-r-00000 
	
	|grep -v 4$ > /home/trs/vdb20181111/vdbids.txt			
	
【路径:/home/trs/Hadoop/hadoop-3.2.0】


【服务器179】
一、

1. ssh免密码连接设置:
	1.1 命令:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
		效果如下:
		trs@node179:~/.ssh> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
		Generating public/private rsa key pair.
		Your identification has been saved in /home/trs/.ssh/id_rsa.
		Your public key has been saved in /home/trs/.ssh/id_rsa.pub.
		The key fingerprint is:
		SHA256:6JOV5JlzLx/N25EGsWDjDV+ZUlRpk8N+D6sYA7T8zEk trs@node179
		The key's randomart image is:
		+---[RSA 2048]----+
		|              oo=|
		|        .     .Bo|
		|       o.. = oo+o|
		|       +++E * *..|
		|      . S*.o = oo|
		|     . o oB. oo o|
		|      +   .+o.o+ |
		|       .  .o...o.|
		|            . . .|
		+----[SHA256]-----+
		trs@node179:~/.ssh> 
	1.2 命令:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
	1.3 命令:chmod 0600 ~/.ssh/authorized_keys
	1.4 命令:ssh node179
	1.5 正常结果:
			trs@node179:~> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
			trs@node179:~> chmod 0600 ~/.ssh/authorized_keys
			trs@node179:~> ssh node179
			The authenticity of host 'node179 (10.50.144.179)' can't be established.
			ECDSA key fingerprint is SHA256:5IMaWPLLwYck4zMXQFt4kU3JrGtdSdmwYSprlxGC2M8.
			Are you sure you want to continue connecting (yes/no)? yes
			Warning: Permanently added 'node179,10.50.144.179' (ECDSA) to the list of known hosts.
			Last login: Thu May 23 10:23:28 2019 from 10.75.12.2
			trs@node179:~> exit
			logout
			Connection to node179 closed.
			trs@node179:~> ssh node179
			Last login: Thu May 23 10:25:22 2019 from 10.50.144.179
			trs@node179:~>

2. 配置普通用户环境变量
	2.1
		##########
		vim ~/.bashrc #设置用户环境参数,使得Hadoop的命令和java命令能够在根目录下使用
		在文件中添加:
		export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
		export JRE_HOME=${JAVA_HOME}/jre
		export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
		
		export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
		export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
		export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
		
		export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
		##########
	2.2
		source ~/.bashrc #使修改生效
	2.3
		正常效果:
		trs@node179:~> java -version
		java version "1.8.0_11"
		Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
		Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
		trs@node179:~> hadoop version
		Hadoop 3.2.0
		Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
		Compiled by sunilg on 2019-01-08T06:08Z
		Compiled with protoc 2.5.0
		From source with checksum d3f0795ed0d9dc378e2c785d3668f39
		This command was run using /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
		trs@node179:~> 
		

		

【服务器180】
一 、
1. ssh免密码连接设置:
	1.1 命令:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
		效果如下:
		trs@node180:~> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
		Generating public/private rsa key pair.
		Your identification has been saved in /home/trs/.ssh/id_rsa.
		Your public key has been saved in /home/trs/.ssh/id_rsa.pub.
		The key fingerprint is:
		SHA256:VQMa30cayaBwe0gnhN08wE5W0ib5eTU90A4ucZKsubU trs@node180
		The key's randomart image is:
		+---[RSA 2048]----+
		|      .=OB*+=o+. |
		|      .+B@*BoBoo.|
		|       +=+*oB.+..|
		|        .=oo.o . |
		|        S o.o    |
		|         . E     |
		|                 |
		|                 |
		|                 |
		+----[SHA256]-----+
		trs@node180:~> 
	1.2 命令:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
	1.3 命令:chmod 0600 ~/.ssh/authorized_keys
	1.4 命令:ssh node180
	1.5 正常结果:
			trs@node180:~> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
			trs@node180:~> chmod 0600 ~/.ssh/authorized_keys
			trs@node180:~> ssh node180
			Last login: Thu May 23 10:35:00 2019 from 10.50.144.180
			trs@node180:~> 
			
2. 配置普通用户环境变量
	2.1
		##########
		vim ~/.bashrc #设置用户环境参数,使得Hadoop的命令和java命令能够在根目录下使用
		在文件中添加:
		export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
		export JRE_HOME=${JAVA_HOME}/jre
		export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
		
		export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
		export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
		export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
		
		export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
		##########
	2.2
		source ~/.bashrc #使修改生效
	2.3
		正常效果:
		trs@node180:~> java -version
		java version "1.8.0_11"
		Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
		Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
		trs@node180:~> hadoop version
		Hadoop 3.2.0
		Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
		Compiled by sunilg on 2019-01-08T06:08Z
		Compiled with protoc 2.5.0
		From source with checksum d3f0795ed0d9dc378e2c785d3668f39
		This command was run using /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
		trs@node180:~> 

【服务器181】
一 、
1. ssh免密码连接设置:
	1.1 命令:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
		效果如下:
		trs@node181:~> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
		Generating public/private rsa key pair.
		Your identification has been saved in /home/trs/.ssh/id_rsa.
		Your public key has been saved in /home/trs/.ssh/id_rsa.pub.
		The key fingerprint is:
		SHA256:cECieGX7F/xaMlcsvezNkoBu5XFq24Lj4bDnYRaz/CM trs@node181
		The key's randomart image is:
		+---[RSA 2048]----+
		|    +.o          |
		| . + o o   o     |
		|. o . . + . +    |
		| .   . o + + .   |
		|      . S B +    |
		|       + % * +   |
		|      . @.+ + o  |
		|       BE=oo .   |
		|      .++ooo.    |
		+----[SHA256]-----+
		trs@node181:~>  
	1.2 命令:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
	1.3 命令:chmod 0600 ~/.ssh/authorized_keys
	1.4 命令:ssh node181
	1.5 正常结果:
			trs@node181:~> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
			trs@node181:~> chmod 0600 ~/.ssh/authorized_keys
			trs@node181:~> ssh node181
			Last login: Thu May 23 10:42:31 2019 from 10.50.144.181
			trs@node181:~> 
 
2. 配置普通用户环境变量
	2.1
		##########
		vim ~/.bashrc #设置用户环境参数,使得Hadoop的命令和java命令能够在根目录下使用
		在文件中添加:
		export JAVA_HOME=/home/trs/Hadoop/jdk1.8.0_11
		export JRE_HOME=${JAVA_HOME}/jre
		export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
		
		export HADOOP_HOME=/home/trs/Hadoop/hadoop-3.2.0
		export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
		export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
		
		export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
		##########
	2.2
		source ~/.bashrc #使修改生效
	2.3
		正常效果:
		trs@node181:~> java -version
		java version "1.8.0_11"
		Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
		Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
		trs@node181:~> hadoop version
		Hadoop 3.2.0
		Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
		Compiled by sunilg on 2019-01-08T06:08Z
		Compiled with protoc 2.5.0
		From source with checksum d3f0795ed0d9dc378e2c785d3668f39
		This command was run using /home/trs/Hadoop/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
		trs@node181:~> 




















发布了25 篇原创文章 · 获赞 4 · 访问量 6233

猜你喜欢

转载自blog.csdn.net/qq_41953807/article/details/90477448