[Big Data] Hadoop fully distributed configuration (super detailed)

overview

  1. Hadoop fully distributed configuration - the specific steps are as follows
默认前提:
1.在Windows平台下安装Vmware平台(默认已经安装)
2.在Vmware平台上安装Linux操作系统(默认已经安装)
这里我已安装的是 Vmware16 和 CentOS7.

1. Prepare Linux

  1. Network settings NAT mode
    Edit --> Virtual Network Editor --> Change settings (requires administrator privileges)
    insert image description here
    insert image description here

insert image description here

·桥接:选择桥接模式的话虚拟机和宿主机在网络上就是平级的关系,相当于连接在同一交换机上。
·NAT:NAT模式就是虚拟机要联网得先通过宿主机才能和外面进行通信。
·仅主机:虚拟机与宿主机直接连起来

如果是通过网线部署在不同的电脑上,应该选择桥接模式。
如果是选择NAT模式来用于设置本机局域网,则IP也要设置。

insert image description here

  1. set static IP

    ·命令1:vi /etc/sysconfig/network-scripts/ifcfg-ens33
    ·命令2:service network restart	#重启网络,使上面的设置生效
    ·命令3:ping www.baidu.com #可测试固定IP设置是否有效
    

    (1) Execute command 1 to modify the configuration file.

     首先,修改部分:BOOTPROTO=static
     其次,添加部分:HDDR=刚刚复制的数字
     				IPADDR=192.168.111.131 
     			   	GATEWAY=192.168.111.254
     			   	DNS1=8.8.8.8
    

    insert image description here

(2) Execute command 2, restart the network, and make the settings take effect

insert image description here

(3) Execute command 3 to test whether the fixed IP setting is valid
insert image description here

  1. Modify the host name

    ·命令1:vim /etc/hostname
    直接修改即可
    重启系统reboot,使设置生效
    

    Before modification:
    insert image description here
    After modification:insert image description here
    insert image description here

  2. turn off firewall

    ·永久性生效,重启后不会复原
    	开启:systemctl enable firewalld
    	关闭:systemctl disable firewalld
    ·即时生效,重启后复原
    	开启:systemctl start firewalld
    	关闭:systemctl stop firewalld
    ·查看防火墙运行状态:systemctl status firewalld
    

    Permanently close the firewall, reboot the system, and then use systemctl status firewalld to check the running status of the firewall.
    insert image description here
    After a permanent shutdown and reboot:
    insert image description here

2. Install JDK

默认已经在Windows上安装 Xshell 和 Xftp
下面我们将通过Xshell和Xftp进行操作
  1. First connect Xshell
    insert image description here
    insert image description here

  2. Use Xftp in Xshell to upload jdk in Windows to Linux system

    准备工作:在/opt文件夹下创建两个文件夹
    	cd /opt
    	mkdir module	#module存放解压后文件
    	mkdir source	#source存放原文件
    

    insert image description here

    insert image description here
    insert image description here

  3. unzip jdk

    ·命令:tar -zxvf jdk-8u181-linux-x64.tar.gz
    ·修改文件夹的名称:mv jdk1.8.0_181 jdk1.8
    ·移动到module文件夹中:mv jdk1.8 ../module
    

    After decompression: insert image description here
    After renaming:
    insert image description here
    After moving:
    insert image description here

  4. Configuration Environment

    方法1:修改.bash_profile文件
    ·vim ~/.bash_profile
    ·在.bash_profile文件末尾加入:
    	export JAVA_HOME=/opt/module/jdk1.8
    	export PATH=$JAVA_HOME/BIN:$PATH
    ·然后执行以下命令,使环境变量配置立即生效
    	source ~/.bash_profile
    
    方法2:修改/etc/profile文件
    ·vim /etc/profile
    ·在profile文件末尾加入:
    	export JAVA_HOME=/opt/module/jdk1.8
    	export PATH=$JAVA_HOME/bin:$PATH
    ·然后执行以下命令,使配置立即生效
    	source /etc/profile
    
    ·执行以下命令,查看java版本号:
    	java -version
    
    ·方法一:更为安全,它可以把使用这些环境变量的权限控制到用户级别,如果你需要给某个用户权限使用这些环境变量,你只需要修改其个人用户主目录下的.bash_profile文件就可以了。
    ·方法二:更为方便,因为所有用户的shell都有权使用这些环境变量,如果你的计算机仅仅作为开发使用时推荐使用这种方法,否则可能会给系统带来安全性问题。
    

    Here I am using method two:
    insert image description here
    insert image description here

3. Clone two virtual machines

前面我们已经完成了 虚拟机AY01 的准备工作和 JDK的安装。
下面我们将对该虚拟机进行克隆。

Right-click —> Manage —> Clone
and then clone:

insert image description here
​​Select “Create Full Clone”
insert image description here
to modify it yourself:
insert image description here
insert image description here
insert image description here
Then use the same operation to complete the clone of the third virtual machine.

克隆完成后,我们需要进行一些小操作:
·修改 虚拟机AY002 和 虚拟机AY003 的 主机名 和 IP地址
	修改主机名:vim /etc/hostname #直接改即可,新主机名在reboot重启后生效
	修改IP地址:vim /etc/sysconfig/network-scripts/ifcfg-ens33	#分别将AY002和AY003的IP地址改为192.168.111.132和192.168.111.133
·修改三台系统的hosts文件,使IP与主机名对应
	vim /etc/hosts
	内容如下:
	192.168.111.131 AY01
	192.168.111.132 AY02
	192.168.111.133 AY03

(1) Before modification:
insert image description here
insert image description here
AY002:
insert image description here

insert image description here
AY003:
insert image description here
insert image description here
After restarting:
insert image description here
insert image description here
(2) Modify the hosts files of the three systems (note that there are three systems, each with the same content). The
specific content is as shown in the figure below:
insert image description here
So far, we have completed the cloning part.

4. Password-free login

经过前面的操作,我们已经可以用Xshell来远程控制虚拟机AY02和虚拟机AY03,
下面我们的操作将在Xshell上进行(当然,也可以依然在Vmware平台上操作)
准备:首先要保证主机名、hosts、防火墙正确设置
  1. Configure the public key and private key of each node itself (that is, execute the following two commands on each node:)

    (1)进入家目录:cd ~
    (2)生成公钥和私钥:ssh-keygen -t rsa
    
    	·进入.ssh目录:cd .ssh
    执行上述指令,然后敲(三个回车),中间不要输入任何内容,在.ssh目录下就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)
    

    insert image description here
    insert image description here

  2. Send the generated keys to AY01, AY02, and AY03 respectively

    ·命令1:ssh-copy-id AY01
    ·命令2:ssh-copy-id AY02
    ·命令3:ssh-copy-id AY03
    

    insert image description hereinsert image description here
    Test it: (Success is shown in the figure below)
    insert image description here

  3. Repeat the above operations in AY02 and AY03 and
    the final result is as follows:
    AY02: insert image description here
    AY03:
    ![
    So far, we have completed the password-free login between the three systems.

5. Install Hadoop

  1. Use Xftp in Xshell to upload Hadoop-2.7.2 in Windows to the /opt/source folder of virtual machine AY01
    insert image description here

  2. Unzip the Hadoop-2.7.2 compressed package

    ·解压命令:tar -zxvf hadoop-2.7.2.tar.gz
    ·移动到/opt/module文件夹中:mv hadoop-2.7.2 ../module
    

    After unpacking:
    insert image description here
    After moving:
    insert image description here

  3. Configure Hadoop environment variables

    方法一:
    ·首先进入hadoop文件夹:
    	cd /opt/module/hadoop-2.7.2/etc/hadoop
    ·配置hadoop-env.sh 配置文件
    	1)修改 export JAVA_HOME=${JAVA_HOME} 为 export JAVA_HOME=/opt/module/jdk1.8
    	2)添加以下两条语句:
    		export HADOOP_HOME=/opt/module/hadoop-2.7.2
    		export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
    
    方法二:
    ·首先进入hadoop文件夹:
    	cd /opt/module/hadoop-2.7.2/etc/hadoop
    ·配置hadoop-env.sh 配置文件
    	1)修改 export JAVA_HOME=${JAVA_HOME} 为 export JAVA_HOME=/opt/module/jdk1.8
    ·其次修改/etc/profile文件,在jdk的配置语句后面添加以下两条语句:
    	export	HADOOP_HOME=/opt/module/hadoop-2.7.2
    	export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
    

    Here I use the second method:
    insert image description here
    insert image description here
    Here, we just completed the installation and environment configuration of hadoop in AY01. Next, we will improve the configuration file of Hadoop, and then copy the configured Hadoop to AY02 and AY03.

Cluster deployment:

AY01 AY02 AY03
HDFS
NameNode YARN SecondaryNameNode
DataNode DataNode DataNode
NodeManager NodeManager NodeManager
ResourceManager

6. Configure the Hadoop configuration file

一共需要配置7个文件,这7个文件都在/opt/module/hadoop-2.7.2/etc/hadoop 文件夹下。
把配置信息写在文件的<configuration></configuration>之间
  1. core-site.xml

    ·vim core-site.xml
    在该文件中的<configuration></configuration>之间复制下列内容,注释不要复制,只复制<property></property>部分。
    <!-- 指定HDFS中NameNode的地址 -->
    <property>
    	<name>fs.defaultFS</name>
        <value>hdfs://AY01:9000</value>
    </property>
    
    <!-- 指定hadoop运行时产生文件的存储目录 -->
    <property>
    	<name>hadoop.tmp.dir</name>
    	<value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>
    

    insert image description here

  2. There are 3 files in the Hdfs section:
    (2.1) hadoop-env.sh

    (此文件在安装Hadoop时已配置过,故可以省略)	
    ·修改内容:export JAVA_HOME=/opt/module/jdk1.8
    

    (2.2)hdfs-site.xml

    ·vim hdfs-site.xml
    ·添加的内容如下:
     <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>AY03:50090</value>
    </property>
    

    insert image description here

    (2.3)slaves

    ·vim slaves
    ·把原文件中的localhost删除掉,输入集群节点名:
    	AY01
    	AY02
    	AY03
    

    insert image description here

  3. yarn
    (3.1)yarn-env.sh

    ·vim yarn-env.sh
    ·修改内容如下:
    	将 # export JAVA_HOME=/home/y/libexec/jdk1.6.0/ 修改为 export JAVA_HOME=/opt/module/jdk1.8
    	(JDK的实际路径)
    

    Before modification:
    insert image description here
    After modification:
    insert image description here

    (3.2)yarn-site.xml

    ·vim yarn-site.xml
    ·添加下面<property></property>部分:
    <!-- reducer获取数据的方式 -->
    	<property>
    		 <name>yarn.nodemanager.aux-services</name>
    	 	<value>mapreduce_shuffle</value>
    	</property>
    <!-- 指定YARN的ResourceManager的地址 -->
    	<property>
    		<name>yarn.resourcemanager.hostname</name>
    		<value>AY02</value>
    	</property>
    

    insert image description here

  4. mapreduce
    (4.1)mapred-env.sh

    ·vim mapred-env.sh
    ·修改内容如下:
    	将# export JAVA_HOME=/home/y/libexec/jdk1.6.0/ 修改为 export JAVA_HOME=/opt/module/jdk1.8
    

    Before modification:
    insert image description here
    After modification:
    insert image description here
    (4.2) mapred-site.xml

    ·hadoop文件夹下没有mapred-site.xml文件,但有一个mapred-site.xml.template文件,
    拷贝一份该文件,并把新拷贝的文件命名为mapred-site.xml
    命令为:cp mapred-site.xml.template mapred-site.xml
    ·然后再打开mapred-site.xml
    ·添加<property></property>部分:
    <!-- 指定mr运行在yarn上 -->
    <property>
    	<name>mapreduce.framework.name</name>
    	<value>yarn</value>
    </property>
    

    insert image description here
    insert image description here

  5. Distribute the Hadoop folder

    自此,所有文件全部配置完毕,接下来我们需要把AY01节点的hadoop文件夹远程分发到AY02、AY03上的对应位置
    ·首先回到/opt/module文件夹下
    	cd /opt/module
    ·然后执行如下命令:
    	scp -r ./hadoop-2.7.2/ AY02:/opt/module
    	scp -r ./hadoop-2.7.2/ AY03:/opt/module
    

So far, the Hadoop configurations of the three systems are completely consistent.

7. Start the service

  1. format namenode

    根据前面的集群部署图可知,namenode在AY01节点上,因此只需要在AY01节点上进行格式化即可。
    ·首先,进入到hadoop目录下:cd/opt/module/hadoop-2.7.2
    ·格式化命令:bin/hdfs namenode -format
    

    insert image description here

  2. start namenode

    ·启动命令:sbin/start-dfs.sh
    ·可通过jps命令查看是否启动
    

    insert image description here
    insert image description here

  3. start yarn

    根据之前集群部署的情况可知,我们把yarn部署在了AY02节点上,因此我们需要在AY02节点上启动yarn。
    ·启动yarn命令:sbin/start-yarn.sh
    ·通过jps命令查看是否启动
    

    insert image description here
    insert image description here
    The services started by AY03 are:
    insert image description here

  4. close command

    ·yarn关闭命令:sbin/stop-yarn.sh
    ·hadoop关闭命令:sbin/stop-dfs.sh
    
    Hadoop启动和关闭命令:
    1.start-all.sh
     启动所有的Hadoop守护进程。(包括NameNode、Secondary NameNode、DataNode、JobTracker、TaskTrack)
    2.stop-all.sh
     停止所有的Hadoop守护进程。(包括NameNode、Secondary NameNode、DataNode、JobTracker、TaskTrack)
    3.start-dfs.sh
     启动Hadoop HDFS守护进程NameNode、Secondary NameNode 和 DataNode。
    4.stop-dfs.sh
     停止Hadoop HDFS守护进程NameNode、Secondary NameNode 和 DataNode。
       
    5.hadoop-daemons.sh start namenode
     单独启动NameNode守护进程
    6.hadoop-daemons.sh stop namenode
     单独停止NameNode守护进程
    7.hadoop-daemons.sh start datanode
     单独启动DataNode守护进程
    8.hadoop-daemons.sh stop datanode
     单独停止DataNode守护进程
    9.hadoop-daemons.sh start secondarynamenode
     单独启动SecondaryNameNode守护进程
    10.hadoop-daemons.sh stop secondarynamenode
     单独停止SecondaryNameNode守护进程 
    11.start-mapred.sh
      启动Hadoop MapReduce守护进程JobTracker和TaskTracker
     12.stop-mapred.sh
      停止Hadoop MapReduce守护进程JobTracker和TaskTracker
    13.hadoop-daemons.sh start jobtracker
     单独启动jobtracker守护进程
    14.hadoop-daemons.sh stop jobtracker
     单独停止jobtracker守护进程
    15.hadoop-daemons.sh start tasktracker
     单独启动TaskTracker守护进程
    16.hadoop-daemons.sh stop tasktracker
     单独停止TaskTracker守护进程
    

So far, we have completed the fully distributed construction of the Hadoop cluster! !

8. Test a jar package on the cluster - word statistics function

  1. Create a word.txt file locally

    ·命令1:touch word.txt
    ·命令2:vi word.txt
    输入:
    小明 小张 小李 小明
    张三 王五 张三 小李
    

    insert image description here

  2. Create an input folder in the root directory

    ·命令:bin/hdfs dfs -mkdir /input
    
  3. Upload the word.txt file to the input folder in the server

    ·命令:bin/hdfs dfs -put ./word.txt /input
    
  4. Check if the upload is successful

    ·命令:bin/hdfs dfs -ls /input
    

    insert image description here

  5. Execute the wordcount word statistics function

    在执行此命令前,再次确认集群中所有节点的防火墙都已经关闭!
    
    ·命令:bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
    
    ·output文件夹内有2个文件
     查看命令:bin/hdfs dfs -ls /output
    ·输出output文件夹文件的内容:
     bin/hdfs dfs -cat /output/*
    

    insert image description here
    insert image description here

So far, the basic fully distributed has been successfully established! !

  1. Delete the output folder on the file system (understand)
    ·删除命令:bin/hdfs dfs -rm -r /output
    
  2. Delete the part-r-00000 file in the output folder (understand)
    ·删除命令:bin/hdfs dfs -rm /output/part-r-00000
    
  3. Yarn's browser page view (executed tasks in the cluster)
    由于我们的yarn部署在 AY02 节点,该节点的固定IP为:192.168.111.132
    所以我们可以通过浏览器:http://192.168.111.132:8088/cluster
    查看yarn上已执行的任务。
    
    insert image description here

conclusion of issue

  1. For the finishing process, I refer to the documentation of centos6, and the commands of centos7 are different from it. Such as modifying the host name, closing the firewall, etc.
  2. Centos7 cannot connect to Xshell

Guess you like

Origin blog.csdn.net/weixin_45954198/article/details/128461645