滴滴云部署Hadoop3.1.1

1.本例集群架构如下:

在这里插入图片描述

此处我们使用的是滴滴云主机内网 IP,如果需要外部访问 Hadoop,需要绑定公网 IP 即 EIP。有关滴滴云 EIP 的使用请参考以下链接。
https://help.didiyun.com/hc/kb/section/1035272/

  • master 节点保存着分布式文件的系统信息,比如 inode 表和资源调度器及其记录。同时 master 还运行着两个守护进程:

    NameNode:管理分布式文件系统,存放数据块在集群中所在的位置。
    ResourceManger:负责调度数据节点(本例中为 node1 和 node2)上的资源,每个数据节点上都有一个 NodeManger 来执行实际工作。
  • node1 和 node2 节点负责存储实际数据并提供计算资源,运行两个守护进程:
    DataNode:负责管理实际数据的物理储存。
    NodeManager:管理本节点上计算任务的执行。

2.系统配置#

本例中使用的滴滴云虚拟机配置如下:
2核CPU 4G内存 40G HDD存储 3 Mbps带宽 CentOS 7.4

  • 滴滴云主机出于安全考虑,默认不能通过 root 用户直接登录,需要先用 dc2-user 登录,然后用 sudo su 切换至 root。本例中默认全部以 dc2-user 用户运行命令,Hadoop默认用户同样为 dc2-user。
  • 将三台节点的 IP 和主机名分别写入三台节点的 /etc/hosts 文件,并把前三行注释掉
sudo vi /etc/hosts
#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
#127.0.0.1 10-254-149-24
10.254.149.24   master
10.254.88.218   node1
10.254.84.165   node2
  • master 节点需要与 node1 和 node2 进行 ssh 密钥对连接,在 master 节点上为 dc2-user 生成公钥。
ssh-keygen -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:zRhhVpEfSIZydqV75775sZB0GBjZ/f7nnZ4mgfYrWa8 hadoop@10-254-149-24
The key's randomart image is:
+---[RSA 4096]----+
|        ++=*+ .  |
|      .o+o+o+. . |
|       +...o o  .|
|         = .. o .|
|        S + oo.o |
|           +.=o .|
|          . +o+..|
|           o +.+O|
|            .EXO=|
+----[SHA256]-----+

输入以下命令将生成的公钥复制到三个节点上

ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@master
ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@node1
ssh-copy-id -i $HOME/.ssh/id_rsa.pub dc2-user@node2

接下来可以用在 master 输入 ssh dc2-user@node1,ssh dc2-user@node2 来验证是否可以不输入密码就可以连接成功。

  • 配置 java 环境

在3台节点下载 jdk。

mkdir /home/dc2-user/java
cd /home/dc2-user/java
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.tar.gz
tar -zxf jdk-8u191-linux-x64.tar.gz

在3台节点配置 java 变量

sudo vi /etc/profile.d/jdk-1.8.sh
export JAVA_HOME=/home/dc2-user/java/jdk1.8.0_191
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

使环境变量生效

source /etc/profile

查看 java 版本

java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

出现以上结果,即说明 java 环境已经配置成功。

3.安装Hadoop

在 master 节点下载 Hadoop3.1.1 并解压。

cd /home/dc2-user
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
tar zxf hadoop-3.1.1.tar.gz

在 /home/dc2-user/hadoop-3.1.1/etc/hadoop 下需要配置的6个文件分别是 hadoop-env.sh、core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、workers

(1)hadoop-env.sh 添加如下内容

export JAVA_HOME=/home/dc2-user/java/jdk1.8.0_191
export HDFS_NAMENODE_USER="dc2-user"
export HDFS_DATANODE_USER="dc2-user"
export HDFS_SECONDARYNAMENODE_USER="dc2-user"
export YARN_RESOURCEMANAGER_USER="dc2-user"
export YARN_NODEMANAGER_USER="dc2-user"

(2)core-site.xml

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://master:9000</value>
        </property>
    </configuration>

(3)hdfs-site.xml

<configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/dc2-user/data/nameNode</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/dc2-user/data/dataNode</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
       </property>
</configuration>

(4)yarn-site.xml

<configuration>
    <property>
            <name>yarn.acl.enable</name>
            <value>0</value>
    </property>
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>master</value>
    </property>
    <property>
          <name>yarn.resourcemanager.webapp.address</name>
          <value>master:8088</value>
    </property>
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>
     <property>
	<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

</configuration>

(5)mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1536</value>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1024M</value>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>3072</value>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2560M</value>
    </property>
  
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>

    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>
</configuration>

(6)编辑 workers

node1
node2

4.启动hadoop

  • 复制以下配置文件到 node1 和 node2
scp -r /home/dc2-user/hadoop-3.1.1 dc2-user@node1:/home/dc2-user/
scp -r /home/dc2-user/hadoop-3.1.1 dc2-user@node2:/home/dc2-user/
  • 配置 Hadoop 环境变量(三台节点)
sudo vi /etc/profile.d/hadoop-3.1.1.sh
export HADOOP_HOME="/home/dc2-user/hadoop-3.1.1"
export PATH="$HADOOP_HOME/bin:$PATH"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
  • 使环境变量生效
source /etc/profile
  • 在3台节点输入 Hadoop version 看是否有输出,来验证环境变量是否生效
hadoop version
Hadoop 3.1.1
Source code repository https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c
Compiled by leftnoteasy on 2018-08-02T04:26Z
Compiled with protoc 2.5.0
From source with checksum f76ac55e5b5ff0382a9f7df36a3ca5a0
This command was run using /home/dc2-user/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1.jar
  • 格式化 HDFS,只在 master 上操作
/home/dc2-user/hadoop-3.1.1/bin/hdfs namenode -format testCluster
  • 开启服务
/home/dc2-user/hadoop-3.1.1/sbin/start-dfs.sh
/home/dc2-user/hadoop-3.1.1/sbin/start-yarn.sh
  • 查看三个节点服务是否已启动

master

jps
1654 Jps
31882 NameNode
32410 ResourceManager
32127 SecondaryNameNode

node1

jps
19827 NodeManager
19717 DataNode
20888 Jps

node2

jps
30707 Jps
27675 NodeManager
27551 DataNode

出现以上结果,即说明服务已经正常启动,可以通过 master 的公网 IP 访问 ResourceManager 的 web 页面,注意要打开安全组的 8088 端口,关于滴滴云安全组的使用请参考以下链接。https://help.didiyun.com/hc/kb/article/1091031/
在这里插入图片描述

5.实例验证

最后用 Hadoop 中自带的 wordcount 程序来验证 MapReduce 功能,以下操作在 master 的节点进行
首先在当前目录创建两个文件 test1,test2,内容如下:

vi test1
hello world
bye world
vi test2
hello hadoop
bye hadoop

接下来在 HDFS 中创建文件夹并将以上两个文件上传到文件夹中。

hadoop fs -mkdir /input
hadoop fs -put test* /input

当集群启动的时候,会首先进入安全模式,因此要先离开安全模式。

hdfs dfsadmin -safemode leave

运行 wordcount 程序统计两个文件中个单词出现的次数。

yarn jar /home/dc2-user/hadoop-3.1.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /input /output


WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.
2018-11-09 20:27:12,233 INFO client.RMProxy: Connecting to ResourceManager at master/10.254.149.24:8032
2018-11-09 20:27:12,953 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1541766351311_0001
2018-11-09 20:27:14,483 INFO input.FileInputFormat: Total input files to process : 2
2018-11-09 20:27:16,967 INFO mapreduce.JobSubmitter: number of splits:2
2018-11-09 20:27:17,014 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enab
2018-11-09 20:27:17,465 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541766351311_0001
2018-11-09 20:27:17,466 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-11-09 20:27:17,702 INFO conf.Configuration: resource-types.xml not found
2018-11-09 20:27:17,703 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-11-09 20:27:18,256 INFO impl.YarnClientImpl: Submitted application application_1541766351311_0001
2018-11-09 20:27:18,296 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1541766351311_0001/
2018-11-09 20:27:18,297 INFO mapreduce.Job: Running job: job_1541766351311_0001
2018-11-09 20:28:24,929 INFO mapreduce.Job: Job job_1541766351311_0001 running in uber mode : false
2018-11-09 20:28:24,931 INFO mapreduce.Job:  map 0% reduce 0%
2018-11-09 20:28:58,590 INFO mapreduce.Job:  map 50% reduce 0%
2018-11-09 20:29:19,437 INFO mapreduce.Job:  map 100% reduce 0%
2018-11-09 20:29:33,038 INFO mapreduce.Job:  map 100% reduce 100%
2018-11-09 20:29:36,315 INFO mapreduce.Job: Job job_1541766351311_0001 completed successfully
2018-11-09 20:29:36,619 INFO mapreduce.Job: Counters: 54
	File System Counters
		FILE: Number of bytes read=75
		FILE: Number of bytes written=644561
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=237
		HDFS: Number of bytes written=31
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=1
		Launched map tasks=3
		Launched reduce tasks=1
		Data-local map tasks=3
		Total time spent by all maps in occupied slots (ms)=164368
		Total time spent by all reduces in occupied slots (ms)=95475
		Total time spent by all map tasks (ms)=82184
		Total time spent by all reduce tasks (ms)=31825
		Total vcore-milliseconds taken by all map tasks=82184
		Total vcore-milliseconds taken by all reduce tasks=31825
		Total megabyte-milliseconds taken by all map tasks=168312832
		Total megabyte-milliseconds taken by all reduce tasks=97766400
	Map-Reduce Framework
		Map input records=5
		Map output records=8
		Map output bytes=78
		Map output materialized bytes=81
		Input split bytes=190
		Combine input records=8
		Combine output records=6
		Reduce input groups=4
		Reduce shuffle bytes=81
		Reduce input records=6
		Reduce output records=4
		Spilled Records=12
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=2230
		CPU time spent (ms)=2280
		Physical memory (bytes) snapshot=756064256
		Virtual memory (bytes) snapshot=10772656128
		Total committed heap usage (bytes)=541589504
		Peak Map Physical memory (bytes)=281268224
		Peak Map Virtual memory (bytes)=3033423872
		Peak Reduce Physical memory (bytes)=199213056
		Peak Reduce Virtual memory (bytes)=4708827136
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=47
	File Output Format Counters 
		Bytes Written=31

如果出现以下输出说明计算完成,结果保存在 HDFS 中的 /output 文件夹中。

hadoop fs -ls /output
Found 2 items
-rw-r--r--   1 root supergroup          0 2018-11-09 20:29 /output/_SUCCESS
-rw-r--r--   1 root supergroup         31 2018-11-09 20:29 /output/part-r-00000

打开 part-r-00000 查看结果。

hadoop fs -cat /output/part-r-00000
bye	2
hadoop	2
hello	2
world	2

猜你喜欢

转载自blog.csdn.net/java060515/article/details/84302102
今日推荐