注:本次Hadoop集群三台虚拟机,主机名称分别为:hadoop01,hadoop02,hadoop03
一、Hadoop集群安装
1.规范操作,创建文件夹:
cd /export/
cd /export/data/
cd /export/servers/
cd /export/software/
2.下载JDK,Hadoop:
JDK:https://www.oracle.com/technetwork/java/javase/downloads/index.html
Hadoop:https://hadoop.apache.org/releases.html
安装包放入/export/software/目录中
3.安装JDK,Hadoop:
cd /export/software/
tar -zxvf (jdk) -C /export/servers/
tar -zxvf (Hadoop) -C /export/servers/
4.方便操作,重命名JDK,Hadoop:
cd /export/servers/
mv {JDK}/ jdk
mv {Hadoop}/ hadoop
5.配置JDK,Hadoop环境变量:
vi /etc/profile
export JAVA_HOME=/export/servers/jdk
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/export/servers/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
6.重启、非重启生效:
重启:reboot
非重启:source /etc/profile
二、Hadoop集群配置
1.进入主节点hadoop解压缩包下的/etc/hadoop/目录
2.修改hadoop-env.sh文件:(设置Hadoop运行时需要的JDK环境变量;目的是让Hadoop启动时能够执行守护进程)
vi /hadoop-env.sh
export JAVA_HOME=/export/servers/jdk
3.core-site.xml文件:(配置HDFS的主进程NameNode运行主机、Hadoop集群的主节点位置;配置Hadoop运行时生成数据的临时目录)
vi /core-site.xml
<configuration>
<!-- 用于设置Hadoop的文件系统,由URI指定 -->
<property>
<name>fs.defaultFS</name>
<!-- 用于指定namenode地址在hadoop01机器上-->
<value>hdfs://hadoop01:9000</value>
</property>
<!--配置Hadoop的临时目录,默认/tmp/hadoop-${user.name}-->
<property>
<name>hadoop.tmp.dir</name>
<value>/export/servers/hadoop/tmp</value>
</property>
</configuration>
4.修改hdfs-site.xml文件:(配置HDFS数据块的副本数量、默认值为3;设置Secondary NameNode所在服务的HTTP协议网址)
vi /hdfs-site.xml
<configuration>
<!--指定HDFS副本的数量-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--secondary namenode所在主机的IP和端口-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop02:50090</value>
</property>
</configuration>
5.修改mapred-site.xml文件:(指定Hadoop的MapReduce运行框架为YARN)
cp mapred-site.xml.template mapred-site.xml
vi /mapred-site.xml
<configuration>
<!--指定MapReduce运行时框架,这里指定在YARN上,默认是local-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
6.修改yarn-site.xml文件:(配置YARN的主进程ResourceManager运行主机为hadoop01;配置NodeManager运行时的附属服务、需要配置为mapreduce_shuffle才能正常运行MapReduce默认程序)
vi /yarn-site.xml
<configuration>
<!--指定YARN集群的管理者(ResourceManager)的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
7.修改slaves文件:(记录Hadoop集群所有从节点的主机名、配合一键启动;需要删除默认内容)
vi /slaves
hadoop01
hadoop02
hadoop03
8.集群主节点的配置文件分发到其它子节点:
scp /etc/profile hadoop02:/etc/profile
scp /etc/profile hadoop03:/etc/profile
scp -r /export/ hadoop02:/
scp -r /export/ hadoop03:/
9.子节点分别执行刷新指令:
source /etc/profile
三、Hadoop集群测试
1.格式化文件系统:
hdfs namenode -format
OR
hadoop namenode -format
2.启动和关闭Hadoop集群:单节点逐个启动和关闭
(1)主节点启动和关闭HDFS NameNode进程
hadoop-daemon.sh start namenode
hadoop-daemon.sh stop namenode
(2)从节点启动和关闭HDFS DataNode进程
hadoop-daemon.sh start datanode
hadoop-daemon.sh stop datanode
(3)主节点启动和关闭YARN ResourceManager进程
yarn-deamon.sh start resourcemanager
yarn-deamon.sh stop resourcemanager
(4)从节点启动和关闭YARN NodeManager进程
yarn-deamon.sh start nodemanager
yarn-deamon.sh stop nodemanager
(5)规划节点hadoop02启动和关闭SecondaryNameNode进程
hadoop-daemon.sh start secondarynamenode
hadoop-daemon.sh stop secondarynamenode
3.启动和关闭Hadoop集群:一键启动启动和关闭
(1)主节点启动和关闭所有HDFS服务进程
start-dfs.sh
stop-dfs.sh
(2)主节点启动和关闭所有YARN服务进程
start-yarn.sh
stop-yarn.sh