一、Hadoop2.x完全分布式集群搭建

实验环境规划

192.168.1.101 cmaster0
192.168.1.102 cslave0
192.168.1.103 cslave1
三台服务器都是安装的CentOS6.8操作系统

配置/etc/hosts

[root@cmaster0 ~]# vi /etc/hosts
192.168.1.101   cmaster0
192.168.1.102   cslave0
192.168.1.103   cslave1

修改字符集

[root@cmaster0 ~]# vi /etc/sysconfig/i18n 
#LANG="zh_CN.UTF-8"	
LANG="en_US.UTF-8"

关闭selinux

[root@cmaster0 ~]# vi /etc/sysconfig/selinux 

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
#SELINUX=enforcing
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
#SELINUXTYPE=targeted 

禁用iptables、ip6tables

[root@cmaster0 ~]# service iptables stop
[root@cmaster0 ~]# service ip6tables stop
[root@cmaster0 ~]# chkconfig iptables off
[root@cmaster0 ~]# chkconfig ip6tables off
[root@cmaster0 ~]# chkconfig iptables --list
[root@cmaster0 ~]# chkconfig ip6tables --list

建立hadoop账号

[root@cmaster0 ~]# useradd hadoop
[root@cmaster0 ~]# passwd hadoop

配置sudo权限

[root@cmaster0 ~]# vi /etc/sudoers
root    ALL=(ALL)       ALL
hadoop    ALL=(ALL)       ALL
##/etc/sudoers文件时readonly的, 在退出的时候wq!就可以了

安装JDK

[root@cmaster0 software]# tar -zxf jdk-7u79-linux-x64.gz -C /opt/module/
[root@cmaster0 software]# vi /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
[root@cmaster0 jdk1.7.0_79]# source /etc/profile

创建hadoop安装目录

[root@cmaster0 ~]# mkdir -p /opt/module
[root@cmaster0 software]# tar -zxf hadoop-2.7.2.tar.gz -C /opt/module/
#以上操作集群中每一台机器都要做,我这里是使用虚拟机,就在同一台机器做了,复制几台虚拟机
[root@cmaster0 software]# vi /etc/profile
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[root@cmaster0 jdk1.7.0_79]# source /etc/profile

配置ssh免密码

[hadoop@cmaster0 ~]$ cd
[hadoop@cmaster0 ~]$ ssh-keygen -t rsa
[hadoop@cmaster0 .ssh]$ cd ~/.ssh
[hadoop@cmaster0 .ssh]$ cp id_rsa.pub authorized_keys
##分发ssh公钥(所有节点都要做),这一步需要在集群中每台机器都做
##把各个节点的authorized_keys的内容互相拷贝加入到对方的此文件中,然后就可以免密码彼此ssh连入
测试ssh(所有节点都要做)
# ssh cmaster0 date
# ssh cslave0 date
# ssh cslave1 date

集群部署规划

/ cmaster0 cslave0 cslave1
HDFS DataNode DataNode DataNode
HDFS NameNode / SecondaryNameNode
YARN NodeManager NodeManager NodeManager
YARN / ResourceManager /

安装Hadoop2.x

下载hadoop2.x幵解压
[hadoop@cmaster0 module]$ cd hadoop-2.7.2/
[hadoop@cmaster0 hadoop-2.7.2]$ mkdir -p data/tmp
修改配置文件
涉及到的配置文件有8个:
$HADOOP_HOME/etc/hadoop/hadoop-env.sh
$HADOOP_HOME/etc/hadoop/mapred-env.sh
$HADOOP_HOME/etc/hadoop/yarn-env.sh
$HADOOP_HOME/etc/hadoop/core-site.xml
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
$HADOOP_HOME/etc/hadoop/mapred-site.xml
$HADOOP_HOME/etc/hadoop/yarn-site.xml
$HADOOP_HOME/etc/hadoop/slaves
以上个别文件默认不存在的,可以复制相应的template文件获得
配置hadoop-env.sh
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置mapred-env.sh
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置yarn-env.sh
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置core-site.xml文件
[hadoop@cmaster0 hadoop]$ vi core-site.xml
<configuration>
	<!-- 指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://cmaster0:8020</value>
	</property>
	<!-- 指定hadoop运行时产生文件的存储目录 -->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/opt/module/hadoop-2.7.2/data/tmp</value>
  	</property>
</configuration>
配置hdfs-site.xml文件
[hadoop@cmaster0 hadoop]$ vi hdfs-site.xml
<configuration>
	<!-- 指定HDFS副本的数量 -->
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<!-- secondarynamenode 进程所在服务器地址 -->
	<property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>cslave1:50090</value>
    </property>
</configuration>
配置mapred-site.xml文件
[hadoop@cmaster0 hadoop]$ cp mapred-site.xml.template mapred-site.xml
[hadoop@cmaster0 hadoop]$ vi mapred-site.xml
<configuration>
	<!-- 指定mr运行在yarn上 -->
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>
配置yarn-site.xml文件
[hadoop@cmaster0 hadoop]$ vi yarn-site.xml
<configuration>
	<!-- 指定YARN的老大(ResourceManager)的地址 -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>cslave0</value>
	</property>
	<!-- reducer获取数据的方式 -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
</configuration>
配置slaves文件
[hadoop@cmaster0 hadoop]$ vi slaves
cmaster0
cslave0
cslave1

向各节点复制hadoop

[hadoop@cmaster0 hadoop]$ scp -r ./hadoop-2.7.2/ cslave0:/opt/module/
[hadoop@cmaster0 hadoop]$ scp -r ./hadoop-2.7.2/ cslave1:/opt/module/
[hadoop@cmaster0 hadoop]$ scp -r ./jdk1.7.0_79/ cslave0:/opt/module/
[hadoop@cmaster0 hadoop]$ scp -r ./jdk1.7.0_79/ cslave1:/opt/module/

格式化namenode

[hadoop@cmaster0 hadoop-2.6.4]$ hdfs namenode -format

启动Hadoop集群

一定要在cmaster0上启动start-dfs.sh,因为NameNode是配置在cmaster0上的
[hadoop@cmaster0 sbin]$ start-dfs.sh
[hadoop@cmaster0 tmp]$ jps
2973 DataNode
3225 Jps
2876 NameNode

[hadoop@cslave0 tmp]$ jps
2734 Jps
2647 DataNode

[hadoop@cslave1 tmp]$ jps
2815 Jps
2728 SecondaryNameNode
2655 DataNode
一定要在cslave0上启动start-yarn.sh,因为resourcemanager是配置在cslave0上的
  • 注意:Namenode和ResourceManger如果不是同一台机器,不能在NameNode上启动 yarn,应该在ResouceManager所在的机器上启动yarn。
[hadoop@cslave0 sbin]$ start-yarn.sh
[hadoop@cmaster0 tmp]$ jps
3525 Jps
2973 DataNode
3375 NodeManager
2876 NameNode

[hadoop@cslave0 tmp]$ jps
3792 Jps
2647 DataNode
3122 ResourceManager
3230 NodeManager

[hadoop@cslave1 tmp]$ jps
2970 Jps
2728 SecondaryNameNode
2860 NodeManager
2655 DataNode

web管理页面

#NameNode所在服务器上查看:
http://192.168.1.101:50070 (HDFS管理界面)

#ResourceManager所在服务器上查看:
http://192.168.1.102:8088   (MR管理界面)

#查看SecondaryNameNode信息
http://192.168.1.103:50090/status.html

$HADOOP_HOME/sbin管理脚本介绍

命令 解释
hadoop-daemon.sh 可以单独启动namenode、datanode、secondarynamenode(hadoop-daemon.sh start,stop namenode,datanode,secondarynamenode)
hadoop-daemons.sh 在集群中每台机器执行hadoop-daemon.sh脚本
yarn-daemon.sh 可以单独启动resourcemanager、nodemanager(yarn-daemon.sh start,stop resourcemanager,nodemanager)
yarn-daemons.sh 在集群中每台机器执行yarn-daemon.sh脚本
start-all.sh 先执行start-dfs.sh再执行start-yarn.sh(脚本已经过时)
stop-all.sh 同start-all.sh脚本
start-dfs.sh 启动hdfs(NameNode、SecondaryNameNode、DataNode)
stop-dfs.sh 同start-dfs.sh脚本
start-yarn.sh 启动yarn(ResourceManager、NodeManager)
stop-yarn.sh 同start-yarn.sh脚本

官方wordcount案例

在home目录下面准备一个输入文件
[hadoop@cmaster0 ~]$ cd
[hadoop@cmaster0 ~]$ mkdir wcinput
[hadoop@cmaster0 ~]$ cd wcinput/
[hadoop@cmaster0 wcinput]$ touch wc.input
[hadoop@cmaster0 wcinput]$ vi wc.input 
[hadoop@cmaster0 wcinput]$ cat wc.input 
hadoop yarn
hadoop mapreduce 
zhangsh
zhangyu
将测试文件内容上传到文件系统上
[hadoop@cmaster0 wcinput]$ hadoop fs -mkdir -p /user/hadoop/wcinput
[hadoop@cmaster0 wcinput]$ hadoop fs -put /home/hadoop/wcinput/wc.input /user/hadoop/wcinput
[hadoop@cmaster0 wcinput]$ hadoop fs -cat /user/hadoop/wcinput/wc.input
hadoop yarn
hadoop mapreduce 
zhangsh
zhangyu
在Hdfs上运行mapreduce程序
[hadoop@cmaster0 wcinput]$ hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/hadoop/wcinput /user/hadoop/wcoutput
[hadoop@cmaster0 wcinput]$ hadoop fs -ls -R /user
drwxr-xr-x   - hadoop supergroup          0 2018-12-21 08:03 /user/hadoop
drwxr-xr-x   - hadoop supergroup          0 2018-12-21 08:00 /user/hadoop/wcinput
-rw-r--r--   3 hadoop supergroup         46 2018-12-21 08:00 /user/hadoop/wcinput/wc.input
drwxr-xr-x   - hadoop supergroup          0 2018-12-21 08:03 /user/hadoop/wcoutput
-rw-r--r--   3 hadoop supergroup          0 2018-12-21 08:03 /user/hadoop/wcoutput/_SUCCESS
-rw-r--r--   3 hadoop supergroup         48 2018-12-21 08:03 /user/hadoop/wcoutput/part-r-00000
[hadoop@cmaster0 wcinput]$ hadoop fs -cat /user/hadoop/wcoutput/part-r-00000
hadoop	2
mapreduce	1
yarn	1
zhangsh	1
zhangyu	1

猜你喜欢

转载自blog.csdn.net/zq9017197/article/details/85269563