hadoop2.6分布式环境搭建

1. 前言

在3个系统centos6.5的linux虚拟机搭建一个分布式hadoop环境，hadoop版本为2.6，节点ip分别为

192.168.17.133 
192.168.17.134 
192.168.17.135

2. 配置hosts文件

分别在3个节点上配置/etc/hosts文件，内容如下：

192.168.17.133 master 
192.168.17.134 slave1 
192.168.17.135 slave2 

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

3. 安装java环境

这里选择安装jdk1.7版本，安装成功后配置一下环境变量：

export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

4. 关闭防火墙与selinux

临时关闭防火墙：

service iptables stop

永久关闭防火墙重启生效：

chkconfig iptables off

临时关闭Selinux:

setenforce 0

永久关闭Selinux:

编辑/etc/selinux/config,设置SELINUX=disabled

配置完成后重启各节点使配置生效

5. 配置免密码登录

首先到用户主目录（cd ~），ls -a查看文件，其中一个为“.ssh”，该文件价是存放密钥的。待会我们生成的密钥都会放到这个文件夹中。
现在执行命令生成密钥： ssh-keygen -t rsa -P "" (使用rsa加密方式生成密钥)回车后，会提示三次输入信息，我们直接回车即可。
进入文件夹cd .ssh (进入文件夹后可以执行ls -a 查看文件)
将生成的公钥id_rsa.pub内容追加到authorized_keys,执行命令：cat id_rsa.pub >> authorized_keys
把各个节点的authorized_keys的内容互相拷贝加入到对方的此文件中，然后就可以免密码彼此ssh连入

6. NTP服务器搭建

这里以master为ntp服务器，slave1,slave2作为客户端

首先在master上编辑/etc/ntp.conf配置文件，只显示部分内容：

# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).

driftfile /var/lib/ntp/drift

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
#restrict default kod nomodify notrap nopeer noquery
#restrict -6 default kod nomodify notrap nopeer noquery
restrict default nomodify

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1

# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server     127.127.1.0 #local clock
fudge    127.127.1.0 stratum 8

在服务端master启动httpd服务：service ntpd start，设置开机启动：chkconfig ntpd on
在客户端slave1,slave2停止ntpd服务：service ntpd stop，并手动测试能否同步master：ntpdate master，成功的话如下图：
在slave1,slave2添加定时同步任务，执行crontab -e添加以下内容：

*/1 * * * * /usr/sbin/ntpdate master;hwclock –w

7. hadoop安装

以上的步骤都是基础环境配置，现在正式进入hadoop环境的配置

7.1 解压并配置环境变量

到官网下载安装包后解压到/usr/hadoop-2.6.0，然后配置一下环境变量，内容如下：

export HADOOP_HOME=/usr/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

7.2 配置hadoop-env.sh、mapred-env.sh、yarn-env.sh

到$HADOOP_HOME/etc/hadoop目录下修改这三个文件，添加JAVA_HOME配置：

export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

7.3 配置slaves

根据实际的部署修改slaves文件，这里以slave1,slave2两台机子作为集群的slave，则内容为：

slave1
slave2

7.4 配置core-site.xml

<property>
    <name>hadoop.tmp.dir</name>
    <value>/hadoop/tmp</value>
</property>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
</property>

<!-- root用户问题
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
-->

7.5 配置hdfs-site.xml

<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>/hadoop/dfs/name</value>
</property>
<property>
    <name>dfs.datannode.data.dir</name>
    <value>/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

7.6 配置mapred-site.xml

<property>                                                                  
<name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value>
</property>

7.7 配置yarn-site.xml

<property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
</property>
<property>                                                                
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
       <name>yarn.resourcemanager.address</name>
       <value>master:8032</value>
</property>
<property>
       <name>yarn.resourcemanager.scheduler.address</name>
       <value>master:8030</value>
</property>
<property>
       <name>yarn.resourcemanager.resource-tracker.address</name>
       <value>master:8031</value>
</property>
<property>
       <name>yarn.resourcemanager.admin.address</name>
       <value>master:8033</value>
</property>
<property>
       <name>yarn.resourcemanager.webapp.address</name>
       <value>master:8088</value>
</property>

7.8 格式化HDFS文件系统

执行命令hadoop namenode -format，看到以下信息表示成功：

这里写图片描述

7.9 启动集群

$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stap historyserver

完成后可以通过jps查看一下各节点的进程是否正常启动，或者通过访问一下web界面：

http://master:50070
http://master:8088/

8. 连接收藏

hadoop2.2已经被遗弃的属性名称：https://www.iteblog.com/archives/923.html