Hadoop2.6.5搭建教程

最近需要搭一个HBase环境,所以第一步就是要搭建Hadoop了。搭建过程主要参考了张良均、樊哲、位文超、刘名军等人著作的《Hadoop大数据挖掘》(机械工业出版社),部分环节参考了网上查阅的资料,现在也找不到出处了,十分抱歉。


Hadoop2.6.5集群搭建

搭建环境:Ubuntu 16.04

我使用Virtualbox建了四个虚拟机,每个虚拟机开了两个网卡,一个用来四个虚拟机内部连接,一个用来连外网装软件的。下面两条是我在搭建过程中碰到的两个小问题,大家可以忽略不看。

  • 修改virtualbox虚拟机硬盘存储空间 D:\Program Files\Oracle\VirtualBox>VBoxManage.exe modifyhdF:\virtualbox\ubuntu_slave3\ubuntu_slave3.vdi --resize 15360
  • 无法访问共享文件夹,权限不够   sudo adduser boarmy vboxsf 增加用户到vboxsf用户组即可


1、 配置固定IP

修改hosts文件,使用root用户 vim /etc/hosts,增加以下四条。原文件中有一条127.0.1.1指向本机的记录,引起了我后续安装Zookeeper的失败,尽量注释掉吧。

a)     192.168.1.10    master.ubuntu.com       master

b)     192.168.1.11    slave1.ubuntu.com       slave1

c)     192.168.1.12    slave2.ubuntu.com       slave2

d)     192.168.1.13    slave3.ubuntu.com       slave3

2、 配置Java环境

a)     下载Java包 http://www.oracle.com/technetwork/java/javase/downloads

b)     tar -zxvf jdk-8u151-linux-x64.tar.gz解压包,并放在 /usr/local/ 下

c)     修改 /etc/profile文件,在下方新增:

# setJava enviroment

JAVA_HOME=/usr/local/jdk1.8.0_151

PATH=$JAVA_HOME/bin:$PATH

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

exportJAVA_HOME

exportPATH

export CLASSPATH

3、 新增用户

a)     sudo adduser hadoop

b)     给hadoop新增sudo权限,修改/etc/sudoers为,

# Userprivilege specification

root    ALL=(ALL:ALL) ALL

hadoop  ALL=(ALL:ALL) ALL   (新增一行

4、 配置SSH免密登陆

a)     切换到hadoop用户,su – Hadoop

b)     生成公钥和私钥,ssh-keygen -t rsa ,打接着按3次Enter键

c)     导入公钥到认证文件,

                i.         ssh-copy-id -i  ~/.ssh/id_rsa.pub master

               ii.         ssh-copy-id -i  ~/.ssh/id_rsa.pub slave1

              iii.         ssh-copy-id -i  ~/.ssh/id_rsa.pub slave2

              iv.         ssh-copy-id -i  ~/.ssh/id_rsa.pub slave3

d)     如果没有安装ssh_server端,使用命令sudo apt-get install openssh-server 进行安装。

e)     可以先将所有节点的公钥放在一台机子里,然后将公钥文件复制到其他节点上。

                i.         在master节点上 ssh-copy-id -i ~/.ssh/id_rsa.pub master

               ii.         在slave1节点上 ssh-copy-id -i ~/.ssh/id_rsa.pub master

              iii.         在slave2节点上 ssh-copy-id -i ~/.ssh/id_rsa.pub master

              iv.         在slave3节点上 ssh-copy-id -i ~/.ssh/id_rsa.pub master

               v.         将master机上的/home/hadoop/.ssh/ authorized_keys文件复制到其他所有节点上:

scp.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/authorized_keys

5、 配置NTP

a)     配置NTP进行集群间的时间同步,sudo apt-get install ntp

b)     修改配置文件/etc/ntp.conf

                i.         主节点

#注释掉server开头的行,并添加

restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap

       Server127.127.1.0

       Fudge127.127.1.0 startum 10

               ii.         从节点

#注释掉server开头的行,并添加

server master

       起动ntp服务 service start ntp

6、 配置hadoop

a)     /etc/profile里添加$HADOOP_HOME环境变量

# set HADOOP_HOME enviroment

HADOOP_HOME=/usr/local/hadoop-2.6.5

PATH=$JAVA_HOME/bin:$HADOOP_HOME:$PATH

export HADOOP_HOME

export PATH

# hadoop

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

b)     涉及到7个配置文件

i.    $HADOOP_HOME/etc/hadoop/hadoop-env.sh
ii.    $HADOOP_HOME/etc/hadoop/yarn-env.sh
iii.    $HADOOP_HOME/etc/hadoop/slaves
iv.    $HADOOP_HOME/etc/hadoop/core-site.xml
v.    $HADOOP_HOME/etc/hadoop/hdfs-site.xml
vi.    $HADOOP_HOME/etc/hadoop/mapred-site.xml
vii.    $HADOOP_HOME/etc/hadoop/yarn-site.xml

c)    配置文件1:hadoop-env.sh

# The java implementation to use.

#export JAVA_HOME=${JAVA_HOME}

exportJAVA_HOME=/usr/local/jdk1.8.0_151

d)     配置文件1:yarn-env.sh

# some Java parameters

# exportJAVA_HOME=/home/y/libexec/jdk1.6.0/

export JAVA_HOME=/usr/local/jdk1.8.0_151

e)     配置文件1:slaves

slave1

slave2

slave3

f)      配置文件1:core-site.xml

<property>

              <name>fs.defaultFS</name>

              <value>hdfs://master:8020</value>

       </property>

       <property>

              <name>hadoop.tmp.dir</name>

              <value>/hadoop/tmp</value>

       </property>

g)     配置文件1:hdfs-site.xml

<property>

              <name>dfs.namenode.name.dir</name>

              <value>file:///hadoop/hdfs/name</value>

       </property>

       <property>

              <name>dfs.datanode.data.dir</name>

              <value>file:///hadoop/hdfs/data</value>

       </property>

       <property>

              <name>dfs.namenode.secondary.http-address</name>

              <value>master:50090</value>

       </property>

       <property>

              <name>dfs.replication</name>

              <value>3</value>

       </property>

h)     配置文件1:mapred-site.xml

<property>

              <name>mapreduce.framework.name</name>

              <value>yarn</value>

       </property>

       <!-- jobhistory properties -->

       <property>

              <name>mapreduce.jobhistory.address</name>

              <value>master:10020</value>

       </property>

       <property>

              <name>mapreduce.jobhistory.webapp.address</name>

              <value>master:19888</value>

       </property>

i)      配置文件1:yarn-site.xml

<property>

              <name>yarn.resourcemanager.hostname</name>

              <value>master</value>

       </property>

       <property>

              <name>yarn.resourcemanager.address</name>

              <value>${yarn.resourcemanager.hostname}:8032</value>

       </property>

       <property>

              <name>yarn.resourcemanager.scheduler.address</name>

              <value>${yarn.resourcemanager.hostname}:8030</value>

       </property>

       <property>

              <name>yarn.resourcemanager.webapp.address</name>

              <value>${yarn.resourcemanager.hostname}:8088</value>

       </property>

       <property>

              <name>yarn.resourcemanager.webapp.https.address</name>

              <value>${yarn.resourcemanager.hostname}:8090</value>

       </property>

       <property>

              <name>yarn.resourcemanager.resource-tracker.address</name>

              <value>${yarn.resourcemanager.hostname}:8031</value>

       </property>

       <property>

              <name>yarn.resourcemanager.admin.address</name>

              <value>${yarn.resourcemanager.hostname}:8033</value>

       </property>

       <property>

              <name>yarn.nodemanager.local-dir</name>

              <value>/hadoop/yarn/local</value>

       </property>

       <property>

              <name>yarn.log-aggregation-enable</name>

              <value>true</value>

       </property>

       <property>

              <name>yarn.nodemanager.remote-app-log-dir</name>

              <value>/hadoop/tmp/logs</value>

       </property>

       <property>

              <name>yarn.log.server.url</name>

              <value>http://master:19888/jobhistory/logs</value>

              <description>URL for job historyserver</description>

       </property>

       <property>

              <name>yarn.nodemanager.vmem-check-enabled</name>

              <value>false</value>

       </property>

       <property>

              <name>yarn.nodemanager.aux-services</name>

              <value>mapreduce_shuffle</value>

       </property>

       <property>

              <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

              <value>org.apache.hadoop.mapred.ShuffleHandler</value>

              </property>

7、 启动hadoop

a)     cd $HADOOP_HOME/sbin

b)     bash start-all.sh    (sh start-all.sh会报错,https://issues.apache.org/jira/browse/HADOOP-8432

c)     bash mr-jobhistory-daemon.shstart historyserver 打开不知道什么服务,不然19888端口不能用

8、 Hadoop集群监控相关端口

服务

Web接口

默认端口

NameNode

http://namenode_host:port/

50070

ResourceManager

http://resourcemanager_host:port

8088

MapReduce JobHistory Server

http://jobhistoryserver_host:port

19888


猜你喜欢

转载自blog.csdn.net/tft3640/article/details/78971668