hadoop2.5搭建过程

1 搭建环境所使用的资源

VMware Workstation 9

ubuntu-14.04.2-desktop-amd64.iso

jdk-7u80-linux-x64.tar.gz

hadoop-2.5.0.tar.gz

zookeeper-3.4.5-cdh5.1.0.tar.gz

hbase-0.98.6-cdh5.3.0.tar.gz

实验室服务器一台

(没有用最新版的hadoop是因为也是看别人教程搭的)

2 准备工作

2.1 安装虚拟机

在Vmware上安装4台虚拟机,使用ubuntu镜像。

如果出现左边那一列不显示

关掉虚拟机,在虚拟机设置中,将显示里的 3D图形加速的勾去掉

设置好用户名和密码

2.2 设置IP地址

设置master和slave的IP,如:

(master01)10.109.252.94,

(slave01)10.109.252.95,

(slave02)10.109.252.96,

(slave03)10.109.252.97。

 

子码掩码:255.255.255.0

默认网关:10.109.252.1

首选DNS:10.3.9.4 10.3.9.5

 

网络连接要选择桥接

 

命令行输入ifconfig

第一行最左边的名字,就是本机的网络接口,此处为 eth0 ,不同机器可能不同。

 

输入命令: 

sudo gedit /etc/network/interfaces

 

在打开的文件中,输入以下代码:

auto eth0  // 使用的网络接口,之前查询接口是为了这里

iface eth0 inet static    // eth0这个接口,使用静态ip设置

address 10.109.252.94   // 设置ip地址

netmask 255.255.255.0  // 设置子网掩码

gateway 10.109.252.1   // 设置网关

dns-nameservers 10.3.9.4   // 设置dns服务器地址

 

设置DNS:

sudo gedit /etc/resolv.conf

加上:

nameserver 10.3.9.4

nameserver 10.3.9.5

 

用以下命令使网络设置生效:

service networking restart

sudo /etc/init.d/networking restart

2.3 修改主机名

主机名存放在/etc/hostname文件中

sudo gedit /etc/hostname

主机名也就是master01,slave01这些

 

然后修改/etc/hosts文件:

sudo gedit /etc/hosts

 

127.0.0.1 localhost

10.109.252.94   master01

10.109.252.95   slave01

10.109.252.96   slave02

10.109.252.97   slave03

2.4 安装配置SSH

目的是为了无密码远程登录

首先一定要确保虚拟机能上网

然后输入命令:

sudo apt-get update
sudo apt-get install ssh

 

输入

ssh localhost

查看是否安装成功

 

关闭防火墙

sudo ufw disable

 

配置无密码远程登录:

 

第一步:产生密钥

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

第二步:导入authorized_keys

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

 

Master需要通过无密码的SSH登陆来控制各个slave从机,因此需要将master上的公钥复制到每个slave从机上。

在Master上输入命令:

 

$ scp ~/.ssh/authorized_keys mcc@slave01:~/.ssh/

$ scp ~/.ssh/authorized_keys mcc@slave02:~/.ssh/

$ scp ~/.ssh/authorized_keys mcc@slave03:~/.ssh/

这里的mcc@slave01换成你自己设置的用户名@主机名

在master01中无密码登录slave01

输入命令:

ssh slave01

 

遇到问题:

Agent admitted failure to sign using the key.

解决办法:

解決方式 使用 ssh-add 指令将私钥 加进来 (根据个人的密匙命名不同更改 id_dsa)
ssh-add   ~/.ssh/id_dsa 

之后就成功啦

2.5 安装Java

在master和slave上分别安装Java7

创建目录:

sudo mkdir /usr/lib/jvm

解压缩到该目录:

sudo tar -zxvf jdk-7u80-linux-x64.tar.gz -C /usr/lib/jvm

 

修改环境变量:  

sudo gedit ~/.bashrc

 

文件的末尾追加下面内容:

#set oracle jdk environment

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_80  

export JRE_HOME=${JAVA_HOME}/jre  

export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  

export PATH=${JAVA_HOME}/bin:$PATH  

 

使环境变量马上生效:

 source ~/.bashrc

3 Hadoop部署

创建一个文件夹

sudo mkdir /opt/modules

解压到/opt/modules下

sudo tar -zxf hadoop-2.5.0.tar.gz -C /opt/modules

hadoop-2.5.0 重命名为hadoop

sudo mv hadoop-2.5.0 hadoop

配置之前,需要在master本地文件系统创建以下文件夹:

~/dfs/name

~/dfs/data

~/tmp

mcc@master01:~$ mkdir /home/mcc/tmp

mcc@master01:~$ mkdir /home/mcc/dfs

mcc@master01:~$ mkdir /home/mcc/dfs/name

mcc@master01:~$ mkdir /home/mcc/dfs/data

ll查看权限是否为当前用户组下的

这里要涉及到的配置文件有7个:

~/hadoop-2.5.0/etc/hadoop/hadoop-env.sh

~/hadoop-2.5.0/etc/hadoop/yarn-env.sh

~/hadoop-2.5.0/etc/hadoop/slaves

~/hadoop-2.5.0/etc/hadoop/core-site.xml

~/hadoop-2.5.0/etc/hadoop/hdfs-site.xml

~/hadoop-2.5.0/etc/hadoop/mapred-site.xml

~/hadoop-2.5.0/etc/hadoop/yarn-site.xml

以上文件默认不存在的,可以复制相应的template文件获得

进入etc/hadoop/

修改hadoop-env.sh

sudo gedit hadoop-env.sh

修改这句为:

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_80

修改yarn-env.sh

sudo gedit yarn-env.sh

修改这句为:

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_80

修改slaves

sudo gedit slaves

修改为:

slave01

slave02

slave03

修改core-site.xml

sudo gedit core-site.xml

<configuration>

       <property>

                <name>fs.defaultFS</name>

                <value>hdfs://master01:9000</value>

       </property>

       <property>

                <name>io.file.buffer.size</name>

                <value>131072</value>

        </property>

       <property>

               <name>hadoop.tmp.dir</name>

               <value>file:/home/mcc/tmp</value>

               <description>Abase for other temporary   directories.</description>

       </property>

</configuration>

修改hdfs-site.xml

sudo gedit hdfs-site.xml

hdfs-site.xml里改

<configuration>

       <property>

                <name>dfs.namenode.secondary.http-address</name>

               <value>master01:9001</value>

       </property>

     <property>

             <name>dfs.namenode.name.dir</name>

             <value>file:/home/mcc/dfs/name</value>

       </property>

      <property>

              <name>dfs.datanode.data.dir</name>

              <value>file:/home/mcc/dfs/data</value>

       </property>

       <property>

               <name>dfs.replication</name>

               <value>3</value>

        </property>

        <property>

                 <name>dfs.webhdfs.enabled</name>

                  <value>true</value>

         </property>

</configuration>

mapred-site.xml.template  重命名为mapred-site.xml

sudo mv mapred-site.xml.template mapred-site.xml

修改mapred-site.xml

sudo gedit mapred-site.xml

<configuration>      

     <property>                                                               

     <name>mapreduce.framework.name</name>

                <value>yarn</value>

           </property>

          <property>

                  <name>mapreduce.jobhistory.address</name>

                  <value>master01:10020</value>

          </property>

          <property>

                <name>mapreduce.jobhistory.webapp.address</name>

                <value>master01:19888</value>

       </property>

</configuration>

修改yarn-site.xml

sudo gedit yarn-site.xml

<configuration>

        <property>

               <name>yarn.nodemanager.aux-services</name>

               <value>mapreduce_shuffle</value>

        </property>

        <property>                                                                

    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

               <value>org.apache.hadoop.mapred.ShuffleHandler</value>

        </property>

        <property>

               <name>yarn.resourcemanager.address</name>

               <value>master01:8032</value>

       </property>

       <property>

               <name>yarn.resourcemanager.scheduler.address</name>

               <value>master01:8030</value>

       </property>

       <property>

            <name>yarn.resourcemanager.resource-tracker.address</name>

             <value>master01:8031</value>

      </property>

      <property>

              <name>yarn.resourcemanager.admin.address</name>

               <value>master01:8033</value>

       </property>

       <property>

               <name>yarn.resourcemanager.webapp.address</name>

               <value>master01:8088</value>

       </property>

</configuration>

修改环境变量:

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_80  

export JRE_HOME=${JAVA_HOME}/jre  

export HADOOP_HOME=/opt/modules/hadoop

export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  

export PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:${HADOOP_HOME}/bin:$PATH  

格式化

mcc@master01:/opt/modules/hadoop$ sudo bin/hdfs namenode -format

之后启动时报错

解决办法:

每个虚拟机都输一遍这个命令 修改目录的所有者  

sudo chown -R mcc:mcc /opt/modules/

然后又出现问题,jps不显示namenode

解决办法:

输入命令: sudo chmod -R 777 /home/dfs

slave上的配置要保持和master上一致,不要去把有master的地方改成slave

master上输:(要进入hadoop文件夹)

sbin/start-all.sh 

然后成功启动hadoop集群

4 配置zookeeper

解压:

tar -zxf zookeeper-3.4.5-cdh5.1.0.tar.gz -C /opt/modules/

新建一个目录:

mcc@slave01:/opt/modules/zookeeper-3.4.5-cdh5.1.0$ mkdir zkData

在这个目录下新建一个文件叫myid

mcc@slave01:/opt/modules/zookeeper-3.4.5-cdh5.1.0/zkData$ touch myid

slave01,slave02,slave03分别写上数字1,2,3

将zoo_sample.cfg重命名

mcc@slave01:/opt/modules/zookeeper-3.4.5-cdh5.1.0/conf$ mv zoo_sample.cfg zoo.cfg

修改zoo.cfg:

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/opt/modules/zookeeper-3.4.5-cdh5.1.0/zkData

# the port at which the clients will connect

clientPort=2181

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

 

server.1=slave01:2888:3888

server.2=slave02:2888:3888

server.3=slave03:2888:3888

将zookeeper目录移到slave02,slave03上,并修改myid

scp -r zookeeper-3.4.5-cdh5.1.0/ slave02:/opt/modules/

scp -r zookeeper-3.4.5-cdh5.1.0/ slave03:/opt/modules/

修改环境变量:

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_80  

export JRE_HOME=${JAVA_HOME}/jre  

export HADOOP_HOME=/opt/modules/hadoop

export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.5-cdh5.1.0

export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  

export PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:${HADOOP_HOME}/bin:${ZOOKEEPER_HOME}/bin:$PATH

启动:

$ZOOKEEPER_HOME/bin/zkServer.sh start

Jps查看进程,已成功启动

5 配置Hbase

解压:

tar -zxf hbase-0.98.6-cdh5.3.0.tar.gz -C /opt/modules/

 

配置hbase-env.sh

sudo gedit hbase-env.sh

 

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_80

export HBASE_MANAGES_ZK=false

 

配置hbase-site.xml

sudo gedit hbase-site.xml

 

<configuration>

 <property>

    <name>hbase.rootdir</name>

    <value>hdfs://master01:9000/hbase</value>

  </property>

  <property>

    <name>hbase.cluster.distributed</name>

    <value>true</value>

  </property>   

  <property>

    <name>hbase.zookeeper.quorum</name>

    <value>slave01,slave02,slave03</value>

  </property>

</configuration>

 

 

配置regionservers

sudo gedit regionservers

 

slave01

slave02

slave03

 

然后要用我们的hadoop里的jar包替换hbase里的

在hbase的lib里

rm -rf hadoop*.jar删掉所有的hadoop相关的jar包

 

替换

find /opt/modules/hadoop/share/hadoop -name "hadoop*jar" | xargs -i cp {} /opt/modules/hbase-0.98.6-cdh5.3.0/lib

 

因为Hbase是依赖于Hadoop的,它要求Hadoop的jar必须部署在HBase的lib下

 

然后又出现了新的问题:

 

FATAL [master:master01:60000] master.HMaster: Unhandled exception. Starting shutdown.

java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "master01":9000; java.net.UnknownHostException; For more details see:  

 

 

解决办法:(玄学)

将hbase-site.xml里的一个属性改为:

<property>

    <name>hbase.rootdir</name>

    <value>hdfs://10.109.252.94:9000/hbase</value>

  </property>

将hbase目录移到slave01,slave02,slave03上

scp -r hbase-0.98.6-cdh5.3.0/ slave01:/opt/modules

scp -r hbase-0.98.6-cdh5.3.0/ slave02:/opt/modules

scp -r hbase-0.98.6-cdh5.3.0/ slave03:/opt/modules

启动  在master上输

mcc@master01:/opt/modules/hbase-0.98.6-cdh5.3.0$ ./bin/start-hbase.sh

启动成功

猜你喜欢

转载自www.cnblogs.com/mengchunchen/p/9029122.html