hadoop-CDH版环境搭建(二)

 

 Step 8:安装CDH5

a、下载rpm安装包

      1、进入下载目录,/usr/tool/:

      2、执行下载:

      wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm       --如果Linux版本是CentOS 5.x,则将红色字体部分改成5,下同

      3、禁用GPG签名检查,并安装本地软件包:

      yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm

      4、添加cloudera仓库验证:

      rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

b、安装hadoop插件包

      1、master上安装namenoderesourcemanagernodemanagerdatanodemapreducehistoryserverproxyserverhadoop-client

      yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-hdfs-namenode hadoop-yarn-resourcemanager hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce hadoop-mapreduce-historyserver hadoop-yarn-proxyserver -y

             2、slave1和slave2上安装yarnnodemanagerdatanodemapreducehadoop-client

             yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-yarn hadoop-hdfs-datanode hadoop-yarn-nodemanager hadoop-mapreduce -y

             3、安装httpfs:

             yum install hadoop-httpfs -y

             4、安装Secondary NameNode(可选):

             选择一台机器作为Secondary NameNode,安装SecondaryNamenode

             yum install hadoop-hdfs-secondarynamenode -y

             在/etc/hadoop/conf/hdfs-site.xml中添加以下配置:

 

<property>
<name>dfs.namenode.checkpoint.check.period</name>
<value>60</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///data/cache1/dfs/namesecondary</value>
</property>
<property>
<name>file:///data/cache1/dfs/namesecondary</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.namenode.num.checkpoints.retained</name>
<value>2</value>
</property>
<!-- 将slave1设置成SecondaryNameNode -->
<property>
<name>dfs.secondary.http.address</name>
<value>slave1:50090</value>
</property>

详细配置可参考:http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

如果要设置多个Secondary Namenode,可参考:http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/

Step 9:创建目录

a、在master上创建目录:

mkdir -p /data/cache1/dfs/nn

chown -R hdfs:hadoop /data/cache1/dfs/nn

chmod 700 /data/cache1/dfs/nn

hdfs dfs -mkdir -p /user/hadoop/{done,tmp}

b、在slave1&slave2上创建目录: 

mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
usermod -a -G mapred hadoop
chown -R mapred:hadoop /data/cache1/dfs/mapred/local

 c、在HDFS上创建:(此配置需在hadoop集群环境搭建完成并启动后执行)

hdfs dfs -mkdir -p /user/hadoop/{done,tmp}
sudo -u hdfs hadoop fs -chown mapred:hadoop /user/hadoop/*
hdfs dfs -mkdir -p /var/log/hadoop-yarn/apps
sudo -u hdfs hadoop fs -chown hadoop:hdfs /var/log/hadoop-yarn/apps
hdfs dfs -mkdir -p /user/hive/warehouse
sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse
sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
hdfs dfs -mkdir /tmp/hive
sudo -u hdfs hadoop fs -chmod 777 /tmp/hive

 

Step 10:配置环境变量

a、编辑/etc/profile,在里面添加如下环境变量:

export HADOOP_HOME=/usr/lib/hadoop
export HIVE_HOME=/usr/lib/hive
export HBASE_HOME=/usr/lib/hbase
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH

 b、执行以下命令生效:

source /etc/profile

 

Step 11:修改hadoop配置文件:

a、配置文件说明:

配置文件

类型

说明

hadoop-env.sh

Bash脚本

Hadoop运行环境变量设置

core-site.xml

xml

配置Hadoop core,如IO

hdfs-site.xml

xml

配置HDFS守护进程:NNJNDN

yarn-env.sh

Bash脚本

Yarn运行环境变量设置

yarn-site.xml

xml

Yarn框架配置环境

mapred-site.xml

xml

MR属性设置

capacity-scheduler.xml

xml

Yarn调度属性设置

container-executor.cfg

cfg

Yarn Container配置

mapred-queues.xml

xml

MR队列设置

hadoop-metrics.properties

Java属性

Hadoop Metrics配置

hadoop-metrics2.properties

Java属性

Hadoop Metrics配置

slaves

Plain Text

DN节点配置

exclude

Plain Text

移除DN节点配置文件

log4j.properties

Java属性

系统日志设置

configuration.xsl

 

 

b、修改master机器上的配置文件,然后scp到各个slave的对应目录:

/etc/hadoop/conf/core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>master</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hdfs</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>httpfs-host.foo.com</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>

/etc/hadoop/conf/hdfs-site.xml

<property>
<name>dfs.namenode.name.dir</name>
<value>/data/cache1/dfs/nn/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/cache1/dfs/dn/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>

 /etc/hadoop/conf/mapred-site.xml

<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>

<property>
<name>mapreduce.jobhistory.joblist.cache.size</name>
<value>50000</value>
</property>

<!-- 前面在HDFS上创建的目录 -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/user/hadoop/done</value>
</property>

<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/user/hadoop/tmp</value>
</property>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

/etc/hadoop/conf/yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
</property>

<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>

<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://master:9000/var/log/hadoop-yarn/apps</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>

<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$H0041DOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,
$HADOOP_YARN_HOME/lib/*
</value>
</property>

<property>
<name>yarn.web-proxy.address</name>
<value>master:54315</value>
</property>

c、添加所有的slave的 /etc/hadoop/slaves:

slave1

slave2

d、最后将以上修改的文件同步到slave上:

scp -r /etc/hadoop/conf root@slave1:/etc/hadoop/

scp -r /etc/hadoop/conf root@slave2:/etc/hadoop/

 

Step 12:开启回收站功能(可选)

/etc/hadoop/conf/core-site.xml中添加如下两个参数:

1、fs.trash.interval:该参数值为时间间隔,单位为分钟,默认为0,表示回收站功能关闭。该值表示回收站中文件保存多长时间,如果服务端配置了该参数,则忽略客户端的配置;如果服务端关闭了该参数,则检查客户端是否有配置该参数;

2、fs.trash.checkpoint.interval:该参数值为时间间隔,单位为分钟,默认为0。该值表示检查回收站时间间隔,该值要小于fs.trash.interval,该值在服务端配置。如果该值设置为0,则使用 fs.trash.interval 的值。

 

Step 13:配置LZO(可选)

a、下载repo文件到traceMaster上的/etc/yum.repos.d/

wget http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/cloudera-gplextras5.repo

b、安装LZO:yum install hadoop-lzo* impala-lzo -y

c、/etc/hadoop/conf/core-site.xml中添加以下配置:

<property>
  <name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
  <name>io.compression.codec.lzo.class</name>
  <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

如果想要MapReduce在写中间结果时也使用LZO压缩,可以将以下配置添加到/etc/hadoop/conf/mapred-site.xml中:

<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

d、配置完成后,进行测试:

hadoop jar /usr/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer hdfs://master:9000/user/hadoop/workflows/shellTest/workflow.xml

hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start

 

Step 14:启动服务

a、master启动:

hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start

 b、slave1&slave2启动:

/etc/init.d/hadoop-hdfs-datanode start
/etc/init.d/hadoop-yarn-nodemanager start

以上启动过程中,会遇到启动失败的问题,按照提示找到对应的log日志文件,进去查看错误详情,绝大多数是因为文件没有操作权限引起的,执行chmod –R 777 对应文件目录即可解决!

c、启动后检查:

http://192.168.157.130:50070

HDFS

http://192.168.157.130:8088

ResourceManager(Yarn)

http://192.168.157.130:8088/cluster/nodes

在线的节点

http://192.168.157.130:8042

 

NodeManager

http://192.168.157.131:8042

http://192.168.157.132:8042

http://192.168.157.130:19888/

JobHistory

 

 Step 8:安装CDH5

a、下载rpm安装包

      1、进入下载目录,/usr/tool/:

      2、执行下载:

      wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm       --如果Linux版本是CentOS 5.x,则将红色字体部分改成5,下同

      3、禁用GPG签名检查,并安装本地软件包:

      yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm

      4、添加cloudera仓库验证:

      rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

b、安装hadoop插件包

      1、master上安装namenoderesourcemanagernodemanagerdatanodemapreducehistoryserverproxyserverhadoop-client

      yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-hdfs-namenode hadoop-yarn-resourcemanager hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce hadoop-mapreduce-historyserver hadoop-yarn-proxyserver -y

             2、slave1和slave2上安装yarnnodemanagerdatanodemapreducehadoop-client

             yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-yarn hadoop-hdfs-datanode hadoop-yarn-nodemanager hadoop-mapreduce -y

             3、安装httpfs:

             yum install hadoop-httpfs -y

             4、安装Secondary NameNode(可选):

             选择一台机器作为Secondary NameNode,安装SecondaryNamenode

             yum install hadoop-hdfs-secondarynamenode -y

             在/etc/hadoop/conf/hdfs-site.xml中添加以下配置:

 

<property>
<name>dfs.namenode.checkpoint.check.period</name>
<value>60</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///data/cache1/dfs/namesecondary</value>
</property>
<property>
<name>file:///data/cache1/dfs/namesecondary</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.namenode.num.checkpoints.retained</name>
<value>2</value>
</property>
<!-- 将slave1设置成SecondaryNameNode -->
<property>
<name>dfs.secondary.http.address</name>
<value>slave1:50090</value>
</property>

详细配置可参考:http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

如果要设置多个Secondary Namenode,可参考:http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/

Step 9:创建目录

a、在master上创建目录:

mkdir -p /data/cache1/dfs/nn

chown -R hdfs:hadoop /data/cache1/dfs/nn

chmod 700 /data/cache1/dfs/nn

hdfs dfs -mkdir -p /user/hadoop/{done,tmp}

b、在slave1&slave2上创建目录: 

mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
usermod -a -G mapred hadoop
chown -R mapred:hadoop /data/cache1/dfs/mapred/local

 c、在HDFS上创建:(此配置需在hadoop集群环境搭建完成并启动后执行)

hdfs dfs -mkdir -p /user/hadoop/{done,tmp}
sudo -u hdfs hadoop fs -chown mapred:hadoop /user/hadoop/*
hdfs dfs -mkdir -p /var/log/hadoop-yarn/apps
sudo -u hdfs hadoop fs -chown hadoop:hdfs /var/log/hadoop-yarn/apps
hdfs dfs -mkdir -p /user/hive/warehouse
sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse
sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
hdfs dfs -mkdir /tmp/hive
sudo -u hdfs hadoop fs -chmod 777 /tmp/hive

 

Step 10:配置环境变量

a、编辑/etc/profile,在里面添加如下环境变量:

export HADOOP_HOME=/usr/lib/hadoop
export HIVE_HOME=/usr/lib/hive
export HBASE_HOME=/usr/lib/hbase
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH

 b、执行以下命令生效:

source /etc/profile

 

Step 11:修改hadoop配置文件:

a、配置文件说明:

配置文件

类型

说明

hadoop-env.sh

Bash脚本

Hadoop运行环境变量设置

core-site.xml

xml

配置Hadoop core,如IO

hdfs-site.xml

xml

配置HDFS守护进程:NNJNDN

yarn-env.sh

Bash脚本

Yarn运行环境变量设置

yarn-site.xml

xml

Yarn框架配置环境

mapred-site.xml

xml

MR属性设置

capacity-scheduler.xml

xml

Yarn调度属性设置

container-executor.cfg

cfg

Yarn Container配置

mapred-queues.xml

xml

MR队列设置

hadoop-metrics.properties

Java属性

Hadoop Metrics配置

hadoop-metrics2.properties

Java属性

Hadoop Metrics配置

slaves

Plain Text

DN节点配置

exclude

Plain Text

移除DN节点配置文件

log4j.properties

Java属性

系统日志设置

configuration.xsl

 

 

b、修改master机器上的配置文件,然后scp到各个slave的对应目录:

/etc/hadoop/conf/core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>master</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hdfs</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>httpfs-host.foo.com</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>

/etc/hadoop/conf/hdfs-site.xml

<property>
<name>dfs.namenode.name.dir</name>
<value>/data/cache1/dfs/nn/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/cache1/dfs/dn/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>

 /etc/hadoop/conf/mapred-site.xml

<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>

<property>
<name>mapreduce.jobhistory.joblist.cache.size</name>
<value>50000</value>
</property>

<!-- 前面在HDFS上创建的目录 -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/user/hadoop/done</value>
</property>

<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/user/hadoop/tmp</value>
</property>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

/etc/hadoop/conf/yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
</property>

<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>

<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://master:9000/var/log/hadoop-yarn/apps</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>

<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$H0041DOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,
$HADOOP_YARN_HOME/lib/*
</value>
</property>

<property>
<name>yarn.web-proxy.address</name>
<value>master:54315</value>
</property>

c、添加所有的slave的 /etc/hadoop/slaves:

slave1

slave2

d、最后将以上修改的文件同步到slave上:

scp -r /etc/hadoop/conf root@slave1:/etc/hadoop/

scp -r /etc/hadoop/conf root@slave2:/etc/hadoop/

 

Step 12:开启回收站功能(可选)

/etc/hadoop/conf/core-site.xml中添加如下两个参数:

1、fs.trash.interval:该参数值为时间间隔,单位为分钟,默认为0,表示回收站功能关闭。该值表示回收站中文件保存多长时间,如果服务端配置了该参数,则忽略客户端的配置;如果服务端关闭了该参数,则检查客户端是否有配置该参数;

2、fs.trash.checkpoint.interval:该参数值为时间间隔,单位为分钟,默认为0。该值表示检查回收站时间间隔,该值要小于fs.trash.interval,该值在服务端配置。如果该值设置为0,则使用 fs.trash.interval 的值。

 

Step 13:配置LZO(可选)

a、下载repo文件到traceMaster上的/etc/yum.repos.d/

wget http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/cloudera-gplextras5.repo

b、安装LZO:yum install hadoop-lzo* impala-lzo -y

c、/etc/hadoop/conf/core-site.xml中添加以下配置:

<property>
  <name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
  <name>io.compression.codec.lzo.class</name>
  <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

如果想要MapReduce在写中间结果时也使用LZO压缩,可以将以下配置添加到/etc/hadoop/conf/mapred-site.xml中:

<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

d、配置完成后,进行测试:

hadoop jar /usr/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer hdfs://master:9000/user/hadoop/workflows/shellTest/workflow.xml

hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start

 

Step 14:启动服务

a、master启动:

hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start

 b、slave1&slave2启动:

/etc/init.d/hadoop-hdfs-datanode start
/etc/init.d/hadoop-yarn-nodemanager start

以上启动过程中,会遇到启动失败的问题,按照提示找到对应的log日志文件,进去查看错误详情,绝大多数是因为文件没有操作权限引起的,执行chmod –R 777 对应文件目录即可解决!

c、启动后检查:

http://192.168.157.130:50070

HDFS

http://192.168.157.130:8088

ResourceManager(Yarn)

http://192.168.157.130:8088/cluster/nodes

在线的节点

http://192.168.157.130:8042

 

NodeManager

http://192.168.157.131:8042

http://192.168.157.132:8042

http://192.168.157.130:19888/

JobHistory

 

 Step 8:安装CDH5

a、下载rpm安装包

      1、进入下载目录,/usr/tool/:

      2、执行下载:

      wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm       --如果Linux版本是CentOS 5.x,则将红色字体部分改成5,下同

      3、禁用GPG签名检查,并安装本地软件包:

      yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm

      4、添加cloudera仓库验证:

      rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

b、安装hadoop插件包

      1、master上安装namenoderesourcemanagernodemanagerdatanodemapreducehistoryserverproxyserverhadoop-client

      yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-hdfs-namenode hadoop-yarn-resourcemanager hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce hadoop-mapreduce-historyserver hadoop-yarn-proxyserver -y

             2、slave1和slave2上安装yarnnodemanagerdatanodemapreducehadoop-client

             yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-yarn hadoop-hdfs-datanode hadoop-yarn-nodemanager hadoop-mapreduce -y

             3、安装httpfs:

             yum install hadoop-httpfs -y

             4、安装Secondary NameNode(可选):

             选择一台机器作为Secondary NameNode,安装SecondaryNamenode

             yum install hadoop-hdfs-secondarynamenode -y

             在/etc/hadoop/conf/hdfs-site.xml中添加以下配置:

 

<property>
<name>dfs.namenode.checkpoint.check.period</name>
<value>60</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///data/cache1/dfs/namesecondary</value>
</property>
<property>
<name>file:///data/cache1/dfs/namesecondary</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.namenode.num.checkpoints.retained</name>
<value>2</value>
</property>
<!-- 将slave1设置成SecondaryNameNode -->
<property>
<name>dfs.secondary.http.address</name>
<value>slave1:50090</value>
</property>

详细配置可参考:http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

如果要设置多个Secondary Namenode,可参考:http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/

Step 9:创建目录

a、在master上创建目录:

mkdir -p /data/cache1/dfs/nn

chown -R hdfs:hadoop /data/cache1/dfs/nn

chmod 700 /data/cache1/dfs/nn

hdfs dfs -mkdir -p /user/hadoop/{done,tmp}

b、在slave1&slave2上创建目录: 

mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
usermod -a -G mapred hadoop
chown -R mapred:hadoop /data/cache1/dfs/mapred/local

 c、在HDFS上创建:(此配置需在hadoop集群环境搭建完成并启动后执行)

hdfs dfs -mkdir -p /user/hadoop/{done,tmp}
sudo -u hdfs hadoop fs -chown mapred:hadoop /user/hadoop/*
hdfs dfs -mkdir -p /var/log/hadoop-yarn/apps
sudo -u hdfs hadoop fs -chown hadoop:hdfs /var/log/hadoop-yarn/apps
hdfs dfs -mkdir -p /user/hive/warehouse
sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse
sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
hdfs dfs -mkdir /tmp/hive
sudo -u hdfs hadoop fs -chmod 777 /tmp/hive

 

Step 10:配置环境变量

a、编辑/etc/profile,在里面添加如下环境变量:

export HADOOP_HOME=/usr/lib/hadoop
export HIVE_HOME=/usr/lib/hive
export HBASE_HOME=/usr/lib/hbase
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH

 b、执行以下命令生效:

source /etc/profile

 

Step 11:修改hadoop配置文件:

a、配置文件说明:

配置文件

类型

说明

hadoop-env.sh

Bash脚本

Hadoop运行环境变量设置

core-site.xml

xml

配置Hadoop core,如IO

hdfs-site.xml

xml

配置HDFS守护进程:NNJNDN

yarn-env.sh

Bash脚本

Yarn运行环境变量设置

yarn-site.xml

xml

Yarn框架配置环境

mapred-site.xml

xml

MR属性设置

capacity-scheduler.xml

xml

Yarn调度属性设置

container-executor.cfg

cfg

Yarn Container配置

mapred-queues.xml

xml

MR队列设置

hadoop-metrics.properties

Java属性

Hadoop Metrics配置

hadoop-metrics2.properties

Java属性

Hadoop Metrics配置

slaves

Plain Text

DN节点配置

exclude

Plain Text

移除DN节点配置文件

log4j.properties

Java属性

系统日志设置

configuration.xsl

 

 

b、修改master机器上的配置文件,然后scp到各个slave的对应目录:

/etc/hadoop/conf/core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>master</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hdfs</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>httpfs-host.foo.com</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>

/etc/hadoop/conf/hdfs-site.xml

<property>
<name>dfs.namenode.name.dir</name>
<value>/data/cache1/dfs/nn/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/cache1/dfs/dn/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>

 /etc/hadoop/conf/mapred-site.xml

<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>

<property>
<name>mapreduce.jobhistory.joblist.cache.size</name>
<value>50000</value>
</property>

<!-- 前面在HDFS上创建的目录 -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/user/hadoop/done</value>
</property>

<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/user/hadoop/tmp</value>
</property>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

/etc/hadoop/conf/yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
</property>

<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>

<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://master:9000/var/log/hadoop-yarn/apps</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>

<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$H0041DOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,
$HADOOP_YARN_HOME/lib/*
</value>
</property>

<property>
<name>yarn.web-proxy.address</name>
<value>master:54315</value>
</property>

c、添加所有的slave的 /etc/hadoop/slaves:

slave1

slave2

d、最后将以上修改的文件同步到slave上:

scp -r /etc/hadoop/conf root@slave1:/etc/hadoop/

scp -r /etc/hadoop/conf root@slave2:/etc/hadoop/

 

Step 12:开启回收站功能(可选)

/etc/hadoop/conf/core-site.xml中添加如下两个参数:

1、fs.trash.interval:该参数值为时间间隔,单位为分钟,默认为0,表示回收站功能关闭。该值表示回收站中文件保存多长时间,如果服务端配置了该参数,则忽略客户端的配置;如果服务端关闭了该参数,则检查客户端是否有配置该参数;

2、fs.trash.checkpoint.interval:该参数值为时间间隔,单位为分钟,默认为0。该值表示检查回收站时间间隔,该值要小于fs.trash.interval,该值在服务端配置。如果该值设置为0,则使用 fs.trash.interval 的值。

 

Step 13:配置LZO(可选)

a、下载repo文件到traceMaster上的/etc/yum.repos.d/

wget http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/cloudera-gplextras5.repo

b、安装LZO:yum install hadoop-lzo* impala-lzo -y

c、/etc/hadoop/conf/core-site.xml中添加以下配置:

<property>
  <name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
  <name>io.compression.codec.lzo.class</name>
  <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

如果想要MapReduce在写中间结果时也使用LZO压缩,可以将以下配置添加到/etc/hadoop/conf/mapred-site.xml中:

<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

d、配置完成后,进行测试:

hadoop jar /usr/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer hdfs://master:9000/user/hadoop/workflows/shellTest/workflow.xml

hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start

 

Step 14:启动服务

a、master启动:

hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start

 b、slave1&slave2启动:

/etc/init.d/hadoop-hdfs-datanode start
/etc/init.d/hadoop-yarn-nodemanager start

以上启动过程中,会遇到启动失败的问题,按照提示找到对应的log日志文件,进去查看错误详情,绝大多数是因为文件没有操作权限引起的,执行chmod –R 777 对应文件目录即可解决!

c、启动后检查:

http://192.168.157.130:50070

HDFS

http://192.168.157.130:8088

ResourceManager(Yarn)

http://192.168.157.130:8088/cluster/nodes

在线的节点

http://192.168.157.130:8042

 

NodeManager

http://192.168.157.131:8042

http://192.168.157.132:8042

http://192.168.157.130:19888/

JobHistory

 

 Step 8:安装CDH5

a、下载rpm安装包

      1、进入下载目录,/usr/tool/:

      2、执行下载:

      wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm       --如果Linux版本是CentOS 5.x,则将红色字体部分改成5,下同

      3、禁用GPG签名检查,并安装本地软件包:

      yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm

      4、添加cloudera仓库验证:

      rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

b、安装hadoop插件包

      1、master上安装namenoderesourcemanagernodemanagerdatanodemapreducehistoryserverproxyserverhadoop-client

      yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-hdfs-namenode hadoop-yarn-resourcemanager hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce hadoop-mapreduce-historyserver hadoop-yarn-proxyserver -y

             2、slave1和slave2上安装yarnnodemanagerdatanodemapreducehadoop-client

             yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-yarn hadoop-hdfs-datanode hadoop-yarn-nodemanager hadoop-mapreduce -y

             3、安装httpfs:

             yum install hadoop-httpfs -y

             4、安装Secondary NameNode(可选):

             选择一台机器作为Secondary NameNode,安装SecondaryNamenode

             yum install hadoop-hdfs-secondarynamenode -y

             在/etc/hadoop/conf/hdfs-site.xml中添加以下配置:

 

<property>
<name>dfs.namenode.checkpoint.check.period</name>
<value>60</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///data/cache1/dfs/namesecondary</value>
</property>
<property>
<name>file:///data/cache1/dfs/namesecondary</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.namenode.num.checkpoints.retained</name>
<value>2</value>
</property>
<!-- 将slave1设置成SecondaryNameNode -->
<property>
<name>dfs.secondary.http.address</name>
<value>slave1:50090</value>
</property>

详细配置可参考:http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

如果要设置多个Secondary Namenode,可参考:http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/

Step 9:创建目录

a、在master上创建目录:

mkdir -p /data/cache1/dfs/nn

chown -R hdfs:hadoop /data/cache1/dfs/nn

chmod 700 /data/cache1/dfs/nn

hdfs dfs -mkdir -p /user/hadoop/{done,tmp}

b、在slave1&slave2上创建目录: 

mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
usermod -a -G mapred hadoop
chown -R mapred:hadoop /data/cache1/dfs/mapred/local

 c、在HDFS上创建:(此配置需在hadoop集群环境搭建完成并启动后执行)

hdfs dfs -mkdir -p /user/hadoop/{done,tmp}
sudo -u hdfs hadoop fs -chown mapred:hadoop /user/hadoop/*
hdfs dfs -mkdir -p /var/log/hadoop-yarn/apps
sudo -u hdfs hadoop fs -chown hadoop:hdfs /var/log/hadoop-yarn/apps
hdfs dfs -mkdir -p /user/hive/warehouse
sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse
sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
hdfs dfs -mkdir /tmp/hive
sudo -u hdfs hadoop fs -chmod 777 /tmp/hive

 

Step 10:配置环境变量

a、编辑/etc/profile,在里面添加如下环境变量:

export HADOOP_HOME=/usr/lib/hadoop
export HIVE_HOME=/usr/lib/hive
export HBASE_HOME=/usr/lib/hbase
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH

 b、执行以下命令生效:

source /etc/profile

 

Step 11:修改hadoop配置文件:

a、配置文件说明:

配置文件

类型

说明

hadoop-env.sh

Bash脚本

Hadoop运行环境变量设置

core-site.xml

xml

配置Hadoop core,如IO

hdfs-site.xml

xml

配置HDFS守护进程:NNJNDN

yarn-env.sh

Bash脚本

Yarn运行环境变量设置

yarn-site.xml

xml

Yarn框架配置环境

mapred-site.xml

xml

MR属性设置

capacity-scheduler.xml

xml

Yarn调度属性设置

container-executor.cfg

cfg

Yarn Container配置

mapred-queues.xml

xml

MR队列设置

hadoop-metrics.properties

Java属性

Hadoop Metrics配置

hadoop-metrics2.properties

Java属性

Hadoop Metrics配置

slaves

Plain Text

DN节点配置

exclude

Plain Text

移除DN节点配置文件

log4j.properties

Java属性

系统日志设置

configuration.xsl

 

 

b、修改master机器上的配置文件,然后scp到各个slave的对应目录:

/etc/hadoop/conf/core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>master</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hdfs</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>httpfs-host.foo.com</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>

/etc/hadoop/conf/hdfs-site.xml

<property>
<name>dfs.namenode.name.dir</name>
<value>/data/cache1/dfs/nn/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/cache1/dfs/dn/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>

 /etc/hadoop/conf/mapred-site.xml

<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>

<property>
<name>mapreduce.jobhistory.joblist.cache.size</name>
<value>50000</value>
</property>

<!-- 前面在HDFS上创建的目录 -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/user/hadoop/done</value>
</property>

<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/user/hadoop/tmp</value>
</property>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

/etc/hadoop/conf/yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
</property>

<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>

<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://master:9000/var/log/hadoop-yarn/apps</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>

<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$H0041DOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,
$HADOOP_YARN_HOME/lib/*
</value>
</property>

<property>
<name>yarn.web-proxy.address</name>
<value>master:54315</value>
</property>

c、添加所有的slave的 /etc/hadoop/slaves:

slave1

slave2

d、最后将以上修改的文件同步到slave上:

scp -r /etc/hadoop/conf root@slave1:/etc/hadoop/

scp -r /etc/hadoop/conf root@slave2:/etc/hadoop/

 

Step 12:开启回收站功能(可选)

/etc/hadoop/conf/core-site.xml中添加如下两个参数:

1、fs.trash.interval:该参数值为时间间隔,单位为分钟,默认为0,表示回收站功能关闭。该值表示回收站中文件保存多长时间,如果服务端配置了该参数,则忽略客户端的配置;如果服务端关闭了该参数,则检查客户端是否有配置该参数;

2、fs.trash.checkpoint.interval:该参数值为时间间隔,单位为分钟,默认为0。该值表示检查回收站时间间隔,该值要小于fs.trash.interval,该值在服务端配置。如果该值设置为0,则使用 fs.trash.interval 的值。

 

Step 13:配置LZO(可选)

a、下载repo文件到traceMaster上的/etc/yum.repos.d/

wget http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/cloudera-gplextras5.repo

b、安装LZO:yum install hadoop-lzo* impala-lzo -y

c、/etc/hadoop/conf/core-site.xml中添加以下配置:

<property>
  <name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
  <name>io.compression.codec.lzo.class</name>
  <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

如果想要MapReduce在写中间结果时也使用LZO压缩,可以将以下配置添加到/etc/hadoop/conf/mapred-site.xml中:

<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

d、配置完成后,进行测试:

hadoop jar /usr/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer hdfs://master:9000/user/hadoop/workflows/shellTest/workflow.xml

hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start

 

Step 14:启动服务

a、master启动:

hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start

 b、slave1&slave2启动:

/etc/init.d/hadoop-hdfs-datanode start
/etc/init.d/hadoop-yarn-nodemanager start

以上启动过程中,会遇到启动失败的问题,按照提示找到对应的log日志文件,进去查看错误详情,绝大多数是因为文件没有操作权限引起的,执行chmod –R 777 对应文件目录即可解决!

c、启动后检查:

http://192.168.157.130:50070

HDFS

http://192.168.157.130:8088

ResourceManager(Yarn)

http://192.168.157.130:8088/cluster/nodes

在线的节点

http://192.168.157.130:8042

 

NodeManager

http://192.168.157.131:8042

http://192.168.157.132:8042

http://192.168.157.130:19888/

JobHistory

 

猜你喜欢

转载自chaijuntao.iteye.com/blog/2237921
今日推荐