Cloudera(CDH)的hadoop和Hortonworks(HDP)的的hadoop编译

版本使用范围,大致 与Apache Hadoop编译步骤一致大同小异,因为CDH的Hadoop的本来就是从社区版迁过来的,所以,这篇文章同样适合所有的以Apache Hadoop为原型的其他商业版本的hadoop编译,例如,Cloudera(CDH)的hadoop和Hortonworks(HDP)的的hadoop编译,下面开工:

1,环境准备(Cenots6.x,其他的大同小异)

(1)yum安装 sudo yum install -y autoconf automake libtool git  gcc gcc-c++  make cmake openssl-devel,ncurses-devel bzip2-devel
(2)安装JDK1.7+
(3)安装Maven3.0+
(4)安装Ant1.8+
(5)安装 protobuf-2.5.0.tar.gz
  安装例子:
  cd /home/search
  tar -zxvf  protobuf-2.5.0.tar.gz
  cd /home/search/protobuf-2.5.0
  ./configure --prefix=/home/search/protobuf(指定的一个安装目录,默认是根目录)
  make && make install

(6)安装snappy1.1.0.tar.gz(可选选项,如果需要编译完的Hadoop支持Snappy压缩,需要此步骤)
   安装例子:
cd /home/search
tar -zxvf snappy1.1.0.tar.gz
cd /home/search/snappy1.1.0
./configure --prefix=/home/search/snappy(指定的一个安装目录,默认是根目录)
make && make install
(7)安装hadoop-snappy
git下载地址
git clone https://github.com/electrum/hadoop-snappy.git
安装例子:
下载完成后
cd hadoop-snappy
执行maven打包命令
mvn package  -Dsnappy.prefix=/home/search/snappy (需要6步骤)
构建成功后





这个目录就是编译后的snappy的本地库,在hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT-tar/hadoop-snappy-0.0.1-SNAPSHOT/lib目录下,有个hadoop-snappy-0.0.1-SNAPSHOT.jar,在hadoop编译后,需要拷贝到$HADOOP_HOME/lib目录下

上面使用到的包,可到百度网盘:http://pan.baidu.com/s/1mBjZ4下载

2,下载编译hadoop2.6.0
下载cdh-hadoop2.6.0源码:
wget http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.4.1-src.tar.gz
           解压
            tar -zxvf hadoop-2.6.0-cdh5.4.1-src.tar.gz
            解压后进入根目录
            执行下面这个编译命令,就能把snappy库绑定到hadoop的本地库里面,这样就可以在所有的机器上跑了

       mvn clean package -DskipTests -Pdist,native -Dtar -Dsnappy.lib=(hadoop-snappy里面编译后的库地址)   -Dbundle.snappy

中间会报一些异常,无须关心,如果报异常退出了,就继续执行上面这个命令,直到成功为止,一般速度会跟你的网速有关系,大概40分钟左右,最后会编译成功。




3,搭建Hadoop集群
(1)拷贝编译完成后在hadoop-2.6.0-cdh5.4.1/hadoop-dist/target/hadoop-2.6.0-cdh5.4.1.tar.gz位置的tar包,至安装目录
(2)解压执行mv hadoop-2.6.0-cdh5.4.1 hadoop重命名为hadoop
(3)进入hadoop目录下,执行bin/hadoop checknative -a查看本地库,支持情况



(4)配置Hadoop相关的环境变量
#hadoop
export HADOOP_HOME=/home/search/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH=.:$CLASSPATH:$HADOOP_COMMON_HOME:$HADOOP_COMMON_HOMEi/lib:$HADOOP_MAPRED_HOME:$HADOOP_HDFS_HOME:$HADOOP_HDFS_HOME

(5)选择一个数据目录/data/
新建三个目录
hadooptmp(存放hadoop的一些临时数据)
nd(存放hadoop的namenode数据)
dd(存放hadoop的datanode数据)
(6)进入hadoop/etc/hadoop目录
依次配置
slaves内容如下:

Java代码 复制代码  收藏代码
  1. hadoop1  
  2. hadoop2  
  3. hadoop3  
hadoop1
hadoop2
hadoop3




core-site.xml内容如下:

Java代码 复制代码  收藏代码
  1. <configuration>  
  2.  <property>      
  3.         <name>fs.default.name</name>      
  4.         <value>hdfs://hadoop1:8020</value>      
  5.     </property>  
  6.      
  7.   <property>    
  8.     <name>hadoop.tmp.dir</name>    
  9.     <value>/ROOT/tmp/data/hadooptmp</value>    
  10.   </property>  
  11.   
  12.   <property>    
  13.              <name>io.compression.codecs</name>    
  14.              <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>    
  15.          
  16. </property>    
  17. <property>      
  18.   <name>fs.trash.interval</name>      
  19.   <value>1440</value>      
  20.   <description>Number of minutes between trash checkpoints.      
  21.   If zero, the trash feature is disabled.      
  22.   </description>      
  23. </property>    
  24.   
  25. </configuration>  
<configuration>
 <property>    
        <name>fs.default.name</name>    
        <value>hdfs://hadoop1:8020</value>    
    </property>
   
  <property>  
    <name>hadoop.tmp.dir</name>  
    <value>/ROOT/tmp/data/hadooptmp</value>  
  </property>

  <property>  
             <name>io.compression.codecs</name>  
             <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>  
       
</property>  
<property>    
  <name>fs.trash.interval</name>    
  <value>1440</value>    
  <description>Number of minutes between trash checkpoints.    
  If zero, the trash feature is disabled.    
  </description>    
</property>  

</configuration>



hdfs-site.xml内容如下:

Java代码 复制代码  收藏代码
  1. <configuration>  
  2.   
  3. <property>      
  4.    <name>dfs.replication</name>      
  5.    <value>1</value>      
  6.  </property>      
  7.    
  8.  <property>      
  9.    <name>dfs.namenode.name.dir</name>      
  10.    <value>file:///ROOT/tmp/data/nd</value>      
  11.  </property>      
  12.    
  13.  <property>      
  14.    <name>dfs.datanode.data.dir</name>      
  15.    <value>/ROOT/tmp/data/dd</value>      
  16.  </property>      
  17.    
  18. <property>      
  19.   <name>dfs.permissions</name>      
  20.   <value>false</value>      
  21. </property>    
  22.    
  23.   
  24. <property>    
  25.     <name>dfs.webhdfs.enabled</name>    
  26.     <value>true</value>    
  27. </property>    
  28. <property>    
  29.         <name>dfs.blocksize</name>    
  30.         <value>134217728</value>    
  31. </property>    
  32. <property>    
  33.         <name>dfs.namenode.handler.count</name>    
  34.         <value>20</value>    
  35. </property>  
  36.    
  37. <property>    
  38.         <name>dfs.datanode.max.xcievers</name>    
  39.         <value>65535</value>    
  40. </property>  
  41.   
  42. </configuration>  
<configuration>

<property>    
   <name>dfs.replication</name>    
   <value>1</value>    
 </property>    
 
 <property>    
   <name>dfs.namenode.name.dir</name>    
   <value>file:///ROOT/tmp/data/nd</value>    
 </property>    
 
 <property>    
   <name>dfs.datanode.data.dir</name>    
   <value>/ROOT/tmp/data/dd</value>    
 </property>    
 
<property>    
  <name>dfs.permissions</name>    
  <value>false</value>    
</property>  
 

<property>  
    <name>dfs.webhdfs.enabled</name>  
    <value>true</value>  
</property>  
<property>  
        <name>dfs.blocksize</name>  
        <value>134217728</value>  
</property>  
<property>  
        <name>dfs.namenode.handler.count</name>  
        <value>20</value>  
</property>
 
<property>  
        <name>dfs.datanode.max.xcievers</name>  
        <value>65535</value>  
</property>

</configuration>



mapred-site.xml内容如下:

Java代码 复制代码  收藏代码
  1.  <configuration>   
  2. <property>    
  3.     <name>mapreduce.framework.name</name>    
  4.     <value>yarn</value>    
  5. </property>    
  6. <property>    
  7.     <name>mapreduce.jobtracker.address</name>    
  8.     <value>hadoop1:8021</value>    
  9. </property>    
  10. <property>    
  11.     <name>mapreduce.jobhistory.address</name>    
  12.     <value>hadoop1:10020</value>    
  13. </property>    
  14. <property>    
  15.     <name>mapreduce.jobhistory.webapp.address</name>    
  16.     <value>hadoop1:19888</value>    
  17. </property>    
  18. <property>    
  19.     <name>mapred.max.maps.per.node</name>    
  20.     <value>4</value>    
  21. </property>    
  22. <property>    
  23.     <name>mapred.max.reduces.per.node</name>    
  24.     <value>2</value>    
  25. </property>    
  26. <property>    
  27.     <name>mapreduce.map.memory.mb</name>    
  28.     <value>1408</value>    
  29. </property>    
  30. <property>    
  31.     <name>mapreduce.map.java.opts</name>    
  32.     <value>-Xmx1126M</value>    
  33. </property>    
  34.    
  35. <property>    
  36.     <name>mapreduce.reduce.memory.mb</name>    
  37.     <value>2816</value>    
  38. </property>    
  39. <property>    
  40.     <name>mapreduce.reduce.java.opts</name>    
  41.     <value>-Xmx2252M</value>    
  42. </property>    
  43. <property>    
  44.     <name>mapreduce.task.io.sort.mb</name>    
  45.     <value>512</value>    
  46. </property>    
  47. <property>    
  48.     <name>mapreduce.task.io.sort.factor</name>    
  49.     <value>100</value>    
  50. </property>    
  51. </configuration>  
 <configuration> 
<property>  
    <name>mapreduce.framework.name</name>  
    <value>yarn</value>  
</property>  
<property>  
    <name>mapreduce.jobtracker.address</name>  
    <value>hadoop1:8021</value>  
</property>  
<property>  
    <name>mapreduce.jobhistory.address</name>  
    <value>hadoop1:10020</value>  
</property>  
<property>  
    <name>mapreduce.jobhistory.webapp.address</name>  
    <value>hadoop1:19888</value>  
</property>  
<property>  
    <name>mapred.max.maps.per.node</name>  
    <value>4</value>  
</property>  
<property>  
    <name>mapred.max.reduces.per.node</name>  
    <value>2</value>  
</property>  
<property>  
    <name>mapreduce.map.memory.mb</name>  
    <value>1408</value>  
</property>  
<property>  
    <name>mapreduce.map.java.opts</name>  
    <value>-Xmx1126M</value>  
</property>  
 
<property>  
    <name>mapreduce.reduce.memory.mb</name>  
    <value>2816</value>  
</property>  
<property>  
    <name>mapreduce.reduce.java.opts</name>  
    <value>-Xmx2252M</value>  
</property>  
<property>  
    <name>mapreduce.task.io.sort.mb</name>  
    <value>512</value>  
</property>  
<property>  
    <name>mapreduce.task.io.sort.factor</name>  
    <value>100</value>  
</property>  
</configuration>


yarn-site.xml内容如下:

Java代码 复制代码  收藏代码
  1. <configuration>  
  2. <property>   
  3.     <name>mapreduce.jobhistory.address</name>  
  4.     <value>hadoop1:10020</value>  
  5. </property>  
  6. <property>  
  7.     <name>mapreduce.jobhistory.webapp.address</name>  
  8.     <value>hadoop1:19888</value>  
  9. </property>        
  10.  <property>    
  11.    <name>yarn.resourcemanager.address</name>    
  12.     <value>hadoop1:8032</value>    
  13.   </property>    
  14.   <property>    
  15.     <name>yarn.resourcemanager.scheduler.address</name>    
  16.     <value>hadoop1:8030</value>    
  17.   </property>    
  18.   <property>    
  19.     <name>yarn.resourcemanager.scheduler.class</name>    
  20.     <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>    
  21.   </property>    
  22.   <property>    
  23.     <name>yarn.resourcemanager.resource-tracker.address</name>    
  24.     <value>hadoop1:8031</value>    
  25.   </property>    
  26.   <property>    
  27.     <name>yarn.resourcemanager.admin.address</name>    
  28.     <value>hadoop1:8033</value>    
  29.   </property>    
  30.   <property>    
  31.     <name>yarn.resourcemanager.webapp.address</name>    
  32.     <value>hadoop1:8088</value>    
  33.   </property>    
  34.   <property>    
  35.     <name>yarn.nodemanager.aux-services</name>    
  36.     <value>mapreduce_shuffle</value>    
  37.   </property>    
  38.   <property>    
  39.     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    
  40.     <value>org.apache.hadoop.mapred.ShuffleHandler</value>    
  41.   </property>    
  42.   <property>      
  43.     <description>Classpath for typical applications.</description>      
  44.     <name>yarn.application.classpath</name>      
  45.     <value>$HADOOP_CONF_DIR    
  46.     ,$HADOOP_COMMON_HOME/share/hadoop/common/*    
  47.     ,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*    
  48.     ,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*    
  49.     ,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*    
  50.     ,$YARN_HOME/share/hadoop/yarn/*</value>      
  51.   </property>    
  52.      
  53. <!-- Configurations for NodeManager -->    
  54.   <property>    
  55.     <name>yarn.nodemanager.resource.memory-mb</name>    
  56.     <value>5632</value>    
  57.   </property>    
  58.  <property>  
  59.     <name>yarn.scheduler.minimum-allocation-mb</name>  
  60.     <value>1408</value>  
  61.   </property>  
  62.   
  63.  <property>  
  64.     <name>yarn.scheduler.maximum-allocation-mb</name>  
  65.     <value>5632</value>  
  66.   </property>  
  67.   
  68. </configuration>  
<configuration>
<property> 
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop1:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop1:19888</value>
</property>      
 <property>  
   <name>yarn.resourcemanager.address</name>  
    <value>hadoop1:8032</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.scheduler.address</name>  
    <value>hadoop1:8030</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.scheduler.class</name>  
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.resource-tracker.address</name>  
    <value>hadoop1:8031</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.admin.address</name>  
    <value>hadoop1:8033</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.webapp.address</name>  
    <value>hadoop1:8088</value>  
  </property>  
  <property>  
    <name>yarn.nodemanager.aux-services</name>  
    <value>mapreduce_shuffle</value>  
  </property>  
  <property>  
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
  </property>  
  <property>    
    <description>Classpath for typical applications.</description>    
    <name>yarn.application.classpath</name>    
    <value>$HADOOP_CONF_DIR  
    ,$HADOOP_COMMON_HOME/share/hadoop/common/*  
    ,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*  
    ,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*  
    ,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*  
    ,$YARN_HOME/share/hadoop/yarn/*</value>    
  </property>  
   
<!-- Configurations for NodeManager -->  
  <property>  
    <name>yarn.nodemanager.resource.memory-mb</name>  
    <value>5632</value>  
  </property>  
 <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1408</value>
  </property>

 <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>5632</value>
  </property>

</configuration>


(6)将整个hadoop目录和/data数据目录,scp分发到各个节点上
(7)格式化HDFS
执行命令bin/hadoop namenode -format
(8)启动集群
sbin/start-dfs.sh 启动hdfs
sbin/start-yarn.sh启动yarn
sbin/mr-jobhistory-daemon.sh start historyserver 启动日志进程
(9)检验集群状态
jps监测:




web页面监测:
http://hadoop1:50070
http://hadoop1:8088
(10)基准测试
测试map
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.1.jar randomwriter rand
测试reduce
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.1.jar sort rand sort-rand

Hadoop官方文档链接:http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

猜你喜欢

转载自weitao1026.iteye.com/blog/2266985