Hadoop, hive environment to build Detailed

http://ilovejavaforever.iteye.com/blog/733247

 

A, Hadoop environment to build
     first in the Apache official website to download the package hadoop hadoop-0.20.2.tar.gz.
      Extracting hadoop-0.20.2.tar.gz package, using the following command:
      the tar zxvf hadoop-0.20.2.tar.gz
      which is to be noted that, coated with the tar xvf, gz coated with zxvf.
During installation, if you encounter recognition problems, or can not extract, is likely to be a permissions problem, the solution is to modify the permissions of this file, the following command:
      chmod 777 hadoop-0.20.2.tar.gz
      which, for all 777 authority.
      If the error remains, such as: Archive contains obsolescent base-64 headers ; Error exit delayed from previous errors.
      This case, the problem is typically compressed damaged. Because most people will download the package to the windows environment, then uploaded via ftp to the Linux environment and other methods. Prone to packet corruption. I suggest that you can download directly into Linux. Specific command is as follows:
wget http://labs.renren.com/apache-mirror/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
       downloaded directly to the current directory.

       When the file is ready, we have to modify the configuration, Hadoop will simply run up.
       First, we enter hadoop-0.20.2 / conf directory, one of them there will be the following profile:


 
       First modification masters and slaves, this address is designated ip and s m we, we are here in a single machine as an example, the input current IP machine directly in the file.


       Then we modify mapred-site.xml file, the specific configuration is as follows


Xml Code Copy the code  Collection Code
  1. <SPAN style="FONT-SIZE: medium"><?xml version="1.0"?>  
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
  3.   
  4. <!-- Put site-specific property overrides in this file. -->  
  5.   
  6. <configuration>  
  7. <property>  
  8.         <name>mapred.job.tracker</name>  
  9.         <value>hdfs://192.168.216.57:8012</value>  
  10.         <description>The host and port that the MapReduce job tracker runs   
  11.             at. If "local", then jobs are run in-process as a single map   
  12.             and reduce task.   
  13.             Pass in the jobtracker hostname via the   
  14.             -Dhadoop.jobtracker = JOBTRACKER_HOST  java option.   
  15.         </description>  
  16.     </property>  
  17. </configuration></SPAN>  
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>mapred.job.tracker</name>
        <value>hdfs://192.168.216.57:8012</value>
        <description>The host and port that the MapReduce job tracker runs
            at. If "local", then jobs are run in-process as a single map
            and reduce task.
            Pass in the jobtracker hostname via the
            -Dhadoop.jobtracker = JOBTRACKER_HOST java option.
        </description>
    </property>
</configuration>





      job.tracker is key, mapReduce will be a job, by the map (), broken into n task.
      After a configuration file for the core-site.xml detailed configuration is as follows:

Xml Code Copy the code  Collection Code
  1. <SPAN style="FONT-SIZE: medium"><?xml version="1.0"?>  
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
  3.   
  4. <!-- Put site-specific property overrides in this file. -->  
  5.   
  6. <configuration>    
  7. <property>  
  8.      <name>fs.default.name</name>  
  9.      <value>hdfs://localhost:9100</value>  
  10.  </property>  
  11.   
  12.  <property>  
  13.     <name>mapred.job.tracker</name>  
  14.     <value>localhost:9101</value>  
  15.  </property>  
  16.   
  17.  <property>  
  18.     <name>dfs.replication</name>  
  19.     <value>1</value>  
  20.  </property>  
  21.   
  22.  <property>  
  23.      <name>hadoop.tmp.dir</name>  
  24.      <value>/home/admin/tmp/</value>  
  25.      <description>A base for other temporary directories. Set to a   
  26.            directory off of the user's home directory for the simple test.   
  27.      </description>  
  28.  </property>  
  29.   
  30.  </configuration></SPAN>  
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration> 
<property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:9100</value>
 </property>

 <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9101</value>
 </property>

 <property>
    <name>dfs.replication</name>
    <value>1</value>
 </property>

 <property>
     <name>hadoop.tmp.dir</name>
     <value>/home/admin/tmp/</value>
     <description>A base for other temporary directories. Set to a
           directory off of the user's home directory for the simple test.
     </description>
 </property>

 </configuration>





      这个主要是配置我们的文件系统。其中,fs.default.name的value,不可以写IP地址,要写域名。域名的查询,具体命令如下:
      cd  ~
      cd etc
      vi hosts 
      在hosts文件中,找到自己IP对应的域名。
  
       到此,Hadoop本身的配置就算完成了。但hadoop会在master/slaves之间进行文件的操作,在机器之间操作时候,就必须做到免登陆。对此,我们就得设置相应的公钥私钥。
      具体命令如下:
      ssh-keygen -t rsa -P ''
      -P表示密码,-P '' 就表示空密码,也可以不用-P参数,这样就要三车回车,用-P就一次回车。它在/~ 下生成.ssh目录,.ssh下有id_rsa和id_rsa.pub。
       如果是多台机器,则需要将公钥id-rsa.pub通过scp到其他机器的相同目录。
       之后,追加公钥到相应文件,具体如下:
       cat id_rsa.pub >> .ssh/authorized_keys
       chmod 600 .ssh/authorized_keys
      
其中,authorized_keys要的权限是600

       第一次登录是时要你输入yes,之后就不用了。

       OK,所有均搞定了,进入hadoop-0.20.2/bin路径,直接执行start-all.sh脚本,即可启动Hadoop的服务了。

       我们可以通过web的方式,对Hadoop的运转进行监控,具体url如下:

       控制台:http://cap216057.sqa:50030/jobtracker.jsp


       数据节点:http://cap216057.sqa:50070/dfshealth.jsp

       cap216057.sqa可以再hosts里配置,或者直接访问IP地址。

二、    Hive搭建
       Hive搭建在Hadoop的基础之上,相对就会简单很多。只需要设置Hadoop的HADOOP_HOME和HIVE_HOME即可了。

       首先下载、解压,此操作与商品Hadoop的操作一样。值得注意的是,Apache的官网上会提供bin和dev两种版本的压缩包。建议使用bin版本。
       Wget http://labs.renren.com/apache-mirror/hadoop/hive/hive-0.5.0/hive-0.5.0-bin.tar.gz

       After completion, as follows:
       Export HADOOP_HOME is = / Home / ADMIN / Hadoop-0.20.0 /
       Export HIVE_HOME = / Home / ADMIN / hive bin-0.5.0-
       complete command, starts directly in the hive hive / bin directory, start port any write an unoccupied can be.
       ./hive --service hiveserver 10000 &
   
III testing
      in the hive / bin directory, enter ./hive, enter the hive command console. Execution hql, is the hive of sql
      follows:
      the Create the Tables the User (the above mentioned id, int);
      Show the Tables;
      is worth noting that the semicolon at the end of the sentence, must not be missed.
  
   Look at the result, ha ha, it's done slightly!

Reproduced in: https: //www.cnblogs.com/licheng/archive/2011/11/09/2242315.html

Guess you like

Origin blog.csdn.net/weixin_34356138/article/details/93800084