Hadoop学习之三 多节点集群配置

Hadoop学习之三  多节点集群配置

hadoop 集群搭建

1, 节点配置
 节点名     主机名                                   ip                安装的软件                      进程
master      zhaosy-HP-Compaq-Pro-6380-MT         109.123.100.83         jdk_1.8.0_65, hadoop_2.7.3     namenode, resourcemanager
salver1      tizen-HP-Compaq-Pro-6380-MT          109.123.121.193       jdk_1.8.0_65, hadoop_2.7.3     datanode, secondnamenode
salver2      OCI-Server                           109.123.100.134       jdk_1.8.0_65, hadoop_2.7.3     datanode

2 免密码登陆配置
  在将作为master节点的主机 109.123.100.83上执行命令
      ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
   master节点的主机~/.ssh/id_rsa目录下将会生成一公钥id_rsa.pub, 一私钥id_rsa;
   使用如下命令将公钥拷贝其他salver节点上
   ssh-copy-id zhaosy-HP-Compaq-Pro-6380-MT   点击yes, 输入主机109.123.100.83登陆命令   主机109.123.100.83目录~/.ssh/id_rsa/下会生成authorized_keys 及known_hosts
     此时在主机zhaosy-HP-Compaq-Pro-6380-MT使用ssh zhaosy-HP-Compaq-Pro-6380-MT 可以免密码登陆 zhaosy-HP-Compaq-Pro-6380-MT 主机, 表示zhaosy-HP-Compaq-Pro-6380-MT免密码登陆配置成功
   ssh-cop-id tizen-HP-Compaq-Pro-6380-MT     点击yes, 输入主机109.123.121.193登陆命令  主机109.123.121.193目录~/.ssh/id_rsa/下会生成authorized_keys
     此时在主机zhaosy-HP-Compaq-Pro-6380-MT使用ssh tizen-HP-Compaq-Pro-6380-MT 可以免密码登陆 tizen-HP-Compaq-Pro-6380-MT  主机, 表示tizen-HP-Compaq-Pro-6380-MT 免密码登陆配置成功
   ssh-cop-id tizen-HP-Compaq-Pro-6380-MT     点击yes, 输入主机109.123.121.193登陆命令  主机109.123.121.193目录~/.ssh/id_rsa/下会生成authorized_keys
     此时在主机zhaosy-HP-Compaq-Pro-6380-MT使用ssh  OCI-Server 可以免密码登陆 OCI-Server 主机, 表示 OCI-Server 免密码登陆配置成功

3  安装配置JDK及hadoop
    在三台主机上均安装相同版本的jdk及hadoop并配置相同的安装目录
4  hadoop配置文件修改
   1)修改master节点的${HADOOP_HOME}/etc/hadoop/core-site.xml文件为
  <configuration>
      <property>
          <!-- 指定HDFS(namenode)的通信地址 -->
          <name>fs.defaultFS</name>    
          <value>hdfs://localhost:9000</value>
      </property>
      <property>
          <!-- 指定hadoop运行时产生文件的存储路径 -->
          <name>hadoop.tmp.dir</name>
          <value>/home/yangyong/SoftWare/BigData/Hadoop/tmp</value>
      </property>
  2)修改master节点的${HADOOP_HOME}/etc/hadoop/hdfs-site.xml为
<configuration>
    <property>
      <!-- 设置namenode的http通讯地址 -->
        <name>dfs.namenode.http-address</name>
        <value>109.123.100.83:50070</value>
    </property>
    <property>
     <!-- 设置secondarynamenode的http通讯地址 -->
        <name>dfs.namenode.secondary.http-address</name>
        <value>109.123.121.193:50090</value>
    </property>
    <property>
     <!-- 设置namenode存放的路径 -->
        <name>dfs.namenode.name.dir</name>
        <value>/home/yangyong/SotrWare/BigData/Hadoop/namenode</value>
    </property>
    <property>
     <!-- 设置datanode存放的路径 -->
        <name>dfs.datanode.data.dir</name>
        <value>/home/yangyong/SotrWare/BigData/Hadoop/datanode</value>
    </property>
    <property>
     <!-- 设置hdfs副本数量 -->
        <name>dfs.replication</name>
        <value>2</value>
    </property>
</configuration>
3)修改master节点的${HADOOP_HOME}/etc/hadoop/mapred-site.xml为
  <configuration>
        <property>
           <!-- 通知框架MR使用YARN -->
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
       </property>
     </configuration>
 
4)修改master节点的${HADOOP_HOME}/etc/hadoop/yarn-site.xml为

<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <!-- 设置 resourcemanager 在哪个节点-->
        <name>yarn.resourcemanager.hostname</name>
        <value>109.123.100.83</value>
    </property>
    <!-- 设置 resourcemanager 的http访问地址-->
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>109.123.100.83:8088</value>
    </property>

    <property>
         <!-- reducer取数据的方式是mapreduce_shuffle -->
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <!--表示该节点上YARN可使用的物理内存总量,默认是8192(MB),注意,如果你的节点内存资源不够8GB,则需要调减小这个值,而YARN不会智能的探测节点的物理内存总量。  MB为单位-->
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>20480</value>
    </property>
    <property>
        <!--nodemanager可供分配的最小内存  MB为单位-->
        <name>yarn.nodemanager.minmun-allocation-mb</name>
        <value>5120</value>
    </property>
    <property>
        <!--单个任务可申请的最多物理内存量,默认是8192(MB)  MB为单位-->
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>5120</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
 <!--用于磁盘空间检查,低于某一值时,会导致mapreduce无法正常运行-->
 <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
 <value>99</value>
    </property>
</configuration>

5)新建master文件   (此步在2.7.3已无效,可忽略此步)
   新建文件${HADOOP_HOME}/etc/hadoop/master, 内容为secondary namenode 的主机名或IP,  保险起见,用IP较好
         109.123.121.193
6)新建slaves文件(仅在master节点配置)
  新建文件${HADOOP_HOME}/etc/hadoop/slaves, 内容为slaves主机名或IP,保险起见,用IP较好
       109.123.121.193
       109.123.120.200
7)配置hosts
        在/etc/hosts文件中增加
       
         109.123.100.83   zhaosy-hp-compaq-pro-6380-mt
         109.123.120.200  oci-server
         109.123.121.193  tizen-HP-Compaq-Pro-6380-MT
    
      注意:ubuntu机器中会存在127.0.1.1  的配置,此配置一定要删除,否则将导致第六步中 URL无法打开

8)配置slaves节点
    在每个slaves节点,重复执行1)->6)
    此时有两种操作, i: 使用scp将master节点上的hadoop配置拷贝至slavers节点, 切记, 配置的环境变量每台slaves均要生效
                                     ii: 手动每个slaves节点重复操作
9)master, slavers节点的时间要同步,否则后续运行时会因时间不同步,导致程序无法执行  (  sudo date -s "2017-06-20 17:10:30")

5 master节点启动
    1)首次启动,执行格式化命令  ${HADOOP_HOME}/bin/hdfs  namenode -format
        2)启动dfs    执行${HADOOP_HOME}/sbin/start-dfs.sh   
             109.123.121.193: starting datanode, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/hadoop-yangyong-datanode-tizen-HP-Compaq-Pro-6380-MT.out
             109.123.120.200: starting datanode, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/hadoop-yangyong-datanode-OCI-Server.out
            Starting secondary namenodes [tizen-HP-Compaq-Pro-6380-MT]
            tizen-HP-Compaq-Pro-6380-MT: starting secondarynamenode, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/hadoop-yangyong-secondarynamenode-tizen-HP-Compaq-Pro-6380-MT.out

        3)启动yarn , 执行执行${HADOOP_HOME}/sbin/start-yarn.sh
               109.123.121.193: starting nodemanager, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/yarn-yangyong-nodemanager-tizen-HP-Compaq-Pro-6380-MT.out
               109.123.120.200: starting nodemanager, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/yarn-yangyong-nodemanager-OCI-Server.out

        4)master执行 jps
              20081 ResourceManager
             19750 NameNode
             20344 Jps

       5)salver1执行jps
      

       5850 SecondaryNameNode
       5707 DataNode
       6139 Jps
       6015 NodeManager
    6)salver2执行jps
   
       59091 Jps
       58566 DataNode
       58839 NodeManager
6 验证
   1)浏览器中输入http://109.123.100.83:50070
      
  

   2)浏览器中输入http://109.123.100.83:8088
     
          

   图中一定要有数据 才表示配置成功

此时,若5 使用jps检查各节点均正常, 而6无法显示,使用命令./bin/hdfs dfsadmin -report,根据输出的info排错
7 测试
      1)本地创建一个文件words.txt, 内容如下
          Hello World!
          Hello China!
          Hello Jim
         Hello Tom
         The People's Republic Of China!
         上传words.txt至HDFS根目录 ${HADOOP_HOME}/bin/hadoop fs -put words.txt  /
         使用命令${HADOOP_HOME}bin/hadoop fs -ls / 可查看到相应的文件
         Found 1 items
         -rw-r--r--   2 yangyong supergroup         57 2017-06-20 17:44 /words.txt

         此时可在http://109.123.100.83:50070/explorer.html#/  查看到上传的文件

      2)${HADOOP_HOME}/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /words.txt  /output_wordcount

       yangyong@zhaosy-HP-Compaq-Pro-6380-MT:~/SoftWare/BigData/Hadoop/hadoop-2.7.3$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /words.txt  /output_wordcount
17/06/20 17:19:46 INFO client.RMProxy: Connecting to ResourceManager at /109.123.100.83:8032
17/06/20 17:19:47 INFO input.FileInputFormat: Total input paths to process : 1
17/06/20 17:19:48 INFO mapreduce.JobSubmitter: number of splits:1
17/06/20 17:19:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1497947875315_0019
17/06/20 17:19:49 INFO impl.YarnClientImpl: Submitted application application_1497947875315_0019
17/06/20 17:19:49 INFO mapreduce.Job: The url to track the job: http://zhaosy-hp-compaq-pro-6380-mt:8088/proxy/application_1497947875315_0019/
17/06/20 17:19:49 INFO mapreduce.Job: Running job: job_1497947875315_0019
17/06/20 17:19:54 INFO mapreduce.Job: Job job_1497947875315_0019 running in uber mode : false
17/06/20 17:19:54 INFO mapreduce.Job:  map 0% reduce 0%
17/06/20 17:19:58 INFO mapreduce.Job:  map 100% reduce 0%
17/06/20 17:20:02 INFO mapreduce.Job:  map 100% reduce 100%
17/06/20 17:20:04 INFO mapreduce.Job: Job job_1497947875315_0019 completed successfully
17/06/20 17:20:04 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=74
                FILE: Number of bytes written=238345
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=158
                HDFS: Number of bytes written=44
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=1721
                Total time spent by all reduces in occupied slots (ms)=1933
                Total time spent by all map tasks (ms)=1721
                Total time spent by all reduce tasks (ms)=1933
                Total vcore-milliseconds taken by all map tasks=1721
                Total vcore-milliseconds taken by all reduce tasks=1933
                Total megabyte-milliseconds taken by all map tasks=1762304
                Total megabyte-milliseconds taken by all reduce tasks=1979392
        Map-Reduce Framework
                Map input records=5
                Map output records=10
                Map output bytes=97
                Map output materialized bytes=74
                Input split bytes=101
                Combine input records=10
                Combine output records=6
                Reduce input groups=6
                Reduce shuffle bytes=74
                Reduce input records=6
                Reduce output records=6
                Spilled Records=12
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=80
                CPU time spent (ms)=1070
                Physical memory (bytes) snapshot=440082432
                Virtual memory (bytes) snapshot=3844157440
                Total committed heap usage (bytes)=291504128
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=57
        File Output Format Counters
                Bytes Written=44


yangyong@zhaosy-HP-Compaq-Pro-6380-MT:~/SoftWare/BigData/Hadoop/hadoop-2.7.3$ ./bin/hadoop fs -cat  /output_wordcount/*
Bye     1
Hadoop  2
Hello   4
Jack    1
Tom     1
World   1

8 关闭
${HADOOP_HOME}//sbin/start-dfs.sh
${HADOOP_HOME}//sbin/start-yarn.sh

猜你喜欢

转载自yy9991818.iteye.com/blog/2380671
今日推荐