Hadoop learning three multi-node cluster configuration

Hadoop learning three multi-node cluster configuration

Hadoop cluster building

1, node configuration
 node name host name ip installed software process
master zhaosy-HP-Compaq-Pro-6380-MT 109.123.100.83 jdk_1.8.0_65, hadoop_2.7.3 namenode, resourcemanager
salver1 tizen-HP-Compaq-Pro-6380- MT 109.123.121.193 jdk_1.8.0_65, hadoop_2.7.3 datanode, secondnamenode
salver2 OCI-Server 109.123.100.134 jdk_1.8.0_65, hadoop_2.7.3 datanode

2 Password-free login configuration
  Execute the command
      ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa on the host 109.123.100.83 that will be the master node
   . Generate a public key id_rsa.pub, a private key id_rsa;
   use the following command to copy the public key to other salver nodes
   ssh-copy-id zhaosy-HP-Compaq-Pro-6380-MT Click yes, enter the host 109.123.100.83 login command The host 109.123.100.83 directory ~/.ssh/id_rsa/ will generate authorized_keys and known_hosts.
     At this time, use ssh zhaosy-HP-Compaq-Pro-6380-MT on the host zhaosy-HP-Compaq-Pro-6380-MT to log in without a password zhaosy-HP-Compaq-Pro-6380-MT host, means zhaosy-HP-Compaq-Pro-6380-MT password-free login configuration is successful
   ssh-cop-id tizen-HP-Compaq-Pro-6380-MT Click yes, enter Host 109.123.121.193 login command host 109.123.121.193 directory ~/.ssh/id_rsa/ will generate authorized_keys
     At this time, use ssh tizen-HP-Compaq-Pro-6380-MT on the host zhaosy-HP-Compaq-Pro-6380-MT to log in to the tizen-HP-Compaq-Pro-6380-MT host without a password, indicating tizen-HP- Compaq-Pro-6380-MT Password-free login configuration is successful
   ssh-cop-id tizen-HP-Compaq-Pro-6380-MT Click yes, enter the host 109.123.121.193 login command host 109.123.121.193 directory ~/.ssh/id_rsa/ The following will generate authorized_keys.
     At this time, using ssh OCI-Server on the host zhaosy-HP-Compaq-Pro-6380-MT can log in to the OCI-Server host without password, which means that the OCI-Server password-free login configuration is successful

3 Install and configure JDK and hadoop
    Install the same version of jdk and hadoop on the three hosts and configure the same installation directory
4 Modify the hadoop configuration file
   1) Modify the ${HADOOP_HOME}/etc/hadoop/core-site.xml of the master node The file is
  <configuration>
      <property>
          <!-- Specify the communication address of HDFS (namenode) -->
          <name>fs.defaultFS</name>    
          <value>hdfs://localhost:9000</value>
      </property >
      <property>
          <!-- Specify the storage path of files generated when hadoop is running-->
          <name>hadoop.tmp.dir</name>
          <value>/home/yangyong/SoftWare/BigData/Hadoop/tmp</value >
      </property>
  2) Modify the ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml of the master node to
<configuration>
    <property>
      <!-- Set the http communication address of the namenode-->
        <name>dfs.namenode.http-address</name>
        <value>109.123.100.83:50070</value>
    </property>
    <property>
     <!-- 设置secondarynamenode的http通讯地址 -->
        <name>dfs.namenode.secondary.http-address</name>
        <value>109.123.121.193:50090</value>
    </property>
    <property>
     <!-- 设置namenode存放的路径 -->
        <name>dfs.namenode.name.dir</name>
        <value>/home/yangyong/SotrWare/BigData/Hadoop/namenode</value>
    </property>
    <property>
     <!-- 设置datanode存放的路径 -->
        <name>dfs.datanode.data.dir</name>
        <value>/home/yangyong/SotrWare/BigData/Hadoop/datanode</value>
    </property>
    <property>
     <!-- Set the number of hdfs replicas-->
        <name>dfs.replication</name>
        <value>2</value>
    </property>
</configuration>
3) Modify the ${HADOOP_HOME} of the master node /etc/hadoop/mapred-site.xml is
  <configuration>
        <property>
           <!-- Inform framework MR to use YARN -->
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
       < /property>
     </configuration>
 
4) Modify the ${HADOOP_HOME}/etc/hadoop/yarn-site.xml of the master node as

<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <!-- 设置 resourcemanager 在哪个节点-->
        <name>yarn.resourcemanager.hostname</name>
        <value>109.123.100.83</value>
    </property>
    <!-- 设置 resourcemanager 的http访问地址-->
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>109.123.100.83:8088</value>
    </property>

    <property>
         <!-- The way the reducer takes data is mapreduce_shuffle -->
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <!-- Indicates the total amount of physical memory that YARN can use on the node. The default is 8192 (MB). Note that if your node memory resources are not enough to 8GB, you need to reduce this value, and YARN will not intelligently detect the physical memory of the node. total. The unit is MB -->
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>20480</value>
    </property>
    <property>
        <!--The minimum memory MB that nodemanager can allocate is Unit -->
        <name>yarn.nodemanager.minmun-allocation-mb</name>
        <value>5120</value>
    </property>
    <

        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>5120</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
 <!--用于磁盘空间检查,低于某一值时,会导致mapreduce无法正常运行-->
 <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
 <value>99</value>
    </property>
</configuration>

5) Create a new master file (this step is invalid in 2.7.3, you can ignore this step)
   Create a new file ${HADOOP_HOME}/etc/hadoop/master, the content is the host name or IP of the secondary namenode, to be on the safe side, it is better to use IP
         109.123.121.193
6) Create a new slave file (configured only on the master node) and
  create a new file ${HADOOP_HOME}/etc/hadoop/slaves, the content is the host name or IP of the slaves, for the sake of insurance, it is better to use the IP
       109.123.121.193
       109.123.120.200
7 ) Configure hosts to add          109.123.100.83 zhaosy-hp-compaq-pro-6380-mt          109.123.120.200 oci-server          109.123.121.193 tizen-HP-Compaq-Pro-6380-MT to
        the /etc/hosts file Note: in the ubuntu machine There will be a configuration of 127.0.1.1, this configuration must be deleted, otherwise the URL in the sixth step cannot be opened
       



    
     

8) Configure the slave node
    on each slave node, repeat 1) -> 6)
    There are two operations at this time, i: Use scp to copy the hadoop configuration on the master node to the slave node, remember, the configured environment variables for each All slaves must take effect
                                     ii: manually repeat the operation for each slave
node :10:30")

5 The master node starts
    1) Start for the first time, execute the format command ${HADOOP_HOME}/bin/hdfs namenode -format
        2) Start dfs and execute ${HADOOP_HOME}/sbin/start-dfs.sh   
             109.123.121.193: starting datanode, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/hadoop-yangyong-datanode-tizen-HP-Compaq-Pro-6380-MT.out
             109.123.120.200: starting datanode, logging to /home/ yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/hadoop-yangyong-datanode-OCI-Server.out
            Starting secondary namenodes [tizen-HP-Compaq-Pro-6380-MT]
            tizen-HP-Compaq-Pro -6380-MT: starting secondarynamenode, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/hadoop-yangyong-secondarynamenode-tizen-HP-Compaq-Pro-6380-MT.out

        3)启动yarn , 执行执行${HADOOP_HOME}/sbin/start-yarn.sh
               109.123.121.193: starting nodemanager, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/yarn-yangyong-nodemanager-tizen-HP-Compaq-Pro-6380-MT.out
               109.123.120.200: starting nodemanager, logging to /home/yangyong/SoftWare/BigData/Hadoop/hadoop-2.7.3/logs/yarn-yangyong-nodemanager-OCI-Server.out

        4) The master executes jps
              20081 ResourceManager 19750
             NameNode
             20344 Jps

       5) salver1 executes jps
      

       5850 SecondaryNameNode
       5707 DataNode
       6139 Jps
       6015 NodeManager
    6) salver2 executes jps
   
       59091 Jps
       58566 DataNode
       58839 NodeManager
6 Verification 1) Enter http://109.123.100.83:50070
   in the browser
      
  

 

2) Enter http://109.123.100.83:8088    in the browser
     
          

   There must be data in the figure to indicate that the configuration is successful

At this time, if 5 uses jps to check that each node is normal, but 6 cannot be displayed, use the command ./bin/hdfs dfsadmin -report to troubleshoot according to the output info
7 Test
      1) Create a local file words.txt with the following content:
          Hello World!
          Hello China!
          Hello Jim
         Hello Tom
         The People's Republic Of China!
         Upload words.txt to HDFS root directory ${HADOOP_HOME}/bin/hadoop fs -put words.txt /
         use the command ${HADOOP_HOME}bin/hadoop fs -ls / You can view the corresponding file
         Found 1 items
         -rw-r--r-- 2 yangyong supergroup 57 2017-06-20 17:44 /words.txt

         At this point, you can   view the uploaded file at http://109.123.100.83:50070/explorer.html#/

      2)${HADOOP_HOME}/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /words.txt  /output_wordcount

       yangyong@zhaosy-HP-Compaq-Pro-6380-MT:~/SoftWare/BigData/Hadoop/hadoop-2.7.3$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /words.txt  /output_wordcount
17/06/20 17:19:46 INFO client.RMProxy: Connecting to ResourceManager at /109.123.100.83:8032
17/06/20 17:19:47 INFO input.FileInputFormat: Total input paths to process : 1
17/06/20 17:19:48 INFO mapreduce.JobSubmitter: number of splits:1
17/06/20 17:19:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1497947875315_0019
17/06/20 17:19:49 INFO impl.YarnClientImpl: Submitted application application_1497947875315_0019
17/06/20 17:19:49 INFO mapreduce.Job: The url to track the job: http://zhaosy-hp-compaq-pro-6380-mt:8088/proxy/application_1497947875315_0019/
17/06/20 17:19:49 INFO mapreduce.Job: Running job: job_1497947875315_0019
17/06/20 17:19:54 INFO mapreduce.Job: Job job_1497947875315_0019 running in uber mode : false
17/06/20 17:19:54 INFO mapreduce.Job:  map 0% reduce 0%
17/06/20 17:19:58 INFO mapreduce.Job:  map 100% reduce 0%
17/06/20 17:20:02 INFO mapreduce.Job:  map 100% reduce 100%
17/06/20 17:20:04 INFO mapreduce.Job: Job job_1497947875315_0019 completed successfully
17/06/20 17:20:04 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=74
                FILE: Number of bytes written=238345
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=158
                HDFS: Number of bytes written=44
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=1721
                Total time spent by all reduces in occupied slots (ms)=1933
                Total time spent by all map tasks (ms)=1721
                Total time spent by all reduce tasks (ms)=1933
                Total vcore-milliseconds taken by all map tasks=1721
                Total vcore-milliseconds taken by all reduce tasks=1933
                Total megabyte-milliseconds taken by all map tasks=1762304
                Total megabyte-milliseconds taken by all reduce tasks=1979392
        Map-Reduce Framework
                Map input records=5
                Map output records=10
                Map output bytes=97
                Map output materialized bytes=74
                Input split bytes=101
                Combine input records=10
                Combine output records=6
                Reduce input groups=6
                Reduce shuffle bytes=74
                Reduce input records=6
                Reduce output records=6
                Spilled Records=12
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=80
                CPU time spent (ms)=1070
                Physical memory (bytes) snapshot=440082432
                Virtual memory (bytes) snapshot=3844157440
                Total committed heap usage (bytes)=291504128
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=57
        File Output Format Counters
                Bytes Written=44


yangyong@zhaosy-HP-Compaq-Pro-6380-MT:~/SoftWare/BigData/Hadoop/hadoop-2.7.3$ ./bin/hadoop fs -cat  /output_wordcount/*
Bye     1
Hadoop  2
Hello   4
Jack    1
Tom     1
World   1

8 关闭
${HADOOP_HOME}//sbin/start-dfs.sh
${HADOOP_HOME}//sbin/start-yarn.sh

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326317318&siteId=291194637