Hadoop study notes _4: pseudo-distributed mode of operation mode

  • Pseudo-distributed mode

    Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

    Hadoop can also run on a single node in a pseudo-distributed mode, where each Hadoop daemon runs in a separate Java process.

    • Start HDFS and run the MapReduce program

      • Configure the cluster

        • Configure etc/hadoop/hadoop-env.shand modify the JAVA_HOMEpath to environment variables.

          [root@localhost hadoop]# vim hadoop-env.sh 
          # The only required environment variable is JAVA_HOME.  All others are
          # optional.  When running a distributed configuration it is best to
          # set JAVA_HOME in this file, so that it is correctly defined on
          # remote nodes.
          # 唯一需要的环境变量是JAVA_HOME。 所有其他均为可选。 运行分布式配置时,最好在此文件中设置JAVA_HOME,以便在远程节点上正确定义它。
          # The java implementation to use.
          export JAVA_HOME=/opt/module/jdk1.8.0_144
          
        • Configure etc/hadoop/core-site.xml, specify the nameNodeaddress and temporary file directory.

          [root@localhost hadoop]# vim core-site.xml
          <configuration>
          <!-- 指定 HDFS 中 NameNode 的地址 -->
          <property>
              <name>fs.defaultFS</name>
              <value>hdfs://192.168.116.100:9000</value>
          </property>
          <!-- 指定 Hadoop 运行时产生文件的存储目录 -->
          <property>
              <name>hadoop.tmp.dir</name>
              <value>/opt/module/hadoop-2.7.2/data/tmp</value>
          </property>
          </configuration>
          

          Here, since the mapping of the hosts file is not configured in hdfs, the IP address configuration is used.

        • Configure etc/hadoop/hdfs-site.xml, configure the number of copies, the default is 3 [the copy here is a local setting, and other nodes are automatically backed up].

          [root@localhost hadoop]# vim hdfs-site.xml
          <configuration>
          <!-- 指定 HDFS 副本的数量 -->
          <property>
              <name>dfs.replication</name>
              <value>1</value>
          </property>
          </configuration>
          
      • Start the cluster

        • Format the NameNode (it needs to be formatted the first time it starts)

          [root@localhost hadoop-2.7.2]# bin/hdfs namenode -format
          
        • Dynamic NameNode, DataNode

        [root@localhost hadoop-2.7.2]# sbin/hadoop-daemon.sh start namenode
        starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-localhost.localdomain.out
        [root@localhost hadoop-2.7.2]# sbin/hadoop-daemon.sh start datanode
        starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-localhost.localdomain.out
        
      • View cluster

        • See if started successfully ( jpsis JDKthe command to set the environment variable can be used after completion)

          [root@localhost hadoop-2.7.2]# jps
          1362 DataNode
          1461 Jps
          1308 NameNode
          
        • To view the HDFSfile system through the web , here is winthe access performed by the browser under the host. Since the hosts-related mapping is not configured, the access is directly performed through the IP.

          http://192.168.116.100:50070/dfshealth.html#tab-overview
          Insert picture description here

        • View the generated log

          [root@localhost logs]# ll
          总用量 72
          -rw-r--r-- 1 root root 25277 7月   5 20:10 hadoop-root-datanode-localhost.localdomain.log
          -rw-r--r-- 1 root root   714 7月   5 19:52 hadoop-root-datanode-localhost.localdomain.out
          -rw-r--r-- 1 root root 30915 7月   5 20:10 hadoop-root-namenode-localhost.localdomain.log
          -rw-r--r-- 1 root root  5002 7月   5 20:00 hadoop-root-namenode-localhost.localdomain.out
          -rw-r--r-- 1 root root     0 7月   5 19:52 SecurityAuth-root.audit
          [root@localhost logs]# cat hadoop-root-datanode-localhost.localdomain.log 
          
        • Please note when formatting the NameNode:

          • Enter the specified storage directory of the files generated by the Hadoop runtime:

          • /namenameNode

          [root@localhost hadoop-2.7.2]# cd data/tmp/dfs/name/current/
          [root@localhost current]# ll
          总用量 1040
          -rw-r--r-- 1 root root 1048576 7月   5 20:10 edits_inprogress_0000000000000000001
          -rw-r--r-- 1 root root     350 7月   5 19:50 fsimage_0000000000000000000
          -rw-r--r-- 1 root root      62 7月   5 19:50 fsimage_0000000000000000000.md5
          -rw-r--r-- 1 root root       2 7月   5 19:52 seen_txid
          -rw-r--r-- 1 root root     201 7月   5 19:50 VERSION
          [root@localhost current]# cat VERSION 
          #Sun Jul 05 19:50:24 CST 2020
          namespaceID=253643691
          clusterID=CID-53139122-7fe0-405f-bdde-522fbfa9fe95
          cTime=0
          storageType=NAME_NODE
          blockpoolID=BP-1432435135-127.0.0.1-1593949824604
          layoutVersion=-63
          
          • /datadataNode
          [root@localhost hadoop-2.7.2]# cd data/tmp/dfs/data/current/
          [root@localhost current]# ll
          总用量 4
          drwx------ 4 root root  54 7月   5 19:52 BP-1432435135-127.0.0.1-1593949824604
          -rw-r--r-- 1 root root 229 7月   5 19:52 VERSION
          [root@localhost current]# cat VERSION 
          #Sun Jul 05 19:52:36 CST 2020
          storageID=DS-9a858421-29ac-4778-b625-6881374acfd6
          clusterID=CID-53139122-7fe0-405f-bdde-522fbfa9fe95
          cTime=0
          datanodeUuid=acc2d611-bd06-4a73-94e8-9672fed10714
          storageType=DATA_NODE
          layoutVersion=-56
          

          It can be found that the clusterID in the nameNode and dataNode are the same, and they need to be consistent in HDFS to communicate. Random formatting of the nameNode will cause the clusterID of the nameNode to change, which cannot be consistent with the dataNode, resulting in inability to communicate and obtain data. Therefore, when formatting nameNode, you need to delete data and log data, and then perform namenode -formatoperations.

      • Operating the cluster

        • HDFSCreate an input folder (input) in the file system

          [root@localhost hadoop-2.7.2]# bin/hdfs dfs -mkdir -p /user/bcxtm/input
          
        • Upload the test file to the file system:-put

          [root@localhost hadoop-2.7.2]# bin/hdfs dfs -put wcinput/wc.input /user/bcxtm/input/
          

          Insert picture description here

        • Run the MapReduceprogram to implement the wordcount case again

          [root@localhost hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/bcxtm/input/ /user/bcxtm/output
          

        Insert picture description here

        • Download the test output file to the local:-get

          [root@localhost hadoop-2.7.2]# hdfs dfs -get /user/bcxtm/output/part-r-00000 /wcoutput/
          get: `/wcoutput/': No such file or directory
          [root@localhost hadoop-2.7.2]# mkdir wcoutput
          [root@localhost hadoop-2.7.2]# hdfs dfs -get /user/bcxtm/output/part-r-00000 ./wcoutput/
          # 查看下载到本地的测试输出文件
          [root@localhost hadoop-2.7.2]# cat wcoutput/part-r-00000 
          Alibaba	1
          Baidu	1
          Bcxtm	3
          ByteDance	1
          lisi	1
          wangwu	2
          zhangsan	1
          
    • Start YARN and run the MapReduce program

      • Configure the cluster

        • Configure etc/hadoop/yarn-env.shand modify the JAVA_HOMEpath to environment variables.

          [root@localhost hadoop]# vim yarn-env.sh 
          [root@localhost hadoop]# cat yarn-env.sh 
          # some Java parameters
          # export JAVA_HOME=/home/y/libexec/jdk1.6.0/
          if [ "$JAVA_HOME" != "" ]; then
            #echo "run java in $JAVA_HOME"
            JAVA_HOME=/opt/module/jdk1.8.0_144
          
        • Configuration etc/hadoop/yarn-site.xml, nodeManagerand resourceManager. Here the ResourceManageraddress is still configured using IP address.

          [root@localhost hadoop]# vim yarn-site.xml 
          [root@localhost hadoop]# cat yarn-site.xml 
          <configuration>
          <!-- Site specific YARN configuration properties -->
          <!-- Reducer 获取数据的方式 -->
          <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
          </property>
          <!-- 指定 YARN 的 ResourceManager 的地址 -->
          <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>192.168.116.100</value>
          </property>
          </configuration>
          
        • Configure etc/hadoop/mapred-env.shand modify the JAVA_HOMEpath to environment variables.

          [root@localhost hadoop]# vim mapred-env.sh
          [root@localhost hadoop]# cat mapred-env.sh 
          # export JAVA_HOME=/home/y/libexec/jdk1.6.0/
          export JAVA_HOME=/opt/module/jdk1.8.0_144
          export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
          export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
          
        • Configuration etc/hadoop/mapred-site.xml, rename the corresponding file of the template configuration.

          [root@localhost hadoop]# ll
          ## ...
          -rw-r--r-- 1 root root   758 5月  22 2017 mapred-site.xml.template
          [root@localhost hadoop]# mv mapred-site.xml.template mapred-site.xml
          [root@localhost hadoop]# ll
          ## ...
          -rw-r--r-- 1 root root   758 5月  22 2017 mapred-site.xml
          [root@localhost hadoop]# vim mapred-site.xml 
          [root@localhost hadoop]# cat mapred-site.xml 
          <configuration>
          <!-- 指定 MR 运行在 YARN 上 -->
          <property>
              <name>mapreduce.framework.name</name>
              <value>yarn</value>
          </property>
          </configuration>
          
      • Start the cluster

        • It must be ensured that the NameNode and DataNode have been started before starting

          [root@localhost hadoop]# jps
          1936 Jps
          1362 DataNode
          1308 NameNode
          
        • Start ResourceManagerandNodeManager

          [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh start resourcemanager
          starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-resourcemanager-localhost.localdomain.out
          [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh start nodemanager
          starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-localhost.localdomain.out
          [root@localhost hadoop-2.7.2]# jps
          2081 Jps
          1362 DataNode
          1308 NameNode
          1964 ResourceManager
          2014 NodeManager
          
      • Cluster operation

        • By webviewing: http://192.168.116.100:8088/cluster

        Insert picture description here

      • Configure history server

        • Configure mapred-site.xml, add historical server address and web address

        Insert picture description here

        [root@localhost hadoop]# vim mapred-site.xml 
        [root@localhost hadoop]# cat mapred-site.xml 
        <configuration>
        <!-- 指定 MR 运行在 YARN 上 -->
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <!-- 历史服务器端地址 -->
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>192.168.116.100:10020</value>
        </property>
        <!-- 历史服务器 web 端地址 -->
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>192.168.116.100:19888</value>
        </property>
        </configuration>
        
        • Start history server

          [root@localhost hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh start historyserver
          starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-root-historyserver-localhost.localdomain.out
          [root@localhost hadoop-2.7.2]# jps
          1362 DataNode
          2474 JobHistoryServer
          2507 Jps
          1308 NameNode
          1964 ResourceManager
          2014 NodeManager
          
        • By webviewing: http://192.168.116.100:19888/jobhistory

        Insert picture description here

      • Configure log aggregation (after the application is completed, upload the program operation log information to the HDFS system)

        Note: To enable the log aggregation function, you need to restart NodeManager, ResourceManager and
        HistoryServer

        • Configure yarn-site.xml, set the log aggregation function and expiration time (seconds)

          Insert picture description here

          [root@localhost hadoop]# vim yarn-site.xml 
          [root@localhost hadoop]# cat yarn-site.xml 
          <configuration>
          
          <!-- Site specific YARN configuration properties -->
          <!-- Reducer 获取数据的方式 -->
          <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
          </property>
          <!-- 指定 YARN 的 ResourceManager 的地址 -->
          <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>192.168.116.100</value>
          </property>
          <!-- 日志聚集功能使能 -->
          <property>
              <name>yarn.log-aggregation-enable</name>
              <value>true</value>
          </property>
          <!-- 日志保留时间设置 7 天 -->
          <property>
              <name>yarn.log-aggregation.retain-seconds</name>
              <value>604800</value>
          </property>
          </configuration>
          
        • Close NodeManager, ResourceManager and HistoryServer

          [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh stop nodemanager
          stopping nodemanager
          [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh stop resourcemanager
          stopping resourcemanager
          [root@localhost hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh stop historyserver
          stopping historyserver
          [root@localhost hadoop-2.7.2]# jps
          1362 DataNode
          2664 Jps
          1308 NameNode
          
        • Start NodeManager, ResourceManager and HistoryServer

          [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh start nodemanager
          starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-localhost.localdomain.out
          [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh start resourcemanager
          starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-resourcemanager-localhost.localdomain.out
          [root@localhost hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh start historyserver
          starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-root-historyserver-localhost.localdomain.out
          [root@localhost hadoop-2.7.2]# jps
          1362 DataNode
          2819 ResourceManager
          2965 JobHistoryServer
          2998 Jps
          2697 NodeManager
          1308 NameNode
          
      • Delete HDFSthe output file in the file system to facilitate subsequent re-execution of the MapReduce program

        [root@localhost hadoop-2.7.2]# hdfs dfs -rm -r /user/bcxtm/output
        20/07/05 21:48:10 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
        Deleted /user/bcxtm/output
        
      • Re-execute the MapReduce program

        [root@localhost hadoop-2.7.2]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/bcxtm/input /user/bcxtm/output
        20/07/05 22:09:30 INFO client.RMProxy: Connecting to ResourceManager at /192.168.116.100:8032
        20/07/05 22:09:36 INFO input.FileInputFormat: Total input paths to process : 1
        20/07/05 22:09:36 INFO mapreduce.JobSubmitter: number of splits:1
        20/07/05 22:09:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1593957936940_0001
        20/07/05 22:10:12 INFO impl.YarnClientImpl: Submitted application application_1593957936940_0001
        20/07/05 22:10:37 INFO mapreduce.Job: The url to track the job: http://192.168.116.100:8088/proxy/application_1593957936940_0001/
        20/07/05 22:10:37 INFO mapreduce.Job: Running job: job_1593957936940_0001
        20/07/05 22:10:43 INFO mapreduce.Job: Job job_1593957936940_0001 running in uber mode : false
        20/07/05 22:10:43 INFO mapreduce.Job:  map 0% reduce 0%
        20/07/05 22:10:53 INFO mapreduce.Job:  map 100% reduce 0%
        20/07/05 22:11:21 INFO mapreduce.Job:  map 100% reduce 100%
        20/07/05 22:11:31 INFO mapreduce.Job: Job job_1593957936940_0001 completed successfully
        20/07/05 22:11:31 INFO mapreduce.Job: Counters: 49
        

        It can be seen that the MapReduce program execution through YRAN will create a job and then perform a running process of Map and then Reduce. Finally, you can see the execution status and historical information of this task through the web page.
        Insert picture description here

      • View historical server information
        Insert picture description here

      • View log aggregation information
        Insert picture description here

Guess you like

Origin blog.csdn.net/Nerver_77/article/details/107146549