Linux cluster installation and set up Hadoop3.1.2 (5) - Hadoop operating mode

Chapter VI Hadoop operating mode

Hadoop operating modes include:

  • Local mode, pseudo-distributed mode and full distributed mode.
  • Hadoop official website:
http://hadoop.apache.org/

6.1 Local operating mode

6.1.1 official Grep Case

  1. Create Create a folder in the input file hadoop-3.1.2 below
[zpark@hadoop104 hadoop-3.1.2]$ mkdir input
  1. Copy Hadoop xml configuration file to input
[zpark@hadoop104 hadoop-3.1.2]$ cp etc/hadoop/*.xml input
  1. MapReduce execution procedures under the share directory
[zpark@hadoop104 hadoop-3.1.2]$ bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep input output 'dfs[a-z.]+'
  1. View output
[zpark@hadoop104 hadoop-3.1.2]$ cat output/*

6.1.2 official WordCount Case

  1. Create Create a folder in wcinput hadoop-3.1.2 file below
[zpark@hadoop104 hadoop-3.1.2]$ mkdir wcinput
  1. Wc.input create a file in the file wcinput
[zpark@hadoop104 hadoop-3.1.2]$ cd wcinput
[zpark@hadoop104 wcinput]$ touch wc.input
  1. Edit wc.input file
[zpark@hadoop104 wcinput]$ vi wc.input

Enter the following in the file

hadoop yarn
hadoop mapreduce
zhangyong
zhangyong

WQ save and exit ::
4. Back to the Hadoop directory /opt/module/hadoop-3.1.2
5. program execution

[zpark@hadoop104 hadoop-3.1.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar wordcount wcinput wcoutput
  1. View Results
[zpark@hadoop104 hadoop-3.1.2]$ cat wcoutput/part-r-00000
zhangyong 2
hadoop  2
mapreduce       1
yarn    1

Here Insert Picture Description

6.2 pseudo-distributed mode of operation

6.2.1 HDFS and MapReduce programs run

  1. Analysis
    (1) configure a cluster
    (2) start, test cluster add, delete, check
    (3) the implementation of WordCount Case
  2. Step
    (1) configuring the cluster
    (a) Configuration: hadoop-env.sh
    the Linux system for installation JDK path:
[zpark@hadoop104 ~]# echo $JAVA_HOME
/opt/module/jdk1.8.0_181

Modify JAVA_HOME path:

export JAVA_HOME=/opt/module/jdk1.8.0_181

(B) place: core-site.xml

<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
    <value>hdfs://hadoop104:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/opt/module/hadoop-3.1.2/data/tmp</value>
</property>

(C) Configuration: hdfs-site.xml

<!-- 指定HDFS副本的数量 -->
<property>
	<name>dfs.replication</name>
	<value>1</value>
</property>

(2) Start the cluster
(a) format NameNode (The first time you start the formatting, do not always after formatting)

[zpark@hadoop104 hadoop-3.1.2]$ bin/hdfs namenode -format

(B) start NameNode

[zpark@hadoop104 hadoop-3.1.2]$ sbin/hadoop-daemon.sh start namenode

(C) start DataNode

[zpark@hadoop104 hadoop-3.1.2]$ sbin/hadoop-daemon.sh start datanode

(3) View Cluster
(a) to see if a successful start

[zpark@hadoop104 hadoop-3.1.2]$ jps
13586 NameNode
13668 DataNode
13786 Jps

Note: jps in the JDK command, not a Linux command. JDK is not installed can not use the JPS
(B) an end view Web file system HDFS

http://hadoop104:50070/dfshealth.html#tab-overview

Note: If you can not see, see the following post processing

http://www.cnblogs.com/zlslch/p/6604189.html

Log Log (c) generated view
Description: Bug encountered in the enterprise, often based on the log message to analyze and solve Bug.
Current directory:

/opt/module/hadoop-3.1.2/logs
[zpark@hadoop104 logs]$ ls
hadoop-zhangyong-datanode-hadoop.zhangyong.com.log
hadoop-zhangyong-datanode-hadoop.zhangyong.com.out
hadoop-zhangyong-namenode-hadoop.zhangyong.com.log
hadoop-zhangyong-namenode-hadoop.zhangyong.com.out
SecurityAuth-root.audit
[zpark@hadoop104 logs]# cat hadoop-zhangyong-datanode-hadoop104.log

(D) thinking: Why not been formatted NameNode, formatting NameNode, pay attention to what?
Note: Formatting NameNode, will generate new cluster id, resulting in NameNode and DataNode cluster id is inconsistent, the cluster can not find past data. Therefore, when the NameNode formats, data must first delete the log data and the log, then format NameNode.
(4) Operating the cluster
(a) input to create a folder on the HDFS file system

[zpark@hadoop104 hadoop-3.1.2]$ bin/hdfs dfs -mkdir -p /user/zhangyong/input

(B) the test file upload content to the file system

[zpark@hadoop104 hadoop-3.1.2]$bin/hdfs dfs -put wcinput/wc.input
  /user/zhangyong/input/

(C) whether the correct view uploaded files

[zpark@hadoop104 hadoop-3.1.2]$ bin/hdfs dfs -ls  /user/zhangyong/input/
[zpark@hadoop104 hadoop-3.1.2]$ bin/hdfs dfs -cat  /user/zhangyong/ input/wc.input

(D) running in MapReduce

[zpark@hadoop104 hadoop-3.1.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar wordcount /user/zhangyong/input/ /user/zhangyong/output

(E) see the output
command View:

[zpark@hadoop104 hadoop-3.1.2]$ bin/hdfs dfs -cat /user/zhangyong/output/*

Browser to view
Here Insert Picture Description
(f) the contents of the file downloaded to the local test

[zpark@hadoop104 hadoop-3.1.2]$ hdfs dfs -get /user/zhangyong/output/part-r-00000 ./wcoutput/

(G) deleted output

[zpark@hadoop104 hadoop-3.1.2]$ hdfs dfs -rm -r /user/zhangyong/output

6.2.2 Start and run MapReduce programs YARN

  1. Analysis
    (1) configure the cluster to run on the MR YARN
    (2) increase start, test cluster, delete, check
    (3) in the case of execution WordCount YARN
  2. Perform steps
    (1) to configure a cluster
    (a) Configuration yarn-env.sh
    configure the look JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_181

(B) placed yarn-site.xml

<!-- Reducer获取数据的方式 -->
<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>hadoop104</value>
</property>

(C) Configuration: mapred-env.sh
configure the look JAVA_HOME

export JAVA_HOME=/opt/module/jdk1.8.0_181

(D) Configuration: (p mapred-site.xml.template renamed) mapred-site.xml

[zpark@hadoop104 hadoop]$ mv mapred-site.xml.template mapred-site.xml
[zpark@hadoop104 hadoop]$ vi mapred-site.xml
<!-- 指定MR运行在YARN上 -->
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>

(2) Start the cluster
(a) before the start must ensure NameNode and DataNode has been launched
(b) start ResourceManager

[zpark@hadoop104 hadoop-3.1.2]$ sbin/yarn-daemon.sh start resourcemanager

(C) start NodeManager

[zpark@hadoop104 hadoop-3.1.2]$ sbin/yarn-daemon.sh start nodemanager

(3) cluster operations
(a) YARN browser page view, as

http://hadoop104:8088/cluster

Here Insert Picture Description
Figure 2-35 YARN browser page of the
output file on the (b) to delete the file system

[zpark@hadoop104 hadoop-3.1.2]$ bin/hdfs dfs -rm -R /user/zhangyong/output

(C) implementation of MapReduce programs

[zpark@hadoop104 hadoop-3.1.2]$ bin/hadoop jar
 share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar wordcount /user/zhangyong/input  /user/zhangyong/output

(D) View operation result, as shown

[zpark@hadoop104 hadoop-3.1.2]$ bin/hdfs dfs -cat /user/zhangyong/output/*

Here Insert Picture Description

6.2.3 server configuration history

To view the history of the operation of the program, you need to configure the server history. The configuration procedure is as follows:

  1. Placed mapred-site.xml
[zpark@hadoop104 hadoop]$ vi mapred-site.xml

Increase follows the file inside.

<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop104:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop104:19888</value>
</property>
  1. Start the server history
[zpark@hadoop104 hadoop-3.1.2]$ sbin/mr-jobhistory-daemon.sh start historyserver
  1. View History server is started
[zpark@hadoop104 hadoop-3.1.2]$ jps
  1. View JobHistory
http://hadoop104:19888/jobhistory

6.2.4 gather configuration log

Log aggregation concept: After the completion of the application is running, the program will run on HDFS log information is uploaded to the system.
Log aggregation benefits: you can easily view the details of the program to run, easy development and debugging.
Note: Enable the log aggregation function, you need to restart NodeManager, ResourceManager and HistoryManager.
Enable the log aggregation following steps:

  1. Located yarn-site.xml
[zpark@hadoop101 hadoop]$ vi yarn-site.xml

Increase follows the file inside.

<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时间设置7-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
  1. Close NodeManager, ResourceManager and HistoryServer
[zpark@hadoop101 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop resourcemanager
[zpark@hadoop101 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop nodemanager
[zpark@hadoop101 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh stop historyserver
  1. Start NodeManager, ResourceManager and HistoryServer
[zpark@hadoop101 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
[zpark@hadoop101 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager
[zpark@hadoop101 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver
  1. Delete the output file already exists on HDFS
[zpark@hadoop101 hadoop-2.7.2]$ bin/hdfs dfs -rm -R /user/zpark/output
  1. WordCount program execution
[zpark@hadoop101 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/zpark/input /user/zpark/output
  1. View Log
    http: // hadoop104: 19888 / jobhistory
    Here Insert Picture Description
    Here Insert Picture Description
    Here Insert Picture Description

6.2.5 Profile Description

Hadoop configuration file divided into two categories: the default configuration files and custom configuration files, only a user wants to modify the default configuration values, only need to modify the custom configuration file, change the appropriate attribute values.
(1) the default configuration file:

To get the default file Hadoop file stored in a jar in the position
[core-default.xml] hadoop-common-2.7.2.jar/ core-default.xml
[hdfs-default.xml] hadoop-hdfs-2.7.2.jar/ hdfs-default.xml
[yarn-default.xml] hadoop-yarn-common-2.7.2.jar/ yarn-default.xml
[mapred-default.xml] hadoop-mapreduce-client-core-2.7.2.jar/ mapred-default.xml

(2) custom configuration file:

core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml

Four configuration files on $ HADOOP_HOME / etc / hadoop this path, the user can re-modify the configuration according to the project requirements.

Published 37 original articles · won praise 7 · views 1190

Guess you like

Origin blog.csdn.net/zy13765287861/article/details/104575812