An article teaches you to quickly understand the construction of pseudo-distributed clusters (super detailed!)

After the guidance of the previous article, I believe that the friends can already do the following operations


First of all, we need to know to configure pseudo-distributed cluster configuration file to modify
all configuration files are /opt/module/hadoop-2.7.2/etc/hadoop/within

1

  • 1. HDFS configuration files
Serial number file name
01 hadoop-env.sh
02 core-site.xml
03 hdfs-site.xml
  • 2. YARN configuration files
Serial number file name
01 yarn-env.sh
02 yarn-site.xml
03 mapred-env.sh
  • 3. Configure the history server
Serial number file name
01 mapred-site.xml
  • 4. Configure log aggregation
Serial number file name
01 yarn-site.xml

1. Start HDFS

1. Configure the cluster

  • 1. Configuration: hadoop-env.sh

① Get the installation path of JDK in Linux system (if you can remember the path can be omitted):

[bigdata@hadoop001 ~]$ echo $JAVA_HOME
/opt/module/jdk1.8.0_144

The following needs to modify the JAVA_HOME path:

export JAVA_HOME=/opt/module/jdk1.8.0_144

2

  • 2. Placement: core-site.xml
[bigdata@hadoop001 hadoop]$ vim core-site.xml 

<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
    <value>hdfs://hadoop001:9000</value>
</property>

<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>

3

  • 3. hdfs-site.xml
[bigdata@hadoop001 hadoop]$ vim hdfs-site.xml 

<!-- 指定HDFS副本的数量 -->
<property>
	<name>dfs.replication</name>
	<value>1</value>
</property>

2. Start the cluster

  • 1.Format NameNode(Format for the first boot, don't always format it later)
[bigdata@hadoop001 hadoop-2.7.2]$ bin/hdfs namenode -format

4
Same as above is correct.

  • 2. Start NaneNode and DataNode respectively
[bigdata@hadoop001 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode
[bigdata@hadoop001 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode

3. View the cluster

  • 1. Check if the startup is successful
    5
  • Note: jps is a command in the JDK, not a Linux command. Cannot use jps without installing JDK
  • 2. View HDFS file system on the web
    http: // hadoop001: 50070 / dfshealth.html # tab-overview
    5
  • 3. View log logs
    When encountering bugs in daily life, often analyze problems and resolve bugs according to the log prompts.
    Current directory: /opt/module/hadoop-2.7.2/logs
[bigdata@hadoop001 logs]$ ll

# 下面的为日志文件
总用量 220
-rw-rw-r--. 1 bigdata bigdata  82138 421 02:38 hadoop-bigdata-datanode-hadoop001.log
-rw-rw-r--. 1 bigdata bigdata    719 421 02:38 hadoop-bigdata-datanode-hadoop001.out
-rw-rw-r--. 1 bigdata bigdata    719 421 02:28 hadoop-bigdata-datanode-hadoop001.out.1
-rw-rw-r--. 1 bigdata bigdata 111269 421 02:38 hadoop-bigdata-namenode-hadoop001.log
-rw-rw-r--. 1 bigdata bigdata    719 421 02:38 hadoop-bigdata-namenode-hadoop001.out
-rw-rw-r--. 1 bigdata bigdata    719 421 02:36 hadoop-bigdata-namenode-hadoop001.out.1
-rw-rw-r--. 1 bigdata bigdata    719 421 02:30 hadoop-bigdata-namenode-hadoop001.out.2
-rw-rw-r--. 1 bigdata bigdata    719 421 02:28 hadoop-bigdata-namenode-hadoop001.out.3
-rw-rw-r--. 1 bigdata bigdata      0 421 02:28 SecurityAuth-bigdata.audit
[bigdata@hadoop001 logs]$ cat hadoop-bigdata-datanode-hadoop001.log 

2. Start YARN

1. Configure the cluster

  • 1. Configure yarn-env.sh to
    modify JAVA_HOME
[bigdata@hadoop001 hadoop]$ vim yarn-env.sh 

export JAVA_HOME=/opt/module/jdk1.8.0_144

6

  • 2. Placement yarn-site.xml
[bigdata@hadoop001 hadoop]$ yarn-site.xml

<!-- Reducer获取数据的方式 -->
<property>
 		<name>yarn.nodemanager.aux-services</name>
 		<value>mapreduce_shuffle</value>
</property>

<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>2</value>   
 </property>
 
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop001</value>
</property>

7

  • 3. Configuration: mapred-env.sh
    modify JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144

7

  • 4. Configuration: (Rename mapred-site.xml.template to) mapred-site.xml
[bigdata@hadoop001 hadoop]$ mv mapred-site.xml.template mapred-site.xml
[bigdata@hadoop001 hadoop]$ vim mapred-site.xml

<!-- 指定MR运行在YARN上 -->
<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
</property>

2. Start the cluster

  • 1. Make sure NameNode and DataNode have been started before starting
  • 2. Start ResourceManager and NodeManager respectively
# 启动服务
[bigdata@hadoop001 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-bigdata-resourcemanager-hadoop001.out
[bigdata@hadoop001 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-bigdata-nodemanager-hadoop001.out

# 查看是否启动成功
[bigdata@hadoop001 hadoop-2.7.2]$ jps
3414 DataNode
3993 ResourceManager
3722 NodeManager
3327 NameNode
4159 Jps

3. View on the web

View on YARN's browser page: http: // hadoop001: 8088 / cluster
8

3. Configure the history server

If you want to view the history of the program, you need to configure the history server. The specific configuration steps are as follows:

1. Deployment mapred-site.xml

[bigdata@hadoop001 hadoop]$ vim mapred-site.xml

# 在该文件里面增加如下配置。
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop001:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop001:19888</value>
</property>

2. Start the history server

[bigdata@hadoop001 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver

3. Check whether the history server is started

[bigdata@hadoop001 hadoop-2.7.2]$ jps
4304 JobHistoryServer
26210 Jps
3414 DataNode
3993 ResourceManager
3327 NameNode
4495 NodeManager

4. Check whether the history server is started on the web

http://hadoop001:19888/jobhistory
9

4. Configure log aggregation

Log aggregation concept: After the application is completed, upload the program operation log information to the HDFS system.
Benefits of the log aggregation function: You can easily view the details of the program operation, which is convenient for development and debugging.

Note: To enable the log aggregation function, you need to restart NodeManager, ResourceManager, and HistoryManager.
The following are the specific steps to enable the log aggregation function:

1. Placement yarn-site.xml

[bigdata@hadoop001 hadoop]$ vim yarn-site.xml

# 在该文件里面增加如下配置。
<!-- 日志聚集功能使能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<!-- 日志保留时间设置7-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>

2. Close NodeManager, ResourceManager and HistoryServer

[bigdata@hadoop001 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[bigdata@hadoop001 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager
[bigdata@hadoop001 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh stop historyserver
stopping historyserver


3. Start NodeManager, ResourceManager and HistoryServer

[bigdata@hadoop001 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-bigdata-resourcemanager-hadoop001.out
[bigdata@hadoop001 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-bigdata-nodemanager-hadoop001.out
[bigdata@hadoop001 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-bigdata-historyserver-hadoop001.out

4. Delete the output file that already exists on HDFS (you can not run this step if it is not running)

[bigdata@hadoop001 hadoop-2.7.2]$ bin/hdfs dfs -rm -R /user/bigdata/output

5. Run the WordCount program

# 如果没有input 可先创建
[bigdata@hadoop001 hadoop-2.7.2]$ bin/hdfs dfs -mkdir -p /user/bigdata/input

# 运行程序
[bigdata@hadoop001 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/bigdata/input /user/bigdata/output

10

6. View logs

http://hadoop001:19888/jobhistory

  • 1. Job History

11

  • 2. Job running status
    12
  • 3. View logs
    13

Dear friends, if you think you can learn something, please like it before you go. Welcome to the comments of the big brothers who pass by, correct the mistakes, and welcome the friends who have problems to leave comments and private messages. Every little friend's attention is my motivation to update my blog! ! !

Published 60 original articles · 67 praises · 20,000+ views

Guess you like

Origin blog.csdn.net/qq_16146103/article/details/105640196