-
Pseudo-distributed mode
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Hadoop can also run on a single node in a pseudo-distributed mode, where each Hadoop daemon runs in a separate Java process.
-
Start HDFS and run the MapReduce program
-
Configure the cluster
-
Configure
etc/hadoop/hadoop-env.sh
and modify theJAVA_HOME
path to environment variables.[root@localhost hadoop]# vim hadoop-env.sh # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # 唯一需要的环境变量是JAVA_HOME。 所有其他均为可选。 运行分布式配置时,最好在此文件中设置JAVA_HOME,以便在远程节点上正确定义它。 # The java implementation to use. export JAVA_HOME=/opt/module/jdk1.8.0_144
-
Configure
etc/hadoop/core-site.xml
, specify thenameNode
address and temporary file directory.[root@localhost hadoop]# vim core-site.xml <configuration> <!-- 指定 HDFS 中 NameNode 的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.116.100:9000</value> </property> <!-- 指定 Hadoop 运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.7.2/data/tmp</value> </property> </configuration>
Here, since the mapping of the hosts file is not configured in hdfs, the IP address configuration is used.
-
Configure
etc/hadoop/hdfs-site.xml
, configure the number of copies, the default is 3 [the copy here is a local setting, and other nodes are automatically backed up].[root@localhost hadoop]# vim hdfs-site.xml <configuration> <!-- 指定 HDFS 副本的数量 --> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
-
-
Start the cluster
-
Format the NameNode (it needs to be formatted the first time it starts)
[root@localhost hadoop-2.7.2]# bin/hdfs namenode -format
-
Dynamic NameNode, DataNode
[root@localhost hadoop-2.7.2]# sbin/hadoop-daemon.sh start namenode starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-localhost.localdomain.out [root@localhost hadoop-2.7.2]# sbin/hadoop-daemon.sh start datanode starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-localhost.localdomain.out
-
-
View cluster
-
See if started successfully (
jps
isJDK
the command to set the environment variable can be used after completion)[root@localhost hadoop-2.7.2]# jps 1362 DataNode 1461 Jps 1308 NameNode
-
To view the
HDFS
file system through the web , here iswin
the access performed by the browser under the host. Since the hosts-related mapping is not configured, the access is directly performed through the IP.http://192.168.116.100:50070/dfshealth.html#tab-overview
-
View the generated log
[root@localhost logs]# ll 总用量 72 -rw-r--r-- 1 root root 25277 7月 5 20:10 hadoop-root-datanode-localhost.localdomain.log -rw-r--r-- 1 root root 714 7月 5 19:52 hadoop-root-datanode-localhost.localdomain.out -rw-r--r-- 1 root root 30915 7月 5 20:10 hadoop-root-namenode-localhost.localdomain.log -rw-r--r-- 1 root root 5002 7月 5 20:00 hadoop-root-namenode-localhost.localdomain.out -rw-r--r-- 1 root root 0 7月 5 19:52 SecurityAuth-root.audit [root@localhost logs]# cat hadoop-root-datanode-localhost.localdomain.log
-
Please note when formatting the NameNode:
-
Enter the specified storage directory of the files generated by the Hadoop runtime:
-
/name
,nameNode
[root@localhost hadoop-2.7.2]# cd data/tmp/dfs/name/current/ [root@localhost current]# ll 总用量 1040 -rw-r--r-- 1 root root 1048576 7月 5 20:10 edits_inprogress_0000000000000000001 -rw-r--r-- 1 root root 350 7月 5 19:50 fsimage_0000000000000000000 -rw-r--r-- 1 root root 62 7月 5 19:50 fsimage_0000000000000000000.md5 -rw-r--r-- 1 root root 2 7月 5 19:52 seen_txid -rw-r--r-- 1 root root 201 7月 5 19:50 VERSION [root@localhost current]# cat VERSION #Sun Jul 05 19:50:24 CST 2020 namespaceID=253643691 clusterID=CID-53139122-7fe0-405f-bdde-522fbfa9fe95 cTime=0 storageType=NAME_NODE blockpoolID=BP-1432435135-127.0.0.1-1593949824604 layoutVersion=-63
/data
,dataNode
[root@localhost hadoop-2.7.2]# cd data/tmp/dfs/data/current/ [root@localhost current]# ll 总用量 4 drwx------ 4 root root 54 7月 5 19:52 BP-1432435135-127.0.0.1-1593949824604 -rw-r--r-- 1 root root 229 7月 5 19:52 VERSION [root@localhost current]# cat VERSION #Sun Jul 05 19:52:36 CST 2020 storageID=DS-9a858421-29ac-4778-b625-6881374acfd6 clusterID=CID-53139122-7fe0-405f-bdde-522fbfa9fe95 cTime=0 datanodeUuid=acc2d611-bd06-4a73-94e8-9672fed10714 storageType=DATA_NODE layoutVersion=-56
It can be found that the clusterID in the nameNode and dataNode are the same, and they need to be consistent in HDFS to communicate. Random formatting of the nameNode will cause the clusterID of the nameNode to change, which cannot be consistent with the dataNode, resulting in inability to communicate and obtain data. Therefore, when formatting nameNode, you need to delete data and log data, and then perform
namenode -format
operations. -
-
-
Operating the cluster
-
HDFS
Create an input folder (input) in the file system[root@localhost hadoop-2.7.2]# bin/hdfs dfs -mkdir -p /user/bcxtm/input
-
Upload the test file to the file system:
-put
[root@localhost hadoop-2.7.2]# bin/hdfs dfs -put wcinput/wc.input /user/bcxtm/input/
-
Run the
MapReduce
program to implement the wordcount case again[root@localhost hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/bcxtm/input/ /user/bcxtm/output
-
Download the test output file to the local:
-get
[root@localhost hadoop-2.7.2]# hdfs dfs -get /user/bcxtm/output/part-r-00000 /wcoutput/ get: `/wcoutput/': No such file or directory [root@localhost hadoop-2.7.2]# mkdir wcoutput [root@localhost hadoop-2.7.2]# hdfs dfs -get /user/bcxtm/output/part-r-00000 ./wcoutput/ # 查看下载到本地的测试输出文件 [root@localhost hadoop-2.7.2]# cat wcoutput/part-r-00000 Alibaba 1 Baidu 1 Bcxtm 3 ByteDance 1 lisi 1 wangwu 2 zhangsan 1
-
-
-
Start YARN and run the MapReduce program
-
Configure the cluster
-
Configure
etc/hadoop/yarn-env.sh
and modify theJAVA_HOME
path to environment variables.[root@localhost hadoop]# vim yarn-env.sh [root@localhost hadoop]# cat yarn-env.sh # some Java parameters # export JAVA_HOME=/home/y/libexec/jdk1.6.0/ if [ "$JAVA_HOME" != "" ]; then #echo "run java in $JAVA_HOME" JAVA_HOME=/opt/module/jdk1.8.0_144
-
Configuration
etc/hadoop/yarn-site.xml
,nodeManager
andresourceManager
. Here theResourceManager
address is still configured using IP address.[root@localhost hadoop]# vim yarn-site.xml [root@localhost hadoop]# cat yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <!-- Reducer 获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定 YARN 的 ResourceManager 的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.116.100</value> </property> </configuration>
-
Configure
etc/hadoop/mapred-env.sh
and modify theJAVA_HOME
path to environment variables.[root@localhost hadoop]# vim mapred-env.sh [root@localhost hadoop]# cat mapred-env.sh # export JAVA_HOME=/home/y/libexec/jdk1.6.0/ export JAVA_HOME=/opt/module/jdk1.8.0_144 export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000 export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
-
Configuration
etc/hadoop/mapred-site.xml
, rename the corresponding file of the template configuration.[root@localhost hadoop]# ll ## ... -rw-r--r-- 1 root root 758 5月 22 2017 mapred-site.xml.template [root@localhost hadoop]# mv mapred-site.xml.template mapred-site.xml [root@localhost hadoop]# ll ## ... -rw-r--r-- 1 root root 758 5月 22 2017 mapred-site.xml [root@localhost hadoop]# vim mapred-site.xml [root@localhost hadoop]# cat mapred-site.xml <configuration> <!-- 指定 MR 运行在 YARN 上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
-
Start the cluster
-
It must be ensured that the NameNode and DataNode have been started before starting
[root@localhost hadoop]# jps 1936 Jps 1362 DataNode 1308 NameNode
-
Start
ResourceManager
andNodeManager
[root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-resourcemanager-localhost.localdomain.out [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh start nodemanager starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-localhost.localdomain.out [root@localhost hadoop-2.7.2]# jps 2081 Jps 1362 DataNode 1308 NameNode 1964 ResourceManager 2014 NodeManager
-
-
Cluster operation
- By
web
viewing: http://192.168.116.100:8088/cluster
- By
-
Configure history server
- Configure
mapred-site.xml
, add historical server address and web address
[root@localhost hadoop]# vim mapred-site.xml [root@localhost hadoop]# cat mapred-site.xml <configuration> <!-- 指定 MR 运行在 YARN 上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 历史服务器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>192.168.116.100:10020</value> </property> <!-- 历史服务器 web 端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.116.100:19888</value> </property> </configuration>
-
Start history server
[root@localhost hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-root-historyserver-localhost.localdomain.out [root@localhost hadoop-2.7.2]# jps 1362 DataNode 2474 JobHistoryServer 2507 Jps 1308 NameNode 1964 ResourceManager 2014 NodeManager
-
By
web
viewing: http://192.168.116.100:19888/jobhistory
- Configure
-
Configure log aggregation (after the application is completed, upload the program operation log information to the HDFS system)
Note: To enable the log aggregation function, you need to restart NodeManager, ResourceManager and
HistoryServer-
Configure
yarn-site.xml
, set the log aggregation function and expiration time (seconds)[root@localhost hadoop]# vim yarn-site.xml [root@localhost hadoop]# cat yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <!-- Reducer 获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定 YARN 的 ResourceManager 的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.116.100</value> </property> <!-- 日志聚集功能使能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志保留时间设置 7 天 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>
-
Close NodeManager, ResourceManager and HistoryServer
[root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh stop nodemanager stopping nodemanager [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh stop resourcemanager stopping resourcemanager [root@localhost hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh stop historyserver stopping historyserver [root@localhost hadoop-2.7.2]# jps 1362 DataNode 2664 Jps 1308 NameNode
-
Start NodeManager, ResourceManager and HistoryServer
[root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh start nodemanager starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-localhost.localdomain.out [root@localhost hadoop-2.7.2]# sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-resourcemanager-localhost.localdomain.out [root@localhost hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-root-historyserver-localhost.localdomain.out [root@localhost hadoop-2.7.2]# jps 1362 DataNode 2819 ResourceManager 2965 JobHistoryServer 2998 Jps 2697 NodeManager 1308 NameNode
-
-
Delete
HDFS
the output file in the file system to facilitate subsequent re-execution of the MapReduce program[root@localhost hadoop-2.7.2]# hdfs dfs -rm -r /user/bcxtm/output 20/07/05 21:48:10 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /user/bcxtm/output
-
Re-execute the MapReduce program
[root@localhost hadoop-2.7.2]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/bcxtm/input /user/bcxtm/output 20/07/05 22:09:30 INFO client.RMProxy: Connecting to ResourceManager at /192.168.116.100:8032 20/07/05 22:09:36 INFO input.FileInputFormat: Total input paths to process : 1 20/07/05 22:09:36 INFO mapreduce.JobSubmitter: number of splits:1 20/07/05 22:09:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1593957936940_0001 20/07/05 22:10:12 INFO impl.YarnClientImpl: Submitted application application_1593957936940_0001 20/07/05 22:10:37 INFO mapreduce.Job: The url to track the job: http://192.168.116.100:8088/proxy/application_1593957936940_0001/ 20/07/05 22:10:37 INFO mapreduce.Job: Running job: job_1593957936940_0001 20/07/05 22:10:43 INFO mapreduce.Job: Job job_1593957936940_0001 running in uber mode : false 20/07/05 22:10:43 INFO mapreduce.Job: map 0% reduce 0% 20/07/05 22:10:53 INFO mapreduce.Job: map 100% reduce 0% 20/07/05 22:11:21 INFO mapreduce.Job: map 100% reduce 100% 20/07/05 22:11:31 INFO mapreduce.Job: Job job_1593957936940_0001 completed successfully 20/07/05 22:11:31 INFO mapreduce.Job: Counters: 49
It can be seen that the MapReduce program execution through YRAN will create a job and then perform a running process of Map and then Reduce. Finally, you can see the execution status and historical information of this task through the web page.
-
View historical server information
-
View log aggregation information
-
-
Hadoop study notes _4: pseudo-distributed mode of operation mode
Guess you like
Origin blog.csdn.net/Nerver_77/article/details/107146549
Recommended
Ranking