Sword refers to data warehouse-Hadoop II

1. The last course review

2. The second lesson of Hadoop

3. Homework for this course

1. The last course review

  • https://blog.csdn.net/SparkOnYarn/article/details/104997202
  • Apache and cloudera hadoop, why use cloudera company, because of compatibility; cdh version as long as the small version number is no problem.
  • The hadoop software is mainly divided into 3 fast, storage, calculation, resource scheduling; Intranet ip + hostname is configured in / etc / hosts; three files core-site.xml, hdfs-site.xml, slaves need to be modified; ssh No password trust relationship, note which user starts, authorized_keys pays 600 permissions; key file pay attention to which address starts, if it is not correct, you need to delete the corresponding line in known_hosts, restart and enter yes to trust.

2. The second lesson of Hadoop

2.1, Yarn's single node deployment

1. Reference connection website:

  • https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node
1、拷贝一份做备份:
[hadoop@hadoop001 hadoop]$ cp mapred-site.xml.template mapred-site.xml

2、vi mapred-site.xml
 	<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

3、编辑yarn-site.xml
	<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

###### 注意
//如果开放8088端口,一般为了避免机器被挖矿,直接修改yarn的端口号;进入到yarn-default.xml,搜索8088,在yarn-site.xml中增加如下:
	<property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>hadoop001:38088</value>
    </property>  

2、由于我们sbin已经配置了环境变量,所以使用which start-yarn.sh能够直接找到
[hadoop@hadoop001 hadoop]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-resourcemanager-hadoop001.out
hadoop001: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-nodemanager-hadoop001.out
[hadoop@hadoop001 hadoop]$ jps
2240 Jps
15074 NameNode
1779 ResourceManager
15206 DataNode
15384 SecondaryNameNode
1881 NodeManager

[hadoop@hadoop001 hadoop]$ netstat -nlp|grep 1779
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 :::8030                     :::*                        LISTEN      1779/java           
tcp        0      0 :::8031                     :::*                        LISTEN      1779/java           
tcp        0      0 :::8032                     :::*                        LISTEN      1779/java           
tcp        0      0 :::8033                     :::*                        LISTEN      1779/java           
tcp        0      0 ::ffff:172.17.0.5:38088     :::*                        LISTEN      1779/java           

2. Open the web interface to check whether the deployment is successful:
Insert picture description here

2.2. Yarn uses wordcount for word frequency statistics && how to know that the job runs successfully without going through the web interface

1. Case presentation:

1、find查找hadoop目录下存在的jar包:
[hadoop@hadoop001 hadoop]$ find ./ -name "*example*.jar"
./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-sources.jar
./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-test-sources.jar
./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
./share/hadoop/mapreduce1/hadoop-examples-2.6.0-mr1-cdh5.16.2.jar

2、一步步的使用如下这个命令:
[hadoop@hadoop001 hadoop]$ hadoop jar ./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount /wordcount/input/wordcount.log /wordcount/output

2. Display under the web interface:
Insert picture description here
Insert picture description here

3. How to know that the job is running successfully without going through the web interface?

1、查看到这个文件是0字节
[hadoop@hadoop001 data]$ hdfs dfs -ls /wordcount/output
20/03/24 23:32:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2020-03-24 23:27 /wordcount/output/_SUCCESS

4. Use the command to view the output on hdfs:

[hadoop@hadoop001 data]$ hdfs dfs -cat /wordcount/output/part-r-00000
20/03/24 23:34:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hello   3
john    1
world   2
Summarize at this point:

HDFS: store the calculation results and return to the
MR jar package stored on HDFS : calculation logic
Yarn: resource + job scheduling

2.3 What is the current big data storage and calculation?

Storage: HDFS (distributed file system), Hive, HBase, Kudu, Cassandra (not a big data component)

Calculation: MR programming, Hivesql will be used for demonstration later, Spark, Flink

Resources + job scheduling: Yarn

2.4, set the hostname of the cloud host

1. Modifications in CentOS6:

1、vi /etc/sysconfig/network
# Created by cloud-init on instance boot automatically, do not edit.
#
NETWORKING=yes
HOSTNAME=hadoop001

2. Modify in CentOS:

hostnamectl set-hostname hadoop001

After that, you need to restart.

2.5, the real use of jps

1. Where can I see the jps process identification file?

1、j指的是Java,ps就是获取Java的命令:
[root@hadoop001 ~]# which jps
/usr/java/jdk1.8.0_45/bin/jps

2、jps -l	展示出更详细的信息:
[hadoop@hadoop001 ~]$ jps -l	
15074 org.apache.hadoop.hdfs.server.namenode.NameNode
1779 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
15206 org.apache.hadoop.hdfs.server.datanode.DataNode
15384 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
1881 org.apache.hadoop.yarn.server.nodemanager.NodeManager
7738 sun.tools.jps.Jps

3、对应的进程的标识文件在哪儿?
[root@hadoop001 hsperfdata_hadoop]# pwd
/tmp/hsperfdata_hadoop
[root@hadoop001 hsperfdata_hadoop]# ll
total 160
-rw------- 1 hadoop hadoop 32768 Mar 24 23:52 15074
-rw------- 1 hadoop hadoop 32768 Mar 24 23:52 15206
-rw------- 1 hadoop hadoop 32768 Mar 24 23:52 15384
-rw------- 1 hadoop hadoop 32768 Mar 24 23:52 1779
-rw------- 1 hadoop hadoop 32768 Mar 24 23:52 1881

//我们的进程是hadoop用户启动的,会存储在/tmp目录下,hsperfdata_hadoop(hadoop用户)

2. What is the difference between using jps under root user and hadoop user?

1、root用户下:
[root@hadoop001 hsperfdata_hadoop]# jps
15074 -- process information unavailable
1779 -- process information unavailable
8454 Jps
15206 -- process information unavailable
15384 -- process information unavailable
1881 -- process information unavailable

2、hadoop用户下:
[hadoop@hadoop001 hadoop]$ jps
15074 NameNode
1779 ResourceManager
15206 DataNode
15384 SecondaryNameNode
1881 NodeManager
8539 Jps

Summary here: process information unavailable

  • The user who belongs to the process executes the jps command to display only the relevant process information; assuming that it is the root user, all it sees is, but the display is not available; we switch to the hadoop user, jps only displays the information of the process that it started.

The true and false judgment of the jps command:

[root@hadoop001 hsperfdata_hadoop]# ps -ef | grep 1779 | grep -v grep | wc -l
1
  • Using cdh deployment in production, the components under each process are running under different users, we write shell scripts using root permissions to use jps is not easy to determine, and using the above command is the number of pids that can be confirmed Know if it really exists.
  • Or use the largest user with sudo permissions

The test moves the 15074 binary file in the / tmp / hsperfdata_hadoop directory

[root@hadoop001 hsperfdata_hadoop]# mv 15074 /home/hadoop/data/

//不影响进程和停止,移除走后,jps也查不到15074的记录了;
[hadoop@hadoop001 hadoop]$ jps
1779 ResourceManager
15206 DataNode
15384 SecondaryNameNode
1881 NodeManager
10397 Jps

2.6. OOM mechanism under Linux

  • The memory usage of a process is too high. In order to protect itself and prevent ramming, the machine kills the process that uses the most memory.

Simulation test:
1. kill -9 1779
// kill the resourcemanager process

2、jps发现1779进程已经没了,进入到这个目录,1779进程二进制文件也没了
[hadoop@hadoop001 tmp]$ cd hsperfdata_hadoop/
[hadoop@hadoop001 hsperfdata_hadoop]$ ll
total 96
-rw------- 1 hadoop hadoop 32768 Mar 25 00:16 15206
-rw------- 1 hadoop hadoop 32768 Mar 25 00:15 15384
-rw------- 1 hadoop hadoop 32768 Mar 25 00:16 1881

3、进入到/tmp目录,发现到这个文件记录的还是1779pid
[hadoop@hadoop001 tmp]$ pwd
/tmp
[hadoop@hadoop001 tmp]$ cat yarn-hadoop-resourcemanager.pid
1779

4、再次启动我们杀掉的yarn进程:
[hadoop@hadoop001 tmp]$ jps
11825 Jps
15206 DataNode
15384 SecondaryNameNode
1881 NodeManager
11501 ResourceManager
[hadoop@hadoop001 tmp]$ cat yarn-hadoop-resourcemanager.pid
11501

//发现已经启动了新的pid

The problems found in the above experiment:

  • About killing the pid, we can restart the service, but the recorded pid will be updated

Linux mechanism / tmp default storage period 1 month will automatically clear files that are not within the rules

Example: Our 2G machine, install CDH, then start the mysql service, and then hang up in an instant, because the system mechanism directly kills it, so the relevant information cannot be found in the log.

In the future, if the process hangs, first find the location of the log, locate the error, and analyze it if there is an error; if there is no error, you can go to / var / log / messages to view the system log file. cat / var / log / messages | grep oom, cat / var / log / secure | grep oom

A total of 4 points are added:
1. After configuring the environment variable which
2. cp before the production configuration file is started
3. Before killing the process of others, make sure that the service is using
the kill mechanism of linux oom to understand clearly

3. Homework for this course

1. Build a pseudo-distributed deployment of yarn
2. Run the command of mr wordcount
3. Discriminate the true and false of jps
4. Linux oom mechanism and periodic clean

Published 23 original articles · praised 0 · visits 755

Guess you like

Origin blog.csdn.net/SparkOnYarn/article/details/105085009