Spark of Monitor


When run Spark job, or after midnight to finish the next day, you often need to see the information when it is running, look at the operation, as well as excellent tone based on that information, then for monitoring operating Spark job becomes very important.
There are several ways to monitor Spark applications, commonly used Web UI, history server, REST API, of course, there are other less common, you can refer to on a blog.

Monitored by Spark Web UI

Each SparkContext will start some very useful information to demonstrate the application of a relevant web UI, the default port in 4040.
You can see a lot of information in the Web UI interface, such as Status, Host, Duration, gc, launch time, which is often observed in the production going. But when the task success or failure hung up finish, there is no WebUI, and can not see any information , whether it is running locally, or in the YARN above are the same.
For example, access: http://hadoop001:4040, see the following message:
Here Insert Picture Description
Here Insert Picture Description
If you want to sc.stop (), the Spark is after your application stopped, but also to access the web UI, then they would have before starting the application, the spark.eventLog. the enabled parameter set to true.

Monitored by Spark HistoryServer UI

Spark default at the end of the application, but also by Spark HistoryServer event by event information has been recorded or preserved, again on the UI interface this information to render, rendered UI and Web UI presented above show is exactly the same of.
So how to use HistoryServer? See the following steps

① first step in configuring spark-defaults.conf
spark.eventLog.enabled true    //把记录日志事件开启
spark.eventLog.dir hdfs://namenode/shared/spark-logs  把记录的日志放在这个目录
[hadoop@hadoop001 conf]$ pwd   //进入conf目录
/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/conf

//编辑这个配置文件,修改下面两个参数如下(其它参数不动)
 //9000这个端口号要注意一下,有的可能是8020
[hadoop@hadoop001 conf]$ vi spark-defaults.conf 
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://hadoop001:9000/g6_directory

[hadoop@hadoop001 conf]$ hdfs dfs -mkdir /g6_directory   //还要创建这个目录
② The second step configuration spark-env.sh

Some environment variables need to be configured, such as SPARK_HISTORY_OPTS (of course there are other parameters, please refer to previous blog):

SPARK_HISTORY_OPTS   //以spark.history.*这个开头的都配置在这个地方配置

Then. * Configuration at the beginning of this spark.history what to do? For example:

//这些参数都有默认值的
spark.history.fs.logDirectory    //从哪里读取日志,上面的spark.eventLog.dir 表示记到哪里去,记在哪里就要从哪里去读取这个日志
spark.history.fs.update.interval	
spark.history.fs.cleaner.enabled	
spark.history.fs.cleaner.interval

How to configure it? In this configuration spark-env.sh configuration file, for example:

 //编辑这个文件,配置如下,其它参数类似:
 //9000这个端口号要注意一下,有的可能是8020
[hadoop@hadoop001 conf]$ vi spark-env.sh 
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop001:9000/g6_directory"
③ The third step is to start

Once configured above, start HistoryServer:

[hadoop@hadoop001 sbin]$ pwd
/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/sbin

[hadoop@hadoop001 sbin]$ ./start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.history.HistoryServer-1-hadoop001.out
[hadoop@hadoop001 sbin]$ 

//jps看一下  多了个HistoryServer进程
[hadoop@hadoop001 sbin]$ jps
10544 ResourceManager
10112 NameNode
10641 NodeManager
16633 HistoryServer
10234 DataNode
10395 SecondaryNameNode
16684 Jps

//看一下日志说明启动成功了:
19/06/22 08:04:58 INFO Utils: Successfully started service on port 18080.
19/06/22 08:04:58 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://hadoop001:18080

//ps -ef看一下,可以看到它是一个java进程 还有其它参数 包括我们自己设置的
[hadoop@hadoop001 conf]$ ps -ef |grep HistoryServer
hadoop   16633     1  0 08:04 pts/0    00:00:22 /usr/java/jdk1.8.0_45/bin/java -cp /home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/conf/:/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/jars/*:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/ -Dspark.history.fs.logDirectory=hdfs://hadoop001:9000/g6_directory -Xmx1g org.apache.spark.deploy.history.HistoryServer
hadoop   21772 21442  0 09:00 pts/1    00:00:00 grep HistoryServer
④ The fourth step access

Through the web site http://hadoop001:18080/to access
Here Insert Picture Description
may look unfinished and (completed and incomplete) application is complete, click the Show incomplete applications,
Here Insert Picture Description
anything above page did not, and now look to run a job:

scala> sc.parallelize(List(1,2,3,4,5)).count
res0: Long = 5

Look again at the page, with
Here Insert Picture Description
(an application is made to represent the Application ID)
Click: local-1561162465170, jump to this page: http://hadoop001:18080/history/local-1561162465170/jobs/, and normal hadoop001: 4040 pages, like
Here Insert Picture Description
in all of these web interface by clicking on the "table head "to sort these forms. This makes the task of identification is running slow, and so it is easy to distinguish data skew.
A method Spark job completion signals is completed explicit stop Spark Context (sc.stop ()).

⑤ stop HistoryServer
//不需要的话可以这样停掉
[hadoop@hadoop001 sbin]$ ./stop-history-server.sh    
stopping org.apache.spark.deploy.history.HistoryServer

This directory can go to look at its log;

[hadoop@hadoop001 sbin]$ hdfs dfs -ls /g6_directory
Found 1 items
-rwxrwx---   1 hadoop supergroup        213 2019-06-22 08:14 /g6_directory/local-1561162465170.inprogress

I can see it is a suffix application id, if there are multiple applications, then there will be more similar document, but if you look at it, will find it a json format. Json format the following information is accessible from here, and put this information into the page only.

Be monitored via REST API

In this way by sending a request to the REST, it returns JSON format information, which is JSON to interact. And above all to monitor the way the Web UI pages defined. And by monitoring that the REST API for developers, you can go to create their own new visualization and monitoring tools defined monitoring program they want.
Running programs, previously run the program, you can use the REST API to monitor. For the history server, they can usually be http://hadoop001:18080/api/v1accessed for application is running, you can http://hadoop001:4040/api/v1access.

An application is represented by the Application ID to:
Here Insert Picture Description
obtain information about all of the application of:
http://hadoop001:18080/api/v1/applications
Here Insert Picture Description
obtaining or completing the application being run

http://hadoop001:18080/api/v1/applications/?status=running
http://hadoop001:18080/api/v1/applications/?status=completed

Gets a specific application:

http://hadoop001:18080/api/v1/applications/local-1561162465170

Get some of the application the following jobs:

http://hadoop001:18080/api/v1/applications/local-1561162465170/jobs

Here Insert Picture Description
View a job in an application:

http://hadoop001:18080/api/v1/applications/local-1561162465170/jobs/1

Here Insert Picture Description
Gets only one application
http://hadoop001:18080/api/v1/applications/?limit=1

There are other useful parameters can be set, such as Executor Task Metrics of many parameters, REST API to expose the granularity of task execution Task Metrics of value, Task Metrics which is collected by the Spark executors. These indicators can be used for troubleshooting performance and workload description.
Tell me what network parameters can be more:
http://spark.apache.org/docs/latest/monitoring.html#rest-api

This approach is used for developers to customize, and front-end with the needs, what kind of customization required, each company requires UI interface is not the same, after a good UI design, to the front-end, front-end only need to tell you how to call this interface it.

Guess you like

Origin blog.csdn.net/liweihope/article/details/92802140