Spark learning (6) High availability cluster configuration HistoryServer

In order to facilitate log management and record the running results of Spark programs, historyServer configuration is required.

1. Conventional single node configuration

first step:

cp spark-defaults.conf.template  spark-defaults.conf  
Add the following to the file:  
 spark.eventLog.enabled           true  
 spark.eventLog.dir               hdfs://hadoop06:9000/sparklog  

Step 2:

Add the following to the spark-evn.sh file:  
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://hadoop06:9000/sparklog"  

third step:

Before starting the HistorServer service, the hdfs://hadoop06:9000/sparklog directory should be created in advance  
2. HA high availability configuration

first step:

cp spark-defaults.conf.template  spark-defaults.conf  
Add the following to the file:  
 spark.eventLog.enabled           true  
 spark.eventLog.dir               hdfs://myha01/sparklog  
Where myhao1 is the configuration name of the nameservice of dfs-site.xml

Step 2:

Add the following to the spark-evn.sh file:  
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://myha01/sparklo  
g"

third step:

Before starting the HistorServer service, the hdfs://hadoop06:9000/sparklog directory should be created in advance  

use:

Before starting, you need to start zookeeper, HDFS, and YARN.

Start start-history-server.sh on any node (take hadoop03 as an example), and you can view it in the UI of the corresponding node: http://hadoop03:18080


The interface shown in the figure, and displayed after running the Spark task.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324816107&siteId=291194637