In order to facilitate log management and record the running results of Spark programs, historyServer configuration is required.
1. Conventional single node configuration
first step:
cp spark-defaults.conf.template spark-defaults.conf Add the following to the file: spark.eventLog.enabled true spark.eventLog.dir hdfs://hadoop06:9000/sparklog
Step 2:
Add the following to the spark-evn.sh file: export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://hadoop06:9000/sparklog"
third step:
Before starting the HistorServer service, the hdfs://hadoop06:9000/sparklog directory should be created in advance
2. HA high availability configuration
first step:
cp spark-defaults.conf.template spark-defaults.conf Add the following to the file: spark.eventLog.enabled true spark.eventLog.dir hdfs://myha01/sparklog Where myhao1 is the configuration name of the nameservice of dfs-site.xml
Step 2:
Add the following to the spark-evn.sh file: export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://myha01/sparklo g"
third step:
Before starting the HistorServer service, the hdfs://hadoop06:9000/sparklog directory should be created in advance
use:
Before starting, you need to start zookeeper, HDFS, and YARN.
Start start-history-server.sh on any node (take hadoop03 as an example), and you can view it in the UI of the corresponding node: http://hadoop03:18080
The interface shown in the figure, and displayed after running the Spark task.