Teach you how to easily configure Spark history log server JobHistoryServer?

        By default, the Spark program has finished running after the window is closed, you can not see the log of the Web UI (4040), but by HistoryServer can provide a service by reading the log file so that we can run at the end of the program, still be able to view during operation. This blog, bloggers will bring you a detailed process configuration JobHistoryServer on the Spark for everyone.

Here Insert Picture Description


1. Go to the conf file in the installation directory folder spark
cd /export/servers/spark/conf

2. Modify the profile name
vim spark-defaults.conf

spark.eventLog.enabled true
spark.eventLog.dir hdfs://node01:8020/sparklog

Note: The directory on HDFS need to be present in advance

hadoop fs -mkdir -p /sparklog

3. Modify spark-env.sh file
vim spark-env.sh

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000
-Dspark.history.retainedApplications=3 
-Dspark.history.fs.logDirectory=hdfs://node01:8020/sparklog"

Parameter Description:

        spark.eventLog.dir : the Application during operation all information is recorded in the property path specified;

        = 4000 spark.history.ui.port the WEBUI access port number 4000

        spark.history.fs.logDirectory = hdfs: // node01: 8020 / sparklog After you configure the property when start-history-server.sh do not need to explicitly specify the path, Spark History Server page to show only the specified path information under

        spark.history.retainedApplications = 30 specifies the Application save the history of the number, if more than this value, the old application information will be deleted, this is the number of applications in memory, rather than the number of applications displayed on the page.

4. synchronization profile

Here you can use scp command, xsync custom command can also be used, how to use xsync refer to <and xsync (synchronous file directory) with your writing linux super useful script --xcall (synchronous command execution)>

xsync spark-defaults.conf
xsync spark-env.sh

5. Restart Cluster
/export/servers/spark/sbin/stop-all.sh /export/servers/spark/sbin/start-all.sh

6. Start the log server on the master
/export/servers/spark/sbin/start-history-server.sh

7. Run a PI calculation procedure of Example

bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--executor-memory 1G \
--total-executor-cores 2 \
/export/servers/spark/examples/jars/spark-examples_2.11-2.2.0.jar \
100

Until after the run is completed, the input from the browserhttp://node01:4000/
Here Insert Picture Description

  • If you experience problems Hadoop HDFS's write permissions:

org.apache.hadoop.security.AccessControlException

solution:

        Adding follows hdfs-site.xml, the closure authority verification

<property>
        <name>dfs.permissions</name>
        <value>false</value>
</property>

        The share on here, benefiting small partners or friends are interested in big data technology thumbs remember little attention bacteria yo ~

                
                

Here Insert Picture Description

Published 202 original articles · won praise 1613 · Views 410,000 +

Guess you like

Origin blog.csdn.net/weixin_44318830/article/details/104445247