spark - parameters for startup optimization spark.yarn.archive and spark.yarn.jars

Test the function of spark.yarn.archive and spark.yarn.jars parameters. The official website describes these two parameters as follows:



 

 

(1) spark-defaults.conf configure the following two parameters

spark.yarn.archive    hdfs://hd1:9000/archive/spark-libs.jar

spark.yarn.jars    hdfs://hd1:9000/spark_jars/*



 

view web ui

 

 

(2) Only configure spark.yarn.jars hdfs://hd1:9000/spark_jars/*


 

Check the web ui (the displayed parameter value is empty), but from the above log, it can be seen that there is really no upload, and there is only a conf zip package under .staging



 

 

(3) Only configure spark.yarn.archive hdfs://hd1:9000/archive/spark-libs.jar


 

view web ui



 

 

 

(4) Both parameters are not set, zip compression will be reported to jars and conf


 

 

Summarize:

  • The spark.yarn.archive parameter and the spark.yarn.jars parameter are configured at the same time, only spark.yarn.archive will take effect
  • Configure the spark.yarn.archive parameter or the spark.yarn.jars parameter, the jars in the spark directory will not be uploaded to the application temporary directory
  • If only spark.yarn.jars is configured, the spark.yarn.jars value displayed on the web page is empty (not sure if it is a bug)
  • If you do not configure the spark.yarn.archive parameter or the spark.yarn.jars parameter, the conf and jars will be uploaded to the application temporary directory at the same time. Spark2.x is uploaded in the form of a zip archive.
  • By default, spark2.x starts spark-sql-master yarn, and the startup time is about 1~2s slower than the case of configuring one of the parameters without configuring the spark.yarn.archive parameter or the spark.yarn.jars parameter.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326034940&siteId=291194637