Spark提交任务参数详解

先看官网提供的两个提交例子（只看集群模式）

# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

选项参数解释

--master
 MASTER_URL, 可以是spark://host:port, 
 mesos://host:port, yarn,  yarn-cluster,yarn-client, local
 （**集群的住URL**）
--deploy-mode
DEPLOY_MODE, Driver程序运行的地方，client或者cluster,默认是client。
--class
CLASS_NAME, 主类名称，含包名（**应用程序的入口**）
--jars
逗号分隔的本地JARS, Driver和executor依赖的第三方jar包
--files
用逗号隔开的文件列表,会放置在每个executor工作目录中
--conf
spark的配置属性
--driver-memory
Driver程序使用内存大小（例如：1000M，5G），默认1024M
--executor-memory
每个executor内存大小（如：1000M，2G），默认1G
上面的数字1000：是application-arguments，即传递给主类的main方法的参数

Spark standalone with cluster deploy mode only

--driver-cores
Driver程序的使用core个数（默认为1），仅限于Spark standalone模式

Spark standalone or Mesos with cluster deploy mode only

--supervise
失败后是否重启Driver，仅限于Spark  alone或者Mesos模式

Spark standalone and Mesos only

--total-executor-cores
executor使用的总核数，仅限于SparkStandalone、Spark on Mesos模式

Spark standalone and YARN only

--executor-cores
每个executor使用的core数，Spark on Yarn默认为1，standalone默认为worker上所有可用的core。

YARN-only

--driver-cores
driver使用的core,仅在cluster模式下，默认为1。
--queue 
QUEUE_NAME  指定资源队列的名称,默认：default
--num-executors
一共启动的executor数量，默认是2个。

下一篇讲解究竟如何选择这些参数，以及参数调优

Spark提交任务参数详解

Spark standalone with cluster deploy mode only

Spark standalone or Mesos with cluster deploy mode only

Spark standalone and Mesos only

Spark standalone and YARN only

YARN-only

猜你喜欢