Apache Spark arguments detailed

Spark arguments detailed

spark-shell

Spark spark-shell is carrying interactive Shell procedures, user-friendly interactive programming,
the user may be used to write programs scala spark at the command line

Examples

spark-shell can carry parameters

spark-shell --master local [N] represents the number N to run N threads local analog current task

spark-shell --master local [*] * indicates the use of all available resources on the current machine

The default parameters that do not carry -master local [*]

spark-shell --master spark: // hadoop01: 7077, hadoop02: 7077 represents a running cluster

spark-submit

spark-submit command to submit jar contracted spark cluster / YARN

spark-shell interactive programming really easy to learn we tested, but in practice we generally use IDEA to develop applications Spark Spark labeled jar package to the cluster / YARN execution.

spark-submit the command is commonly used when we developed !!!

Example: Calculation π

cd /export/servers/spark
/export/servers/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://node01:7077  \
--executor-memory 1g \
--total-executor-cores 2 \
/export/servers/spark/examples/jars/spark-examples_2.11-2.2.0.jar \
10

Master parameters form

  • locally in a local worker thread to run (e.g. non-parallel situation).
  • local [N] K worker thread to the local (ideally, N is set to the number of CPU cores your machine).
  • local [*] running locally on the same number of threads to the core of this machine.
  • spark: // HOST: PORT connect to the specified Spark standalone cluster master.
    Port is your master cluster configuration port, default is 7077.
  • mesos: // HOST: PORT connect to the specified Port is your Mesos cluster configuration mesos port, default 5050..
    Or use ZK, the format mesos: // zk: // ...
  • yarn-client to client connection mode to the YARN cluster. HADOOP_CONF_DIR variable positions based on clustered found.
  • yarn-cluster connection mode to the cluster to cluster YARN. HADOOP_CONF_DIR variable positions based on clustered found.

Other examples of parameters

  • -master spark: // node01: 7077 specifies the Master Address
  • -name name "appName" designated running
  • Class main method -class program resides
  • jar package -jars xx.jar use of additional program
  • -driver-memory 512m Driver needs to run memory, default 1g
  • -executor-memory 2g specify each memory available executor 2g, default 1g
  • -executor-cores 1 each of a specified number of cores available executor
  • -total-executor-cores 2 specifies the entire cluster to run the task of auditing the use of cup to 2
  • The column specified task -queue default
  • -deploy-mode Specifies the mode (client / cluster)

note:

If insufficient worker node memory, so when you start spark-submit, it can not exceed the available memory capacity of the worker is assigned executor.
If -executor-cores more than the cores available to each worker, the task is in a wait state.
If -total-executor-cores even exceed the available cores, using all of the default. When the cluster after the release of other resources, will be used by the program.
If memory or a single executor of the cores is insufficient, the error will start spark-submit the task in a wait state, it can not be executed properly.

Published 295 original articles · won praise 184 · views 10000 +

Guess you like

Origin blog.csdn.net/weixin_42072754/article/details/105285280