Spark arguments detailed
spark-shell
Spark spark-shell is carrying interactive Shell procedures, user-friendly interactive programming,
the user may be used to write programs scala spark at the command line
Examples
spark-shell can carry parameters
spark-shell --master local [N] represents the number N to run N threads local analog current task
spark-shell --master local [*] * indicates the use of all available resources on the current machine
The default parameters that do not carry -master local [*]
spark-shell --master spark: // hadoop01: 7077, hadoop02: 7077 represents a running cluster
spark-submit
spark-submit command to submit jar contracted spark cluster / YARN
spark-shell interactive programming really easy to learn we tested, but in practice we generally use IDEA to develop applications Spark Spark labeled jar package to the cluster / YARN execution.
spark-submit the command is commonly used when we developed !!!
Example: Calculation π
cd /export/servers/spark
/export/servers/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://node01:7077 \
--executor-memory 1g \
--total-executor-cores 2 \
/export/servers/spark/examples/jars/spark-examples_2.11-2.2.0.jar \
10
Master parameters form
- locally in a local worker thread to run (e.g. non-parallel situation).
- local [N] K worker thread to the local (ideally, N is set to the number of CPU cores your machine).
- local [*] running locally on the same number of threads to the core of this machine.
- spark: // HOST: PORT connect to the specified Spark standalone cluster master.
Port is your master cluster configuration port, default is 7077. - mesos: // HOST: PORT connect to the specified Port is your Mesos cluster configuration mesos port, default 5050..
Or use ZK, the format mesos: // zk: // ... - yarn-client to client connection mode to the YARN cluster. HADOOP_CONF_DIR variable positions based on clustered found.
- yarn-cluster connection mode to the cluster to cluster YARN. HADOOP_CONF_DIR variable positions based on clustered found.
Other examples of parameters
- -master spark: // node01: 7077 specifies the Master Address
- -name name "appName" designated running
- Class main method -class program resides
- jar package -jars xx.jar use of additional program
- -driver-memory 512m Driver needs to run memory, default 1g
- -executor-memory 2g specify each memory available executor 2g, default 1g
- -executor-cores 1 each of a specified number of cores available executor
- -total-executor-cores 2 specifies the entire cluster to run the task of auditing the use of cup to 2
- The column specified task -queue default
- -deploy-mode Specifies the mode (client / cluster)
note:
If insufficient worker node memory, so when you start spark-submit, it can not exceed the available memory capacity of the worker is assigned executor.
If -executor-cores more than the cores available to each worker, the task is in a wait state.
If -total-executor-cores even exceed the available cores, using all of the default. When the cluster after the release of other resources, will be used by the program.
If memory or a single executor of the cores is insufficient, the error will start spark-submit the task in a wait state, it can not be executed properly.