1. Spark-Submit submission parameters
parameter name | Parameter value |
---|---|
–master | spark://host:port;mesos://host:port,yarn;yarn-cluster,yarn-clint;local |
–deploy-mode | The place where the Driver program runs, client or cluster, the default is client |
–class | Main class name |
–jars | Comma separates local jars, third-party packages that Driver and executor depend on |
–files | A comma-separated list of files will be placed in each executor working directory |
–conf | Spark configuration properties |
–driver-memory | The driver program uses the memory size (for example: 520M, 2G), the default is 1024M |
–executor-memory | The memory size of each executor (for example: 2G), the default is 1G |
–driver-cores | The driver program uses the number of cores (default 1), which is limited to the cluster mode of spark standalone |
–supervise | Whether to restart the driver after failure. Limited to clusters of spark standalone or Mesos |
–Total-executor-colors | The total number of cores used by the executor is limited to SparkStandalone and Spark on Mesos mode |
–Executor-colors | The number of cores used by each executor, spark on yarn defaults to 1, and standalone defaults to all available cores on the worker. |
Yarn-only | |
–driver-cores | The core used by the driver is only in cluster mode, and the default is 1. |
–queue | Specify the name of the resource queue, default: default |
–num-executors | The total number of executors started, 2 by default. |
2. Source code analysis of resource scheduling
2.1 Simple diagram of resource request
- Execute ./start-all.sh to start the cluster.
- After the Master node is started, it communicates with the Worker node through ssh.
- Worker nodes are registered to the Master node in reverse.
- The Master node encapsulates the information of the Worker node into
HashSet[WorkerInfo]
aworkers
collection of types . - The client through the
spark-submit
mission command. - Request to start a Driver from the Master node.
- The Master node encapsulates a
ArrBuffer[DriverInfo]
typewaitingDriver
(Driver waiting to be executed). - The Master node selects a node to start the Driver.
- The Driver applies to the Master for resources for the current Application.
- The Master node encapsulates Application resources to Driver.
- The driver sends tasks to the Worker node for execution.
2.2 Resource scheduling master path
CLASS="org.apache.spark.deploy.master.Master"
2.3 Submit application, submit path
org.apache.spark.deploy.SparkSubmit
2.4 Summary
- Executor is started in a distributed manner in the cluster, which is conducive to the localization of task calculation data.
- By default (no
--executor-cores
options are set when submitting a task ), each Worker starts an Executor for the current Application. This Executor will use all the cores and 1G memory of this Worker. - If you want to start multiple Executors on the Worker, you need to add
--executor-cores
this option when submitting the Application . - There is no setting by default
--total-executor-cores
, an Application will use all cores in the Spark cluster.
2.5 Conclusion presentation
Use Spark-submit to submit a task demonstration.
-
By default, each Worker starts an Executor for the current Application. This Executor uses all the cores and 1G memory in the cluster.
./spark-submit --master spark://masterNode:7077 --class org.apache.spark.examples.SparkPi ../lib/spark-exeamples-1.6.0-hadoop2.6.0.jar 1000
-
Start multiple Executors on the Worker, and set
--executor-cores
parameters to specify the number of cores used by the executor../spark-submit --master spark://masterNode:7077 --executor-cores 1 --class org.apache.spark.examples.SparkPi ../lib/spark-exeamples-1.6.0-hadoop2.6.0.jar 1000
-
The core is started with insufficient memory. When Spark starts, not only the core configuration parameters, but also whether the configured core memory is sufficient.
./spark-submit --master spark://masterNode:7077 --executor-cores 1 --executor-memory 3g --class org.apache.spark.examples.SparkPi ../lib/spark-exeamples-1.6.0-hadoop2.6.0.jar 1000
-
--total-executor-cores
How many cores are used in the cluster.Note: A process cannot be started by multiple nodes in the cluster.
./spark-submit --master spark://masterNode:7077 --executor-cores 1 --executor-memory 3g --total-executor-cores 3 --class org.apache.spark.examples.SparkPi ../lib/spark-exeamples-1.6.0-hadoop2.6.0.jar 1000