Spark learning from 0 to 1 (3)-Apache Spark submission parameters and resource scheduling source code analysis

1. Spark-Submit submission parameters

parameter name Parameter value
–master spark://host:port;mesos://host:port,yarn;yarn-cluster,yarn-clint;local
–deploy-mode The place where the Driver program runs, client or cluster, the default is client
–class Main class name
–jars Comma separates local jars, third-party packages that Driver and executor depend on
–files A comma-separated list of files will be placed in each executor working directory
–conf Spark configuration properties
–driver-memory The driver program uses the memory size (for example: 520M, 2G), the default is 1024M
–executor-memory The memory size of each executor (for example: 2G), the default is 1G
–driver-cores The driver program uses the number of cores (default 1), which is limited to the cluster mode of spark standalone
–supervise Whether to restart the driver after failure. Limited to clusters of spark standalone or Mesos
–Total-executor-colors The total number of cores used by the executor is limited to SparkStandalone and Spark on Mesos mode
–Executor-colors The number of cores used by each executor, spark on yarn defaults to 1, and standalone defaults to all available cores on the worker.
Yarn-only
–driver-cores The core used by the driver is only in cluster mode, and the default is 1.
–queue Specify the name of the resource queue, default: default
–num-executors The total number of executors started, 2 by default.

2. Source code analysis of resource scheduling

2.1 Simple diagram of resource request

Insert picture description here

  1. Execute ./start-all.sh to start the cluster.
  2. After the Master node is started, it communicates with the Worker node through ssh.
  3. Worker nodes are registered to the Master node in reverse.
  4. The Master node encapsulates the information of the Worker node into HashSet[WorkerInfo]a workerscollection of types .
  5. The client through the spark-submitmission command.
  6. Request to start a Driver from the Master node.
  7. The Master node encapsulates a ArrBuffer[DriverInfo]type waitingDriver(Driver waiting to be executed).
  8. The Master node selects a node to start the Driver.
  9. The Driver applies to the Master for resources for the current Application.
  10. The Master node encapsulates Application resources to Driver.
  11. The driver sends tasks to the Worker node for execution.

2.2 Resource scheduling master path

CLASS="org.apache.spark.deploy.master.Master"

2.3 Submit application, submit path

org.apache.spark.deploy.SparkSubmit

2.4 Summary

  1. Executor is started in a distributed manner in the cluster, which is conducive to the localization of task calculation data.
  2. By default (no --executor-coresoptions are set when submitting a task ), each Worker starts an Executor for the current Application. This Executor will use all the cores and 1G memory of this Worker.
  3. If you want to start multiple Executors on the Worker, you need to add --executor-coresthis option when submitting the Application .
  4. There is no setting by default --total-executor-cores, an Application will use all cores in the Spark cluster.

2.5 Conclusion presentation

Use Spark-submit to submit a task demonstration.

  1. By default, each Worker starts an Executor for the current Application. This Executor uses all the cores and 1G memory in the cluster.

    ./spark-submit 
    --master spark://masterNode:7077 
    --class org.apache.spark.examples.SparkPi
    ../lib/spark-exeamples-1.6.0-hadoop2.6.0.jar
    1000
    
  2. Start multiple Executors on the Worker, and set --executor-coresparameters to specify the number of cores used by the executor.

    ./spark-submit
    --master spark://masterNode:7077
    --executor-cores 1
    --class org.apache.spark.examples.SparkPi
    ../lib/spark-exeamples-1.6.0-hadoop2.6.0.jar
    1000
    
  3. The core is started with insufficient memory. When Spark starts, not only the core configuration parameters, but also whether the configured core memory is sufficient.

    ./spark-submit
    --master spark://masterNode:7077
    --executor-cores 1
    --executor-memory 3g
    --class org.apache.spark.examples.SparkPi
    ../lib/spark-exeamples-1.6.0-hadoop2.6.0.jar
    1000
    
  4. --total-executor-coresHow many cores are used in the cluster.

    Note: A process cannot be started by multiple nodes in the cluster.

    ./spark-submit
    --master spark://masterNode:7077
    --executor-cores 1
    --executor-memory 3g
    --total-executor-cores 3
    --class org.apache.spark.examples.SparkPi
    ../lib/spark-exeamples-1.6.0-hadoop2.6.0.jar
    1000
    

Guess you like

Origin blog.csdn.net/dwjf321/article/details/109047999