【Spark operation mode】

When using spark-submit to submit Spark tasks, there are generally the following parameters:

 

./bin/spark-submit \

  --class <main-class> \

  --master <master-url> \

  --deploy-mode <deploy-mode> \

  --conf <key>=<value> \

  ... # other options

  <application-jar> \

  [application-arguments]

The deploy-mode is for the cluster, which refers to the mode of cluster deployment. There are two ways according to where the Driver main process is placed: client and cluster, the default is client

 



 

 

1. client mode

The Master node is the node you use to submit tasks, that is, the node where the bin/spark-submit command is executed; the Driver process is the Main function that starts executing your Spark program. Although the Driver process I drew here is on the Master node, pay attention The Driver process is not necessarily on the Master node, it can be on any node; the Worker is the Slave node, and the Executor process must be on the Worker node for actual calculations

1. In client mode, the Driver process runs on the Master node, not on the Worker node, so compared to the Worker cluster participating in the actual calculation, the Driver is equivalent to a third-party "client"

2. Because the Driver process is not on the Worker node, it is independent and will not consume the resources of the Worker cluster

3. In client mode, the Master and Worker nodes must be in the same local area network, because the Drive needs to communicate with the Executorr. For example, the Drive needs to distribute the Jar package to the Executor through Netty HTTP, and the Driver needs to assign tasks to the Executor, etc.

4. There is no supervised restart mechanism in client mode. If the Driver process hangs, an additional program restart is required.

 



 

Two, cluster mode

1. The Driver program is on a node in the worker cluster, not the Master node, but this node is designated by the Master

2. The Driver program occupies the resources of the Worker

3. In cluster mode, the Master can use –supervise to monitor the Driver. If the Driver hangs, it can automatically restart

4. In cluster mode, the Master node and the Worker node are generally not in the same local area network, so the Jar package cannot be distributed to each Worker, so the cluster mode requires that the Jar package must be placed in the corresponding directory of each Worker in advance.

 

 

Should I choose client mode or cluster mode?

Generally speaking, if the node that submits the task (i.e. the Master) and the worker cluster are in the same network, the client mode is more suitable at this time.

If the node submitting the task and the worker cluster are far apart, cluster mode will be used to minimize the network delay between the driver and the executor

 

Spark operating mode: cluster and client

 

When run SparkSubmit --class [mainClass], SparkSubmit will call a childMainClass which is

1. client mode, childMainClass = mainClass

2. standalone cluster mde, childMainClass = org.apache.spark.deploy.Client

3. yarn cluster mode, childMainClass = org.apache.spark.deploy.yarn.Client

The childMainClass is a wrapper of mainClass. The childMainClass will be called in SparkSubmit, and if cluster mode, the childMainClass will talk to the the cluster and launch a process on one woker to run the mainClass.
 
ps. use "spark-submit -v" to print debug infos.
 
Yarn client : spark-submit -v --class "org.apache.spark.examples.JavaWordCount" --master yarn JavaWordCount.jar
childMainclass: org.apache.spark.examples.JavaWordCount
Yarn cluster : spark-submit -v --class "org.apache.spark.examples.JavaWordCount" --master yarn-cluster JavaWordCount.jar
childMainclass: org.apache.spark.deploy.yarn.Client
 
Standalone clien t: spark-submit -v --class "org.apache.spark.examples.JavaWordCount" --master spark://aa01:7077 JavaWordCount.jar
childMainclass: org.apache.spark.examples.JavaWordCount
Stanalone cluster : spark-submit -v --class "org.apache.spark.examples.JavaWordCount" --master spark://aa01:7077 --deploy-mode cluster JavaWordCount.jar
childMainclass: org.apache.spark.deploy.rest.RestSubmissionClient (if rest, else org.apache.spark.deploy.Client)
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326166635&siteId=291194637