[Spark] Principle of Spark task execution

DAGScheduler: According to the wide dependence operation, divide the stage and encapsulate the tasks in the stage into a tastset

TaskScheduler: Receives the scheduling phase submitted by DAGScheduler, and sends the task to the node, where the task is executed.

1) Do various conversion operations for RDD, action operation will trigger job generation, build DAG graph based on RDD blood relationship, DAGScheduler is responsible for analyzing DAG graph,

2) DAGScheduler splits the DAG into interdependent stages according to the wide dependence, each stage contains several tasks, and the tasks of each stage form a task set, which is submitted to TaskScheduler for scheduling.

DAGScheduler calculates the optimal scheduling strategy, such as choosing a data localization strategy.

DAGScheduler monitors the scheduling process. If a stage fails, DAGScheduler is responsible for resubmitting this stage.

3) Each TaskScheduler only serves one SparkContext. TaskScheduler receives the task set distributed from DAGScheduler and distributes it to each Executor for execution. TaskScheduler is responsible for monitoring the task execution status and the task retry mechanism.

4) The Executor receives the task set of TaskScheduler, a task is handed over to a thread for execution, and the task information is returned to TaskScheduler when the task is completed.

ShuffleMapTask returns a MapStatus object, not the result itself; the result returned by ResultTask is described below.

Get execution results

For the calculation result returned by Executor:

1) Generate result size (∞, 1GB), the result is discarded directly, set by spark.driver.maxResultSize.

2) The size of the generated result [1GB, 128MB-200KB]; the generated size is equal to 128MB-200KB, the result is stored in the BlockManager with taskId as the number, and then the number is sent to the Driver terminal.

3) The size of the generated result (128MB-200KB, 0) is directly sent to the driver terminal through Netty.

 

Divide the scheduling stage:

 

 


Traverse each dependent RDD based on breadth first

1) When submitted to run in SparkContext, hadleJobSubmitted in DAGScheduler is processed, the last RDD is found in this method, and the getParentStage method is called.

2) The getParentStage method determines whether there is a shuffle operation on the rddG dependent tree species. It is found that the join operation is a shuffle operation, and the RDDs for obtaining the operation are rddB and rddF.

3) Use the getAncesterShuffleDependencies method to traverse from rddB forward and find that there is no shuffle operation on the dependency branch, that is, there is no wide dependency. Call the newOrUsedShuffleStage method to generate the scheduling stage ShuffleMapStage0.

4) Use the getAncesterShuffleDependencies method to traverse forward from rddF and find that the dependency branch has a wide dependency operation groupBy to divide rddD, rddC is ShuffleMapStage1, rddE, rddF is ShuffleMapStage2.

5) Finally, ResultStage3 of rddG is generated

Commit schedule

The last scheduling stage of the example, ReusltStage3, was obtained in the handleJobSubmitted method, and the scheduling stage was submitted through the submitStage method.
In the submitStage method, first create a job instance, and then determine whether there is a parent scheduling stage in the scheduling stage. Since ReusltStage3 has two parent scheduling stages, ShuffleMapStage0 and ShuffleMapStage2, it cannot immediately submit the scheduling stage to run, and put ReusltStage3 into the waiting queue waitingStages.
By calling the submitStage method recursively, you can know that ShuffleMapStage0 does not have a parent scheduling stage, and ShuffleMapStage2 has a parent scheduling stage, ShuffleMapStage1, so that ShuffleMapStage2 is added to the waitingStages list, and ShuffleMapStage0 and ShuffleMapStage1 are scheduled for the first time using the submitMissingTasks method .
The Executor task sends a message when the task execution is completed, and when the scheduler such as DAGScheduler updates the status, it checks the operation of the scheduling phase. If there are tasks that fail to execute, resubmit the scheduling phase; if all tasks are completed, continue to submit the scheduling phase to run. Since the parent scheduling stage of ReusltStage3 is not all completed, the second scheduling stage only submits ShuffleMapStage2 to run.
When ShuffleMapStage2 finishes running, the parent scheduling stage of ResultStage3 is all completed at this time, and the scheduling operation is submitted to complete.
å¨è¿éæå ¥ å¾çæè¿ °

Five submission tasks

  1. In the submission stage, the first call is ShuffleMapStage0 [ShuffleMapStage0 will be split into ShuffleMapStage (0,0), ShuffleMapStage (0,1), and ShuffleMapStage1 will also split into two tasks] and ShuffleMapStage1, assuming each Stages have only two partitions, ShuffleMapStage0 is TaskSet0, ShuffleMapStage1 is TaskSet1, and each TaskSet has two tasks to execute.
  2. The TaskScheduler receives the task sets TaskSet0 and TaskSet1 sent, constructs TaskSetManager0 and TaskSetManager1 in the submitTasks method, and puts them both in the system's scheduling pool, and performs scheduling according to the scheduling algorithm set by the system (FIFO or FAIR)
  3. In the ResourceOffers method of TaskSchedulerImpl, the resources are allocated according to the principle of proximity. Each task is assigned running code, data sharding, and processing resources. Use the launchTasks method of CoarseGrainedSchedulerBackend to send the task to the CoarseGrainedExecutorBackend on the Worker node to call its Executor to execute the task
  4. ShuffleMapStage0, ShuffleMapStage1 are completed, ShuffleMapStage2, ResultStage3 will perform steps 1-3, but ReduceStage3 generates ResultTask

Insert picture description here

Six tasks

When CoarseGrainedExecutorBackend receives the LaunchTask message, it will call the Executor launchTask method for processing. In the launchTask method of Executor, initialize a TaskRunner to encapsulate the task, which is used to manage the details of the task runtime, and then put the TaskRunner object into the ThreadPool to execute.

For ShuffleMapTask, its calculation result will be written to BlockManager, and finally returned to DAGScheduler is a MapStatus. This object manages the related storage information of the ShuffleMapTask operation results stored in the BlockManager, rather than the calculation results themselves, and these storage information will become the basis for the input data needed for the next stage of the task.

 

 

 

Published 61 original articles · won praise 2 · Views 7302

Guess you like

Origin blog.csdn.net/hebaojing/article/details/104052360