Spark kernel analysis-node startup 4 (six)

1. Master node starts

Master is a specific example of Endpoint. Below we introduce the Master startup and related work after the OnStart command.

1.1 Script Overview

Here is an example:

/opt/jdk1.7.0_79/bin/java
-cp /opt/spark-2.1.0/conf/:/opt/spark-2.1.0/jars/*:/opt/hadoop-2.6.4/etc/hadoop/
-Xmx1g
-XX:MaxPermSize=256m
org.apache.spark.deploy.master.Master
--host zqh
--port 7077

1.2 Startup process

The startup process of Master is as follows:
Insert image description here
1) SparkConf: Load system properties whose key starts with spark. (Utils.getSystemProperties)
2) MasterArguments:
a) Parse Master startup parameters (–ip -i --host -h --port -p --webui-port --properties-file)
b) Store the configuration starting with spark. in –properties-file (defaults to conf/spark-defaults.conf without configuration) into SparkConf
3) The internal processing in NettyRpcEnv follows the unification of RpcEndpoint Processing, I won’t go into details here
4) BoundPortsResponse returns rpcEndpointPort, webUIPort, restPort real port
5) The final daemon will always exist and wait for the end message awaitTermination

4.3OnStart listening event

After the Master is started, the asynchronous execution work is as follows:
Insert image description here
1) [dispatcher-event-loop] thread scans the OnStart instruction and starts the relevant MasterWebUI (default port 8080), and chooses to install ResetServer (default port 6066) according to the configuration
2) In addition, new [master-forward-message-thread] The thread periodically checks whether the worker heartbeat times out.
3) If the Worker heartbeat detection times out, then send ExecutorUpdated to the Drivers belonging to all tasks published under the Worker, and at the same time re-LaunchDriver

4.4RpcMessage processing (receiveAndReply)

Insert image description here
OneWayMessage processing(receive)
Insert image description here
Insert image description here

4.5Master’s processing logic for RpcMessage/OneWayMessage

This part is not very helpful for the overall Master's understanding and is relatively abstract. You can read the subsequent content first and then consider reading this part later, or you can not read it.

Insert image description here

2. Start the Work node

Worker is a specific instance of Endpoint. Below we introduce the additional work after Worker startup and OnStart instructions.

2.1 Script Overview

/opt/jdk1.7.0_79/bin/java
-cp /opt/spark-2.1.0/conf/:/opt/spark-2.1.0/jars/*:/opt/hadoop-2.6.4/etc/hadoop/
-Xmx1g
-XX:MaxPermSize=256m
org.apache.spark.deploy.worker.Worker
--webui-port 8081
spark://master01:7077

2.2 Startup process

 Worker的启动流程如下:

Insert image description here
1) SparkConf: Load system properties whose key starts with spark. (Utils.getSystemProperties)
2) WorkerArguments:
a) Parse Master startup parameters (–ip -i --host -h --port -p --cores -c - -memory -m --work-dir --webui-port --properties-file)
b) Save the configuration starting with spark. in –properties-file (defaults to conf/spark-defaults.conf if not configured) into SparkConf
c ) In the absence of configuration, cores defaults to the number of server CPU cores
d) In the absence of configuration, memory defaults to server memory minus 1G, if it is less than 1G, take 1G
e) webUiPort defaults to 8081
3) The internal processing in NettyRpcEnv follows RpcEndpoint is processed uniformly and will not be described in detail here.
4) The final daemon process will always exist and wait for the end message awaitTermination.

2.3OnStart listening event

After the startup of the Worker is completed, the asynchronous execution work is as follows
Insert image description here
1) [dispatcher-event-loop] The thread scans the OnStart command and starts the relevant WorkerWebUI (default port 8081)
2) The Worker initiates a RegisterWorker command to the Master
3) Starts another [master-forward -message-thread] The thread regularly executes the ReregisterWithMaster task. If the registration is successful (RegisteredWorker), it will be skipped. Otherwise, the RegisterWorker instruction will be initiated to the Master again until the maximum number of errors is exceeded (default 16 times).
4) If the Master can be registered, it will maintain the corresponding The WorkerInfo object is persisted. After completion, a RegisteredWorker instruction is issued to the Worker. If the Master is in the standby state, a MasterInStandby instruction is issued to the Worker.
5) After the Worker accepts the RegisteredWorker, it submits the [master-forward-message-thread] thread to perform the SendHeartbeat task regularly. ,, after completion, initiate a WorkerLatestState command to the Worker.
6) The Worker sends a heartbeat detection, which will trigger the update of the WorkerInfo object corresponding to the Master. If the Master detects an exception, it initiates a ReconnectWorker command to the Worker, and the Worker executes the ReregisterWithMaster job again.

2.4RpcMessage processing (receiveAndReply)

Insert image description here

2.5OneWayMessage processing (receive)

Insert image description here
Insert image description here

3. Client startup process

Client is a specific instance of Endpoint. Below we introduce the additional work after the Client startup and OnStart instructions.

3.1 Script Overview

Here is an example:

/opt/jdk1.7.0_79/bin/java
-cp /opt/spark-2.1.0/conf/:/opt/spark-2.1.0/jars/*:/opt/hadoop-2.6.4/etc/hadoop/
-Xmx1g
-XX:MaxPermSize=256m
org.apache.spark.deploy.SparkSubmit
--master spark://zqh:7077
--class org.apache.spark.examples.SparkPi
../examples/jars/spark-examples_2.11-2.1.0.jar 10

3.2SparkSubmit startup process

   SparkSubmit的启动流程如下:

Insert image description here
1)SparkSubmitArguments:
a) Parse the parameters for Client startup
i.–name --master --class --deploy-mode
ii.–num-executors --executor-cores --total-executor-cores --executor-memory
iii .–driver-memory --driver-cores --driver-class-path --driver-java-options --driver-library-path
iv.–properties-file
v.–kill --status --supervise --queue
vi.–files --py-files
vii.–archives --jars --packages --exclude-packages --repositories
viii.–conf (the analysis is stored in Map: sparkProperties)
ix.–proxy-user --principal - -keytab --help --verbose --version --usage-error
b) Merge –properties-file (default to conf/spark-defaults.conf if not configured) file configuration items (configuration not in –conf) to sparkProperties
c ) Delete the configuration items that do not start with spark. in sparkProperties
d) Merge configuration items with empty startup parameters from sparkProperties
e) Verify whether each required parameter has a value according to action (SUBMIT, KILL, REQUEST_STATUS)
2) Case Submit:
a ) Get childMainClass
i.[–deploy-mode] = clent (default): user task startup class mainClass (–class)
ii.[–deploy-mode] = cluster & [–master] = spark:* & useRest:org. apache.spark.deploy.rest.RestSubmissionClient
iii.[–deploy-mode] = cluster & [–master] = spark:* & !useRest : org.apache.spark.deploy.Client
iv.[–deploy-mode] = cluster & [–master] = yarn: org.apache.spark.deploy.yarn.Client
v. [–deploy-mode] = cluster & [–master] = mesos:*: org.apache.spark.deploy.rest. RestSubmissionClient
b) Get childArgs (command line assembly parameters corresponding to child runtime)

i.[–deploy-mode] = cluster & [–master] = spark:* & useRest: Contains primaryResource and mainClass
ii.[–deploy-mode] = cluster & [–master] = spark:* & !useRest: Contains –supervise --memory --cores launch [childArgs], primaryResource, mainClass
iii.[–deploy-mode] = cluster & [–master] = yarn: –class --arg --jar/–primary-py-file/ –primary-r-file
iv.[–deploy-mode] = cluster & [–master] = mesos:*: primaryResource
c) Get childClasspath
i.[–deploy-mode] = clent: read –jars configuration, with primaryResource Information (.../examples/jars/spark-examples_2.11-2.1.0.jar)
d) Obtain sysProps
i. Encapsulate all configurations in sparkPropertie into new sysProps objects, and add additional configuration items
e) Load childClasspath through the current class loader
f) Set sysProps to the current jvm environment
g) Finally, reflect and execute childMainClass, passing the parameter as childArgs

3.3Client startup process

  Client的启动流程如下:

Insert image description here
1)SparkConf: Load system properties whose key starts with spark. (Utils.getSystemProperties)
2)ClientArguments:
a) Parse Client startup parameters
i.–cores -c --memory -m --supervise -s --verbose -v
ii.launch jarUrl master mainClass
iii.kill master driverId
b) Save the configuration starting with spark. in –properties-file (defaults to conf/spark-defaults.conf if not configured) into SparkConf
c) In the absence of configuration, cores defaults It is 1 core
d) Without configuration, the memory defaults to 1G
e) The internal processing in NettyRpcEnv follows the unified processing of RpcEndpoint, which will not be described here.
3) The final daemon will always exist to wait for the end message awaitTermination

3.4Client’s OnStart listening event

 Client的启动完成后异步执行工作如下: 

Insert image description here
1) If it is a release task (case launch), the Client creates a DriverDescription and initiates a RequestSubmitDriver request to the Master
Insert image description here
a) The mainClass in the Command is: org.apache.spark.deploy.worker.DriverWrapper
b) The arguments in the Command are: Seq ("{ {WORKER_URL}}", "{ {USER_JAR}}", driverArgs.mainClass)
2) After Master accepts the RequestSubmitDriver request, it encapsulates the DriverDescription into a DriverInfo,

Insert image description here
a) startTime and submitDate are both the current time
b) driverId format is: driver-yyyyMMddHHmmss-nextId, nextId is globally unique
3) Master persists DriverInfo and adds it to the waiting driver list (waitingDrivers) to trigger public resource scheduling logic.
4) After the Master public resource scheduling is completed, it returns the SubmitDriverResponse to the Client.

3.5RpcMessage processing (receiveAndReply)

3.6OneWayMessage processing (receive)

Insert image description here

4. Driver and DriverRunner

The Client initiates a RequestSubmitDriver request to the Master, and the Master adds the DriverInfo to the waiting list (waitingDrivers). The following is a further overview of the Driver.

4.1Master allocates Driver resources

 大致流程如下:

Insert image description here
Resource matching between waitingDrivers and aliveWorkers,
1) In the waitingDrivers loop, poll all aliveWorkers
2) If the aliveWorker meets the current waitingDriver resource requirements, send the LaunchDriver command to the Worker and remove the waitingDriver from the waitingDrivers, then perform the next waitingDriver polling work
3 ) If all aliveWorkers do not meet the waitingDriver resource requirements after polling, the next waitingDriver polling work will be performed.
4) All initiated polling start points start at the next point from the last polling end point.

4.2Worker runs DriverRunner

Driver startup, the process is as follows:
Insert image description here
1) When the Worker encounters the LaunchDriver instruction, it creates and starts a DriverRunner
2) DriverRunner starts a thread [DriverRunner for [driverId]] to handle the Driver startup work
3) [DriverRunner for [driverId]]:
a ) Add a JVM hook and create a temporary directory for each driverId
b) Remotely copy DriverDesc.jarUrl from the Driver machine through Netty
c) Build a locally executed command command based on the DriverDesc.command template and start the Process process corresponding to the command
d) Output the output stream of the Process to the file stdout/stderror. If the Process fails to start, perform repeated startup work for 1-5 seconds until the startup is successful, and then release the DriverRunner resources of the Worker node.

4.3DriverRunner creates and runs DriverWrapper

 DriverWrapper的运行,流程如下:

Insert image description here
1) DriverWapper creates an RpcEndpoint and RpcEnv
2) RpcEndpoint is WorkerWatcher, the main purpose is to monitor whether the Worker node is normal, and exit directly if an exception occurs
3) Then the current ClassLoader loads userJar and executes userMainClass at the same time
4) After executing the user's main method Close workerWatcher

5, SparkContext analysis

5.1SparkContext analysis

SparkContext is the user's only entrance to the Spark cluster. Anywhere where Spark needs to be used, a SparkContext needs to be created first. So what does SparkContext do?
First of all, SparkContext is started in the Driver program. It can be regarded as a connection between the Driver program and the Spark cluster. During initialization, SparkContext creates many objects: The
Insert image description here
above figure lists the construction of some main components of SparkContext during initial creation. .

5.2SparkContext creation process

The creation process is as follows : SparkContext creates a SparkEnv internally
Insert image description here
when it is newly created , and creates an RpcEnv internally in SparkEnv a) Creates and registers a MapOutputTrackerMasterEndpoint internally in RpcEnv (this Endpoint will not be introduced yet) 2) Then creates DAGScheduler, TaskSchedulerImpl, and SchedulerBackend a) Creates TaskSchedulerImpl when it is created SchedulableBuilder, SchedulableBuilder is divided into two categories: FIFOSchedulableBuilder and FairSchedulableBuilder according to type 3) Finally start TaskSchedulerImpl, TaskSchedulerImpl starts SchedulerBackend a) Create ApplicationDescription, DriverEndpoint, StandloneAppClient when SchedulerBackend starts b) StandloneAppClient includes a ClientEndpoint internally






5.3SparkContext simple structure and interaction relationship

Insert image description here
1)SparkContext: It is the context in which user Spark executes tasks. The user program uses the API provided by Spark to directly or indirectly create a SparkContext.
2)SparkEnv: the environment information of user execution, including communication-related endpoints
. 3)RpcEnv: the remote communication environment in SparkContext.
4)ApplicationDescription: Application description information, mainly including appName, maxCores, memoryPerExecutorMB, coresPerExecutor, Command (
CoarseGrainedExecutorBackend), appUiUrl, etc.
5)ClientEndpoint: Client endpoint, initiates a RegisterApplication request to the Master after startup
6)Master: After accepting the RegisterApplication request , allocate Worker resources, and initiate the LaunchExecutor instruction to the allocated resources
7) Worker: After accepting the LaunchExecutor instruction, run the ExecutorRunner
8) ExecutorRunner: Run the Command command of the applicationDescription, and finally the Executor, and register the Executor information with the DriverEndpoint at the same time

5.4Master allocates resources to Application

When the Master accepts the Driver's RegisterApplication request, it is placed in the waitingDrivers queue and resource allocation is performed in the same schedule. The allocation process is as follows:

Insert image description here
Resource matching between waitingApps and aliveWorkers
1) If waitingApp is configured with app.desc.coresPerExecutor:
a) Poll all valid allocable workers, allocate one executor each time, and the number of cores of the executor is minCoresPerExecutor (app.desc.coresPerExecutor), until There are no valid allocable resources or all the resources that the app depends on have been allocated.
2) If waitingApp does not configure app.desc.coresPerExecutor:
a) Poll all valid allocable workers, each worker is assigned an executor, and the number of executor cores is It starts to increase from minCoresPerExecutor (which is a fixed value of 1) until there are no valid allocable resources or all the resources that the app depends on have been allocated.
3) A valid allocable worker is defined as a worker that satisfies one resource allocation:
a) cores satisfy: usableWorkers( pos).coresFree - assignedCores(pos) >= minCoresPerExecutor,
b)memory satisfies (if it is a new Executor): usableWorkers(pos).memoryFree - assignedExecutors(pos) * memoryPerExecutor >= memoryPerExecutor
Note: Master allocates resources for applicationInfo When there are valid and available resources, they will be allocated directly, and the remaining app.coresLeft will be allocated next time.

5.5Worker creates Executor

Insert image description here
(Illustration: The orange component is the Endpoint component)
Worker starts Executor
1) Create application and executor directories under Worker's tempDir, and chmod700 operation permissions
2) Create and start ExecutorRunner to create Executor
3) Send Executor status to master
ExecutorRnner
1) The new thread [ExecutorRunner for [executorId]] reads the ApplicationDescription and converts the Command into a local Command command.
2) Calls the Command and outputs the log to the stdout and stderr log files in the executor directory. The java class corresponding to the Command is CoarseGrainedExecutorBackend.
CoarseGrainedExecutorBackend
1) Create a SparkEnv, create an ExecutorEndpoint (CoarseGrainedExecutorBackend), and WorkerWatcher
2) After the ExecutorEndpoint is created and started, send a RegisterExecutor request to the DriverEndpoint and wait for the return
3) DriverEndpoint processes the RegisterExecutor request and returns the result of the ExecutorEndpointRegister
4) If the registration is successful, the ExecutorEndpoint internal Then create the Executor processing object
. At this point, the container framework for Spark running tasks is completed.

6. Job submission and task splitting

In the previous chapter of Client loading, Spark's DriverRunner has started to execute the user task class (for example: org.apache.spark.examples.SparkPi). Next we start to analyze the user task class (or task code)

6.1 Overall preview

Insert image description here
1) Code: refers to the code written by the user
2) RDD: elastic distributed data set. User coding can convert Code into RDD data structure according to the API of SparkContext and RDD (the details of the conversion will be introduced below)
3) DAGScheduler : Directed acyclic graph scheduler, encapsulates RDD into JobSubmitted objects and stores them in the EventLoop (implementation class DAGSchedulerEventProcessLoop) queue
4) EventLoop: Regularly scans unprocessed JobSubmitted objects and submits the JobSubmitted objects to DAGScheduler
5) DAGScheduler: Targets JobSubmitted Process, and finally convert the RDD into an executed TaskSet, and submit the TaskSet to the TaskScheduler
6) TaskScheduler: Create a TaskSetManager object based on the TaskSet and store it in the data pool (Pool) of the ScheduledBuilder, and call the DriverEndpoint to evoke the consumption (ReviveOffers) operation
7) DriverEndpoint: Accept After the ReviveOffers instruction, the Tasks in the TaskSet are evenly distributed to the Executor according to relevant rules.
8) Executor: Start a TaskRunner to execute a Task

6.2Code converted into initial RDDs

Our user code calls Spark's Api (for example: SparkSession.builder.appName("Spark Pi").getOrCreate()), which creates a Spark context (SparkContext). When we call the transform class method (for example: parallelize (), map()) will create (or decorate an existing) Spark data structure (RDD). If it is an action-type operation (such as: reduce()), then the last encapsulated RDD will be submitted as a job and stored to be processed. Waiting for subsequent asynchronous processing in the scheduling queue (DAGSchedulerEventProcessLoop).
If the action class operation is called multiple times, multiple encapsulated RDDs will be submitted as multiple jobs.
The process is as follows:

Insert image description here
ExecuteEnv (execution environment)
1) This can be the MainClass submitted through spark-submit, or it can be a spark-shell script
2) MainClass: a SparkContext will be created or obtained in the code 3) spark-shell: a SparkContext RDD
will be created by default
(Resilient distributed data set)

1)create: can be created directly (such as: sc.parallelize(1 until n, slices)), or can be read elsewhere (such as: sc.textFile("README.md")), etc.
2)transformation: provided by rdd A set of APIs can be used to repeatedly encapsulate existing RDDs into new RDDs. The decorator design pattern is used here. The following is a partial decorator class diagram.

Insert image description here
3) action: When the action class operation method of RDD is called (collect, reduce, lookup, save), this triggers DAGScheduler's Job submission
4) DAGScheduler: Creates a message named JobSubmitted to the DAGSchedulerEventProcessLoop blocking message queue (LinkedBlockingDeque)
5 )DAGSchedulerEventProcessLoop: Start a thread named [dag-scheduler-event-loop] to consume the message queue in real time
6) [dag-scheduler-event-loop] Callback JobWaiter after completion of processing
7) DAGScheduler: Print the Job execution results
8) JobSubmitted: Related The code is as follows (where jobId is the DAGScheduler global increment Id):

eventProcessLoop.post(JobSubmitted(
        jobId, rdd, func2, partitions.toArray, callSite, waiter,
        SerializationUtils.clone(properties)))

Insert image description here
The final converted RDD is divided into four layers. Each layer depends on the upper layer RDD. ShffleRDD is encapsulated into a Job and stored in DAGSchedulerEventProcessLoop for processing. If there are several pieces of the above example code in our code, then several corresponding ones will be created. ShffleRDD is stored in DAGSchedulerEventProcessLoop respectively

6.3RDD is decomposed into a set of tasks to be executed (TaskSet)

After the Job is submitted, DAGScheduler parses it into corresponding Stages based on the RDD hierarchical relationship, while maintaining the relationship between Job and Stage.
Decompose the top-level Stage into multiple Tasks based on the concurrency relationship (findMissingPartitions), and encapsulate these multiple Tasks into TaskSet and submit them to TaskScheduler. The non-top-level Stage is stored in the processing list (waitingStages += stage).
The process is as follows:
Insert image description here
1) In DAGSchedulerEventProcessLoop, the thread [dag-scheduler-event-loop] is processed to JobSubmitted
2) Call DAGScheduler to handleJobSubmitted
a) First, according to RDD Dependencies create Stage families in sequence. Stage is divided into ShuffleMapStage and ResultStage.

Insert image description here
b) Update the Map of the relationship between jobId and StageId
c) Create ActiveJob, call LiveListenerBug, and send SparkListenerJobStart command
d) Find the uppermost Stage for submission, and the lower Stage is stored in waitingStage for subsequent processing
i. Call OutputCommitCoordinator for stageStart() processing
ii. Call LiveListenerBug , send the SparkListenerStageSubmitted instruction
and call the broadcast method of SparkContext to obtain the Broadcast object.

Insert image description here
Create multiple corresponding Tasks according to the Stage type. A Stage is divided into multiple corresponding Tasks according to findMissingPartitions. Tasks are divided into ShuffleMapTask and ResultTask
iv. Encapsulate the Task into a TaskSet and call TaskScheduler.submitTasks(taskSet) for Task scheduling. The key code is as follows:

taskScheduler.submitTasks(new TaskSet(
        tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId, properties))

6.4TaskSet is encapsulated into TaskSetManager and submitted to Driver

TaskScheduler encapsulates TaskSet into TaskSetManager(new TaskSetManager(this, taskSet, maxTaskFailures, blacklistTrackerOpt)), stores it in the pending task pool (Pool), and sends the DriverEndpoint command to evoke consumption (ReviveOffers)

Insert image description here
1) DAGSheduler submits TaskSet to the implementation class of TaskScheduler, here is TaskChedulerImpl
2) TaskSchedulerImpl creates a TaskSetManager to manage TaskSet, the key code is as follows:
new TaskSetManager(this, taskSet, maxTaskFailures, blacklistTrackerOpt)
3) At the same time, add TaskSetManager to the task pool Poll of SchedduableBuilder 4
) Call the implementation class of SchedulerBackend to perform reviveOffers. Here is the implementation class of standlone mode StandaloneSchedulerBackend
5) SchedulerBackend sends the ReviveOffers command to DriverEndpoint

6.5Driver decomposes TaskSetManager into TaskDescriptions and publishes tasks to Executor

After the Driver accepts the evoke consumption instruction, it matches all pending TaskSetManagers with the Executor resources registered in the Driver. Finally, a TaskSetManager obtains multiple TaskDescription objects and sends the LaunchTask instruction to the corresponding Executor according to the TaskDescription.
Insert image description here
When the Driver obtains ReviveOffers (request consumption) Instruction
1) First obtain the available Executor resource information (WorkerOffer) based on the executorDataMap cache information. The key code is as follows

val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
val workOffers = activeExecutors.map {
    
     case (id, executorData) =>
  new WorkerOffer(id, executorData.executorHost, executorData.freeCores)
}.toIndexedSeq

2) Then call TaskScheduler for resource matching. The method is defined as follows:
def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {…}
a) Shuffle the WorkerOffer resources (val shuffledOffers = Random.shuffle (offers))
b) Take out the pending TaskSetManager in Poo (val sortedTaskSets = rootPool.getSortedTaskSetQueue),
c) and loop through sortedTaskSets and match the shuffledOffers loop, if shuffledOffers(i) has enough Cpu resources (if (availableCpus( i) >= CPUS_PER_TASK)), call TaskSetManager to create a TaskDescription object (taskSet.resourceOffer(execId, host, maxLocality)), and finally create multiple TaskDescriptions. TaskDescription is defined as follows:

new TaskDescription(
        taskId,
        attemptNum,
        execId,
        taskName,
        index,
        sched.sc.addedFiles,
        sched.sc.addedJars,
        task.localProperties,
        serializedTask)

3) If TaskDescriptions is not empty, loop TaskDescriptions, serialize the TaskDescription object, and send the LaunchTask instruction to the ExecutorEndpoint. The key code is as follows:

for (task <- taskDescriptions.flatten) {
    
    
        val serializedTask = TaskDescription.encode(task)
        val executorData = executorDataMap(task.executorId)
        executorData.freeCores -= scheduler.CPUS_PER_TASK
        executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
}

7. Task execution and receipt

DriverEndpoint eventually generates multiple executable TaskDescription objects and sends LaunchTask instructions to each ExecutorEndpoint. This section will focus on how ExecutorEndpoint handles LaunchTask instructions, how it feeds back to DriverEndpoint after processing is completed, and how the entire job is ultimately scheduled multiple times until it ends.

7.1Task execution process

After the Executor accepts the LaunchTask instruction, it starts a new thread TaskRunner to parse the RDD, calls the compute method of the RDD, and merges the functions to obtain the final task execution result.

Insert image description here
1) After receiving the LaunchTask instruction, the ExecutorEndpoint decodes the TaskDescription, calls the launchTask method of the Executor,
the Executor creates a TaskRunner thread, starts the thread, and adds the thread to the member object of the Executor. The code is as follows:

private val runningTasks = new ConcurrentHashMap[Long, TaskRunner]
runningTasks.put(taskDescription.taskId, taskRunner)

TaskRunner
1) First send the latest status of the task to DriverEndpoint as RUNNING
2) Parse the Task from the TaskDescription and call the Task's run method
Task
1) Create the TaskContext and CallerContext (the context object that interacts with HDFS)
2) Execute the Task's runTask method
a) If the Task instance is a ShuffleMapTask: parse out the RDD and ShuffleDependency information, call the compute() method of the RDD and write the results into the Writer (Writer is not introduced here, it can be understood as a black box, such as writing to a file), and returns the MapStatus object
b) If the Task instance is a ResultTask: parse out the RDD and merge function information, and the calling function will return the result after the call.
TaskRunner will serialize the results of the Task execution and send the task to DriverEndpoint again. The latest status is FINISHED.

7.2Task feedback process

After the execution of TaskRunner is completed, the execution status is sent to DriverEndpoint, and DriverEndpoint finally feedbacks the command CompletionEvent to DAGSchedulerEventProcessLoop.

Insert image description here
1) After the DriverEndpoint receives the StatusUpdate message, it calls the statusUpdate(taskId, state, result) method of the TaskScheduler.
2) If the task result of the TaskScheduler is completed, then clear the status of the task processing and mobilize the TaskResultGetter related methods. The key code is as follows:

val taskSet = taskIdToTaskSetManager.get(tid)

taskIdToTaskSetManager.remove(tid)
        taskIdToExecutorId.remove(tid).foreach {
    
     executorId =>
    executorIdToRunningTaskIds.get(executorId).foreach {
    
     _.remove(tid) }
}
taskSet.removeRunningTask(tid)
if (state == TaskState.FINISHED) {
    
    
    taskResultGetter.enqueueSuccessfulTask(taskSet, tid, serializedData)
} else if (Set(TaskState.FAILED, TaskState.KILLED, TaskState.LOST).contains(state)) {
    
    
    taskResultGetter.enqueueFailedTask(taskSet, tid, state, serializedData)
}

TaskResultGetter starts the thread and starts the thread [task-result-getter] for related processing.
1) Obtain the TaskResult object of the Task through parsing or remote acquisition.
2) Call the handleSuccessfulTask ​​method of TaskSet. The handleSuccessfulTask ​​method of TaskSet directly calls the handleSuccessfulTask ​​method of TaskSetManager.
TaskSetManager
1) Update internal TaskInfo object status and delete the Task from the collection of running Tasks. The code is as follows:

val info = taskInfos(tid)
info.markFinished(TaskState.FINISHED, clock.getTimeMillis())
removeRunningTask(tid)

2) Call the taskEnded method of DAGScheduler. The key code is as follows:

sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)

DAGScheduler stores the CompletionEvent instruction into DAGSchedulerEventProcessLoop. The CompletionEvent object is defined as follows:

private[scheduler] case class CompletionEvent(
task: Task[_],
reason: TaskEndReason,
result: Any,
accumUpdates: Seq[AccumulatorV2[_, _]],
taskInfo: TaskInfo) extends DAGSchedulerEvent

7.3Task iteration process

For the CompletionEvent instruction in DAGSchedulerEventProcessLoop, DAGScheduler is called for processing. DAGScheduler updates the relationship status between the Stage and the Task. If all the tasks under the Stage are returned, the task disassembly and calculation work of the next stage will be done until the Job is executed:

Insert image description here
1) After DAGSchedulerEventProcessLoop receives the CompletionEvent instruction, it calls the handleTaskCompletion method of DAGScheduler
2) DAGScheduler processes it separately according to the type of Task
3) If the Task is a ShuffleMapTask
a) The Partitions to be fed back are subtracted from the current partitionId
b) If all tasks are returned, then markStageAsFinished( shuffleStage), while registering MapOutputs information with MapOutputTrackerMaster, and markMapStageJobAsFinished
c) Call submitWaitingChildStages(shuffleStage) to process the lower stages, so that the iterative processing is finally processed to the ResultTask, and the job ends. The key code is as follows:

private def submitWaitingChildStages(parent: Stage) {
    
    
    ...
    val childStages = waitingStages.filter(_.parents.contains(parent)).toArray
    waitingStages --= childStages
    for (stage <- childStages.sortBy(_.firstJobId)) {
    
    
        submitStage(stage)
    }
}

4) If the Task is ResultTask
a) Change the partitions of the job and all have returned, then markStageAsFinished(resultStage), and cleanupStateForJobAndIndependentStages(job), the key code is as follows

for (stage <- stageIdToStage.get(stageId)) {
    
    
    if (runningStages.contains(stage)) {
    
    
        logDebug("Removing running stage %d".format(stageId))
        runningStages -= stage
    }
    for ((k, v) <- shuffleIdToMapStage.find(_._2 == stage)) {
    
    
        shuffleIdToMapStage.remove(k)
    }
    if (waitingStages.contains(stage)) {
    
    
        logDebug("Removing stage %d from waiting set.".format(stageId))
        waitingStages -= stage
    }
    if (failedStages.contains(stage)) {
    
    
        logDebug("Removing stage %d from failed set.".format(stageId))
        failedStages -= stage
    }
}
// data structures based on StageId
stageIdToStage -= stageId
jobIdToStageIds -= job.jobId
jobIdToActiveJob -= job.jobId
activeJobs -= job

At this point, the code written by the user finally calls Spark distributed computing.

Guess you like

Origin blog.csdn.net/qq_44696532/article/details/135390525
Recommended