Analysis of the Principle of Task

1. The execution process of Task

    1. After receiving the LaunchTask request, a TaskRunner will be used to encapsulate the task, copy the required resources and initialize the relevant environment in the TaskRunner, and then call the task in the TaskRunner's run (because it inherits Runnable) method. The run() method processes the task:

override def run(): Unit = {
      threadId = Thread.currentThread.getId
      Thread.currentThread.setName(threadName)
      val threadMXBean = ManagementFactory.getThreadMXBean
      val taskMemoryManager = new TaskMemoryManager(env.memoryManager, taskId)
      val deserializeStartTime = System.currentTimeMillis()
      val deserializeStartCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
        threadMXBean.getCurrentThreadCpuTime
      } else 0L
      Thread.currentThread.setContextClassLoader(replClassLoader)
      val ser = env.closureSerializer.newInstance()
      logInfo(s"Running $taskName (TID $taskId)")
      execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
      var taskStart: Long = 0
      var taskStartCpu: Long = 0
      startGCTime = computeTotalGcTime()

      try {
        // Must be set before updateDependencies() is called, in case fetching dependencies
        // requires access to properties contained within (e.g. for access control).
        //Deserialize the configuration information of the task for easy use later
        Executor.taskDeserializationProps.set(taskDescription.properties)

        //Deserialize, copy related resources, and the jar package we need 3
        updateDependencies(taskDescription.addedFiles, taskDescription.addedJars)
        
        //The deserialization method deserializes the obtained files and jar packages back
        task = ser.deserialize[Task[Any]](
            //The reason for using the class loader, the class loader can dynamically load a class, and can read the specified context-related resources 4
            
          taskDescription.serializedTask, Thread.currentThread.getContextClassLoader)
        task.localProperties = taskDescription.properties
        task.setTaskMemoryManager (taskMemoryManager)

        // If this task has been killed before we deserialized it, let's quit now. Otherwise,
        // continue executing the task.
        val killReason = reasonIfKilled
        if (killReason.isDefined) {
          // Throw an exception rather than returning, because returning within a try{} block
          // causes a NonLocalReturnControl exception to be thrown. The NonLocalReturnControl
          // exception will be caught by the catch block, leading to an incorrect ExceptionFailure
          // for the task.
          throw new TaskKilledException(killReason.get)
        }

        // The purpose of updating the epoch here is to invalidate executor map output status cache
        // in case FetchFailures have occurred. In local mode `env.mapOutputTracker` will be
        // MapOutputTrackerMaster and its cache invalidation is not based on epoch numbers so
        // we don't need to make any special calls here.
        if (!isLocal) {
          logDebug("Task " + taskId + "'s epoch is " + task.epoch)
          env.mapOutputTracker.asInstanceOf[MapOutputTrackerWorker].updateEpoch(task.epoch)
        }

        //Run the current task and calculate the running time
        // Run the actual task and measure its runtime.
        //time to start running
        taskStart = System.currentTimeMillis()
        taskStartCpu = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
          threadMXBean.getCurrentThreadCpuTime
        } else 0L
        var threwException = true
        //value encapsulates the location where the shufflemaptask calculates the data output
        val value = try {
          //Call the run method of the task and return the result of the run
          val res = task.run (
            taskAttemptId = taskId,
            attemptNumber = taskDescription.attemptNumber,
            metricsSystem = env.metricsSystem)
          threwException = false
          res
        }

    2. Calling the iterator() method of RDD will execute the operator we defined on the partition of the RDD corresponding to the task

final def run(
      taskAttemptId: Long
      attemptNumber: Int,
      metricsSystem: MetricsSystem): T = {
      .......
      .......
     try {
      //call abstract method 15
      runTask(context)
    }
   .........
  }
//The definition of abstract function is as follows
//e16 of abstract function
  def runTask(context: TaskContext): T
//The abstract function has dependencies and the implementation of its subclasses here I
// Let's take shuffleMapTask as an example

    (1), this is the function operator we defined

override def compute(split: Partition, context: TaskContext): Iterator[U] =
   //f is the operator and function defined by ourselves and also implement some functions to operate and calculate the RDD partition
    //Return the partition data of the new RDD
    f(context, split.index, firstParent[T].iterator(split, context))

    (2) The calculated result is written to the local disk file using the shuffleWriter of the shufflemanager

override def runTask(context: TaskContext): MapStatus = {
  ..........
  ..........
   try {
      //manager
      val manager = SparkEnv.get.shuffleManager
      writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
      //Execute the specified logic on the iterator of rdd
      //The returned data is written to its own partition
      writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
      //mapstatus encapsulates the calculated data, which is the relevant information of BlockManger
      writer.stop(success = true).get
    }
 ..........
 .........
}

    (3), MapStatus sends the processed data to DAGScheduler, and after MapStatus summarizes, sends the data to MapOutPutTeacker, which is finally processed by resultTask

2. Schematic diagram of execution

        

    

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324703177&siteId=291194637