Spark Core（十五）Executor执行Task的原理与源码分析（一）

Executor执行Task的前期准备：
1. 在我们介绍Executor执行Task之前，先看一个重要的类，它就是CoarseGrainedExecutorBackend类
2. 它创建这个进程的时候会调用onStart方法
3. 它是ExecutorBackend粗粒度进程，
4. 它负责向Driver发送Executor的注册请求
5. 它是一个通信的进程，它可以与Driver相互通信
6. 它是Executor所在的一个进程名称，Executor才是处理Task真正的对象，Executor处理Task都是由线程池来进行Task的处理的。
7. 它负责接受Driver返回回来的Executor注册信息，然后创建Executor上下文。
8. 它负责接受TaskSchedule发送过来的LaunchTask消息，开始Task的启动与计算
Executor执行Task的原理分析：
1. 当CoarseGrainedExecutorBackend接收到Driver发送过来的RegisteredExecutor消息的时候就会创建Executor
2. 然后当再次接受Driver发送过来的LaunchTask消息后就会开始执行Task，首先它会对发送来的TaskTaskDescription进行反序列化，然后调用launchTask方法交由Executor去执行Task。
3. 在launchTask方法中，创建了TaskRunner，然后TaskRunner继承了Runnable接口，然后将这个TaskRunner加入到线程池和缓存中，然后线程池调用executor方法开始Task的执行。

Executor执行Task的原码分析：

CoarseGrainedExecutorBackend的onStart方法：该方法在创建CoarseGrainedExecutorBackend类的时候被执行，它会向Driver注册Executor

override def onStart() {
    logInfo("Connecting to driver: " + driverUrl)
    rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
      // This is a very fast action so we can use "ThreadUtils.sameThread"
      driver = Some(ref)
      //向Driver发送Executor的注册请求
      ref.ask[RegisterExecutorResponse](
        RegisterExecutor(executorId, self, hostPort, cores, extractLogUrls))
    }(ThreadUtils.sameThread).onComplete {
      // This is a very fast action so we can use "ThreadUtils.sameThread"
      case Success(msg) => Utils.tryLogNonFatalError {
        Option(self).foreach(_.send(msg)) // msg must be RegisterExecutorResponse
      }
      case Failure(e) => {
        logError(s"Cannot register with driver: $driverUrl", e)
        System.exit(1)
      }
    }(ThreadUtils.sameThread)
}

CoarseGrainedExecutorBackend的receive方法：该方法作用就是接受各种消息用的。

override def receive: PartialFunction[Any, Unit] = {
    //Driver返回Executor注册成功的消息，然后就会创建Executor对象。
    case RegisteredExecutor(hostname) =>
      logInfo("Successfully registered with driver")
      executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)

    //Driver返回Executor注册失败的消息，然后程序结束执行。
    case RegisterExecutorFailed(message) =>
      logError("Slave registration failed: " + message)
      System.exit(1)

    //接受Driver发送过来的LaunchTask消息，这个消息作用就是要求Executor开始执行Task任务
    case LaunchTask(data) =>
      if (executor == null) {
        logError("Received LaunchTask command but executor was null")
        System.exit(1)
      } else {
        //首先会对传过来的TaskDescription进行反序列化，
        val taskDesc = ser.deserialize[TaskDescription](data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        //调用executor的launchTask方法开始执行Task任务。
        //this：ExecutorBackend，taskId：task的索引Id，attemptNumber：尝试执行的次数，
        //taskDesc.name:task的名称，taskDesc.serializedTask：TaskDescription序列化后的对象
        executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,
          taskDesc.name, taskDesc.serializedTask)
      }

    case KillTask(taskId, _, interruptThread) =>
      if (executor == null) {
        logError("Received KillTask command but executor was null")
        System.exit(1)
      } else {
        executor.killTask(taskId, interruptThread)
      }

    case StopExecutor =>
      logInfo("Driver commanded a shutdown")
      // Cannot shutdown here because an ack may need to be sent back to the caller. So send
      // a message to self to actually do the shutdown.
      self.send(Shutdown)

    case Shutdown =>
      executor.stop()
      stop()
      rpcEnv.shutdown()
  }

Executor的launchTask方法：该方法的作用是为每个Task创建一个TaskRunner，然后将TaskRunner放入内存缓存中，然后再将TaskRunner放入线程池中，等待线程执行。

def launchTask(
      context: ExecutorBackend,
      taskId: Long,
      attemptNumber: Int,
      taskName: String,
      serializedTask: ByteBuffer): Unit = {
    //为每一个Task都创建一个对应的TaskRunner对象，TaskRunner继承了Java的Runnable接口   
    val tr = new TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber, taskName,
      serializedTask)
    //将TaskRunner放入内存缓存
    runningTasks.put(taskId, tr)

    //Executor内部有一个Java线程池，然后将Task封装到TaskRunner线程，直接放到
    //线程池中去执行，如果线程池中线程不够用的，就会等待有了空闲的线程在开始执行
    threadPool.execute(tr)
}

TaskRunner继承了Runable接口，执行Task的程序都放在了多线程的run方法里了，每当一个Task过来就会创建一个TaskRunner对象，并且创建一个线程线程去执行Task，然后这些TaskRunner会放到线程池中去执行。下边是run方法的源码解析

override def run(): Unit = {

      //为Task分配一个内存管理器
      val taskMemoryManager = new TaskMemoryManager(env.memoryManager, taskId)
      //记录反序列化的时间
      val deserializeStartTime = System.currentTimeMillis()
      Thread.currentThread.setContextClassLoader(replClassLoader)
      //创建一个序列化器，用来对Task数据进行反序列化
      val ser = env.closureSerializer.newInstance()
      logInfo(s"Running $taskName (TID $taskId)")

      //向Driver发送Task当前的执行状态
      execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
      var taskStart: Long = 0
      startGCTime = computeTotalGcTime()

      try {
        //对序列化后的Task数据进行反序列化
        val (taskFiles, taskJars, taskBytes) = Task.deserializeWithDependencies(serializedTask)
        //通过网络通信，获取Task依赖的文件、资源、jar包，比如说Hadoop的配置文件
        updateDependencies(taskFiles, taskJars)
        //通过反序列化将Task进行反序列化
        //类加载的作用：用发射动态加载一个类，创建类的对象
        task = ser.deserialize[Task[Any]](taskBytes, Thread.currentThread.getContextClassLoader)
        task.setTaskMemoryManager(taskMemoryManager)

        //如果在序列化之前以及被停掉了，那么就会马上退出，否则就会继续执行Task
        if (killed) {
          throw new TaskKilledException
        }

        logDebug("Task " + taskId + "'s epoch is " + task.epoch)
        env.mapOutputTracker.updateEpoch(task.epoch)

        // 计算出Task开始的时间
        taskStart = System.currentTimeMillis()
        var threwException = true
        //value：就是MapStatus，因为执行Task所得结果其实就是Shuffle操作，那么Shuffle
        //操作后的结果会被持久化到对应Shuffle文件中，MapStatus它封装了Shuffle的文件地址，以及计算结果的大小。
        //后边会将这个MapStatus序列化，返回给对应Executor的CoraseGrainedBackend上
        val (value, accumUpdates) = try {
        //执行Task最核心的方法，不要着急，我们会在下边的源码中讲到
          val res = task.run(
            taskAttemptId = taskId,
            attemptNumber = attemptNumber,
            metricsSystem = env.metricsSystem)
          threwException = false
          res
        } finally {
          //当Task执行成功或者失败都会释放内存
          val freedMemory = taskMemoryManager.cleanUpAllAllocatedMemory()
          //监测是否内存泄漏，如果泄漏就会跑出异常
          if (freedMemory > 0) {
            val errMsg = s"Managed memory leak detected; size = $freedMemory bytes, TID = $taskId"
            if (conf.getBoolean("spark.unsafe.exceptionOnMemoryLeak", false) && !threwException) {
              throw new SparkException(errMsg)
            } else {
              logError(errMsg)
            }
          }
        }

        //计算出Task结束的时间
        val taskFinish = System.currentTimeMillis()

        // If the task has been killed, let's fail it.
        if (task.killed) {
          throw new TaskKilledException
        }

        //为Task执行后的到的结果创建序列化器
        val resultSer = env.serializer.newInstance()
        //记录序列化Task执行结果的时间
        val beforeSerialization = System.currentTimeMillis()
        //序列化Task执行后的结果，因为这个结果会返回给Driver
        val valueBytes = resultSer.serialize(value)
        //记录序列化Task结果的完成的时间
        val afterSerialization = System.currentTimeMillis()

        //设置Task运行时候的一些指标，这些都会在SparkUI上显示
        for (m <- task.metrics) {

          m.setExecutorDeserializeTime(
            (taskStart - deserializeStartTime) + task.executorDeserializeTime)
          m.setExecutorRunTime((taskFinish - taskStart) - task.executorDeserializeTime)
          m.setJvmGCTime(computeTotalGcTime() - startGCTime)
          m.setResultSerializationTime(afterSerialization - beforeSerialization)
          m.updateAccumulators()
        }
        //一个包含了Task结果与累加器的更新的TaskResult
        val directResult = new DirectTaskResult(valueBytes, accumUpdates, task.metrics.orNull)
        //序列化TaskResult
        val serializedDirectResult = ser.serialize(directResult)
        //计算序TaskResult序列后的大小
        val resultSize = serializedDirectResult.limit

        // directSend = sending directly back to the driver
        val serializedResult: ByteBuffer = {
          //如果执行结果序列化后的大小是否大于最大的限制大小（可配置，默认是1G），如果大于最大的大小，那么直接丢弃它

          if (maxResultSize > 0 && resultSize > maxResultSize) {
            logWarning(s"Finished $taskName (TID $taskId). Result is larger than maxResultSize " +
              s"(${Utils.bytesToString(resultSize)} > ${Utils.bytesToString(maxResultSize)}), " +
              s"dropping it.")
            ser.serialize(new IndirectTaskResult[Any](TaskResultBlockId(taskId), resultSize))
          //如果执行结果序列化后的大小超出阈值大小，但是不超过最大限制大小（1G），
          //那么序列化的结果不直接发送给Driver，而是通过BlockManage获取
          } else if (resultSize >= akkaFrameSize - AkkaUtils.reservedSizeBytes) {
            val blockId = TaskResultBlockId(taskId)
            env.blockManager.putBytes(
              blockId, serializedDirectResult, StorageLevel.MEMORY_AND_DISK_SER)
            logInfo(
              s"Finished $taskName (TID $taskId). $resultSize bytes result sent via BlockManager)")
            ser.serialize(new IndirectTaskResult[Any](blockId, resultSize))
           //如果没有超出阈值，那么就会直接返回给Driver
          } else {
            logInfo(s"Finished $taskName (TID $taskId). $resultSize bytes result sent to driver")
            serializedDirectResult
          }
        }
        //向Driver（其实是Executor所在的CoraseGrainedBackend）发送对应Task的执行结果与执行状态
        //因为Executor启动以后会向CoraseGrainedBackend进行注册。
        execBackend.statusUpdate(taskId, TaskState.FINISHED, serializedResult)

      //下边是一些异常捕获，不同执行程序可能遇到不同的异常
      //根据不同的异常对程序做不同的处理
      } catch {
        case ffe: FetchFailedException =>
          val reason = ffe.toTaskEndReason
          execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))

        case _: TaskKilledException | _: InterruptedException if task.killed =>
          logInfo(s"Executor killed $taskName (TID $taskId)")
          execBackend.statusUpdate(taskId, TaskState.KILLED, ser.serialize(TaskKilled))

        case cDE: CommitDeniedException =>
          val reason = cDE.toTaskEndReason
          execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))

        case t: Throwable =>
          // Attempt to exit cleanly by informing the driver of our failure.
          // If anything goes wrong (or this was a fatal exception), we will delegate to
          // the default uncaught exception handler, which will terminate the Executor.
          logError(s"Exception in $taskName (TID $taskId)", t)

          val metrics: Option[TaskMetrics] = Option(task).flatMap { task =>
            task.metrics.map { m =>
              m.setExecutorRunTime(System.currentTimeMillis() - taskStart)
              m.setJvmGCTime(computeTotalGcTime() - startGCTime)
              m.updateAccumulators()
              m
            }
          }
          val serializedTaskEndReason = {
            try {
              ser.serialize(new ExceptionFailure(t, metrics))
            } catch {
              case _: NotSerializableException =>
                // t is not serializable so just send the stacktrace
                ser.serialize(new ExceptionFailure(t, metrics, false))
            }
          }
          execBackend.statusUpdate(taskId, TaskState.FAILED, serializedTaskEndReason)

          // Don't forcibly exit unless the exception was inherently fatal, to avoid
          // stopping other tasks unnecessarily.
          if (Utils.isFatalError(t)) {
            SparkUncaughtExceptionHandler.uncaughtException(t)
          }

      } finally {
        //Task执行完毕以后将task从RunningTask的队列中移除去
        runningTasks.remove(taskId)
      }
    }
  }

Executor的updateDependencies方法，该方法的作用就是通过网络通信，获取Task依赖的文件、资源、jar包，比如说Hadoop的配置文件

 private def updateDependencies(newFiles: HashMap[String, Long], newJars: HashMap[String, Long]) {

    //获取Hadoop的配置文件
    lazy val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)

    //同步代码块，因为在CoarseGrainedExecutorBackend进程中运行多个线程，
    //来执行不同的Task那么多个线程访问同一个资源，就会出现线程安全问题，
    //所以为了避免数据同步问题，加上同步到代码块
    synchronized {
      // 遍历要拉去的文件
      for ((name, timestamp) <- newFiles if currentFiles.getOrElse(name, -1L) < timestamp) {
        logInfo("Fetching " + name + " with timestamp " + timestamp)
        // Fetch file with useCache mode, close cache for local mode.
        //通过Utils.fetchFile方法，利用网络通信来拉去依赖文件
        Utils.fetchFile(name, new File(SparkFiles.getRootDirectory()), conf,
          env.securityManager, hadoopConf, timestamp, useCache = !isLocal)

        currentFiles(name) = timestamp
      }
      //遍历拉去的Jar
      for ((name, timestamp) <- newJars) {

        val localName = name.split("/").last
        val currentTimeStamp = currentJars.get(name)
          .orElse(currentJars.get(localName))
          .getOrElse(-1L)
          //处理时间戳的问题，保证了Jar的时间戳小于当前时间戳
        if (currentTimeStamp < timestamp) {
          logInfo("Fetching " + name + " with timestamp " + timestamp)
          // 通过Utils.fetchFile方法，利用网络通信进行Jar的拉去
          Utils.fetchFile(name, new File(SparkFiles.getRootDirectory()), conf,
            env.securityManager, hadoopConf, timestamp, useCache = !isLocal)
          currentJars(name) = timestamp
          // Add it to our class loader
          val url = new File(SparkFiles.getRootDirectory(), localName).toURI.toURL
          if (!urlClassLoader.getURLs().contains(url)) {
            logInfo("Adding " + url + " to class loader")
            urlClassLoader.addURL(url)
          }
        }
      }
    }
  }

Task里的run方法，也就是执行Task所需的准备工作的结尾

 final def run(
    taskAttemptId: Long,
    attemptNumber: Int,
    metricsSystem: MetricsSystem)
  : (T, AccumulatorUpdates) = {

    //创建TaskContext，也就是Task执行的上下文，封装了Task执行所需要的数据
    //stageId:属于哪个Stage，partitionId：所处理的分区，attemptNumber：尝试执行的次数，
    //taskMemoryManager:所需要的内存管理器，metricsSystem：系统指标，
    //internalAccumulators：内部去累加器
    context = new TaskContextImpl(
      stageId,
      partitionId,
      taskAttemptId,
      attemptNumber,
      taskMemoryManager,
      metricsSystem,
      internalAccumulators,
      runningLocally = false)
    TaskContext.setTaskContext(context)
    context.taskMetrics.setHostname(Utils.localHostName())
    context.taskMetrics.setAccumulatorsUpdater(context.collectInternalAccumulators)
    taskThread = Thread.currentThread()
    if (_killed) {
      kill(interruptThread = false)
    }
    try {

      //调用runTask方法，因为runTask是一个抽象类，所以它的处理逻辑都是基于子类来实现的
      //因为Task的子类有两个，一个是ShuffleMapTask,另个一是ResultTask，如果想看具体的Task
      //执行程序，就需要到这两个子类去解析具体的处理逻辑
      (runTask(context), context.collectAccumulators())
    } finally {
      context.markTaskCompleted()
      try {
        Utils.tryLogNonFatalError {
          // Release memory used by this thread for unrolling blocks
          SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask()
          // Notify any tasks waiting for execution memory to be freed to wake up and try to
          // acquire memory again. This makes impossible the scenario where a task sleeps forever
          // because there are no other tasks left to notify it. Since this is safe to do but may
          // not be strictly necessary, we should revisit whether we can remove this in the future.
          val memoryManager = SparkEnv.get.memoryManager
          memoryManager.synchronized { memoryManager.notifyAll() }
        }
      } finally {
        TaskContext.unset()
      }
    }
  }

列表内容

Spark Core（十五）Executor执行Task的原理与源码分析（一）

猜你喜欢