Spark Core（十四）TaskScheduler提交Tasks的原理与源码

TaskScheduler提交Tasks的原理
1. 获取当前TaskSet里的所有Task
2. 根据当前的TaskSet封装成对应的TaskSetManager。每一个TaskSet都会创建一个TaskSetManager与之对应。该TaskSetManager的作用就是监控它对应的所有的 Task的执行状态和管理。TaskScheduler就是以TaskSetManager为调度单元去执行Tasks的
3. 将封装好的TaskSetManager加入到等待的调度队列等待调度，又schedueBuilder决定调度的顺序，scheduleBuilder是一个Trian，他又两种实现类，一种是FIFOOSchedulerBuilder，另一个是FairSchedulerBuilder,而Spark默认采用的是FIFO调度模式。
4. 在初始化TaskSchedulerImpl的时候会调用start方法来启动SchedulerBackend，SchedulerBackend（实际上是CoarseGrainedSchedulerBackend）调用riviveOffers方法。SchedulerBackend负责与外界打交道，接受来自Executor的注册，并维护Executor的状态。
5. 调用CoarseGrainedSchedulerBackend的riviveOffers方法对Tasks进行调度
6. reviveOffers方法里向DriverEndpoint发送ReviveOffers消息触发调度任务的执行，DriverEndpoint接受到ReviveOffers消息后接着调用makeOffers方法
7. SchedulerBackend（实际上是CoarseGrainedSchedulerBackend）负责将新创建的Task分发给Executor上执行。

TaskScheduler提交Tasks的源码

TaskSchedulerImpl的submitTasks方法，该方法就是提交Tasks的入口方法。
CoarseGrainedSchedulerBackend的reviveOffers方法，这个方法是继承自SchedulerBackend方法，该方法就是向DriverEndpoint发送ReviveOffers消息，触发任务的调度
CoarseGrainedSchedulerBackend的receive方法接受DriverEndpoint发送过来的ReviveOffers消息后调用makeOffers方法
CoarseGrainedSchedulerBackend的makeOffer是方法，该方法作用就是找出活跃的Executor，然后调用相应的方法将Tasks发送到对应的Executor上执行
TaskSchedulerImpl的resourceOffers方法，该方法的作用就是实现了分配任务的资源算法，把任务调度池中的Tasks取出来，然后为每个Task分配具体的计算资源，将Task所需要的资源和Task封装成TaskDescription并确定Task发送到哪个Executor上去执行，然后开始发送到具体的Executor上去执行，任务的分配原则是尽可能的均匀分配到每个Node上去。
TaskSchedulerImpl的resourceOfferSingleTaskSet方法

TaskSetManager的resourceOffer方法

  /**
    * 该方法的作用就是通过从一个调度任务中找到一个Task，
    * 然后返回一个单独的执行单元（TaskDescription）
    * 
    * execId:提供资源的Executor的Id
    * host:提供资源的主机的Id
    * maxLocality：执行Task的最优的位置
    * 
    * 
    */
def resourceOffer(
      execId: String,
      host: String,
      maxLocality: TaskLocality.TaskLocality)
    : Option[TaskDescription] =
  {
    if (!isZombie) {
        //获取当前时间
      val curTime = clock.getTimeMillis()

      //定义允许的位置策略
      var allowedLocality = maxLocality

      if (maxLocality != TaskLocality.NO_PREF) {
        allowedLocality = getAllowedLocalityLevel(curTime)
        if (allowedLocality > maxLocality) {
          // We're not allowed to search for farther-away tasks
          allowedLocality = maxLocality
        }
      }

      //该方法就是为一个节点列出未运行的Task,
      //它的返回值index：Task索引，taskLocality：task运行的位置，speculative：是否是可推测的
      dequeueTask(execId, host, allowedLocality) match {
        case Some((index, taskLocality, speculative)) => {
          //根据dequeueTask方法返回回来的index来获取对应的Task
          val task = tasks(index)
          val taskId = sched.newTaskId()
          // Do various bookkeeping
          copiesRunning(index) += 1
          val attemptNum = taskAttempts(index).size

          //创建TaskInfo，
          val info = new TaskInfo(taskId, index, attemptNum, curTime,
            execId, host, taskLocality, speculative)
          //将TaskInfo加入到内存中，key为taskId，value为TaskInfo
          taskInfos(taskId) = info
          taskAttempts(index) = info :: taskAttempts(index)
          // Update our locality level for delay scheduling
          // NO_PREF will not affect the variables related to delay scheduling
          if (maxLocality != TaskLocality.NO_PREF) {
              //设置当前位置级别在validLocalityLevels中的索引
            currentLocalityIndex = getLocalityIndex(taskLocality)
            //设置这个级别我们启动Task的延时启动时间
            lastLaunchTime = curTime
          }
          // Serialize and return the task
          val startTime = clock.getTimeMillis()
          //利用序列化器序列化Task以及依赖的静态文件、依赖Jar，得到一个ByteBuffer
          val serializedTask: ByteBuffer = try {
            Task.serializeWithDependencies(task, sched.sc.addedFiles, sched.sc.addedJars, ser)
          } catch {
            //如果序列化失败，就会跑出异常
            case NonFatal(e) =>
              val msg = s"Failed to serialize task $taskId, not attempting to retry it."
              logError(msg, e)
              abort(s"$msg Exception during serialization: $e")
              throw new TaskNotSerializableException(e)
          }
          //判断Task是超出了限制大小（1M），如果超出Task的限制大小，那么就会开启告警行为
          if (serializedTask.limit > TaskSetManager.TASK_SIZE_TO_WARN_KB * 1024 &&
              !emittedTaskSizeWarning) {
            emittedTaskSizeWarning = true
            logWarning(s"Stage ${task.stageId} contains a task of very large size " +
              s"(${serializedTask.limit / 1024} KB). The maximum recommended task size is " +
              s"${TaskSetManager.TASK_SIZE_TO_WARN_KB} KB.")
          }
          //将Task添加到正在运行任务集里，用于跟踪运行中的Task数量，以执行调度策略
          addRunningTask(taskId)

          // We used to log the time it takes to serialize the task, but task size is already
          // a good proxy to task serialization time.
          // val timeTaken = clock.getTime() - startTime
          //定义task的名称
          val taskName = s"task ${info.id} in stage ${taskSet.id}"
          logInfo(s"Starting $taskName (TID $taskId, $host, partition ${task.partitionId}," +
            s"$taskLocality, ${serializedTask.limit} bytes)")

          //通知TaskSetManager该Task已经开始了
          sched.dagScheduler.taskStarted(task, info)

          //返回TaskDescription
          return Some(new TaskDescription(taskId = taskId, attemptNumber = attemptNum, execId,
            taskName, index, serializedTask))
        }
        case _ =>
      }
    }
    None
  }

源码分析到这以后，Task已经分配好了，接下来就是Executor开始启动Task

Spark Core（十四）TaskScheduler提交Tasks的原理与源码

猜你喜欢