Spark kernel scheduling of DAGScheduler

Executor start over, then it is ready to perform a task on the executor, to perform tasks on task, what we will say next and DAGScheduler the TaskScheduler.

TaskScheduler role is to create it SparkContext scheduled task, namely to accept different tasks from Stage DAGScheduler, and submit these tasks to the cluster

DAGScheduler mainly analytical application submitted by the user, and the establishment of a dependency DAG computing tasks, and then divided into different Stage DAG, where each consists of a set of Stage Task can be performed, performing the same logic Task, except acting on different data, and DAGScheduler achieved under different resource management framework is the same, after DAGScheduler this group Task division is completed, it will be submitted to the Task group TaskScheduler, TaskScheduler by a cluster Manager Worker in a cluster of Executor start on the task

In SparkPi program, use the RDD's action operators, will trigger Spark mandate to reduce operator in SparkPi program, triggering a mission

object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi")
    selection kick = new SparkContext (conf)
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
    val count = spark.parallelize(1 until n, slices).map { i =>
      val x = random * 2 - 1
      and = random val * 2-1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }
}

Look reduce operator to view the trigger process DAGScheduler

In reduce found, in fact, called runJob method of sparkContext

def reduce(f: (T, T) => T): T = withScope {
    val cleanF = sc.clean(f)
    val reducePartition: Iterator[T] => Option[T] = iter => {
      if (iter.hasNext) {
        Some(iter.reduceLeft(cleanF))
      } else {
        None
      }
    }
    var jobResult: Option[T] = None
    val mergeResult = (index: Int, taskResult: Option[T]) => {
      if (taskResult.isDefined) {
        jobResult = jobResult match {
          case Some(value) => Some(f(value, taskResult.get))
          case None => taskResult
        }
      }
    }
    sc.runJob(this, reducePartition, mergeResult)
    jobResult.getOrElse(throw new UnsupportedOperationException("empty collection"))
  }

SparkContext of runJob method, which calls the multi-layer method runJob

def runJob[T, U: ClassTag](rdd: RDD[T], func: Iterator[T] => U): Array[U] = {
    runJob (roads, func, 0 Until rdd.partitions.length)
  }

The first layer runJob

def runJob[T, U: ClassTag](
      eet: eet [T]
      func: Iterator[T] => U,
      partitions: Seq[Int]): Array[U] = {
    val cleanedFunc = clean(func)
    runJob(rdd, (ctx: TaskContext, it: Iterator[T]) => cleanedFunc(it), partitions)
  }

The second layer runJob

def runJob[T, U: ClassTag](
      eet: eet [T]
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int]): Array[U] = {
    val results = new Array[U](partitions.size)
    runJob[T, U](rdd, func, partitions, (index, res) => results(index) = res)
    results
  }

The third layer runJob

def runJob[T, U: ClassTag](
      eet: eet [T]
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      resultHandler: (Int, U) => Unit): Unit = {
    if (stopped.get()) {
      throw new IllegalStateException("SparkContext has been shutdown")
    }
    val callSite = getCallSite
    val cleanedFunc = clean(func)
    logInfo("Starting job: " + callSite.shortForm)
    if (conf.getBoolean("spark.logLineage", false)) {
      logInfo("RDD's recursive dependencies:\n" + rdd.toDebugString)
    }
    dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)
    progressBar.foreach(_.finishAll())
    rdd.doCheckpoint ()
  }

Enter DAGScheduler.scala

runJob method called submitJob method for submitting job, the method returns a JobWaiter, waiting for the completion of the task DAGScheduler

/**
    * Performing an action job and on all the results given RDD transfer function to resultHandler
   */
  def runJob[T, U](
      eet: eet [T]
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      resultHandler: (Int, U) => Unit,
      properties: Properties): Unit = {
    val start = Sistmknnotime
    val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties) 

submitJob method, call eventProcessLoop post JobSubmitted add events to the event queue DAGScheduler

This eventProcessLoop is DAGSchedulerEventProcessLoop, DAGSchedulerEventProcessLoop is to manage the main event of DAGScheduler

eventProcessLoop.post(JobSubmitted(
      jobId, rdd, func2, partitions.toArray, callSite, waiter,
      SerializationUtils.clone(properties)))
private[scheduler] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)
  taskScheduler.setDAGScheduler(this)

JobSubmitted event submission, in fact, called DAGScheduler method of handleJobSubmitted

case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
      dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)

Here, we find the entrance handleJobSubmitted DAGScheduler core scheduling

He made a total of four actions:

1. Use the last rdd trigger job to create finalStage

This talk about stage, stage, there are two, one is ShuffleMapStage, the other is ResultStage

ShuffleMapStage to handle Shuffle

ResultStage Stage is executed last, for finishing tasks

Before ResultStage stage are ShuffleMapStage

2. Create a Job, the last stage of this job with finalStage, is finalStage

3. Job information, added to the buffer memory

4. The fourth step, using the method submitStage submitted finalStage

private[scheduler] def handleJobSubmitted(jobId: Int,
      finalRDD: eet [_],
      func: (TaskContext, Iterator[_]) => _,
      partitions: Array[Int],
      callSite: CallSite,
      listener: JobListener,
      properties: Properties) {

    was final take: ResultStage = null
    try {

      // first step:
      // use the last RDD trigger job to create finalStage
      finalStage = createResultStage(finalRDD, func, partitions, jobId, callSite)
    } catch {
      case e: Exception =>
        logWarning("Creating new stage failed due to exception - job: " + jobId, e)
        listener.jobFailed(e)
        return
    }

      // Step 2: Create a Job with finalStage
      // this job the last stage, is finalStage
    val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
    clearCacheLocs()
    logInfo("Got job %s (%s) with %d output partitions".format(
      job.jobId, callSite.shortForm, partitions.length))
    logInfo("Final stage: " + finalStage + " (" + finalStage.name + ")")
    logInfo("Parents of final stage: " + finalStage.parents)
    logInfo("Missing parents: " + getMissingParentStages(finalStage))

      // third step, the Job-related information, adding memory buffer
    val jobSubmissionTime = clock.getTimeMillis()
    jobIdToActiveJob(jobId) = job
    activeJobs += job
    finalStage.setActiveJob(job)
    val stageIds = jobIdToStageIds(jobId).toArray
    val stageInfos = stageIds.flatMap(id => stageIdToStage.get(id).map(_.latestInfo))
    listenerBus.post(
      SparkListenerJobStart(job.jobId, jobSubmissionTime, stageInfos, properties))
      // The fourth step, using the method submitStage submit finalStage
      // call this method, in fact, will be submitted to the first stage, and put waitingStages queue other stage
    submitStage(finalStage)

  }

submitStage method for submitting stage

1. active job id stage is verified and if present, go to step 2, and if not, terminates the abortStage

2. Before submitting this stage, we must first determine whether the current stage is not waiting / run / fail stage, if not, proceed to step 3, and if so, terminate abortStage

3. Call getMissingParentStages method to get the current stage of all uncommitted parent stage

    If the parent stage stage uncommitted does not exist, call submitMissingTasks method commits the current stage of all task uncommitted

    If the parent Stage uncommitted exist, recursive calls submitStage commit all uncommitted parent stage (that is, if the stage has been uncommitted parent stage has been invoked until the beginning of stage 0), and the current stage waiting to join waitingStages execution queue

private def submitStage(stage: Stage) {
    val jobId = activeJobForStage(stage)
    if (jobId.isDefined) {
      logDebug("submitStage(" + stage + ")")
      if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
        //调用getMissingParentStages获取当前这个stage的父stage
        val missing = getMissingParentStages(stage).sortBy(_.id)
        logDebug("missing: " + missing)
        /*
        这里会反复递归调用,直到最开始的stage,它没有父stage,此时就会提交这个最开始的stage,stage0
        其余的stage都在waitingStages中
         */
        if (missing.isEmpty) {
          logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
          submitMissingTasks(stage, jobId.get)
        } else {
          //如果有父stage
          //递归调用submitStage()方法,去提交父stage
          //这里的递归就是stage划分算法的推动和亮点!
          for (parent <- missing) {
            submitStage(parent)
          }
          //并且将当前的stage添加到waitingStages等待执行的队列中
          waitingStages += stage
        }
      }
    } else {
      abortStage(stage, "No active job for stage " + stage.id, None)
    }
  }

其中,调用的getMissingParentStages方法,就涉及到了stage的划分算法

在划分算法中做了如下操作:

1.创建了一个存放rdd的栈

2.往栈中推入了stage的最后一个rdd

3.如果栈不为空,对这个rdd调用内部定义的visit方法

在visit方法中做了如下操作:

1.遍历传入rdd的依赖

2.如果是宽依赖,调用getOrCreateShuffleMapStage方法,创建一个新的stage

   如果是窄依赖,将rdd放入栈中

3.将新的stage list返回

/*
  获取某个stage的父stage
  这个方法,对于一个stage,如果它的最后一个rdd的所有依赖,都是窄依赖,那么就不会创建任何新的stage
  但只要发现这个stage的rdd宽依赖了某个rdd,那么就用宽依赖的那个rdd,创建一个新的stage
  然后立即将新的stage返回
   */
  private def getMissingParentStages(stage: Stage): List[Stage] = {
    val missing = new HashSet[Stage]
    val visited = new HashSet[RDD[_]]
    val waitingForVisit = new Stack[RDD[_]]
    def visit(rdd: RDD[_]) {
      if (!visited(rdd)) {
        visited += rdd
        val rddHasUncachedPartitions = getCacheLocs(rdd).contains(Nil)
        //遍历RDD的依赖
        if (rddHasUncachedPartitions) {
          for (dep <- rdd.dependencies) {
            dep match {
                //如果是宽依赖
              case shufDep: ShuffleDependency[_, _, _] =>
                // 那么使用宽依赖的那个RDD,使用getOrCreateShuffleMapStage()方法去创建一个stage
                // 默认最后一个stage,不是shuffleMap stage
                // 但是finalStage之前所有的stage,都是shuffleMap Stage
                val mapStage = getOrCreateShuffleMapStage(shufDep, stage.firstJobId)
                if (!mapStage.isAvailable) {
                  missing += mapStage
                }
                //如果是窄依赖,将rdd放入栈中
              case narrowDep: NarrowDependency[_] =>
                waitingForVisit.push(narrowDep.rdd)
            }
          }
        }
      }
    }
    //首先往栈中,推入了stage最后的一个rdd  finalStage
    waitingForVisit.push(stage.rdd)
    //进行while循环
    while (waitingForVisit.nonEmpty) {
      // 对stage的最后一个rdd,调用自己内部定义的visit()方法
      visit(waitingForVisit.pop())
    }
    missing.toList
  }

顺便看一下getOrCreateShuffleMapStage方法,用于创建stage

方法中调用createShuffleMapStage方法,为给定的宽依赖创建ShuffleMapStage

private def getOrCreateShuffleMapStage(
      shuffleDep: ShuffleDependency[_, _, _],
      firstJobId: Int): ShuffleMapStage = {
    shuffleIdToMapStage.get(shuffleDep.shuffleId) match {
      ...
        // 为给定的shuffle依赖创建一个stage
        createShuffleMapStage(shuffleDep, firstJobId)
    }
  }

createShuffleMapStage会创建一个新的ShuffleMapStage

val stage = new ShuffleMapStage(id, rdd, numTasks, parents, jobId, rdd.creationSite, shuffleDep)

stage的创建和划分就完成了。

在Spark Web UI上看到当前的stage如下:

查看Active Stage为1

点击Description,查看stage描述

查看Spark Pi程序生成的DAG图如下:

因为程序中不含有shuffle操作,即没有宽依赖,所以只有一个stage,为finalStage。

DAGScheduler的stage的划分算法很重要,想要很好的掌握spark,要对stage划分算法很清晰,知道自己编写的spark application被划分了几个job,每个job被划分成了几个stage,每个stage包含了哪些代码,定位到代码。

当我们遇到异常时,发现stage运行过慢,或者stage报错,我们能针对那个stage对应的代码去排查问题,或者性能调优。

 

发布了59 篇原创文章 · 获赞 2 · 访问量 2020

Guess you like

Origin blog.csdn.net/zuodaoyong/article/details/103950584