Spark job execution principle (b) - division scheduling phase

        Spark division scheduling stage is implemented by DAGScheduler, DAGScheduler will start from the last RDD, according to lineage using RDD of breadth-first algorithm to traverse the entire dependency tree (using a total of twice, once traversed distinguish ResultStage range; another one is to traverse acquisition ShuffleMapStage division basis for dividing each ShuffleMapStage range), thereby dividing scheduling stage, the scheduler stage is divided according to whether the operation of the shuffle.

        The real division stage codes, based on a last instance of the RDD ResultStage object Since handleJobSubmitted method, instantiation, finalRDD used getParentStages find out whether there is dependent ancestors RDD Shuffle operation, if the operation does not exist Shuffle, then this only one job ResultStage; Shuffle operation if present, is in addition to this a job ResultStage, further at least one ShuffleMapStage.

handleJobSubmitted part of the source code:

Private [Scheduler] DEF handleJobSubmitted (the jobId: Int, finalRDD: RDD [_], FUNC: (TaskContext, the Iterator [_]) => _, Partitions: the Array [Int], callSite: the CallSite, listener: JobListener, Properties: the Properties ) {
     // definition of a ResultStage type object, for storing the last Stage DAG divided out 
    Val finalStage: ResultStage = null 
    the try { 
        finalStage = new new ResultStage (finalRDD, FUNC, Partitions, the jobId, callSite) 
    } the catch {...} 
 
    // The final phase is generated based on the job 
    Val the job = new new ActiveJob (jobId, finalStage, callSite, listener, the Properties) 
    clearCacheLocs () 
    
    ... 
 
    // submit jobs
    submitStage(finalStage)
    submitWaitingStages()
}

        The above code when instantiating ResultStage, was introduced to a finalRDD, in fact, this will be passed getParentStagesAndId finalRDD method, call getParentStages In this method, generating a final scheduling stage finalStage (here for the first time using breadth-first algorithm).

Private DEF getParentStages (RDD: RDD [_], firstJobId: Int): List [Stage] = { 
    Val Parents = new new HashSet [Stage]      //    Parents element is a type of Stage HashSet set 
    Val visited = new new HashSet [RDD [ _]]     // for storing already visited RDD 
 
    // store non ShuffleDependency the RDD 
    Val waitingForVisit = new new Stack [RDD [_]] 
 
    // first traversal, according to the type currently depends RDD, different operations 
    def : visit (R & lt RDD [_]) {
         IF (! visited (R & lt)) { 
            visited + = R & lt     // current RDD marked as visited, i.e. stored in the visited HashSet set inside 
            for (DEP <-r.dependencies) {
                 // this is dependent parent RDD RDD type when ShuffleDepedency, needs to move traverse, obtaining ShuffleMapStage 
                Case shufDep: ShuffleDependency [_, _, _] => 
                    Parents + = getShuffleMapStage (shufDep, firstJobId)
                 Case _ => 
                    waitingForVisit.push (dep.rdd) 
            } 
        } 
    } 
 
    waitingFoVisit.push (RDD)     
    // traversing the Stack RDD 
    the while (waitingForVisit.nonEmpty) { 
        Visit (waitingForVisit.pop ()) 
    } 
    parents.toList     // return parents 
    
}

        上面代码显示,如果当前遍历的RDD,其所依赖的父RDD的类型是ShuffleDependency类型时,需要往前遍历,找出所有ShuffleMapStage(或者说找出所有划分ShuffleMapStage的依据——RDD),该算法也是用到了广度优先遍历算法,跟getParentStage类似,具体由getAncestorShuffleDependencies方法实现。

getAncestorShuffleDependencies方法部分源码:

private def getAncestorShuffleDependencies(rdd:RDD[_]):Stack[ShuffleDependency[_, _, _]] = {
    val parents = new Stack[ShuffleDependency[_, _, _]]
    val visited = new HashSet[RDD[_]]
 
    //用于存放非ShuffleDependency类型的RDD
    val waitingForVisit = new Stack[RDD[_]]
    def visit(r:RDD[_]){
        if(!visited(r)){
            visited += r    //标记当前rdd已经被访问过,即加入visited 中
            for(dep <- r.dependencies){
                case shufDep:ShuffleDependency[_, _, _] =>
                    if(!shuffleToMapStage.contains(shufDep.shuffleId)){
                        parents.push(shufDep)    //shuffle依据放进Stack中
                    }
                case _ =>    //不操作
            }
        }
    }
 
    //向前遍历依赖树,获取所有的类型为ShuffleDependency的RDD,作为划分阶段的依据
    waitingForVisit.push(rdd)
    while(waitingForVisit.nonEmpty){
        visit(waitingForVisit.pop())
    }
    parents    //返回parents
}

        getAncestorShuffleDependencies方法其实只是找出了ShuffleDependency类型的RDD,而这些RDD就是划分各个ShuffleMapStage的依据。

        当所有阶段的划分操作完成后,这些阶段就会建立起依赖关系。该依赖关系是通过调度阶段属性parents:List[Stage]来定义,通过该属性可以获取当前阶段所有祖先阶段,可以根据这些信息按顺序提交调度阶段进行运行。

下面是一张Spark调度阶段的Stage划分图:

Spark调度阶段Stage划分流程:

  1. 在SparkContext中触发提交作业时,会调用DAGScheduler的handleJobSubmitted方法,在该方法中会先找到最后一个RDD(即RDD7),并调用getParentStages方法;
  2. 在getParentStages方法中判断RDD7所依赖的父RDD是否存在Shuffle操作,上图RDD6属于ShuffleDepedency类型,则对RDD6进行下一步操作;
  3. 通过getAncestorShuffleDependencies方法,对RDD6进行向前遍历,寻找所有的划分依据,向前遍历,发现只有RDD4,所以RDD3->RDD4被划分成一个ShuffleMapStage0,RDD5->RDD6被划分成ShuffleMapStage1;
  4. 最后,剩下的生成ResultStage2,一共3个阶段,在提交阶段按顺序运行。

Guess you like

Origin www.cnblogs.com/SysoCjs/p/11355800.html