The reason why stage skipped is displayed in SparkUI [Source code analysis]

It often shows that tasks and stages are skipped on the homepage UI of spark, as shown in the following screenshot:
Insert picture description here

This article will explain under what circumstances the stage or task will be displayed as skipped, and whether the spark application execution will cause problems when the stage and task are displayed as skipped?

After the last Task of the ResultStage of the Spark Job is successfully executed, the DAGScheduler.handleTaskCompletion method will send the SparkListenerJobEnd event. The source code is as follows:

private[scheduler] def handleTaskCompletion(event: CompletionEvent) {
    
      
    val task = event.task  
    val stageId = task.stageId  
    val taskType = Utils.getFormattedClassName(task)  
  
    outputCommitCoordinator.taskCompleted(stageId, task.partitionId,  
      event.taskInfo.attempt, event.reason)  
  
    // The success case is dealt with separately below, since we need to compute accumulator  
    // updates before posting.  
    if (event.reason != Success) {
    
      
      val attemptId = stageIdToStage.get(task.stageId).map(_.latestInfo.attemptId).getOrElse(-1)  
      listenerBus.post(SparkListenerTaskEnd(stageId, attemptId, taskType, event.reason,  
        event.taskInfo, event.taskMetrics))  
    }  
  
    if (!stageIdToStage.contains(task.stageId)) {
    
      
      // Skip all the actions if the stage has been cancelled.  
      return  
    }  
  
    val stage = stageIdToStage(task.stageId)  
    event.reason match {
    
      
      case Success =>  
        listenerBus.post(SparkListenerTaskEnd(stageId, stage.latestInfo.attemptId, taskType,  
          event.reason, event.taskInfo, event.taskMetrics))  
        stage.pendingTasks -= task  
        task match {
    
      
          case rt: ResultTask[_, _] =>  
            // Cast to ResultStage here because it's part of the ResultTask  
            // TODO Refactor this out to a function that accepts a ResultStage  
            val resultStage = stage.asInstanceOf[ResultStage]  
            resultStage.resultOfJob match {
    
      
              case Some(job) =>  
                if (!job.finished(rt.outputId)) {
    
      
                  updateAccumulators(event)  
                  job.finished(rt.outputId) = true  
                  job.numFinished += 1  
                  // If the whole job has finished, remove it  
                  if (job.numFinished == job.numPartitions) {
    
    //ResultStage所有任务都执行完毕,发送SparkListenerJobEnd事件  
                    markStageAsFinished(resultStage)  
                    cleanupStateForJobAndIndependentStages(job)  
                    listenerBus.post(  
                      SparkListenerJobEnd(job.jobId, clock.getTimeMillis(), JobSucceeded))  
                  }  
  
                  // taskSucceeded runs some user code that might throw an exception. Make sure  
                  // we are resilient against that.  
                  try {
    
      
                    job.listener.taskSucceeded(rt.outputId, event.result)  
                  } catch {
    
      
                    case e: Exception =>  
                      // TODO: Perhaps we want to mark the resultStage as failed?  
                      job.listener.jobFailed(new SparkDriverExecutionException(e))  
                  }  
                }  
              case None =>  
                logInfo("Ignoring result from " + rt + " because its job has finished")  
            }

The JobProgressListener.onJobEnd method is responsible for handling the SparkListenerJobEnd event, the code is as follows:

override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = synchronized {
    
      
   val jobData = activeJobs.remove(jobEnd.jobId).getOrElse {
    
      
     logWarning(s"Job completed for unknown job ${jobEnd.jobId}")  
     new JobUIData(jobId = jobEnd.jobId)  
   }  
   jobData.completionTime = Option(jobEnd.time).filter(_ >= 0)  
  
   jobData.stageIds.foreach(pendingStages.remove)  
   jobEnd.jobResult match {
    
      
     case JobSucceeded =>  
       completedJobs += jobData  
       trimJobsIfNecessary(completedJobs)  
       jobData.status = JobExecutionStatus.SUCCEEDED  
       numCompletedJobs += 1  
     case JobFailed(exception) =>  
       failedJobs += jobData  
       trimJobsIfNecessary(failedJobs)  
       jobData.status = JobExecutionStatus.FAILED  
       numFailedJobs += 1  
   }  
   for (stageId <- jobData.stageIds) {
    
      
     stageIdToActiveJobIds.get(stageId).foreach {
    
     jobsUsingStage =>  
       jobsUsingStage.remove(jobEnd.jobId)  
       if (jobsUsingStage.isEmpty) {
    
      
         stageIdToActiveJobIds.remove(stageId)  
       }  
       stageIdToInfo.get(stageId).foreach {
    
     stageInfo =>  
         if (stageInfo.submissionTime.isEmpty) {
    
    //Job的Stage没有提交执行,则这个Stage和它对应的Task会标记为skipped stage和skipped task进行统计  
           // if this stage is pending, it won't complete, so mark it as "skipped":  
           skippedStages += stageInfo  
           trimStagesIfNecessary(skippedStages)  
           jobData.numSkippedStages += 1  
           jobData.numSkippedTasks += stageInfo.numTasks  
         }  
       }  
     }  
   }  
 }

StageInfo.submissionTime is set before the Stage is decomposed into TaskSet and the TaskSet is submitted to the TaskSetManager. The source code is as follows:

private def submitMissingTasks(stage: Stage, jobId: Int) {
    
      
   logDebug("submitMissingTasks(" + stage + ")")  
   // Get our pending tasks and remember them in our pendingTasks entry  
   stage.pendingTasks.clear()  
  
  
   // First figure out the indexes of partition ids to compute.  
   //parititionsToCompute是一个List, 表示一个stage需要compute的所有分区的index  
   val partitionsToCompute: Seq[Int] = {
    
      
     stage match {
    
      
       case stage: ShuffleMapStage =>  
         (0 until stage.numPartitions).filter(id => stage.outputLocs(id).isEmpty)  
       case stage: ResultStage =>  
         val job = stage.resultOfJob.get  
         (0 until job.numPartitions).filter(id => !job.finished(id))  
     }  
   }  
  
   val properties = jobIdToActiveJob.get(stage.firstJobId).map(_.properties).orNull  
  
   runningStages += stage  
   // SparkListenerStageSubmitted should be posted before testing whether tasks are  
   // serializable. If tasks are not serializable, a SparkListenerStageCompleted event  
   // will be posted, which should always come after a corresponding SparkListenerStageSubmitted  
   // event.  
   stage.latestInfo = StageInfo.fromStage(stage, Some(partitionsToCompute.size))  
   outputCommitCoordinator.stageStart(stage.id)  
   listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))  
  
   // TODO: Maybe we can keep the taskBinary in Stage to avoid serializing it multiple times.  
   // Broadcasted binary for the task, used to dispatch tasks to executors. Note that we broadcast  
   // the serialized copy of the RDD and for each task we will deserialize it, which means each  
   // task gets a different copy of the RDD. This provides stronger isolation between tasks that  
   // might modify state of objects referenced in their closures. This is necessary in Hadoop  
   // where the JobConf/Configuration object is not thread-safe.  
   var taskBinary: Broadcast[Array[Byte]] = null  
   try {
    
      
     // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).  
     // For ResultTask, serialize and broadcast (rdd, func).  
     val taskBinaryBytes: Array[Byte] = stage match {
    
      
       case stage: ShuffleMapStage =>  
         closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef).array()  
       case stage: ResultStage =>  
         closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func): AnyRef).array()  
     }  
  
     taskBinary = sc.broadcast(taskBinaryBytes)//将任务信息构造成广播变量,广播到每个Executor  
   } catch {
    
      
     // In the case of a failure during serialization, abort the stage.  
     case e: NotSerializableException =>  
       abortStage(stage, "Task not serializable: " + e.toString)  
       runningStages -= stage  
  
       // Abort execution  
       return  
     case NonFatal(e) =>  
       abortStage(stage, s"Task serialization failed: $e\n${e.getStackTraceString}")  
       runningStages -= stage  
       return  
   }  
   //tasks是一个List,它表示一个stage每个task的描述,描述信息为:task所在stage id、task处理的partition、partition所在的主机地址和Executor id  
   val tasks: Seq[Task[_]] = try {
    
      
     stage match {
    
      
       case stage: ShuffleMapStage =>  
         partitionsToCompute.map {
    
     id =>  
           /* 
           * 获取task所在的节点,数据所在的节点优先启动任务处理这些数据,在这里用到ShuffleMapStage. 
           * */  
           val locs = getPreferredLocs(stage.rdd, id)  
           val part = stage.rdd.partitions(id)  
           new ShuffleMapTask(stage.id, taskBinary, part, locs)//taskBinary是广播变量  
         }  
  
       case stage: ResultStage =>  
         val job = stage.resultOfJob.get  
         partitionsToCompute.map {
    
     id =>  
           val p: Int = job.partitions(id)  
           val part = stage.rdd.partitions(p)  
           val locs = getPreferredLocs(stage.rdd, p)  
           new ResultTask(stage.id, taskBinary, part, locs, id)  
         }  
     }  
   } catch {
    
      
     case NonFatal(e) =>  
       abortStage(stage, s"Task creation failed: $e\n${e.getStackTraceString}")  
       runningStages -= stage  
       return  
   }  
  
   if (tasks.size > 0) {
    
      
     logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")")  
     stage.pendingTasks ++= tasks  
     logDebug("New pending tasks: " + stage.pendingTasks)  
     taskScheduler.submitTasks(  
       new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.firstJobId, properties))  
     stage.latestInfo.submissionTime = Some(clock.getTimeMillis())//设置StageInfo的submissionTime成员,表示这个TaskSet会被执行,不会被skipped  
   } else

The Stage of the job is not decomposed into TaskSet and submitted for execution. The Stage and its corresponding Task will be marked as skipped stage and skipped task for statistical display.

Which Stage will not be decomposed into TaskSet for decomposing execution?

When Spark submits a Job, it will send a JobSubmitted event. After DAGScheduler.doOnReceive receives the JobSubmitted event, it will call the DAGScheduler.handleJobSubmitted method to process the task submission.

DAGScheduler.handleJobSubmitted first calls the DAGScheduler.newResultStage method to create the last stage. DAGScheduler.newResultStage will eventually be called to DAGScheduler.registerShuffleDependencies through the following series of function calls. This method adds all the ancestor stages of this RDD to the HashMap of DAGScheduler.jobIdToStageIds. Then get the StageInfo corresponding to each Stage of this job, convert it into a Seq, and send the SparkListenerJobStart event.
DAGScheduler.newResultStage->
DAGScheduler.getParentStagesAndId->
DAGScheduler.getParentStagesAndId->getParentStages
DAGScheduler.getParentStagesAndId->getShuffleMapStage
DAGScheduler.registerShuffleDependencies

DAGScheduler.registerShuffleDependencies first calls DAGScheduler.getAncestorShuffleDependencies to find the rdd dependencies of all ancestors of the current rdd, including parents, grandfathers, and even higher rdd dependencies, and then calls DAGScheduler.newOrUsedShuffleStage to create the corresponding ShuffleMapStage for each ancestor rdd dependency,

private def registerShuffleDependencies(shuffleDep: ShuffleDependency[_, _, _], firstJobId: Int) {
    
      
    val parentsWithNoMapStage = getAncestorShuffleDependencies(shuffleDep.rdd)//获取所有祖宗rdd依赖,包括父辈、爷爷辈等  
    while (parentsWithNoMapStage.nonEmpty) {
    
      
      val currentShufDep = parentsWithNoMapStage.pop()  
      //根据ShuffleDependency和jobid生成Stage,由于是从栈里面弹出,所以最先添加的是Root stage,依次类推,最先添加的Stage shuffleId越小  
      val stage = newOrUsedShuffleStage(currentShufDep, firstJobId)  
      shuffleToMapStage(currentShufDep.shuffleId) = stage  
    }  
  }

DAGScheduler.newOrUsedShuffleStage will call DAGScheduler.newShuffleMapStage to create a stage.
After the DAGScheduler.newShuffleMapStage method creates the stage, call the DAGScheduler.updateJobIdStageIdMaps method to add the newly created
stage.id to DAGScheduler.jobIdToStageIds. The source code is as follows:

private def updateJobIdStageIdMaps(jobId: Int, stage: Stage): Unit = {
    
      
   def updateJobIdStageIdMapsList(stages: List[Stage]) {
    
      
     if (stages.nonEmpty) {
    
      
       val s = stages.head  
       s.jobIds += jobId  
       jobIdToStageIds.getOrElseUpdate(jobId, new HashSet[Int]()) += s.id//将stage id加入到jobIdToStageIds中  
       val parents: List[Stage] = getParentStages(s.rdd, jobId)  
       val parentsWithoutThisJobId = parents.filter {
    
     ! _.jobIds.contains(jobId) }  
       updateJobIdStageIdMapsList(parentsWithoutThisJobId ++ stages.tail)  
     }  
   }  
   updateJobIdStageIdMapsList(List(stage))  
 }

The source code of DAGScheduler.handleJobSubmitted is as follows:

private[scheduler] def handleJobSubmitted(jobId: Int,  
      finalRDD: RDD[_],  
      func: (TaskContext, Iterator[_]) => _,  
      partitions: Array[Int],  
      allowLocal: Boolean,  
      callSite: CallSite,  
      listener: JobListener,  
      properties: Properties) {
    
      
    var finalStage: ResultStage = null  
    try {
    
      
      // New stage creation may throw an exception if, for example, jobs are run on a  
      // HadoopRDD whose underlying HDFS files have been deleted.  
      finalStage = newResultStage(finalRDD, partitions.size, jobId, callSite)//创建ResultStage,在这个方法里面会将这个Job执行过程中,需要可能经历的Stage全部放入到  
    } catch {
    
      
      case e: Exception =>  
        logWarning("Creating new stage failed due to exception - job: " + jobId, e)  
        listener.jobFailed(e)  
        return  
    }  
    if (finalStage != null) {
    
      
      val job = new ActiveJob(jobId, finalStage, func, partitions, callSite, listener, properties)  
      clearCacheLocs()  
      logInfo("Got job %s (%s) with %d output partitions (allowLocal=%s)".format(  
        job.jobId, callSite.shortForm, partitions.length, allowLocal))  
      logInfo("Final stage: " + finalStage + "(" + finalStage.name + ")")  
      logInfo("Parents of final stage: " + finalStage.parents)  
      logInfo("Missing parents: " + getMissingParentStages(finalStage))  
      val shouldRunLocally =  
        localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.length == 1  
      val jobSubmissionTime = clock.getTimeMillis()  
      if (shouldRunLocally) {
    
      
        // Compute very short actions like first() or take() with no parent stages locally.  
        listenerBus.post(  
          SparkListenerJobStart(job.jobId, jobSubmissionTime, Seq.empty, properties))  
        runLocally(job)  
      } else {
    
      
        jobIdToActiveJob(jobId) = job  
        activeJobs += job  
        finalStage.resultOfJob = Some(job)  
        val stageIds = jobIdToStageIds(jobId).toArray//获取一个Job对应的所有的Stage id,Job的所有Stage在执行newResultStage的时候会创建,所以在这里能获取成功  
        val stageInfos = stageIds.flatMap(id => stageIdToStage.get(id).map(_.latestInfo))//获取每个Stage对应的StageInfo  
        listenerBus.post(  
          SparkListenerJobStart(job.jobId, jobSubmissionTime, stageInfos, properties))//发送Job启动事件SparkListenerJobStart  
        submitStage(finalStage)  
      }  
    }  
    submitWaitingStages()  
  }

in conclusion:

JobProgressListener.onJobStart is responsible for receiving and processing SparkListenerJobStart events. It will put all the StageInfo information created by the DAGScheduler.handleJobSubmitted method into the HashMap of JobProgressListener.stageIdToInfo.

So far we can draw a conclusion: In the JobProgressListener.onJobEnd method, the processed obProgressListener.stageIdToInfo information is
generated by executing DAGScheduler.handleJobSubmitted. It has been generated before all the stages corresponding to the job are decomposed into tasks
.

The article can know that when the Stage is decomposed into TaskSet, if an RDD has been Cached to the BlockManager, all ancestor Stages corresponding to this RDD will not be decomposed into TaskSets for execution, so the StageInfo.submissionTime.isEmpty corresponding to these ancestor Stages is Will return true, so these ancestor Stages and their corresponding
tasks will be displayed on Spark ui as the skipped stage. After the execution is completed, JobProgressListener.onStageCompleted will be executed to save the Stage information to JobProgressListener.stageIdToInfo. The source code is as follows:

override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit = synchronized {
    
      
    val stage = stageCompleted.stageInfo  
    stageIdToInfo(stage.stageId) = stage//保存Stage的信息,便于跟踪显示  
    val stageData = stageIdToData.getOrElseUpdate((stage.stageId, stage.attemptId), {
    
      
      logWarning("Stage completed for unknown stage " + stage.stageId)  
      new StageUIData  
    })

After all tasks in the TaskSet corresponding to the Stage are successfully executed, the StageInfo corresponding to the Stage will be fed back to JobProgressListener.stageIdToInfo, so that these tasks will not be displayed as skipped

It is normal for the task to appear skipped. The reason for the skipped is that the data to be calculated has been cached in the memory, and there is no need to repeat the calculation. The appearance of skipped has no effect on the result

Guess you like

Origin blog.csdn.net/qq_32727095/article/details/113740277