Spark speculative execution

Spark speculative execution is an optimization technique.

In Spark, speculative execution, that is, Speculative Execution, can be used to identify and restart some slow-running tasks on the Executor of other nodes, and process the same data in parallel. The completed Task is killed, thereby speeding up the task processing speed. It is suitable for scenarios where some Spark tasks are suspended or run slowly, which slows down the running speed of the entire task.

note:

    1. Not all Spark tasks that run slowly can be solved by speculative execution.

    2. Be cautious when using speculative execution. Appropriate scenarios and appropriate parameters are required. Unreasonable parameters may cause a large number of speculative execution tasks to occupy resources.

    3. If Spark Streaming writes to Kafka slowly, if speculative execution is enabled, it may cause data duplication.

     4. The task that is speculated will not be speculated again.

Spark speculative execution parameters

spark.speculation: The default is false. Whether to enable speculative execution.

spark.speculation.interval: The default is 100ms. How often to check the tasks to be speculated to execute.

spark.speculation.multiplier: The default is 1.5. In a stage, tasks whose running time is 1.5 times slower than the median running time of successfully completed tasks may be speculatively executed.

spark.speculation.quantile: The default is 0.75. Inferred quantile. That is, in a stage, at least 75% of the tasks must be completed before speculation begins.

Spark speculative execution source code analysis

Source code analysis

  /**
    * TaskScheduleImpl在启动时,会判断是否启动Task的推测执行。
    */
  override def start() {
    backend.start()
    if (!isLocal && conf.getBoolean("spark.speculation", false)) {
      logInfo("Starting speculative execution thread")
      // scheduleWithFixedDelay 位于`java.util.concurrent.ScheduledExecutorService#scheduleWithFixedDelay`
      // scheduleWithFixedDelay 指的是系统启动等待`第一个SPECULATION_INTERVAL_MS 时间后`,开始执行定时任务,每隔`第二个SPECULATION_INTERVAL_MS 时间`执行一次。
      // SPECULATION_INTERVAL_MS 可通过`spark.speculation.interval`参数设置
      speculationScheduler.scheduleWithFixedDelay(new Runnable {
        override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
          // 检查需要推测执行的Task
          checkSpeculatableTasks()
        }
      }, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
    }
  }

If Spark speculative execution is enabled (that is, the parameter spark.speculation=true is set), and it is not running in Local mode, TaskScheduleImpl will start spark.speculation.interval (that is, the first SPECULATION_INTERVAL_MS) time, and it will every spark.speculation.interval (That is, the second SPECULATION_INTERVAL_MS mentioned above) time to start a thread to check the tasks that need to be speculatively executed.

Click the checkSpeculatableTasks() method to jump to org.apache.spark.scheduler.checkSpeculatableTasks, the following code:
 

 def checkSpeculatableTasks() {
    var shouldRevive = false
    synchronized {
      // MIN_TIME_TO_SPECULATION 在原始副本运行至少这段时间后,才会启动任务的重复副本。 
      shouldRevive = rootPool.checkSpeculatableTasks(MIN_TIME_TO_SPECULATION)
    }
    if (shouldRevive) {
      // 如果有需要推测执行的Task,则SchedulerBackend向ApplicationMaster发送reviveOffers消息,获取集群中可用的executor列表,发起task
      backend.reviveOffers()
    }
  }

As you can see, the internal call of this method is the rootPool.checkSpeculatableTasks(MIN_TIME_TO_SPECULATION)following code:

  override def checkSpeculatableTasks(minTimeToSpeculation: Int): Boolean = {
    var shouldRevive = false
    //schedulableQueue是ConcurrentLinkedQueue[Schedulable]类型,而Schedulable Trait有两种类型的调度实体:Pool、TaskSetManager
    for (schedulable <- schedulableQueue.asScala) {
      shouldRevive |= schedulable.checkSpeculatableTasks(minTimeToSpeculation)
    }
    shouldRevive
  }

As you can see, the schedule.checkSpeculatableTasks(minTimeToSpeculation) method is finally called.

schedulable is an object in schedulableQueue, schedulableQueue is of ConcurrentLinkedQueue[Schedulable] type, and Scheduled Trait has two types of scheduling entities: Pool and TaskSetManager.

By viewing the org.apache.spark.scheduler.TaskSetManager#checkSpeculatableTasks method, you can see the logic of the actual detection of speculative tasks. as follows:
 

//真正检测推测执行Task的逻辑
  override def checkSpeculatableTasks(minTimeToSpeculation: Int): Boolean = {
    // Can't speculate if we only have one task, and no need to speculate if the task set is a
    // zombie or is from a barrier stage.
    if (isZombie || isBarrier || numTasks == 1) {
      return false
    }
    var foundTasks = false
    // minFinishedForSpeculation=SPECULATION_QUANTILE * numTasks
    // SPECULATION_QUANTILE即spark.speculation.quantile
    // numTasks即某个Stage中Taskset的任务总数。
    val minFinishedForSpeculation = (SPECULATION_QUANTILE * numTasks).floor.toInt
    logDebug("Checking for speculative tasks: minFinished = " + minFinishedForSpeculation)

    // 1)已经成功的Task数必须要大于等于`spark.speculation.quantile * numTasks`,才开始处理这个TaskSet
    if (tasksSuccessful >= minFinishedForSpeculation && tasksSuccessful > 0) {
      val time = clock.getTimeMillis()
      // medianDuration: 已经成功的Task的运行时间的中位数
      // threshold=max(SPECULATION_MULTIPLIER * medianDuration, minTimeToSpeculation)
      // SPECULATION_MULTIPLIER:即spark.speculation.multiplier
      val medianDuration = successfulTaskDurations.median
      val threshold = max(SPECULATION_MULTIPLIER * medianDuration, minTimeToSpeculation)
      // TODO: Threshold should also look at standard deviation of task durations and have a lower
      // bound based on that.
      logDebug("Task length threshold for speculation: " + threshold)
      // 2)遍历TaskSet中的每一个Task
      for (tid <- runningTasksSet) {
        val info = taskInfos(tid)
        val index = info.index
        // 3)如果还未运行成功 且 正在执行 且 运行时间已经超过threshold 且 当前不是推测运行的Task
        // 就将该Task取出放到需要推测执行的列表中
        if (!successful(index) && copiesRunning(index) == 1 && info.timeRunning(time) > threshold &&
          !speculatableTasks.contains(index)) {
          logInfo(
            "Marking task %d in stage %s (on %s) as speculatable because it ran more than %.0f ms"
              .format(index, taskSet.id, info.host, threshold))
          speculatableTasks += index
          // 4)最终由DAGScheduler将Task提交到待执行的队列中,后台线程将对提交的任务进行处理
          sched.dagScheduler.speculativeTaskSubmitted(tasks(index))
          foundTasks = true
        }
      }
    }
    foundTasks
  }

The general flow of the detection and speculation task

Spark speculative execution example

Code example

package com.bigdata.spark

import org.apache.spark.TaskContext
import org.apache.spark.sql.SparkSession
import org.slf4j.LoggerFactory

/**
  * Author: Wang Pei
  * License: Copyright(c) Pei.Wang
  * Summary: 
  *   Spark推测执行
  */
object SparkSpeculative {
  def main(args: Array[String]): Unit = {

    @transient lazy val logger = LoggerFactory.getLogger(this.getClass)

    val spark=SparkSession.builder()
      //启用Spark推测执行
      .config("spark.speculation",true)
      .config("spark.speculation.interval",1000)
      .config("spark.speculation.multiplier",1.5)
      .config("spark.speculation.quantile",0.10)
      .getOrCreate()


    logger.info("开始处理.........................................")

     //设置5个并行度,一个Stage中,5个Task同时运行
     //为保证5个Task同时运行,Spark Submit提交任务时给5个核
     //这样,方便观察第4个Task被推测执行
    spark.sparkContext.parallelize(0 to 50,5)
      .foreach(item=>{
        if(item ==38){Thread.sleep(200000)}
        val taskContext = TaskContext.get()
        val stageId = taskContext.stageId()
        val taskAttemptId = taskContext.taskAttemptId()
        logger.info(s"当前Stage:${stageId},Task:${taskAttemptId},打印的数字..............${item}..................")
      })

    logger.info("处理完成.........................................")
  }
}

Task submission

/data/apps/spark-2.4.0-bin-2.7.3.2.6.5.3-10/bin/spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 1g \
    --executor-memory 1g \
    --executor-cores  1 \
    --num-executors  5 \
    --queue offline \
    --name SparkSpeculative \
    --class com.bigdata.spark.SparkSpeculative \
    bigdata_spark.jar

Yarn log view

You can see the following logs on Yarn:

19/03/31 04:21:39 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 318 ms on x.x.x.x (executor 4) (1/5)
19/03/31 04:21:39 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 321 ms on x.x.x.x (executor 2) (2/5)
19/03/31 04:21:39 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 338 ms on x.x.x.x (executor 1) (3/5)
19/03/31 04:21:39 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 327 ms on x.x.x.x (executor 3) (4/5)
#task 3被标记为推测执行
19/03/31 04:21:40 INFO scheduler.TaskSetManager: Marking task 3 in stage 0.0 (on x.x.x.x) as speculatable because it ran more than 486 ms
#启动task 3的推测执行task(taskID=5)
19/03/31 04:21:40 INFO scheduler.TaskSetManager: Starting task 3.1 in stage 0.0 (TID 5, x.x.x.x, executor 3, partition 3, PROCESS_LOCAL, 7855 bytes)
#kill掉task 3的推测执行task(taskID=5),由于原来的task已经成功
19/03/31 04:24:59 INFO scheduler.TaskSetManager: Killing attempt 1 for task 3.1 in stage 0.0 (TID 5) on x.x.x.x as the attempt 0 succeeded on x.x.x.x
19/03/31 04:24:59 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 200311 ms on x.x.x.x (executor 5) (5/5)

Spark WebUI view

You can see the following results on Spark WebUI:

Guess you like

Origin blog.csdn.net/qq_32445015/article/details/115308734