Spark speculation（推测执行）详解

为什么需要speculation

我们都知道，Spark job中，一个stage什么时候完成，取决于stage下最后一个task的完成时间。task的完成时间也被很多因素影响，比如partition的分配，executor的资源使用情况，host的运行状态，集群网络等等。很多情况下因为运行环境导致的task跑的过慢，让task可以重新跑起来是可以缓解这个问题的，因此Spark就支持了speculation（推测）功能。本文我们将来详细介绍下什么是spark的speculation。

Spark.Speculation

在spark的configuration中，关于speculation的参数如下：

property name	default	meaning
spark.speculation	false	如果设置为"true", 就会对tasks执行推测机制。就是说在一个stage下跑的慢的tasks将有机会被重新启动
spark.speculation.interval	100ms	Spark检测tasks推测机制的间隔时间
spark.speculation.multiplier	1.5	一个task的运行时间是所有task的运行时间中位数的几倍（门限值）才会被认为该task需要重新启动
spark.speculation.quantile	0.75	当一个stage下多少百分比的tasks运行完成才会开启推测机制

我们注意到，推测机制都是基于一个stage下进行的，不同的stage下的task是不会相互影响的，针对的也是正在运行的task。当启动了推测执行后，spark会获取先完成的task结果并且将task标记为完成。

Speculation工作流程

从spark关于speculation的配置参数，我们可以不难判断出spark的推测的工作流程。

在这里插入图片描述

Spark源码

TaskScheduler的启动函数,在sparkContext初始化时被调用。

override def start() {
    backend.start()

    if (!isLocal && conf.getBoolean("spark.speculation", false)) {
      logInfo("Starting speculative execution thread")
      speculationScheduler.scheduleWithFixedDelay(new Runnable {
        override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
          checkSpeculatableTasks()
        }
      }, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
    }
  }

在TaskSetManager中，检测需要启动speculation机制的task

/**
   * Check for tasks to be speculated and return true if there are any. This is called periodically
   * by the TaskScheduler.
   *
   */
override def checkSpeculatableTasks(minTimeToSpeculation: Int): Boolean = {
    // Can't speculate if we only have one task, and no need to speculate if the task set is a
    // zombie.
    if (isZombie || numTasks == 1) {
      return false
    }
    var foundTasks = false
    val minFinishedForSpeculation = (SPECULATION_QUANTILE * numTasks).floor.toInt
    logDebug("Checking for speculative tasks: minFinished = " + minFinishedForSpeculation)

    if (tasksSuccessful >= minFinishedForSpeculation && tasksSuccessful > 0) {
      val time = clock.getTimeMillis()
      val medianDuration = successfulTaskDurations.median
      val threshold = max(SPECULATION_MULTIPLIER * medianDuration, minTimeToSpeculation)
      // TODO: Threshold should also look at standard deviation of task durations and have a lower
      // bound based on that.
      logDebug("Task length threshold for speculation: " + threshold)
      for (tid <- runningTasksSet) {
        val info = taskInfos(tid)
        val index = info.index
        if (!successful(index) && copiesRunning(index) == 1 && info.timeRunning(time) > threshold &&
          !speculatableTasks.contains(index)) {
          logInfo(
            "Marking task %d in stage %s (on %s) as speculatable because it ran more than %.0f ms"
              .format(index, taskSet.id, info.host, threshold))
          speculatableTasks += index
          sched.dagScheduler.speculativeTaskSubmitted(tasks(index))
          foundTasks = true
        }
      }
    }
    foundTasks
  }

DAGScheduler中对task进行re-launch,这里利用了event机制进行。

  /**
   * Called by the TaskSetManager when it decides a speculative task is needed.
   */
  def speculativeTaskSubmitted(task: Task[_]): Unit = {
    eventProcessLoop.post(SpeculativeTaskSubmitted(task))
  }

参考

https://spark.apache.org/docs/latest/configuration.html

pyiran

发布了9 篇原创文章 · 获赞 37 · 访问量 1126

私信关注