[spark] Scheduling Mode (FIFO&FAIR)

foreword

The scheduling of spark applications is reflected in two places. The first is the scheduling between Spark applications by Yarn, and the second is the scheduling of multiple TaskSetManagers within the spark application (the same SparkContext). analyze.

Spark has two scheduling modes: FIFO (first in, first out) and FAIR (fair scheduling). The default is FIFO, that is, whoever submits first will execute first, while FAIR supports grouping in the scheduling pool, which can have different weights, and decide who executes first according to weights, resources, etc. The scheduling mode of spark can be set by spark.scheduler.mode.

Scheduling pool initialization

After DAGScheluer divides the job into stages and submits it to TaskScheduler in the form of TaskSet, the implementation class of TaskScheduler will create a TaskSetMagager object for each TaskSet and add the object to the scheduling pool:

schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)

schedulableBuilder is a newTaskSchedulerImpl(sc) in SparkContext that instantiates the schedulableBuilder through the initialize method of scheduler.initialize(backend) when creating TaskSchedulerImpl.

def initialize(backend: SchedulerBackend) {
    this.backend = backend
    // temporarily set rootPool name to empty
    rootPool = new Pool("", schedulingMode, 0, 0)
    schedulableBuilder = {
      schedulingMode match {
        case SchedulingMode.FIFO =>
          new FIFOSchedulableBuilder(rootPool)
        case SchedulingMode.FAIR =>
          new FairSchedulableBuilder(rootPool, conf)
        case _ =>
          throw new IllegalArgumentException(s"Unsupported spark.scheduler.mode: $schedulingMode")
      }
    }
    schedulableBuilder.buildPools()
  }

It can be seen that the program will create different scheduling pools according to the configuration. There are two implementations of schedulerableBuilder, namely FIFOSchedulableBuilder and FairSchedulableBuilder, and then scheduleBuilder.buildPools() is called later. Let's see how both are implemented.

override def buildPools() {
    // nothing
  }

FIFOSchedulableBuilder does nothing.

override def buildPools() {
    var is: Option[InputStream] = None
    try {
      is = Option {
        schedulerAllocFile.map { f =>
          new FileInputStream(f)
        }.getOrElse {
          Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
        }
      }
      //根据配置文件创建buildFairSchedulerPool
      is.foreach { i => buildFairSchedulerPool(i) }
    } finally {
      is.foreach(_.close())
    }

    // finally create "default" pool
    buildDefaultPool()
  }

It can be seen that the buildPools method of FairSchedulableBuilder will first read the configuration file of the FAIR mode, which is located in SPARK_HOME/conf/fairscheduler.xml by default. You can also set a user-defined configuration file through the parameter spark.scheduler.allocation.file. The template is as follows:

<allocations>
  <pool name="production">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="test">
    <schedulingMode>FIFO</schedulingMode>
    <weight>2</weight>
    <minShare>3</minShare>
  </pool>
</allocations>

in:

  • name: The name of the scheduling pool. You can specify a scheduling pool to use according to spark.scheduler.pool in the program. If not specified, use the scheduling pool named default.
  • schedulingMode: scheduling mode
  • weight: weight (weight is 2, the resource allocated is twice the weight of 1), the default is 1
  • minShare: The minimum number of resources (cores) required by the scheduling pool, the default is 0

FAIR can configure multiple scheduling pools, that is, the rootPool is still a group of Pools, and the Pools contain TaskSetMagager.
FairSchedulableBuilder will create a buildFairSchedulerPool based on the configuration file.

private def buildFairSchedulerPool(is: InputStream) {
    val xml = XML.load(is)
    for (poolNode <- (xml \\ POOLS_PROPERTY)) {

      val poolName = (poolNode \ POOL_NAME_PROPERTY).text
      var schedulingMode = DEFAULT_SCHEDULING_MODE
      var minShare = DEFAULT_MINIMUM_SHARE
      var weight = DEFAULT_WEIGHT

      val xmlSchedulingMode = (poolNode \ SCHEDULING_MODE_PROPERTY).text
      if (xmlSchedulingMode != "") {
        try {
          schedulingMode = SchedulingMode.withName(xmlSchedulingMode)
        } catch {
          case e: NoSuchElementException =>
            logWarning(s"Unsupported schedulingMode: $xmlSchedulingMode, " +
              s"using the default schedulingMode: $schedulingMode")
        }
      }

      val xmlMinShare = (poolNode \ MINIMUM_SHARES_PROPERTY).text
      if (xmlMinShare != "") {
        minShare = xmlMinShare.toInt
      }

      val xmlWeight = (poolNode \ WEIGHT_PROPERTY).text
      if (xmlWeight != "") {
        weight = xmlWeight.toInt
      }

      val pool = new Pool(poolName, schedulingMode, minShare, weight)
      rootPool.addSchedulable(pool)
      logInfo("Created pool %s, schedulingMode: %s, minShare: %d, weight: %d".format(
        poolName, schedulingMode, minShare, weight))
    }
  }

A Pool object is instantiated according to each field value (default value if not set) and added to rootPool.

A spark application contains a TaskScheduler, a TaskScheduler contains a unique RootPool, FIFO has only one layer of Pool, including TaskSetMagager, and FARI includes two layers of Pool, RootPool contains sub-Pool, sub-Pool contains TaskSetMagager, RootPool is instantiating ScheduleableBuilder time created.

private def buildDefaultPool() {
    if (rootPool.getSchedulableByName(DEFAULT_POOL_NAME) == null) {
      val pool = new Pool(DEFAULT_POOL_NAME, DEFAULT_SCHEDULING_MODE,
        DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT)
      rootPool.addSchedulable(pool)
      logInfo("Created default pool %s, schedulingMode: %s, minShare: %d, weight: %d".format(
        DEFAULT_POOL_NAME, DEFAULT_SCHEDULING_MODE, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT))
    }
  }

If there is no scheduling pool named default in the scheduling pool created according to the configuration file, a scheduling pool named default will be created with all parameters being default values.

Add TaskSetMagager to scheduling pool

The final implementation of the two scheduling modes is the same, but FAIR will obtain the scheduling pool to be used before adding it. The default is the scheduling pool named default.

override def addSchedulable(schedulable: Schedulable) {
    require(schedulable != null)
    schedulableQueue.add(schedulable)
    schedulableNameToSchedulable.put(schedulable.name, schedulable)
    schedulable.parent = this
  }

When adding a TaskSetMagager, it will be added to the tail of the queue, and the acquisition is obtained from the head. For FIFO, parentPool is RootPool, and FAIR, the parentPool of TaskSetMagager are all child Pools of RootPool.

Scheduling pool sorting algorithm for TaskSetMagager

After TaskScheduler gets the executor resource through SchedulerBackend, it will schedule all TaskSetMagagers. Get the sorted TaskSetMagager through rootPool.getSortedTaskSetQueue.

override def getSortedTaskSetQueue: ArrayBuffer[TaskSetManager] = {
    var sortedTaskSetQueue = new ArrayBuffer[TaskSetManager]
    val sortedSchedulableQueue =
      schedulableQueue.asScala.toSeq.sortWith(taskSetSchedulingAlgorithm.comparator)
    for (schedulable <- sortedSchedulableQueue) {
      sortedTaskSetQueue ++= schedulable.getSortedTaskSetQueue
    }
    sortedTaskSetQueue
  }

It can be seen that the algorithm of the sorting core is in taskSetSchedulingAlgorithm.comparator, and the corresponding implementations of taskSetSchedulingAlgorithm in the two modes are different:

var taskSetSchedulingAlgorithm: SchedulingAlgorithm = {
    schedulingMode match {
      case SchedulingMode.FAIR =>
        new FairSchedulingAlgorithm()
      case SchedulingMode.FIFO =>
        new FIFOSchedulingAlgorithm()
      case _ =>
        val msg = "Unsupported scheduling mode: $schedulingMode. Use FAIR or FIFO instead."
        throw new IllegalArgumentException(msg)
    }
  }

The algorithm class of FIFO mode is FIFOSchedulingAlgorithm, and the algorithm implementation class of FAIR mode is FairSchedulingAlgorithm. Let's look at the implementation of the comparison function in two modes, FIFO:

override def comparator(s1: Schedulable, s2: Schedulable): Boolean = {
    val priority1 = s1.priority
    val priority2 = s2.priority
    var res = math.signum(priority1 - priority2)
    if (res == 0) {
      val stageId1 = s1.stageId
      val stageId2 = s2.stageId
      res = math.signum(stageId1 - stageId2)
    }
    res < 0
  }
  1. First compare the priority. In the FIFO, the priority is actually the Job ID. The earlier the job submitted, the smaller the jobId, the smaller the priority, and the higher the priority.
  2. If the priority is the same, it means that it is the TaskSetMagager in the same job, and the StageId is compared. The smaller the StageId, the higher the priority.

Let's look at the sorting algorithm of FAIR:

override def comparator(s1: Schedulable, s2: Schedulable): Boolean = {
    val minShare1 = s1.minShare
    val minShare2 = s2.minShare
    val runningTasks1 = s1.runningTasks
    val runningTasks2 = s2.runningTasks
    val s1Needy = runningTasks1 < minShare1
    val s2Needy = runningTasks2 < minShare2
    val minShareRatio1 = runningTasks1.toDouble / math.max(minShare1, 1.0)
    val minShareRatio2 = runningTasks2.toDouble / math.max(minShare2, 1.0)
    val taskToWeightRatio1 = runningTasks1.toDouble / s1.weight.toDouble
    val taskToWeightRatio2 = runningTasks2.toDouble / s2.weight.toDouble

    var compare = 0
    if (s1Needy && !s2Needy) {
      return true
    } else if (!s1Needy && s2Needy) {
      return false
    } else if (s1Needy && s2Needy) {
      compare = minShareRatio1.compareTo(minShareRatio2)
    } else {
      compare = taskToWeightRatio1.compareTo(taskToWeightRatio2)
    }
    if (compare < 0) {
      true
    } else if (compare > 0) {
      false
    } else {
      s1.name < s2.name
    }
  }
  1. If the number of tasks running in the scheduling pool is less than minShare, the priority is higher than that of not less than minShare.
  2. If the number of tasks running on both is smaller than minShare, compare the minShare usage rate, and the lower the usage rate, the higher the priority.
  3. If the minShare usage rates of the two are the same, the weight usage rates are compared, and the lower the usage rate, the higher the priority.
  4. If the weights are also the same, the names are compared.

In FAIR mode, you need to sort the sub-Pool first, and then sort the TaskSetMagager in the sub-Pool, because both Pool and TaskSetMagager inherit the Scheduleable trait, and both use the FairSchedulingAlgorithm.FairSchedulingAlgorithm algorithm.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325521426&siteId=291194637