Spark源码学习(七)- Application 申请资源的过程

背景

         SparkContext 在初始化的过程中,其中schudularBackEnd会向Master发送RegisterApplication的注册信息,注册成功之后,会调用schedula() 方法为application分配cores资源,并通知worker启动executor。


正文

1.起点 - receive()

     因为Master是一个消息循环体,他的receive方法会接收来自client的注册application请求,最后,注册成功之后,会调用schedule() 方法,进行资源的调度和分配,代码如下:

case RegisterApplication(description, driver) =>
      // TODO Prevent repeated registrations from some driver
      if (state == RecoveryState.STANDBY) {
         //master 必须处于alive状态
        // ignore, don't send response
      } else {
        logInfo("Registering app " + description.name)
        val app = createApplication(description, driver)
        //注册
        registerApplication(app)
        logInfo("Registered app " + description.name + " with ID " + app.id)
        //持久化
        persistenceEngine.addApplication(app)
        //返回成功信息
        driver.send(RegisteredApplication(app.id, self))
        //资源调度
        schedule()
}

2. 核心方法 - schedule()

     作用:为driver分配资源、为分配worker资源

     调用时机:新application注册、worker资源变动时候

     注意:此次运行,并没有为driver分配资源,因为注册application的时候driver已经启动了,本次主要是分配worker资源

private def schedule(): Unit = {
    

    //状态检测、省略

    //打乱worker顺序
    val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
    val numWorkersAlive = shuffledAliveWorkers.size
    var curPos = 0


    //为driver分配资源,启动driver, 本次不调用
    for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
      // We assign workers to each waiting driver in a round-robin fashion. For each driver, we
      // start from the last worker that was assigned a driver, and continue onwards until we have
      // explored all alive workers.
      var launched = false
      var numWorkersVisited = 0
      while (numWorkersVisited < numWorkersAlive && !launched) {
        val worker = shuffledAliveWorkers(curPos)
        numWorkersVisited += 1
        if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {

          //满足要求启动deriver
          launchDriver(worker, driver)
          waitingDrivers -= driver
          launched = true
        }
        curPos = (curPos + 1) % numWorkersAlive
      }
    }

    //启动worker核心方法
    startExecutorsOnWorkers()
  }

3. 启动worker核心方法 - startExecutorsOnWorkers()

     schedulaExecutorsOnWorker方法,会根据参数SpreadOutApps参数设定,来决定,是将所有cores分配到一个worker上,还是尽可能的分配到多个worker上,返回每个worker分配的cores数目。

     allocateWorkerResourceToExecutors方法,会使用已经分配好的核心数,跟觉coresPerExecutor参数不同,在Worker上启动Executor,启动的方式为,RPCEndPoint调用,发送消息。

        步骤:

               a.过滤可用Worker

               b.为每个Worker分配核心数cores

               c.按照每个Worker所分配的cores,启动executor

 private def startExecutorsOnWorkers(): Unit = {
   

    for (app <- waitingApps if app.coresLeft > 0) {
      val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
      
      //过滤出满足coresPerExecutor 条件的Worker
      val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
        .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
          worker.coresFree >= coresPerExecutor.getOrElse(1))
        .sortBy(_.coresFree).reverse

      //决定每个Worker,分配集合核心数
      val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

      // Now that we've decided how many cores to allocate on each worker, let's allocate them

      //按照核心数 启动executor
      /*

          此处启动executor 时候,会跟觉coresPerExecutor 不同启动方式不同,
          如果coresPerExecutor定义,则启动多个executor
          如果coresPerExecutor未定义,则会启动一个executor,该executor持有全部的cores
      */
      for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
        allocateWorkerResourceToExecutors(
          app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
      }
    }
  }

4. 启动executor之后

    启动executor之后,就是executorBackEnd和Driver之间的通信了,backend会向driver注册executor,相关信息记录于之前的博文信息中。

总结 

      当有新的Application,或者Worker信息变动的时候,都会导致schedula() 调度资源方法的调用。

      分配cores资源的时候,SpreadOutApps参数会决定,按照计算密集还是数据密集方式来分配cores资源。

      coresPerExecutor 参数,会决定在一个Worker启动几个Executor.


猜你喜欢

转载自blog.csdn.net/u013560925/article/details/80216905
今日推荐