[spark-src-core] 8. trivial bug in spark standalone executor assignment

 

   yep from [1] we know that spark will divide jobs into two steps to be executed:a.launches executors and b.assigns tasks to that executors by driver.so how do executors are assigned to workers by master is very important!

  for standalone mode,when we dive into the src in Master#receiveWithLogging() for case RequestSubmitDriver you will figure out it.

 

1.what

  while you step more ,u will see the details in the code path below:

 

/**Compared with mr slots in hadoop, spark's allocation of executors is smarter: the latter is based on the overall requirements of cores and mem for full-cluster allocation,
    * And workers with more resources allocate more exers. Obviously, this is not based on the number of splits like hadoop; it is also smarter than hadoop slots.
   * Schedule executors to be launched on the workers.-note:here will not clear out the assigned app.
   * vip==> spread out purpose:
   * There are two modes of launching executors. The first attempts to spread out an application's
   * executors on as many workers as possible, while the second does the opposite (i.e. launch them
   * on as few workers as possible). The former is usually better for data locality purposes and is
   * the default.<==
   *
   * The number of cores assigned to each executor is configurable. When this is explicitly set,
   * multiple executors from the same application may be launched on the same worker if the worker
   * has enough cores and memory. Otherwise, each executor grabs all the cores available on the
   * worker by default, in which case only one executor may be launched on each worker.
   */
  private def startExecutorsOnWorkers(): Unit = {
    // Right now this is a very simple <<FIFO scheduler>>. We keep trying to fit in the first app
    // in the queue, then the second app, etc.
    if (spreadOutApps) { //- how to present the meaning of 'spread out'? see loop 'while()'
      // Try to spread out each app among all the workers, until it has all its cores
      for (app <- waitingApps if app.coresLeft > 0) { //deep back
        //1 //-workers satisfied the need of a executor;reverse order to balance worker's load
        //-this filters limit thats a worker's mem and free cores must satisfy at least one executor's mem and cpus.
        val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
          .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
            worker.coresFree >= app.desc.coresPerExecutor.getOrElse(1))
          .sortBy(_.coresFree).reverse //-the more free cores of worker the more priority it has
        //-2 balance the resources asked to spread out to cluster as far as possible
        val numUsable = usableWorkers.length
        val assigned = new Array[Int](numUsable) // Number of cores to give on each node
        //-here means if app.coresLeft > sum(workers'cores),more than one exectors will be reassigned in one worker
        // in next round.app.coresLeft can be thinked as the property spark.cores.max per app
        var toAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum) //-can't determine which one is less
        var pos = 0
        ///-spread out the target cpus(spark.cores.max) across cluster for load balance.
        while (toAssign > 0) { //-coresFree is never changed,so one work's cpus will be all assigned as far as possible
          if (usableWorkers(pos).coresFree - assigned(pos) > 0) {//-app.coresLeft is a multiple of one exer core?needless
            toAssign -= 1
            assigned(pos) += 1
          }
          pos = (pos + 1) % numUsable
        }
        //3 Now that we've decided how many cores to give on each node, let's actually give them
        // Allocate exers strictly according to the cpus and mem resources of the worker itself, if not enough executor resource requirements are not allocated
        for (pos <- 0 until numUsable if assigned(pos) > 0) { //Breadth first (horizontal)
          //-worker free mem(mainly) and coresFree and app.coresLeft both will be decreased below
          allocateWorkerResourceToExecutors(app, assigned(pos), usableWorkers(pos))
        }
      }
    } else {
      //-spark.deploy.spreadOut=false will launch 25 executores(ie.50 cores which same as specified)
      // Pack each app into as few workers as possible until we've assigned all its cores. First allocate resources one by one worker. If not enough, then next
      //-worker.coresFree will be decreased in allocateWorkerResourceToExecutors();
      // In fact, only mem is considered each time exers are allocated to a worker, and cpus are considered in the next round of allocation! This means that when stpreadOut=false, a single worker is used.
      // cpus occupied by exers may exceed a single worker quota.
      for (worker <- workers if worker.coresFree > 0 && worker.state == WorkerState.ALIVE) { //广度置后
        for (app <- waitingApps if app.coresLeft > 0) { //depth first (vertical), if a worker resource is not satisfied, enter a worker to continue allocation
          allocateWorkerResourceToExecutors(app, app.coresLeft, worker)
        }
      }
    }
  }

  /**-Generate executors based on worker.memoryFree and parameter corsToAllocate. For spreadOut=true, cores allocation is strictly based on the actual number of workers.
    * assign cores and mem to executor by it's reqeusts(core and mem unit).
    * *For spreadOut=false, in fact, only mem and cpus are not considered when allocating executors here.
    * It is only considered when assigning exers to workers in the next round, see startExecutorsOnWorkers(). But when spreadOut=true, it is allocated strictly according to mem and cpus
   * Allocate a worker's resources to one or more executors.-ie several exers may be run on same worker
   * @param app the info of the application which the executors belong to
   * @param coresToAllocate cores on this worker to be allocated to this application(-total cores to be assigned to this
    *                        worker)
   * @param worker the worker info
   */
  private def allocateWorkerResourceToExecutors(
      app: ApplicationInfo,
      coresToAllocate: Int,
      worker: WorkerInfo): Unit = {
    val memoryPerExecutor = app.desc.memoryPerExecutorMB
    val colorsPerExecutor = app.desc.coresPerExecutor.getOrElse (coresToAllocate)
    var coresLeft = coresToAllocate
    ///-stop whichever meet the cpus or mem conditions
    while (coresLeft >= coresPerExecutor && worker.memoryFree >= memoryPerExecutor) {
      val exec = app.addExecutor(worker, coresPerExecutor) //-here will decrease the app.coresGranted,ie coresLeft
      coresLeft - = coresPerExecutor
      launchExecutor(worker, exec) //-here will decrease the number of free core and mem of worker
      app.state = ApplicationState.RUNNING
    }
  }

   its meaning by below figure:



 

 

2.how about 

   annotation refered from spark src,it said thtat 'spark.cores.max' is the # cores to be allocated to one app as many as possible.that means there will be a computation bug in spark,ie.(spreadOut=true):

case spark.cores.max #workers #worker cores #worker mem coresPerExecutor memPerExecutor result
1 10 10 16 16g 2 2g

failed:no executors be allocated,ie

10/10=1 < 2coresPerExecutor

2 20          

10 executors allocated at one wave,ie

a.20/10=2>=coresPerExecutor,2/2=1

b.10 cores>=2x1

c.16g>=2gx1

3 40           20 executors at one wave
4 40       2 16g 10 exers at one wave,10 exers at othe wave,total is 20
5 40       16 2g similar as above
6 40       20  

failed,

#worker cpus < 20

7 40       2 20g

failed,

#worker mem < 20g

8 15 10 16 16g 2 2g

only 5 executors allocated,ie

15/10=1wave,then15-10=5 

that is only 10 cores to be  assigned.

 

  so from case 1 ,8 we know that the cluster has enough resources to allocate exers but in fact no any executors (or no reasonable # executors) to be launched.then you will see something weird occurs:

 

16/11/18 14:07:10 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks

16/11/18 14:07:25 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/11/18 14:07:40 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

 

 3.workarounds

  a.use a reasonable # of cores ,ie

natural number= cores.max % (#workers x # coresPerExecutor)

  b.appends a embedding code block to check cores.max

  ie checks this # max  whether can be collapsed with prevous wave computations,no matter more or less then coresPerExecutor after assigning cores to workers and before allocating executors.

 

4.conclusion

  no doubt the property 'spark.cores.max' maybe arise certain misunderstands,but u can aovid this case if adopt the solutions above.

  in general speaking this property will let spark more intelligent to allocate executors dynamically compared to other yarn computation framework etc.

 

ref:

[1] [spark-src-core] 4.2 communications b/t certain kernal components

[2] spark调度系列----1. spark stanalone模式下Master对worker上各个executor资源的分配

[3] Spark技术内幕:Executor分配详解

yarn-similar logs when starting up container

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326990416&siteId=291194637