背景
SparkContext 在初始化的过程中,其中schudularBackEnd会向Master发送RegisterApplication的注册信息,注册成功之后,会调用schedula() 方法为application分配cores资源,并通知worker启动executor。
正文
1.起点 - receive()
因为Master是一个消息循环体,他的receive方法会接收来自client的注册application请求,最后,注册成功之后,会调用schedule() 方法,进行资源的调度和分配,代码如下:
case RegisterApplication(description, driver) => // TODO Prevent repeated registrations from some driver if (state == RecoveryState.STANDBY) { //master 必须处于alive状态 // ignore, don't send response } else { logInfo("Registering app " + description.name) val app = createApplication(description, driver) //注册 registerApplication(app) logInfo("Registered app " + description.name + " with ID " + app.id) //持久化 persistenceEngine.addApplication(app) //返回成功信息 driver.send(RegisteredApplication(app.id, self)) //资源调度 schedule() }
2. 核心方法 - schedule()
作用:为driver分配资源、为分配worker资源
调用时机:新application注册、worker资源变动时候
注意:此次运行,并没有为driver分配资源,因为注册application的时候driver已经启动了,本次主要是分配worker资源
private def schedule(): Unit = { //状态检测、省略 //打乱worker顺序 val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) val numWorkersAlive = shuffledAliveWorkers.size var curPos = 0 //为driver分配资源,启动driver, 本次不调用 for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers // We assign workers to each waiting driver in a round-robin fashion. For each driver, we // start from the last worker that was assigned a driver, and continue onwards until we have // explored all alive workers. var launched = false var numWorkersVisited = 0 while (numWorkersVisited < numWorkersAlive && !launched) { val worker = shuffledAliveWorkers(curPos) numWorkersVisited += 1 if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { //满足要求启动deriver launchDriver(worker, driver) waitingDrivers -= driver launched = true } curPos = (curPos + 1) % numWorkersAlive } } //启动worker核心方法 startExecutorsOnWorkers() }
3. 启动worker核心方法 - startExecutorsOnWorkers()
schedulaExecutorsOnWorker方法,会根据参数SpreadOutApps参数设定,来决定,是将所有cores分配到一个worker上,还是尽可能的分配到多个worker上,返回每个worker分配的cores数目。
allocateWorkerResourceToExecutors方法,会使用已经分配好的核心数,跟觉coresPerExecutor参数不同,在Worker上启动Executor,启动的方式为,RPCEndPoint调用,发送消息。
步骤:
a.过滤可用Worker
b.为每个Worker分配核心数cores
c.按照每个Worker所分配的cores,启动executor
private def startExecutorsOnWorkers(): Unit = { for (app <- waitingApps if app.coresLeft > 0) { val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor //过滤出满足coresPerExecutor 条件的Worker val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE) .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB && worker.coresFree >= coresPerExecutor.getOrElse(1)) .sortBy(_.coresFree).reverse //决定每个Worker,分配集合核心数 val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps) // Now that we've decided how many cores to allocate on each worker, let's allocate them //按照核心数 启动executor /* 此处启动executor 时候,会跟觉coresPerExecutor 不同启动方式不同, 如果coresPerExecutor定义,则启动多个executor 如果coresPerExecutor未定义,则会启动一个executor,该executor持有全部的cores */ for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) { allocateWorkerResourceToExecutors( app, assignedCores(pos), coresPerExecutor, usableWorkers(pos)) } } }
4. 启动executor之后
启动executor之后,就是executorBackEnd和Driver之间的通信了,backend会向driver注册executor,相关信息记录于之前的博文信息中。
总结
当有新的Application,或者Worker信息变动的时候,都会导致schedula() 调度资源方法的调用。
分配cores资源的时候,SpreadOutApps参数会决定,按照计算密集还是数据密集方式来分配cores资源。
coresPerExecutor 参数,会决定在一个Worker启动几个Executor.