Spark relations resources and tasks

Before introducing the Spark of tasks and resources to explain a few terms:

Dirver Program: running Application main function (main function jar package submitted by the user) and create a new instance SparkContext program, called a driver, usually SparkContext Representative driver (driver task).

Cluster Manager: Cluster Manager is an external service cluster resource management. Spark is now the main Standalone, YARN, Mesos3 kinds of cluster resource manager. Spark comes Standalone model can meet most

        Spark cluster computing environments demand for resource management, basic accounting framework sets only run in the cluster before considering the use of YARN and Mesos. Generally speaking, the Spark on YARN or Standalone refers to

        Different cluster resource management (Explorer).

Worker Node: a node in the cluster can run work Application code (computing resources).

Executor: Worker Node on a work Application process started, in the process responsible for the task (Task) is running and is responsible for the data stored in memory or on disk, inside Excutor through multi-threaded (thread pool)

       Specific tasks concurrent processing of the application (running on computing resources, work processes).

       Each Application has separate Executors, between applications and therefore are isolated.

Task: Driver task refers to the unit of work on the Executor. Usually a data processing task a Partition each Partition Block size is usually a block of HDFS (task running in the work process

       Thread).

Application: Spark is to create a user program SparkContext instance of an object, comprising a plurality of Executor (Spark applications running on the cluster) and a Driver Program in the cluster.

Job: Spark and the corresponding action, each action corresponds to a Job will example, would split into a plurality of each Stage Job, a task set comprising a Stage (taskset), for each task in the task set by certain

        Parallel execution (particle size Application task segmentation) is transmitted on a scheduling mechanism to work units (Executor).

 

0. scheduling management resources YARN vs Standalone:

Standalone: ​​In this mode the responsibility of the Master node, Worker node is scheduled to start at the Master node Executor. At this time, the deployment of a cluster of typical Master / Slave architecture.

Spark on YARN: yarn-cluster mode submitted, it will first ResourceManager communication and sends a request to the ResourceManager, a request to start ApplicationMaster, ResourceManager after receiving the request,

     It is assigned a Container, then start ApplicationMaster on a NodeManager. After ApplicationMaster starts, and ResourceManager communication, ApplicationMaster (AM)

     Equivalent Driver. AM find RM, requested container, start Executor, RM it is assigned a number Container, used to start Exectutor. At this time, it IS going to connect to other NM, to start the Executor, NM

     Equivalent Worker. After the start Executor, to register with the AM reverse. Compared with standalone, AM is equivalent to Driver, NM equivalent Worker, RM equivalent to Master. After starting the Executor or NM

     Registration will reverse the AM, the structure behind the process is the same as before, which is submitted yarn-cluster mode.

Reference: https://zhuanlan.zhihu.com/p/61902619

Relations 1. Standalone mode tasks and resources

 From the above, resources in the cluster Spark mainly computing resources in the corresponding Container YARN mode, Standalone mode corresponding lower Worker, Application is the user-developed applications Spark, submitted to

Running on a cluster, the cluster allocates resources to run the start Application, the cluster will run after the resource recovery (Application releases resources); a plurality of Application run simultaneously on the same cluster, Application isolation between, each corresponding to a SparkContext Application the relationship between the SparkContext maintained by the cluster object.

Worker's resources managed by the Master, after task registration Application to Master, resource allocation based on current operating conditions by the Master Worker clusters. Three core object DAGScheduler resource allocation created by SparkContext, TaskScheduler, SchedulerBackend make the appropriate division of tasks and scheduling based on Application. SparkContext create a core object and access computing resources flow as shown below

2. DAGScheduler

DAGSchedule is for the Application of the mission plan.

DAGScheduler is for the Stage high-level scheduler, DAGScheduler the DAG split into many Task, are each a Task Stage, (broad-dependent data generated synchronization Shuffle) Shuffle to build parse Stage boundary reverse resolution; Shuffle each encounter Stage generate new, then one taskset (per Stage is encapsulated into a taskset tasks) form submitted to the underlying task scheduler TaskScheduler. DAGScheduler need to record those RDD is saved to disk, seek to optimize the scheduling Task (such as internal data locality Stage), Shuffle monitor the status of cross-node data, the failure to re-submit the Stage.

DAGScheduler core work is divided for the Stage according Stage division is dependent on the width of RDD, a partition while the parent RDD is partitioned dependent on a plurality of sub RDD called a wide-dependent, the parent partition RDD RDD only one sub-partition is referred to as narrow-dependent rely.

 

Spark Application Action because different starting Job plurality, each of one or more Job Stage composition, the latter depending on previous Stage Stage. Spark was calculated after divided in Stage Job submission process and determining the optimum position of the Task, Stage division; Task optimum position using local data and calculated local data is the data in the current memory. DAGScheduler using the local data of its own RDD getPreferedLocations calculation data, getPreferedLocations apparent bid data locality of each Partition.

 3.TaskScheduler

TaskScheduler specific implementation process Task, but also for the task concerned.

TaskScheduler's core mission is to submit TaskSet cluster operation and report the results.

  a. TaskSet create and maintain a TaskSetManager, and track tasks for the local as well as error messages

  b. When encountered Straggle task will put other nodes to retry

  c. debriefed to DAGScheduler, including the loss in output when Shuffle fetch failed error reports and other information.

TaskSchduler need to determine the use of computing resources Task task that needs to be run to determine the specific Task in which ExecutorBackend calculated in accordance with the principle of locality.

 TaskScheduler from the perspective of considering the specific locality calculate, from the data different from the local level of DAGScheduler considered.

 TaskSchedulerImpl TaskScheduler subclass is determined by the Task task specific operational methods resourceOffers ExecutorBackend, the specific process is as follows:

  1. The re-shuffle all computing resources Random.shuffle method to seek the load balancing calculations;

  2. ExecutorBackend the number of cores of the declared type TaskDescription ArrayBuffer array;

  3. If there is a new ExecutorBackend assigned to our Job, this time calls executorAdded to obtain a new complete computing resources available;

  4. seek local priority highest level;

 

 /**
   * Called by cluster manager to offer resources on slaves. We respond by asking our active task
   * sets for tasks in order of priority. We fill each node with tasks in a round-robin manner so
   * that tasks are balanced across the cluster.
   */
  def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {
    ...
    // Randomly shuffle offers to avoid always placing tasks on the same set of workers.
    val shuffledOffers = Random.shuffle(offers)
    // Build a list of tasks to assign to each worker.
    val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores))
    val availableCpus = shuffledOffers.map(o => o.cores).toArray
    val sortedTaskSets = rootPool.getSortedTaskSetQueue
    for (taskSet <- sortedTaskSets) {
      logDebug("parentName: %s, name: %s, runningTasks: %s".format(
        taskSet.parent.name, taskSet.name, taskSet.runningTasks))
      if (newExecAvail) {
        taskSet.executorAdded()
      }
    }
   //以下代码计算最高级别的优先级本地性 // Take each TaskSet in our scheduling order, and then offer it each node in increasing order // of locality levels so that it gets a chance to launch local tasks on all of them. // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY for (taskSet <- sortedTaskSets) { var launchedAnyTask = false var launchedTaskAtCurrentMaxLocality = false for (currentMaxLocality <- taskSet.myLocalityLevels) { do { launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet( taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks) launchedAnyTask |= launchedTaskAtCurrentMaxLocality } while (launchedTaskAtCurrentMaxLocality) } if (!launchedAnyTask) { taskSet.abortIfCompletelyBlacklisted(hostToExecutors) } } if (tasks.size > 0) { hasLaunchedTask = true } return tasks }

  The final determination of each of ExecutorBackend Task specific operation by calling TaskSetManager Locality Level of resourceOffer;

  6. send tasks launchTasks to ExecutorBackend execution. launchTasks will first be serialized, serialized size can not exceed the default setting 128M, otherwise an error. By parameter    

      spark.rpc.message.maxSize设置。

 4. SchedulerBackend

   SchedulerBackend for resources, so that the interface will be created under the different deployment modes different sub-class object (YarnSchedulerBackend) to resource management, such as management object StandaloneSchedulerBackend is in Standalone mode, responsible for the collection and allocation of resources to the Task use.

  StandaloneSchedulerBackend after receiving TaskSchedulerImpl of submitTasks, calls reviveOffers parent class CoarseGrainedSchedulerBackend the final call makOffers method for allocating resources to perform Task.

  MakOffers method of execution:

  1. First Active state Executor filtered off, and the resources available to build the representative Executor WorkerOffer (constructed here as resources available);

  2. Call TaskSchedulerImpl of resourceOffers two dimensional array TaskDescrition, including Task ID, Executor ID, Task Index and other Task execution information needs;

  3. The callback DriverEndPoint launchTask corresponding to each Task execution LaunchTask Executor send the information Task.

 

    // Make fake resource offers on all executors
    private def makeOffers() {
      // Filter out executors under killing
      val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
      val workOffers = activeExecutors.map { case (id, executorData) =>
        new WorkerOffer(id, executorData.executorHost, executorData.freeCores)
      }.toIndexedSeq
      launchTasks(scheduler.resourceOffers(workOffers))
    }

  

Guess you like

Origin www.cnblogs.com/beichenroot/p/11414173.html