A First Look at Spark Task Scheduling

Pre-knowledge

spark task model

  1. job: The invocation of the action triggers the submission of the DAG and the execution of the entire job.
  2. stage: stage is divided by whether shuffle occurs. If shuffle occurs, it is divided into two stages.
  3. taskSet: Each stage corresponds to one taskset. One taskset has multiple tasks, which are determined by the partition data of the RDD. The degree of parallelism is the number of partitions of the respective RDD.
  4. task: The data and processing processes in the same partition in the same stage are regarded as one task. From a horizontal perspective, the task is the same as the number of partitions; from a vertical perspective, a task contains the processing process in one stage, as shown below flatmap, map, reduceBykey in mapstage.

image

spark resource model

image

Executor is a process that actually executes tasks. It has several CPUs and memory and can execute computing tasks in units of threads. It is the smallest unit that can be given by the resource management system.

yarn resources

image

The basic structure of YARN, YARN is mainly composed of several components such as ResourceManager, NodeManager, ApplicationMaster and Container.

ResourceManager是Master上一个独立运行的进程,负责集群统一的资源管理、调度、分配等等;

NodeManager是Slave上一个独立运行的进程,负责上报节点的状态;

App Master和Container是运行在Slave上的组件,Container是yarn中分配资源的一个单位,包涵内存、CPU等等资源,yarn以Container为单位分配资源。

The relationship between spark executor and yarn container

Running Spark Applications on YARN

When running Spark on YARN, each Spark executor runs as a YARN container. In spark on yarn mode, each executor runs as a yarn container.

  • Cluster Deployment Mode
    image

image

two layer model

How does spark's task model match the resource model?

image

As shown in the figure above: the key lies in TaskScheduler and SchedulerBackend, which are used to adapt tasks and executors.

Spark's task model decomposes the submitted job into the smallest task unit task, and the TaskScheduler calls the specific SchedulerBackend (such as yarn) according to the scheduling policy and the resource application situation of the task.

The smallest resource management unit of the SchedulerBackend is the executor. See if the resources of the executeros in the workers are "enough" and "does not match" the task, if it is ok, the task will be officially launched. Note that it is very good to judge whether the resources are "enough or not". The number of CPUs required for each task to start is set in the TaskScheduler. The default is 1, so you only need to do the size judgment of the number of cores and subtract 1 to traverse the allocation. go down. The "character does not match" this thing, depends on the locality settings of each task.

There are five types of task localities, arranged in order of priority: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY. That is, it is best to be in the same process, the next best is on the same node (ie machine), again on the same rack, or whatever. A task has its own locality. What should I do if the desired locality resource does not exist in this resource? spark has a spark.locality.wait parameter, which defaults to 3000ms. For process, node, rack, this time is used as the waiting time for locality resources by default. So once a task needs locality, delay scheduling may be triggered.

SchedulerBackend is in charge of "food", and at the same time, it will periodically "ask" TaskScheduler if there are tasks to run after it is started, that is, it will periodically "ask" TaskScheduler "I have such a margin, do you want it? ", when the TaskScheduler "asks" it in the SchedulerBackend, it will select the TaskSetManager from the scheduling queue to schedule the operation according to the specified scheduling policy .

scheduling strategy

  1. FIFO (default): Whoever submits first will execute first, and the later task needs to wait for the previous task to execute.
  2. FAIR: Supports grouping tasks in the scheduling pool. Different scheduling pools have different weights. Tasks can determine the execution order according to the weight.

references

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325167167&siteId=291194637