[In-depth introduction to Yarn architecture and implementation] 5-1 Basic framework of Yarn resource scheduler

The resource scheduler is one of the core components in YARN. It is a pluggable service component in the ResourceManager and is responsible for the management and allocation of the entire cluster resources.
Yarn provides three available resource schedulers by default, namely FIFO (First In First Out), Yahoo!'s Capacity Scheduler and Facebook's Fair Scheduler.
This section will focus on the basic framework of the resource scheduler, and the Capacity Scheduler and Fair Scheduler will be introduced in detail in subsequent articles.

1. Basic structure

The resource scheduler is one of the core components, and it is pluggable in Yarn. A set of interface specifications is defined in Yarn to facilitate users to implement their own scheduler. At the same time, Yarn comes with FIFO, CapacitySheduler, FairScheduler Three commonly used resource schedulers.
image.png

1) Resource Scheduling Model

Yarn uses a two-tier resource scheduling model.

  • In the first layer, the resource scheduler in RM allocates resources to each AM (the part processed by Scheduler)
  • In the second layer, the AM further allocates resources to its internal tasks (not the focus of this section)

Yarn's resource allocation process is asynchronous . After the resource scheduler allocates resources to an application, it will not immediately push to the corresponding AM, but temporarily put it in a buffer, waiting for the AM to actively come through periodic heartbeats. Fetch (pull-based communication model)

  • NM reports node information through periodic heartbeat
  • RM returns a heartbeat response to NM, including information such as the list of containers that need to be released
  • The NM information received by RM triggers a NODE_UPDATED event, and then allocates resources on the node to various applications according to a certain strategy, and puts the allocation result into a memory data structure
  • AM sends a heartbeat to RM to obtain the latest allocated container resources
  • AM assigns the received new container to internal tasks

2) Resource representation model

When NM starts, it will register with RM. The registration information includes the total amount of CPU and memory that can be allocated by the node. These two values ​​can be set through configuration options, as follows:

  • yarn.nodemanager.resource.memory-mb: The total amount of physical memory that can be allocated, the default is 8G
  • yarn.nodemanager.vmem-pmem-ratio: The amount of physical memory used by the task corresponds to the maximum available virtual memory. The default value is 2.1, which means that 1M of physical memory is used, and the maximum amount of virtual memory that can be used is 2.1MB.
  • yarn.nodemanager.resource.cpu-vcores: The number of virtual CPUs that can be allocated, the default is 8. In order to divide CPU resources more finely and consider CPU performance differences, YARN allows administrators to divide each physical CPU into several virtual CPUs according to actual needs and CPU performance, and administrators can individually configure available virtual CPUs for each node. The number of CPUs, and when users submit applications, they can also specify the number of virtual CPUs required by each task

Scheduling semantics supported by Yarn :

  • Request a specific amount of resources on a node
  • Request a specific amount of resources on a specific rack
  • Add (or remove) some nodes to the blacklist, and no longer allocate resources on these nodes for yourself
  • request the return of certain resources

Scheduling semantics not supported by Yarn (may be implemented in the future as Yarn continues to iterate):

  • Request a specific amount of resources on any node
  • Request a specific amount of resources on any rack
  • Request a group or groups of resources that match a certain trait
  • Ultra-fine-grained resources. Such as CPU performance requirements, bound CPU, etc.
  • Dynamically adjust Container resources, allowing dynamic adjustment of the amount of Container resources as needed

3) Resource Guarantee Mechanism

When the idle resources of a single node cannot satisfy a container of the application, there are two strategies:

  • Abandon the current node and wait for the next node;
  • Reserve a container application on the current node, and wait until the node has resources to satisfy the reservation first.

YARN adopts the second incremental resource allocation mechanism (when the resources requested by the application cannot be guaranteed temporarily, resources on a node are reserved for the application until the cumulatively released idle resources meet the requirements of the application), this mechanism will cause Wasted, but not starved

4) Hierarchical queue management

Yarn's queues are hierarchical, each queue can contain sub-queues, and users can only submit tasks to leaf queues. The administrator can configure the operating system users and user groups corresponding to each leaf queue, and can also configure the administrator of each queue. Administrators can kill any application in the queue, change the priority of any application, etc.
The name of the queue is used .to connect, for example root.A1, root.A1.B1.

Two, three schedulers

Yarn's resource scheduler is configurable, and there are three default implementations FIFO, CapacityScheduler, FairScheduler.

1) FIFO

FIFO is the simplest scheduling mechanism provided by Hadoop at the beginning of its design: first come, first served.
All tasks are uniformly submitted to a team, and Hadoop runs these jobs sequentially in the order of submission. Only when the resources of the application program that came first are satisfied, start scheduling and allocating resources for the next application program.
advantage:

  • The principle is simple and the implementation is simple. Nor does it require any separate configuration

shortcoming:

  • QoS cannot be provided, and all tasks can only be processed according to the same priority.
  • Cannot accommodate multi-tenant resource management. The large applications that come first fill up the cluster resources, which prevents the programs of other users from being executed in time.
  • Applications run with low concurrency.

二)Capacity Scheduler

Capacity Scheduler Capacity Scheduler is a multi-user scheduler developed by Yahoo!, which divides resources in units of queues.
Each queue can set a certain percentage of resource minimum guarantee and usage limit. Each user can also set a certain resource usage limit to prevent resource abuse. It also supports resource sharing, sharing the remaining resources of the queue with other queues. The configuration file name is capacity-scheduler.xml.
main feature:

  • **Capacity Guarantee:** You can set the resource minimum guarantee (capacity) and resource usage upper limit (maximum-capacity, default 100%) for each queue, and all applications submitted to the queue can share the resources in this queue.
  • **Elastic Scheduling:** If there are remaining or idle resources in the queue, they can be temporarily shared with those queues that need resources. Once the queue has new applications that need resources to run, the resources released by other queues will be returned to the queue , so as to realize elastic and flexible allocation and scheduling resources, and improve the utilization rate of system resources.
  • **Multi-tenant management: **Supports multi-user sharing of cluster resources and simultaneous running of multiple applications. In addition, an upper limit can be set for the amount of resources that can be used by each user (user-limit-factor).
  • **Security isolation:**Each queue sets a strict ACL list (acl_submit_applications) to limit which users or user groups can submit applications in the queue.

三)Fair Scheduler

Fair Scheduler is a multi-user scheduler developed by Facebook. The design goal is to allocate "fair" resources to all applications (the definition of fairness can be set through parameters). Fairness can not only be reflected in the application in the queue, but also work among multiple queues.
In the Fair scheduler, we don't need to occupy certain system resources in advance, and the Fair scheduler will dynamically adjust system resources for all running jobs. As shown in the figure below, when the first big job is submitted, only this job is running, and it has obtained all cluster resources; when the second small job is submitted, the Fair scheduler will allocate half of the resources to this small task , allowing the two tasks to share cluster resources fairly.
Differences from Capacity Scheduler:
image.png

4) Source code inheritance relationship

Look at the inheritance relationship of the scheduler in the following three diagrams. All three Schedulers inherit from AbstractYarnScheduler. This abstract class extends AbstractService implements ResourceScheduler. The inheritance AbstractServicespecification is a service, and the implementation ResourceScheduleris the main function of the scheduler.

There are still some differences between the three, the interfaceFairScheduler is not implemented , and the method is missing; resource preemption is not supported, and resource preemption is supported but the interface .ConfigurablesetConf()FifoSchedulerFairSchedulerPreemptableResourceScheduler
image.png

image.png

image.png

YarnSchedulerIn defines a method that a resource scheduler should implement. AbstractYarnSchedulerMost of the methods are implemented in , if you implement the scheduler yourself, you can inherit this class, and focus on the implementation of resource allocation.

public interface YarnScheduler extends EventHandler<SchedulerEvent> {
    
    
  // 获得一个队列的基本信息
  public QueueInfo getQueueInfo(String queueName, boolean includeChildQueues,
      boolean recursive) throws IOException;

  // 获取集群资源
  public Resource getClusterResource();

  /**
   * AM 和资源调度器之间最主要的一个方法
   * AM 通过该方法更新资源请求、待释放资源列表、黑名单列表增减
   */
  @Public
  @Stable
  Allocation allocate(ApplicationAttemptId appAttemptId,
      List<ResourceRequest> ask, List<ContainerId> release,
      List<String> blacklistAdditions, List<String> blacklistRemovals,
      List<UpdateContainerRequest> increaseRequests,
      List<UpdateContainerRequest> decreaseRequests);

  // 获取节点资源使用情况报告
  public SchedulerNodeReport getNodeReport(NodeId nodeId);

ResourceSchedulerIt is essentially an event processor, which mainly handles 10 kinds of events (CapacityScheduler will also handle several preemption-related events). You can view the event processing logic in the corresponding Scheduler handle()method :

  • NODE_ADDED: Add a node to the cluster
  • NODE_REMOVED: Remove a node from the cluster
  • NODE_RESOURCE_UPDATE: The resources of a node in the cluster have increased
  • NODE_LABELS_UPDATE: update node labels
  • NODE_UPDATE: This event is sent when NM communicates with RM through heartbeat, it will report the resource usage of the node and trigger an allocation operation at the same time.
  • APP_ADDED: Add an Application
  • APP_REMOVED: remove an application
  • APP_ATTEMPT_ADDED: Add an application Attempt
  • APP_ATTEMPT_REMOVED: remove an application attempt
  • CONTAINER_EXPIRED: Recycle a timed-out container

3. Resource Scheduling Dimensions

There are currently two types: DefaultResourceCalculatorand DominantResourceCalculator.

  • DefaultResourceCalculator: Only memory resources are considered
  • DominantResourceCalculator: Consider both memory and CPU resources (more types of resources are supported in subsequent updates, FPGA, GPU, etc.). This algorithm extends the max-min fairness algorithm.
    • In the DRF algorithm, the resource with the largest required share (resource ratio) is called the main resource, and the basic design idea of ​​DRF is to apply the maximum and minimum fairness algorithm to the main resource, and then transform the multi-dimensional resource scheduling problem into a single resource. Scheduling problem, that is, DRF always maximizes the smallest of all primary resources
    • If you are interested, you can go to the source code DominantResourceCalculator#compareto explore the implementation logic
    • Corresponding paper "Dominant Resource Fairness: Fair Allocation of Multiple Resource Types"

(Attention here! Many articles and books write that "YARN resource scheduler adopts DominantResourceCalculator by default", which is not the case!)

  • FifoScheduleris used by default DefaultResourceCalculatorand cannot be changed.
  • CapacitySchedulerIt is determined by configuring parameters capacity-scheduler.xmlin .yarn.scheduler.capacity.resource-calculator
  • FairScheduleris used by default DominantResourceCalculator.

4. Resource preemption model

Here is only a brief introduction to the resource preemption model, and the source code analysis preemption process will be in-depth in the following articles.

  • In the resource scheduler, each queue can set a minimum amount of resources and a maximum amount of resources, where the minimum amount of resources is the amount of resources that each queue needs to guarantee in the case of resource shortage, and the maximum amount of resources is the amount of resources that the queue needs to guarantee in extreme cases. Resource usage that cannot be exceeded
  • In order to improve resource utilization, resource schedulers (including Capacity Scheduler and Fair Scheduler) will temporarily allocate the resources of queues with lighter loads to queues with heavy loads, only when the queues with lighter loads suddenly receive newly submitted applications, Only then does the scheduler further allocate resources that belong to the queue to it.

V. Summary

This article introduces the basic framework of the Yarn resource scheduler, including the basic architecture, and briefly introduces three schedulers implemented by YARN, and introduces the resource scheduling dimension and resource preemption model.
In subsequent articles, we will explore the source code of the three YARN schedulers. See how it implements the corresponding functions step by step in the source code.


Reference article:
"Hadoop Technology Insider: In-depth Analysis of YARN Architecture Design and Implementation Principles" Chapter 6
In-depth Analysis of Yarn Architecture Design and Technology Implementation-Resource Scheduler
Yarn Source Code Analysis 5-Resource Scheduling

Guess you like

Origin blog.csdn.net/shuofxz/article/details/129694896