Summarize the three resource schedulers in YARN (in Resource manager)

1. YARN resource scheduling

YARN allocates resources to the application, and the application refers to the application process of the YARN master used to process the job.

(1) The model adopted by MapReduce is that one user job corresponds to one application.
(2) Spark adopts that each workflow of a job or each user's dialogue corresponds to an application. The efficiency is higher than the first one.
(3) Multiple users share a long-running application.

2. FIFO Scheduler (FIFO Scheduler)

The scheduler when Hadoop was originally designed was a single-queue scheduler , which could not make full use of hardware resources. The first-in-first-out queue requests resources for the first application first, and then provides services for the next one in the queue after the first one is satisfied, which is not suitable for shared clusters.

3. Container Scheduler (Capacity Scheduler)

Capacity Scheduler is Yahoo! The developed multi-user scheduler divides resources in units of queues. Each queue can set a certain percentage of resource minimum guarantee and upper limit. At the same time, each user can also set a certain resource upper limit to prevent resource abuse. And when a queue has surplus resources, the remaining resources can be temporarily shared with other queues. In short, Capacity Scheduler mainly has the following characteristics:

  1. Capacity guaranteed . Administrators can set resource minimum guarantees and resource usage caps for each queue, and all applications submitted to that queue share these resources.
  2. flexibility . If there are remaining resources in a queue, they can be temporarily shared with those queues that need resources, and once new applications are submitted to this queue, the resources released by other queues will be returned to this queue. Compared with the HOD scheduler, this flexible allocation of resources can significantly improve resource utilization.
  3. Multiple leases . Support multi-user shared cluster and multi-application running simultaneously. To prevent a single application, user, or queue from monopolizing resources in the cluster, administrators can add multiple constraints (such as the number of tasks that a single application can run at the same time, etc.).
  4. Interface security guarantee . Each queue has a strict ACL list to specify its access users, and each user can specify which users are allowed to view the running status of their own application or control the application (such as killing the application). In addition, administrators can designate queue administrators and cluster system administrators.
  5. Dynamically update configuration files . Administrators can dynamically modify various configuration parameters as needed to realize online cluster management.

The container scheduler allows multiple organizations to share the entire Hadoop cluster. Each organization is configured with a dedicated queue, and each queue allocates a part of the entire cluster resource. The queue uses FIFO queues internally.

Too many jobs in the queue lead to insufficient resources, and capacity scheduling may allocate vacant resources to jobs in the queue. This is called an " elastic queue ". However, the capacity scheduler will not force the queue to release resources. When a queue resource is not enough, it must wait for other queue resources to be released before obtaining resources. You can set a maximum resource usage for the queue, so as not to occupy too many resources and cause other The queue cannot use idle resources, which is where the elastic queue needs to be traded off.
container scheduler.png

4. Fair Scheduler

Fair Scheduler is a multi-user scheduler developed by Facebook . It is similar to Capacity Scheduler. It divides resources in units of queues. Each queue can set a certain percentage of resource minimum guarantee and upper limit. At the same time, each user can also set a certain amount. To prevent resource abuse; when a queue has surplus resources, the remaining resources can be temporarily shared with other queues. Of course, Fair Scheduler also has many differences from Capacity Scheduler, which are mainly reflected in the following aspects:

  1. Resources are shared fairly . Within each queue, Fair Scheduler can choose to allocate resources to applications according to FIFO, Fair or DRF policies. Among them, the Fair strategy is a resource multiplexing method implemented based on the maximum-minimum fairness algorithm9. By default, each queue uses this method to allocate resources internally. This means that if two applications are running simultaneously in a queue, each application will get 1/2 of the resources; if three applications are running simultaneously, each application will get 1/3 of the resources.

  2. Support resource preemption . When there are remaining resources in a queue, the scheduler will share these resources with other queues, and when a new application program is submitted in the queue, the scheduler will reclaim resources for it. In order to reduce unnecessary computing waste as much as possible, the scheduler adopts the strategy of waiting first and then forcing recycling, that is, if there are unreturned resources after waiting for a period of time, resource preemption will be performed: kill resources from those queues that overuse resources. Dead part of the task, and then release resources.

  3. Load balancing . Fair Scheduler provides a load balancing mechanism based on the number of tasks, which distributes tasks in the system to each node as evenly as possible. In addition, users can also design a load balancing mechanism according to their own needs.
    Port scheduling policies can be configured flexibly. Fair Scheduler allows administrators to set scheduling policies individually for each queue (currently supports FIFO, Fair or DRF).

  4. Improve applet response time . Due to the max-min fairness algorithm, small jobs can quickly acquire resources and run to completion.

    Emphasizes 多用户fair use of resources and dynamically adjusts application resource allocation. For example: when a large job is submitted, only this one job is running, and the application will get all the cluster resources at this time; when the second job is submitted, the fair scheduler will allocate half of the resources of the first job to For the second job, there may be a delay because it is waiting for the resources of the first job to be released.

    The fair scheduler organizes application support in queues, and resources between 公平queues 共享.
    Fair Scheduler.png
    As shown in the figure, the resources of each user are divided into half, and the internal resources of each user queue are redistributed.

5. Selection of scheduler

For scenarios with relatively high concurrency requirements, the CPU is relatively large, and a fair scheduler is selected, which is generally for large companies.

If the requirements for concurrency are not high, choose a container scheduler, which is generally for small and medium-sized enterprises.

Guess you like

Origin blog.csdn.net/wilde123/article/details/118943479