【Hadoop八】Yarn的资源调度策略

1. Hadoop的三种调度策略

Hadoop提供了3中作业调用的策略，

FIFO Scheduler
Fair Scheduler
Capacity Scheduler

以上三种调度算法，在Hadoop MR1中就引入了，在Yarn中对它们进行了改进和完善.Fair和Capacity Scheduler用于多用户共享的资源调度

2. 多用户资源共享的调度

支持资源按比例、数目分配
支持层级队列划分分配
支持资源抢占

3. MR1 vs Yarn 资源调度

Hadoop MR1的资源调度是将多维度的资源抽象成一维度的slot，资源调度的过程是将slot分配给Task的过程。何为slot？在Yarn中，Hadoop直接调度的是CPU和内存，丢弃slot概念

4. FIFO Scheduler

FIFO Scheduler可以简单理解为Java的队列，比如LinkedQueue，它是Hadoop早期采用的资源调度策略，它的含义是集群同时只有一个作业运行，这个作业运行完成后，后面提交的作业将按照FIFO的策略依次被调度。FIFO Scheduler以集群资源独占的方式运行作业，这样的好处是一个作业可以充分利用所有的集群资源，但是对于运行时间短，重要性高或者交互式查询类MR作业，时间等待是个需要解决的问题。单一的FIFO调度实现简单，但是对于很多实际的场景并不满足要求。

5. 基于作业优先级的资源调度

在FIFO Scheduler之后，出现了基于作业优先级的资源调度，它可以理解为Java的优先级队列，比如PriorityQueue，每当运行作业时，Hadoop从作业队列中取一个优先级最高的作业执行，作业的优先级设置方法有两种：一个是设置mapred.job.priority属性，一个是调用JobClient的setJobPriority方法。任务优先级可以通过如下常量指定：VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW(优先级从高到低)

6. Fair Scheduler

Fair Scheduler目的是让多个作业提交者共享计算资源，即一个集群中可以同时运行多个作业。

The Fair Scheduler aims to give every user a fair share of the cluster capacity over time. If a single job is running, it gets all of the cluster. As more jobs are submitted, free task slots are given to the jobs in such a way as to give each user a fair share of the cluster. A short job belonging to one user will complete in a reasonable time even while another user’s long job is running, and the long job will still make progress. Jobs are placed in pools, and by default, each user gets her own pool. A user who submits more jobs than a second user will not get any more cluster resources than the second, on average. It is also possible to define custom pools with guaranteed minimum capacities specified in terms of the number of map and reduce slots, and to set weightings for each pool. The Fair Scheduler supports preemption, so if a pool has not received its fair share for a certain period of time, the scheduler will kill tasks in pools running over capacity in order to give more slots to the pool running under capacity. The Fair Scheduler is a “contrib” module. To enable it, place its JAR file on Hadoop’s classpath by copying it from Hadoop’s contrib/fairscheduler directory to the lib directory. Then set the mapred.jobtracker.taskScheduler property to: org.apache.hadoop.mapred.FairScheduler The Fair Scheduler will work without further configuration, but to take full advantage of its features and learn how to configure it (including its web interface), refer to the README file in the src/contrib/fairscheduler directory of the distribution

7. Capacity Scheduler

层级队列式资源调度

The Capacity Scheduler takes a slightly different approach to multiuser scheduling. A cluster is made up of a number of queues (like the Fair Scheduler’s pools), which may be hierarchical (so a queue may be the child of another queue), and each queue has an allocated capacity. This is like the Fair Scheduler, except that within each queue, jobs are scheduled using FIFO scheduling (with priorities). In effect, the Capacity Scheduler Job Scheduling | 207 allows users or organizations (defined using queues) to simulate a separate MapReduce cluster with FIFO scheduling for each user or organization. In contrast, the Fair Scheduler (which actually also supports FIFO job scheduling within pools as an option, making it like the Capacity Scheduler) enforces fair sharing within each pool, so running jobs share the pool’s resources.