Principle hadoop scheduler and application scenarios resolve

Front articles:

To address performance bottlenecks old MapReduce framework fundamentally, to promote more long-term development of the Hadoop framework, version 0.23.0 from the beginning, completely reconstructed Hadoop computational framework, fundamental changes have taken place. The new Hadoop MapReduce framework named Yarn, reconstructed fundamental idea is to separate JobTracker two main functions into separate components, these two functions are computing resource management and task scheduling / monitoring. ResourceManager
global management of all computing applications allocate resources, ApplicationMaster each application is responsible for the corresponding scheduling and coordination, combined with resources obtained from the ResourceManager and NodeManager work together to run and monitoring tasks.

The following is a more detailed explanation of each member functions.

ResourceManager: scheduling resources based on the needs of the application, each application requires different types of resources and therefore need different containers. Resources include: memory, CPU, disk, network, and so on. Explorer provides a plugin scheduling policy, which is responsible for allocating resources to multiple queues and cluster applications. Scheduling plug-in can be based on existing capacity scheduling and fair scheduling model.

NodeManager: proxy each machine frame, the container is executing an application, to monitor resource usage of the application (CPU, memory, hard disk, network) and reported to the scheduler.

ApplicationMaster: scheduler to ask for the appropriate resource container, running tasks, tracking application status and monitor their progress, reason for the failure processing tasks. Each application submitted by a user will be created for a ApplicationMaster. ApplicationMaster depending on the number of applications submitted by the user. First, the application will be submitted to the ResourceManager, and then create a ResourceManager ApplicationMaster for this application. Resource allocation and task execution are carried out around the back of ApplicationMaster.

Text articles:

Then we study together about three scheduler comprises a FIFO scheduler, Capacity dispatcher, Fair scheduler, including.

A, the scheduler FIFO (First In First Out scheduling)
Principle hadoop scheduler and application scenarios resolve
performed during picture shows a schematic view of the FIFO scheduler.
FIFO Scheduler is the simplest and most easily understood scheduler that disadvantage is not suitable for sharing cluster. Large applications may consume all cluster resources, which led to other applications are blocked. In a shared cluster, more suitable Capacity Scheduler or Fair Scheduler, both big task scheduler allow access to certain small tasks and system resources at the same time submitted. As can be seen in FIG execution, in a FIFO scheduler, the task will be blocked little big task.

Two, Capacity Scheduler (pre-assigned capacity scheduling)
Principle hadoop scheduler and application scenarios resolve
performed during picture shows a schematic view of the Capacity Scheduler.
For Capacity scheduler, there is a special queue to run a small task, but as a small task to set up a special queue will pre-empt some cluster resources, which led to the execution time of the big task will lag behind when using FIFO scheduler time.

Three, Fair scheduler (Fair Share Scheduler)
Principle hadoop scheduler and application scenarios resolve
performs the process on the picture shows a schematic Fair scheduler.
In Fair scheduler, we do not need to pre-empt some system resources, Fair scheduler will run for the job dynamic adjustment of all system resources. As shown below, when the first big job submission, only this one job is running, this time it obtained all cluster resources; when the second small job submission, Fair scheduler will allocate resources to this small task half so that these two tasks fair share cluster resources.
Note that, in the Fair scheduler figures submitted from the second task to get the resources there will be some delay, because it needs to wait for the release of the first task occupied Container. Small task execution will release the resources they occupied after the completion of a large task and get all of the system resources. The net effect is Fair Scheduler to obtain a high resource utilization but also to ensure the timely completion of small tasks.

Guess you like

Origin blog.51cto.com/13665344/2413647