[Hadoop series] (3) Introduction and principle of YARN

YARN

1. YARN concept

YARN (Yet Another Resource Negotiator) is a new component introduced after Hadoop 2.0, which is responsible for the resource scheduling and management of the cluster, and allocates computing resources for MapReduce programs.

2. YARN components

YARN is mainly composed of components such as ResourceManager, NodeManager, ApplicationMaster and Container.

img

  • ResourceManager

As the core component of resource management, it is usually deployed on a single node. Responsible for allocating resources for jobs submitted by clients. The strategy is to provide resource scheduling services in a shared, secure, and multi-tenant manner based on application priorities, queue capacity, and location information.

  • NodeManager

This component is responsible for the management of each node. Including lifecycle management, resource monitoring, and health tracking of all containers in the node.

  • ApplicationMaster

After the client submits a task, YARN is responsible for applying for an ApplicationMaster lightweight thread to ResourceManger, which is responsible for coordinating resources with ResourceManger, monitoring NodeManager resources and running conditions, and being responsible for fault-tolerant processing. According to the different tasks, there are MapReduce ApplicationMaster and Giraph ApplicationMasrer.

  • Container

Conotainer is a resource abstraction in YARN, which encapsulates resources such as memory, disk, and network on nodes.

image-20210626111328942

3. Operating mechanism

According to the above description, RM and NM are more of a manager's role, and it is mainly AM that performs tasks. So here we introduce the running process of MRApplicationMaster in detail.

img

(1) Job submission stage

The client submits the task to the cluster, and YarnRunner applies for an Application to the ResourceManager; the task program copies the resources to HDFS, and applies to run the MRApplicationMaster after submission;

(2) Job initialization phase

The ResourceManager initializes the request as a Task, and the scheduler assigns the container, starts the application manager in the container, and starts an ApplicationMaster;

(3) Task allocation stage

MRApplicationMaster obtains the calculated Spilt information from the client, and creates the corresponding Map task for the Reduce task; the task is executed in the local JVM first, and when the resources are insufficient, the ResourceManager is applied to the ResourceManager through the heartbeat mechanism, and other NodeManagers are designated to receive the tasks and create Containers;

(4) Task execution stage

The NodeManager creates the Container and starts the Container by the Node Manager. Inside the Container, a YarnChild Java application executes map or reduce tasks;

(5) Progress and Status Maintenance Phase

Tasks in YARN return their progress and status to the application manager, and the client mapreduce.client.progressmonitor.pollintervalrequests progress updates from the application manager every second ( ) to display to the user.

(6) Operation cleaning stage

In addition to requesting job progress from the application manager, the client checks for completion of the job by calling waitForCompletion() every 5 minutes. The time interval can mapreduce.client.completion.pollintervalbe set by . After the job completes, the application manager and container clean up the job state. The job information will be stored by the job history server for later verification by the user.

Guess you like

Origin blog.csdn.net/qq_40589204/article/details/118244588