Introduction to Big Data (5) Introduction to YARN and Detailed Workflow

Apache YARN (Yet Another Resource Negotiator, another resource coordinator) is Hadoop's cluster resource management system. The core idea of ​​YARN is to split the functions of resource management and job scheduling/monitoring into separate daemon processes—the ResourceManager is used to manage cluster resources, and the NodeManager is used to start and monitor containers.

1. Analysis of related concepts of YARN

  1. Container (container): It is the abstraction and encapsulation of cluster resources (including memory, CPU, disk, network, etc.). A Container is a group of allocated system resources.
  2. Job / Application (job or application): is a unit of work that needs to be executed: it includes input data, MapReduce programs, and configuration information.
  3. ResourceManager (RM, Resource Manager): It is a global resource manager responsible for the resource management and allocation of the entire system. It mainly includes two components, the Scheduler and the ApplicationManager.
  4. Scheduler: The job of the YARN Scheduler scheduler is to allocate resources for applications according to established policies. Currently, there are three schedulers in YARN: FIFO scheduler, capacity scheduler, and fair scheduler.
  5. ApplicationManager (Application Manager): Responsible for the management of all applications in the system, mainly including application submission, negotiating resources with the scheduler to start ApplicationMaster, monitoring the running status of ApplicationMaster and restarting when it fails, etc.
  6. ApplicationMaster (App Mstr): Whenever a Client submits an Application, a new ApplicationMaster will be created. The ApplicationMaster will apply for container resources with the ResourceManager. After obtaining the resources, it will send the program to be run to the container to start , and then perform distributed computing.
  7. NodeManager (Node Manager): It is the agent of ResourceManager on each machine. Its main job is to manage the container, monitor the resource usage of the container, and provide the usage report of these resources to the ResourceManager/Scheduler regularly, and then the ResourceManager decides what operation (allocation, recycling, etc.) to perform on the resources of the node.

insert image description here

2. Detailed explanation of the process of submitting an Application to YARN

When a user submits an application to Yarn, Yarn will run the program in two stages: the first stage is to start the ApplicationMaster; the second stage is for the ApplicationMaster to create the application, apply for resources for it, and monitor its entire running process until the operation is completed.

  1. First, the client submits the Application to ResourceManger, including the ApplicationMaster program, the command to start the ApplicationMaster, and the user program.
  2. ResourceManager communicates with NodeManager to allocate the first container for the Application. And run the ApplicationMaster corresponding to this application in this container (corresponding to steps 2a and 2b).
  3. When the ApplicationMaster starts, it will register itself with the ResourceManager and maintain a heartbeat connection
  4. Then ApplicationMaster will split the Application into multiple Tasks (such as MapTask and ReduceTask in MapReduce), and then use polling to request and obtain resources from ResourceManager for each Task through RPC requests
  5. ResourceManager returns the Container information requested by ApplicationMaster. The successfully applied Container is initialized by ApplicationMaster. After the startup information of the Container is initialized, the ApplicationMaster communicates with the corresponding NodeManager to request the NodeManager to start the Container.
  6. After the container is started, the ApplicationMaster will maintain communication with the NodeManager to monitor and manage the tasks running on the NodeManager, and the ApplicationMaster will monitor the Container while the Container is running. Container reports its own progress and status information to the corresponding ApplicationMaster through the RPC protocol.
  7. When the application is finished running, the ApplicationMaster logs out of the ResourceManager and closes itself, allowing its Containers to be reclaimed.

The above is the general operation process of an Application.

3. YARN Scheduler scheduler

1. FIFO Scheduler (first-in-first-out scheduler): Put applications into a first-in-first-out queue in the order they are submitted. Then, when resource allocation is performed, resources are allocated to the application according to the first-in-first-out principle. When the resources of the first application are satisfied, resources will be allocated for the next application.

2. Capacity Scheduler (capacity scheduler): Divide the entire cluster resources into multiple queues, each queue is configured with a different capacity (that is, the percentage of resources) according to the demand, and adopts the FIFO scheduling strategy. By allocating capacity to different queues, this ensures that each queue does not take up all the resources of the entire cluster.

As shown in the figure above, we can divide the entire cluster resources into two queues of AB. Among them, A is allocated to 60% of the entire cluster resources, and B is allocated to 40%, and each queue can be further divided into different sub-queues. For example, the B queue continues to be divided into B1 and B2.

 

Note: A single job will not use more resources than its queue capacity. But if the entire queue resources are not enough, and other organization queues have idle resources, then the capacity scheduler will allocate these idle resources to the jobs in the queue. This is called queue elasticity . The resource usage upper limit (percentage) of the entire queue can be limited by maximum-capacity.

3. Fair Scheduler (fair scheduler): Different from the capacity scheduler, the fair scheduler does not occupy certain system resources in advance, but dynamically allocates resources.

As shown in the figure above, when job 1 is submitted, there are no other jobs in the entire cluster, and job 1 will monopolize the resources of the entire cluster. When job 2 is submitted, job 1 will allocate the released resources to the new job, and finally make each task obtain almost the same amount of resources, so as to achieve the goal of fairness.

The fair scheduler can also work among multiple queues, as shown in the figure above, for example, there are two users A and B, who each have a queue. When A starts a job but B has no task to submit, A will get all the cluster resources; when B starts a job, A's task will continue to run, but queue A will slowly release some of its resources, and then the two tasks will each get half of the cluster resources. If B starts the second job at this time and other tasks are still running, it will share the resources of queue B with the first job in queue B, that is, the two jobs in queue B will each use a quarter of the resources of the cluster, while the jobs in queue A will still use half of the resources of the cluster. As a result, the resources of the cluster will eventually be shared equally between the two users.

When a job is submitted when the entire cluster resources are exhausted, it will be in the accepted state and wait for the release of cluster resources. To make the time required for a job from submission to execution predictable, the fair scheduler supports preemption . The so-called preemption is to allow the scheduler to terminate the contains of those queues that occupy more resources than their fair share. Note: Preemption will reduce the efficiency of the entire cluster, because terminated contains need to be re-executed.

4. Example of enterprise queue configuration

 root
--> app(Fair Scheduler: 80%)
    --> etl-monthly (occupies 30% of the resources of the entire cluster, up to 80%, using FIFO strategy) --> etl-daily (occupied 30% of the resources of the entire cluster, up to 100%, using FIFO strategy) --> etl-eco (occupies 20% of the resources of the entire cluster
    , up to 60%, using FIFO strategy) --> default(Fair Schedule ler:
    20
%)

Five, YARN command

Close an Application: yarn application --kill <application_id>, for example yarn application --kill application_1618392236222_129696

See the official documentation for more: Apache Hadoop 3.3.1 – YARN Commands

6. References

Analysis of Hadoop Yarn framework principle - Programmer Sought

Basic concepts of Yarn (1) - RivenDong-CSDN Blog

Guess you like

Origin blog.csdn.net/qq_37771475/article/details/119247800