Discussion of YARN resource management framework

1. Introduction

In order to achieve cluster sharing, scalability and reliability of a Hadoop cluster and eliminate the JobTracker performance bottleneck in the early MapReduce framework, the open source community introduced the unified resource management framework YARN.

YARN separates the two main functions of JobTracker (resource management and job scheduling/monitoring). The main method is to create a global ResourceManager (RM) and several application-specific ApplicationMasters (AM).

2. YARN structure

The essence of the YARN hierarchy is ResourceManager. This entity controls the entire cluster and manages the allocation of applications to the underlying computing resources. ResourceManager orchestrates various resource parts (computing, memory, bandwidth, etc.) to the underlying NodeManager (YARN's per-node agent). ResourceManager also works with Application Master to allocate resources and NodeManager to start and monitor their underlying applications. In this context, the Application Master assumes some of the roles of the previous TaskTracker, and the ResourceManager assumes the role of the JobTracker.

The Application Master manages each instance of an application running within YARN. Application Master is responsible for coordinating resources from ResourceManager and monitoring container execution and resource usage (resource allocation of CPU, memory, etc.) through NodeManager.

NodeManager manages each node in a YARN cluster. The NodeManager provides services for each node in the cluster, from overseeing the lifetime management of a container to monitoring resources and tracking node health. MRv1 manages the execution of Map and Reduce tasks through slots, while NodeManager manages abstract containers that represent per-node resources available for use by a specific application.
YARN structure, as shown below:
Insert image description here

name describe
Client YARN Application client, through which users can submit tasks to the ResourceManager and query the running status of the Application.
ResourceManager(RM) Responsible for the unified management and allocation of all resources in the cluster. Receive resource reporting information from each node (NodeManager), and allocate the collected resources to various applications according to certain strategies.
NodeManager(NM) NodeManager (NM) is an agent on each node in YARN. It manages a single computing node in the Hadoop cluster, including maintaining communication with ResourceManger, supervising the life cycle management of Containers, and monitoring the resource usage (memory, CPU, etc.) of each Container. Track node health, manage logs and auxiliary services used by different applications.
ApplicationMaster(AM) That is, the App Mstr in the figure is responsible for all work within the life cycle of an Application. Including: Negotiate with the RM scheduler to obtain resources; further allocate the obtained resources to internal tasks (secondary allocation of resources); communicate with NM to start/stop tasks; monitor the running status of all tasks, and restart when tasks fail to run Request resources for the task to restart the task.
Container Container is a resource abstraction in YARN, which encapsulates multi-dimensional resources on a node, such as memory, CPU, disk, network, etc. (currently only encapsulates memory and CPU). When AM applies for resources from RM, RM returns the information for AM Resources are represented by Containers. YARN allocates a Container to each task, and the task can only use the resources described in the Container.

In YARN, the resource scheduler organizes resources in the form of hierarchical queues, which is conducive to the allocation and sharing of resources among different queues, thereby improving the utilization of cluster resources. As shown in the figure below, the core resource allocation models of Superior Scheduler and Capacity Scheduler are the same.

The scheduler maintains queue information. Users can submit applications to one or more queues. Every time the NM heartbeats, the scheduler will select a queue according to certain rules, then select an application on the queue, and try to allocate resources on this application. If allocation fails due to parameter restrictions, the next application will be selected. After selecting an application, the scheduler will process the application's resource request. The priorities from high to low are: application for local resources, application for the same rack, and application for any machine.
Insert image description here

3. YARN principle

The new Hadoop MapReduce framework is named MRv2 or YARN. YARN mainly includes three parts: ResourceManager, ApplicationMaster and NodeManager.

  • ResourceManager: RM is a global resource manager responsible for resource management and allocation of the entire system. It mainly consists of two components: Scheduler and Applications Manager
    .

  • The scheduler allocates resources in the system to each running application according to capacity, queues and other constraints (such as assigning certain resources to each queue, executing a certain number of jobs at most, etc.). The scheduler only allocates resources based on the resource requirements of each application, and the resource allocation unit is represented by an abstract concept Container. Container is a dynamic resource allocation unit that encapsulates memory, CPU, disk, network and other resources together to limit the amount of resources used by each task. In addition, the scheduler is a pluggable component, and users can design new schedulers according to their own needs. YARN provides a variety of directly available schedulers, such as FairScheduler and Capacity Scheduler.

  • The application manager is responsible for managing all applications in the entire system, including application submission, negotiating resources with the scheduler to start ApplicationMaster, monitoring the running status of ApplicationMaster and restarting it in case of failure, etc.

  • NodeManager: NM is the resource and task manager on each node. On the one hand, it will regularly report to RM the resource usage on this node and the running status of each Container; on the other hand, it receives and processes the container startup/ Stop waiting for requests.

  • ApplicationMaster: AM is responsible for all work within the life cycle of an Application. Includes:
    Negotiating with the RM scheduler to obtain resources.
    The obtained resources are further allocated to internal tasks (secondary allocation of resources).
    Communicate with NM to start/stop tasks.
    Monitor the running status of all tasks, and re-apply resources for the task to restart the task when the task fails.

4. Principle of the open source capacity scheduler Capacity Scheduler

Capacity Scheduler is a multi-user scheduler that divides resources into queues and sets a minimum resource guarantee and upper usage limit for each queue. At the same time, a resource usage limit is also set for each user to prevent resource abuse. When a queue has remaining resources, the remaining resources can be temporarily shared with other queues.

Capacity Scheduler supports multiple queues, configures a certain amount of resources for each queue, and adopts a FIFO scheduling strategy. To prevent applications of the same user from monopolizing queue resources, Capacity Scheduler limits the amount of resources occupied by jobs submitted by the same user. When scheduling, first calculate the resources used by each queue and select the queue that uses the least resources; then select according to job priority and submission time, taking into account user resource limit and memory limit. Capacity Scheduler mainly has the following features:

  • Capacity guaranteed. Cluster administrators can set resource minimum guarantees and resource usage caps for each queue, and these resources are shared by all applications submitted to the queue.
  • flexibility. If there are remaining resources in a queue, they can be temporarily shared with those queues that need the resources. Once a new application is submitted to the queue, the queue occupying the resources will release the resources to the queue. This flexible allocation of resources can significantly improve resource utilization.
  • Multiple tenancies. Supports multi-user shared clusters and simultaneous running of multiple applications. To prevent a single application, user, or queue from monopolizing resources in the cluster, the cluster administrator can add multiple constraints (such as the number of tasks that a single application can run simultaneously).
  • ASD. Each queue has a strict ACL list to specify its access users, and each user can specify which users are allowed to view the running status of their own application or control the application. In addition, the cluster administrator can designate queue administrators and cluster system administrators.
  • Dynamically update configuration files. Cluster administrators can dynamically modify configuration parameters as needed to achieve online cluster management

Each queue in Capacity Scheduler can limit resource usage. Resource allocation between queues is based on usage, giving queues with small capacity a competitive advantage. The overall throughput of the cluster is large, and the delayed scheduling mechanism allows applications to give up cross-machine or cross-rack scheduling and strive for local scheduling.

5. YARN HA principle and implementation plan

The ResourceManager in YARN is responsible for the resource management and task scheduling of the entire cluster. Before Hadoop2.4, the ResourceManager had a single point of failure in the YARN cluster. The YARN high availability solution solves the reliability and fault tolerance issues of this basic service by introducing redundant ResourceManager nodes.
Insert image description here
ResourceManager's high availability solution is implemented by setting up a set of Active/Standby ResourceManager nodes (Figure 1). Similar to the high availability solution of HDFS, only one ResourceManager can be active at any point in time. When the ResourceManager in the Active state fails, failover can be triggered automatically or manually to switch between Active/Standby states.

When automatic failover is not enabled, after the YARN cluster is started, the cluster administrator needs to use the yarn rmadmin command on the command line to manually switch one of the ResourceManagers to the Active state. When planned maintenance needs to be performed or a fault occurs, you need to manually switch the ResourceManager in the Active state to the Standby state, and then switch the other ResourceManager to the Active state.

After automatic failover is turned on, ResourceManager will use the built-in ActiveStandbyElector implemented based on ZooKeeper to determine which ResourceManager should become the Active node. When the ResourceManager in the Active state fails, another ResourceManager will be automatically elected as the Active state to take over the failed node.

When the ResourceManager of the cluster is deployed in HA mode, the "yarn-site.xml" used by the client needs to be configured with all ResourceManager addresses. The client (including ApplicationMaster and NodeManager) will look for the Active ResourceManager in a polling manner, which means that the client needs to provide its own fault-tolerance mechanism. If the ResourceManager in the current Active state cannot be connected, it will continue to use polling to find a new ResourceManager.

After the standby RM is promoted to the master, it can restore the running state of the upper-layer application when the failure occurred (see ResourceManger Restart for details). When ResourceManager Restart is enabled, the restarted ResourceManager can continue to execute by loading the status information of the previous Active ResourceManager and reconstructing the running status by receiving the status information of the containers on all NodeManagers. In this way, the application can avoid the loss of work content by regularly performing checkpoint operations to save the current state information. Status information needs to be accessible to both Active/Standby ResourceManager. The current system provides three methods of sharing state information: sharing FileSystemRMStateStore through the file system), sharing through the LevelDB database (LeveldbRMStateStore), or sharing through ZooKeeper (ZKRMStateStore). Among these three methods, only ZooKeeper sharing supports the Fencing mechanism. Hadoop uses ZooKeeper sharing by default.

For more information about the YARN high availability solution, please refer to the following link:
http://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
https://hadoop. apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

6. The relationship between Yarn and Spark components

Spark's computing scheduling method can be implemented through Yarn's mode. Spark shared Yarn cluster provides rich computing resources and runs tasks in a distributed manner. Spark on Yarn has two modes: Yarn Cluster and Yarn Client.

  • Yarn Cluster mode Spark on yarn-cluster running framework

Insert image description here

  • Spark on yarn-cluster implementation process:
  1. First, the client generates Application information and submits it to ResourceManager.
  2. ResourceManager allocates the first Container (ApplicationMaster) to the Spark Application and starts the Driver on the Container.
  3. ApplicationMaster applies for resources from ResourceManager to run Container.
    ResourceManager allocates Containers to ApplicationMaster, and ApplicationMaster communicates with the relevant NodeManager and starts the Executor on the obtained Container. After the Executor starts, it starts to register with the Driver and apply for Tasks.
  4. Driver assigns Task to Executor for execution.
  5. The Executor executes the Task and reports the running status to the Driver.
  • Yarn Client mode Spark on yarn-client running framework
    Insert image description here
    Spark on yarn-client implementation process:

In the yarn-client mode, the Driver is deployed on the client side and started on the client side. In yarn-client mode, it is not compatible with older versions of clients. It is recommended to use yarn-cluster mode.

  1. The client sends a Spark application submission request to ResourceManager, and ResourceManager returns a response, which contains a variety of information (such as ApplicationId, upper and lower limits of available resource usage, etc.). The client packages all the information required to start ApplicationMaster and submits it to ResourceManager.
  2. After receiving the request, ResourceManager will find a suitable node for ApplicationMaster and start it on that node. ApplicationMaster is a role in Yarn, and the process name in Spark is ExecutorLauncher.
  3. According to the resource requirements of each task, ApplicationMaster can apply to ResourceManager for a series of Containers for running tasks.
  4. When the ApplicationMaster (from the ResourceManager side) receives the list of newly allocated Containers, it will send information to the corresponding NodeManager to start the Containers.
    ResourceManager assigns Container to ApplicationMaster, ApplicationMaster communicates with related NodeManager, starts Executor on obtained Container, after Executor starts, starts to register with Driver and apply for Task.

A running container will not be suspended to release resources.

  1. Driver assigns Task to Executor for execution. The Executor executes the Task and reports the running status to the Driver.

7. The relationship between Yarn and MapReduce

MapReduce is a batch computing framework running on Yarn. MRv1 is the implementation of MapReduce in Hadoop 1.0. It consists of three parts: programming model (old and new programming interface), runtime environment (composed of JobTracker and TaskTracker), and data processing engine (MapTask and ReduceTask). This framework has shortcomings in terms of scalability, fault tolerance (JobTracker single point) and multi-framework support (only supports MapReduce, a computing framework). MRv2 is the implementation of MapReduce in Hadoop 2.0. It reuses the programming model and data processing engine implementation of MRv1 at the source code level, but the runtime environment is composed of Yarn's ResourceManager and ApplicationMaster. Among them, ResourceManager is a brand-new resource management system, while ApplicationMaster is responsible for data segmentation, task division, resource application, task scheduling and fault tolerance of MapReduce jobs.

8. The relationship between Yarn and ZooKeeper

The relationship between ZooKeeper and Yarn
Insert image description here

  1. When the system starts, ResourceManager will try to write election information to ZooKeeper. The first ResourceManager that successfully writes to ZooKeeper is elected as Active ResourceManager, and the other is Standby ResourceManager. Standby ResourceManager regularly goes to ZooKeeper to monitor Active ResourceManager election information.
  2. Active ResourceManager will also create a Statestore directory in ZooKeeper to store Application-related information. When Active ResourceManager fails, Standby ResourceManager will obtain Application-related information from the Statestore directory and restore the data.

9. The relationship between Yarn and SmallFS

SmallFS will regularly run merge, delete, and cleanup tasks, and these tasks are to run MapReduce tasks on Yarn to perform merge, delete, and cleanup operations on file data on HDFS.

10. The relationship between Yarn and Tez

Hive on Tez job information requires Yarn to provide TimeLine Server capabilities to support Hive tasks to display the current and historical status of the application for easy storage and retrieval.

Guess you like

Origin blog.csdn.net/weixin_43114209/article/details/132425732