HDFS Study Notes (5): Principles of Yarn Architecture

1. Yarn generation background

    Apache Yarn (Yet Another Resource Negotiator) is a Hadoop cluster resource manager system. Yarn was introduced from Hadoop 2. It was originally intended to improve the implementation of MapReduce, but it is versatile and also applicable to other distributed computing models.

1.1 Limitations of MapReduce1

In MapReduce1, there are the following limitations:

1) Poor scalability : In MapReduce1, JobTracker has two functions of resource management and job control at the same time, which has become the biggest bottleneck of the system and seriously restricts the scalability of Hadoop cluster (when there are many tasks, the memory overhead is large, and the upper limit is 4000 nodes).
2) Poor reliability : MapReduce1 adopts the master/salve structure, among which, the master has a single point of failure problem, and once it fails, the entire cluster will be unavailable.
3) Low resource utilization : MapReduce1 adopts a slot-based resource allocation model. A slot is a coarse-grained resource division unit. Usually, a task will not use up the resources corresponding to the slot, and other tasks cannot use these resources. Idle resources. Hadoop1 divides the slots into Map Slot and Redue Slot, and does not allow them to be shared, often resulting in a shortage of slot resources while the other is idle (for example, when a job is just submitted, only the Map Slot will be run, At this time, the Reduce Slot is idle).
4) Multiple computing frameworks are not supported : memory computing frameworks, streaming computing frameworks, iterative computing frameworks, etc. are not supported.
Hadoop1

1.2 Yarn Design Ideas

The basic design idea of ​​Yarn : split the two main functions of JobTracker, namely 资源管理and 作业控制(including job monitoring fault tolerance, etc.), into two independent processes.
    The resource management process has nothing to do with specific applications. It is responsible for the management of the resources (memory, CPU, disk, etc.) of the entire cluster, while the job control process is directly related to the application-related modules, and every job control process is only responsible for managing one job. In this way, by separating the application-related and irrelevant modules in the original JobTracker, it not only reduces the load of JobTracker, but also enables Hadoop to support more computing frameworks.
MapReduce2

Yarn has the following characteristics:
1) General resource management system and scheduling platform, supporting multiple computing frameworks
2) Strong scalability
3) Improve resource utilization
4) Can be built as HA

The difference between Hadoop1.x and Hadoop2.x :
Comparison between hadoop1 and hadoop2

2. The basic structure of Yarn

    Yarn is generally a Master/Slave structure. In the entire resource management framework, the ResourceManager is the Master, and the NodeManager is the Slave. The ResourceManager is responsible for the unified management and scheduling of resources on each NodeManager. When a user submits an application program, it is necessary to provide an ApplicationMaster to track and manage the program, which is responsible for applying for resources from the ResourceManager, and requires the NodeManager to start tasks that can occupy certain resources. Since different ApplicationMasters are assigned to different nodes, they will not affect each other.

Yarn is composed of the following components:
1) ResourceManager
2) NodeManager
3) ApplicationMaster (the following figure shows the ApplicationMaster of the MapReduce and MPI computing frameworks, respectively MR APPMstr and MPI AppMstr)
4) Container

composition structure

2.1、ResourceManager(RM)

    RM is a global resource manager responsible for resource management and allocation of the entire system. It mainly consists of two components:

  • Scheduler
  • Application Manager (Application Manager, AM)

2.1.1, Scheduler (Scheduler)

    The scheduler allocates resources in the system to each running application according to capacity, queues and other constraints (such as assigning certain resources to each queue, executing a certain number of jobs at most, etc.) .

    The scheduler is not engaged in any work related to specific applications, such as not responsible for monitoring or tracking application execution status, nor is it responsible for starting failed tasks caused by application execution failures or hardware failures, which are handled by the application-related ApplicationMaster Finish. The scheduler only allocates resources according to the resource requirements of each application, and the resource allocation unit is identified by an abstract concept "Resource Container" (Container for short). Container is a dynamic resource allocation unit that combines memory, CPU, disk , network and other resources are encapsulated together, thereby limiting the amount of resources used by each task.
    The scheduler is a pluggable component. Users can design new schedulers according to their needs. Yarn provides a variety of directly available schedulers.

FIFO Scheduler : First in, first out, regardless of job priority and scope, suitable for low-load clusters.
Capacity Scheduler : Divide resources into multiple queues, allow shared clusters, and guarantee the use of minimum resources for each queue.
Fair Scheduler : A fair way to allocate resources to applications so that all applications get the same share of resources over time on average.

2.1.2, Application Manager (Application Manager)

    The application manager is responsible for managing all applications in the entire system, including application submission, negotiating resources with the scheduler to start the ApplicationMaster, monitoring the running status of the ApplicationMaster and restarting it when it fails, etc.

2.2、ApplicationMaster(AM)

    Each application submitted by the user contains an AM, the main functions include:

  1. Negotiate with the RM scheduler for resources (resources are represented by Containers).
  2. Further assign the resulting tasks to internal tasks
  3. Communicate with NM to start/stop tasks
  4. Monitor the running status of all tasks, and re-apply for resources for the task to restart the task when the task fails to run

2.3、NodeManager(NM)

    NM is the resource and task manager on each node. On the one hand, it will regularly report the resource usage on this node and the running status of each Container to RM; on the other hand, it receives and processes the container startup from AM /stop and other various requests.

2.4、Container

    Container is a resource abstraction in Yarn. It encapsulates multi-dimensional resources on a node, such as memory, CPU, disk, network, etc. When AM applies for resources from RM, the resources returned by RM to AM are represented by Container . Yarn will assign a Container to each task, and the task can only use the resources described in the Container. Container is a dynamic resource division unit, which is dynamically generated according to the needs of the application.
    Currently, Yarn only supports CPU and memory resources, and uses the lightweight resource isolation mechanism Cgroups for resource isolation.

3. Yarn communication mechanism

    RPC 协议It is the main protocol to connect various components. There is only one RPC protocol between any two components that need to communicate. The RPC protocol is divided into Client and Server. Client always actively connects to Server. Yarn 采用的是(pull-based)通信模型.
communication mechanism

  • The agreement between JobClient (job submission client) and RM—ApplicationClientProtocol : JobClient submits applications, queries application status, etc. through this RPC protocol.
  • Communication protocol between Admin (administrator) and RM—ResourceManagerAdministrationProtocol : Admin updates system configuration files through this RPC protocol, such as node black and white lists, user queue permissions, etc.
  • Agreement between AM and RM—ApplicationMasterProtocol : AM registers and cancels itself with RM through this RPC protocol, and applies for resources for each task.
  • The protocol between AM and NM—ContainerManagementProtocol : Through this RPC, AM requests NM to start or stop Containers, and obtain information such as the usage status of each Container.
  • The protocol between NM and RM—ResourceTracker : NM registers with RM through this RPC protocol, and regularly sends heartbeat information to report the resource usage of the current node and the running status of the Container

4. Yarn workflow

Yarn will run the application in two phases:
the first phase: start ApplicationMaster
the second phase: create the application by ApplicationMaster, apply for resources for it, and monitor its entire running process until the operation is completed
process
:

  1. Users submit applications to Yarn, including MRAppMaster programs, commands to start MRAppMaster, user programs, and so on. (The MRAppMstr program is generated on the client side. In this step, the application program has been submitted to the Application Manager module of RM for processing, but at this time, the jar package, environment variables, and slice information required to run the program are submitted to HDFS. Submit and execute on the node when MRAPPMstr returns an available Container. It can be understood as lazy processing, reducing resource occupation, and submitting what is needed)
  2. ResourceManager allocates the first Container for the application, and communicates with the corresponding Node-Manager, asking it to start the MRAppMaster of the application in this Container. (APPMaster is the vanguard, and the operation at this time is handled by RM's ApplicationManager)
  3. MRAppMaster first registers with ResourceManager, so that users can directly view the running status of the application through ResourceManage, then it will apply for resources for each task, and monitor its running status until the entire application is running.
  4. MRAppMaster applies for and receives resources from ResourceManager through the RPC protocol in a polling manner.
  5. Once MRAppMaster has applied for resources, it will communicate with the corresponding NodeManager and ask it to start the task. (Understanding the dynamic changes of resources of different nodes in the cluster, available Containers and different Tasks can run on different nodes)
  6. After NodeManager has set up the running environment for the task (including environment variables, JAR packages, binary programs, etc.), write the task start command into a script, and start the task by running the script. (At this time, the client actually uploads the running resources such as the Jar package required by the specific Task, and at the same time, the NodeManager runs the corresponding Task task by calling the resource);
  7. Each task reports its status and progress to MRAppMaster through an RPC protocol, so that MRApplicationMaster can keep track of the running status of each task, so that the task can be restarted when the task fails. During the running of the application, the user can query the current running status of the application to MRAppMaster through RPC at any time. Steps 4~7 are repeated .
  8. After the application has finished running, the MRAppMaster logs out of the ResourceManager and shuts itself down.

Summary:
Yarn can be regarded as a cloud operating system, which is responsible for starting the ApplicationMaster (equivalent to the main thread) for the application, and then the ApplicationMaster is responsible for data segmentation, task allocation, startup and monitoring, etc. Task (equivalent to child thread) is only responsible for its own computing tasks. When all task calculations are completed, ApplicationMaster considers that the application is complete and then exits.

5. Run the Yarn instance

Instance running

  • Step1: The client program is submitted, and a YarnRunner is obtained to apply for an Application from RM.
  • Step2: RM returns information such as the resource path of the application to YARNRunner.
  • step3: RM returns the resource path of the application to YarnRunner.
  • Step4: The client program submits the resources required for running to Yarn. The submitted content includes: MRAppMaster running program, MRAppMaster startup script program, user program (real MapReduce processing program), etc. RM internally manages ApplicationManager and ResourceManager management objects respectively, and is responsible for interacting with MRAppMaster and managing Resource resources respectively.
  • Step5: The resources are submitted, and the application is placed in the scheduler as a Task.
  • Step6: RM selects an idle NodeManager (NM for short) to allocate the first Container for the submitted program, and communicates with the corresponding NM, asking NM to run MRAppMaster in the current container. (MRAppMaster is responsible for the running status and progress monitoring and scheduling of this program, etc.)
  • step7: After MRAppMaster starts, register itself with RM.
  • Step8: Then apply for and receive resources from RM for its own internal tasks through the RPC protocol by polling, copy a copy of the relevant information required by the Job through HDFS, and apply to RM for resources to run tasks based on the information.
  • Step9: MRAppMaster applies to RM for the resources required by MapTask, RM assigns the corresponding NM information to MRAppMaster, MRAppMaster communicates with the corresponding NM, and sends the corresponding program script to the NM.
  • Step10: RM assigns the corresponding NM information to MRAppMaster.
  • Step11: MRAppMaster communicates with the corresponding NM, and sends the corresponding program script to the NM. (NM will set up the corresponding running environment for the task, including environment variables, JAR packages, binary programs, etc.). NM accepts this script and starts launching the corresponding tasks. Start the corresponding MapTask task, and MapTask will partition and sort the data.
  • Step12: MRAppMaster After all MapTask tasks are completed, MRAppMaster applies for resources from RM and runs ReduceTask.
  • step13: RM allocates corresponding resources to ReduceTask, and the Reduce phase obtains the Map phase data
  • step14: Execute the ReduceTask task
  • Step15: After the program finishes running, MRAppmaster logs off to RM and closes itself.

Guess you like

Origin blog.csdn.net/u011047968/article/details/126642835