Hadoop — Analysis of Yarn Principles

1 Overview

Yarn is a resource scheduling platform responsible for providing server computing resources for computing programs, which is equivalent to a distributed operating system platform; while computing programs such as MapReduce are equivalent to applications running on the operating system.

2. Important concepts of YARN

1. Yarn does not know the operation mechanism of the program submitted by the user;
2. Yarn only provides scheduling of computing resources (the user program applies for resources to yarn, and yarn is responsible for allocating resources);
3. The supervisor role in yarn is called ResourceManager;
4. The role in yarn that specifically provides computing resources is called NodeManager;
5. In this way, yarn is actually completely decoupled from the running user program, which means that various types of distributed computing programs can be run on yarn (mapreduce is only one of them). ), such as MapReduce, storm program, spark program, tez...;
6. Therefore, computing frameworks such as spark and storm can be integrated to run on yarn, as long as their respective frameworks have resources that conform to the yarn specification The request mechanism is enough;
7. Yarn becomes a general resource scheduling platform. From then on, various computing clusters that existed in the enterprise can be integrated into a physical cluster to improve resource utilization and facilitate data sharing.

3. An example of the flow of running an arithmetic program in Yarn

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325143671&siteId=291194637