Good programmers actual data of large data YARN Resource Management

  Good programmers actual data of large data YARN resource management , YARN is a new Hadoop Explorer, it is a universal resource management system that provides a unified application for the upper resource management and scheduling, its introduction in the use of cluster rate, unified resource management and data sharing aspect has brought great benefits.

  YARN on the whole is still master / slave structure, the entire resource management framework, resourcemanager to master, nodemanager is a slave. Resourcemanager responsible for the various nademanger unified resource management and scheduling. When a user submits an application needs to provide a ApplicationMaster to track and manage the program, which is responsible to apply ResourceManager resources, and require NodeManger start the task can take up some resources. Since different ApplicationMaster are distributed to different nodes, it will not affect each other therebetween.

  YARN of basic structures, mainly composed YARN ResourceManager, NodeManager, ApplicationMaster Container and several other components.

  ResourceManager is a separate process running on the Master, in charge of the cluster unified resource management, scheduling, distribution and so on; NodeManager process on Slave is a stand-alone, state of the node is responsible for reporting; App Master and Container components are running on the Slave , Container is a unit of yarn in the allocation of resources, bear with memory, CPU resources and so on, yarn Container units to allocate resources. Each application submitted to the Client ResourceManager must have an Application Master, after it allocates resources ResourceManager, run a Slave Container certain node, do specific things Task, also run a Container with a Slave node . Communication between RM, NM, AM and even ordinary Container, are used RPC mechanism.

  YARN architectural design makes it more and more like a cloud operating system, data processing operating system .

  The last-mentioned YARN resource management, we can be understood from the following aspects:

  1. resource scheduling and isolation is a yarn as a resource management system, the most important and most basic two functions. Resource scheduling done by resourcemanager, and resource isolation achieved by each nodemanager.

  After 2.Resourcemanager on a nodemanager resources assigned to the task (this is the so-called "Resource Scheduling"), in accordance with the requirements nodemanager need to provide appropriate resources to the task, even to ensure that these resources should have exclusive, and provide the basis for the task to run guarantee, which is called the resource isolation.

  3. When it comes to resources, we usually refer to the memory, the CPU, IO three resources. Hadoop yarn so far supports only two resources cpu and memory management and scheduling.

  4. How much of life and death decision task memory resources, if memory is not enough, the task may fail; in contrast, the CPU resource is different, it will only determine the speed of the task, the task will not have an impact on the life and death.


Guess you like

Origin blog.51cto.com/14249543/2410043