Yarn popular Introduction
Apache Hadoop YARN (Yet Another Resource Negotiator , another resource coordinator) is a new Hadoop Explorer, it is a common resource management and scheduling system platform , provides a uniform for applications the resource management and scheduling.
Its introduction is a cluster tremendous benefits in terms of utilization of resources, unified management and data sharing.
The yarn can be understood as the equivalent of a distributed operating system platform, while other operations mapreduce program is equivalent applications running on top of the operating system, Yarn provide the resources needed operation (memory, cpu) for these programs.
- yarn is not clear operating mechanism of the program submitted by users
- yarn only provide scheduling computing resources (user application program resources to the yarn, yarn is responsible for allocating resources)
- The role of director of yarn called ResourceManager
- yarn specifically provide computing resources role called NodeManager
- yarn and run user programs completely decoupled, meaning that the yarn can run various types of distributed computing program, such as mapreduce, storm, spark, tez ......
- spark, storm and other operational framework can be integrated in the yarn run, as long as there are resources in line with their respective yarn specification framework can request mechanism
- yarn becoming a universal resource scheduling platform for enterprise computing in a variety of pre-existing clusters can be integrated on a single physical cluster, improve resource utilization, and facilitate data sharing
Yarn basic architecture
YARN is a resource management, task scheduling framework, mainly includes three modules: ResourceManager (RM), NodeManager (NM), ApplicationMaster (AM).
ResourceManager responsible for monitoring all resources, allocation and management of a cluster only one;
NodeManager responsible for the maintenance of each node, a cluster has more.
ApplicationMaster responsible for scheduling and coordinating each specific application, there are more than a cluster;
For all applications, RM has the absolute right to control and allocate resources. And each AM and RM will consult resources, communication and NodeManager to implement and monitor the task.