yarn introduction and common parameters

I. Overview

  1. Apache Hadoop YARN (Yet Another Resource Negotiator, another resource coordinator) is a new Hadoop Explorer, it is a universal resource management system that provides a unified application for the upper resource management and scheduling, its introduction is cluster has brought great benefits in terms of resource utilization, resource unified management and data sharing.
  2. The basic idea is to JobTracker YARN two main functions (resource management and job scheduling / monitoring) separation, the main method is to create a global ResourceManager (RM) and a number of applications for ApplicationMaster (AM). The applications here refers to the traditional MapReduce jobs.
  3. YARN nature of hierarchy is ResourceManager. This entity controls the whole cluster and manage the distribution of applications to computing resources base. ResourceManager part of the individual resources (compute, memory, bandwidth, etc.) to elaborate the basis NodeManager (YARN each node agent).
  4. ResourceManager also allocate resources and ApplicationMaster together, start and monitor their application basis with NodeManager. In this context, ApplicationMaster bear some of the previous role of TaskTracker, ResourceManager took JobTracker role.
  5. ApplicationMaster manage each instance of the application running in the YARN. ApplicationMaster resources from ResourceManager responsible for coordinating and monitoring the container by NodeManager implementation and use of resources (CPU, memory and other resource allocation).

Second, the architecture diagram

Here Insert Picture Description

Third, the core idea

  1. The JobTracker TaskTacker and separated, which consists of the following major constituent components:
  2. A global resource manager ResourceManager
  3. Each node ResourceManager agent NodeManager
  4. Each application represents ApplicationMaster
  5. Each has more than a ApplicationMaster Container runs on NodeManager

Components Introduction

Four, ResourceManager (RM)

  1. RM is a global resource manager, responsible for resource management and allocation of the entire system. It mainly consists of two components: a scheduler (Scheduler) and Application Manager (Applications Manager, ASM).
  2. Scheduler The scheduler (e.g., allocated some resources for each queue, a certain amount of work performed up to the like), the system resources allocated to each application is running according to the capacity, queuing limitations. It should be noted that the scheduler is a "pure scheduler", it is no longer engaged in any work related to specific applications, such as not responsible for monitoring the implementation of state or tracking applications, etc., and is not responsible because the application fails to restart failed tasks or hardware failure resulting from these applications are referred to the relevant ApplicationMaster completed. Only a resource allocation scheduler according to the resource requirements of each application, and a resource allocation unit using abstraction "Resource container" (Resource Container, referred Container) represents, Container is a dynamic resource allocation units, it will memory, CPU resource encapsulation together, thereby defining the amount of resources used by each task.
  3. Application Manager (Applications Manager) is responsible for managing all applications throughout the system, including applications submitted, in consultation with resource scheduler to start ApplicationMaster, monitoring ApplicationMaster running state and restart it in case of failure and so on.

五、ApplicationMaster(AM)

  1. Each application contains a user submitted AM, the main features include:
    A scheduler to negotiate and obtain resources RM (represented by Container);. B
    obtained will assign tasks to further internal tasks (second allocation of resources. );
    . c NM communicate with the start / stop tasks;
    d monitoring all the tasks running, and when a job fails to re-apply for funding for the mission to restart the task.

Six, NodeManager (NM)

  1. NM is a resource and task manager on each node
  2. It will periodically report the use of resources and operating state of each node on the Container to RM
  3. It receives and processes various requests from the AM Container start / stop and the like.

七、Container

  1. Container YARN resource abstraction of that encapsulates memory on a node, the CPU resources
  2. When the AM application resources to RM, RM resource AM returned is expressed Container.
  3. YARN a Container will be assigned for each task, and the task can only use the resources of the Container described.

Common parameters

Here Insert Picture Description

Guess you like

Origin blog.csdn.net/yang134679/article/details/93782653