------ MapReduce large data calculation process

The ResourceManager : overall computing resources, manage all NodeManager, resource allocation

The NodeManager : management of computing resources on the host Containerresponsible for reporting its status information to the MR

MRAppMaster : computing tasks Master, responsible for the application of computing resources, coordinate computing tasks

YarnChild : responsible for doing the actual computing tasks / processes (MapTask / ReduceTask)

Container : is an abstract computing resources represents a group of memory for / cpu / network, whether MRAppMaster or YranChild fortune is a need to consume logic Container

 

 

  1. The first is the MR line through a local programmer written by the command to commit or submit a cross-platform IDE

  2. A MR program is a Job, Job information will ResourceManager, ResourceManager registration Job Information

  3. ResouceManager After registering through, Job will copy the relevant resource information (from the HDFS)

  4. Job will immediately submit a complete application information (including resource information) to the ResourceManager

  5. ResourceManager calculated resources required by the current Job information Job allocated Container (resource unit) of Job

  6. This information will be distributed Container NodeManager, NodeManger process creates MRAppMaster

  7. At this point MRAppMaster initializes Job

  8. Then queries the task of slicing

  9. Connecting RM, requesting allocation of resources, to obtain a corresponding resource, the NodeManager corresponding connector, the corresponding starting on YarnChild Container

  10. Job resource copy from the Distributed File System

  11. MR program execution

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/jia-0112/p/11432036.html