Introduction to MapReduce
MapReduce is a programming model, a programming method, and an abstract theory.
YARN concept
- ResourceManager
- ApplicationMaster
- NodeManager
ResourceManager
- Allocate and schedule resources
- Start and monitor ApplicationMaster
- Monitor NodeManager
ApplicationMaster
- Request resources for MR-type programs and assign them to internal tasks
- Responsible for data segmentation
- Monitor task execution and fault tolerance
NodeManager
- Manage the resources of a single node
- Process commands from ResourceManager
- Process commands from ApplicationMaster
MapReduce programming model
- Enter a large file, and after splitting, divide it into multiple shards
- Each file fragment is processed by a separate machine, this is the Mao method
- The structure calculated by each machine is summarized and the final structure is obtained. This is the Reduce method