The principle of mapreduce to hang interviewers

Principle of mapreduce

Insert picture description here
1. There is a large data to be processed, which is divided into data blocks of the same size (such as 64MB), and the corresponding user operation program
2. There is a master node (Master) responsible for scheduling in the system, as well as data Map and Reduce worker node (Worker)
3. The user job program is submitted to the master node
4. The master node finds and allocates available Map nodes for the job program, and transfers the program
5. The master node also finds and allocates available Reduce nodes for the job program, And transfer the program to the Reduce node
6. The master node starts each Map node to execute the program, and each map node reads the local or local data as much as possible for
calculation.
7. Each Map node processes the read data block, and Do some data sorting work (combining, sorting, etc.) and store the intermediate results locally; at the same time notify the master node of the completion of the calculation task and inform the intermediate result data storage location
8. After the master node and other Map nodes have completed the calculation, start the reduce node operation ; The Reduce node reads these data remotely from the intermediate result data position information mastered by the master node.
9. The calculation results of the Reduce node are summarized and output to a result file to obtain the entire processing result

Guess you like

Origin blog.csdn.net/qq_42706464/article/details/108812368