4 MapReduce workflow Detailed

Detailed MapReduce workflow

A, MapTask working mechanism

Here Insert Picture Description
The overall process can be summarized as follows:

1, client submit jobs to the cluster, designated slice number n

2, start the cluster of n maptask

3, maptask read data (k, v) by RecordReader, v representative of the data line by line

4. After reading the data row by row mapper logic processing

5, the processed data into a new dispensing (k1, v1), is sent to the ring buffer by Context.write

6, the data written to the ring buffer to different partitions, and the key for sorting zone

7, when the ring buffer reaches 80%, put the data in the buffer overflow to a file, and the partition area is ordered

8, the partition for each file before the merger, the new file is partitioned and orderly area

Two, ReduceTask working mechanism

[

The overall process can be summarized as follows

1, according to the partition number MapTask last good file merge partitions according to crawl over

2, to crawl over files together and sorted according to key

3, data sorted performing a logic operation to generate a new (k, v)

4, the output

Note :

The size of the ring buffer will affect the efficiency of the implementation of MapReduce, because the larger the buffer, the fewer the number of disk io, perform faster.

Three, Shuffle working mechanism

Here Insert Picture Description
After the map method MapTask, before the operation ReduceTask methods reduce called shuffle, shuffle so that the core operations of MapReduce.

Published 42 original articles · won praise 3 · Views 2047

Guess you like

Origin blog.csdn.net/stable_zl/article/details/105133173