Hadoop big data technology of MapReduce (3) - MapReduce workflow

3.2 MapReduce workflow

1. Flow diagram:

Here Insert Picture Description
Here Insert Picture Description

2. Detailed Process

The above process is the most comprehensive MapReduce entire workflow process but only the beginning Shuffle from step 7 to step 16 ends, particularly Shuffle Detailed process, as follows:
. 1) MapTask collected our map () kv method of outputting, into memory buffer
2) continuously from the memory buffer overflow local disk file, multiple files can overflow
3) multiple spill files will be merged into a large spill file
4) in the process of consolidation of the overflow process and, should call Partitioner partition and for sorting Key
. 5) according to their ReduceTask partition number, obtains the corresponding data to the respective partition results MapTask machine
6) ReduceTask will be taken to the same partition of the result files from different MapTask, these files will ReduceTask then merging (merge sort)
after 7) larger than the combined file, Shuffle process is over, back into the process ReduceTask logical operation (key removed one by one to reduce Group, calls from the user-defined file ( )method)

3. note

Shuffle the buffer size will affect the efficiency of the implementation of MapReduce programs, in principle, the larger the buffer, the fewer the number of disk io, perform faster.
Buffer size parameters can be adjusted parameters: io.sort.mb a default 100M.

4. Source resolution process
context.write(k, NullWritable.get());
output.write(key, value);
collector.collect(key, value,partitioner.getPartition(key, value, partitions));
	HashPartitioner();
collect()
	close()
	collect.flush()
sortAndSpill()
	sort()   QuickSort
mergeParts();
collector.close();
Published 37 original articles · won praise 7 · views 1169

Guess you like

Origin blog.csdn.net/zy13765287861/article/details/104748512