In which stages does sorting occur in Hadoop_ MR JOB?

Original address:

https://blog.csdn.net/qq_42246689/article/details/84590215

 

This is an interview question, because bloggers haven't written much about MR JOB. The knowledge I recently learned in the review just happened to be sorted out.

 

1. The partition is partitioned in the last stage of the map, generally use the class set by job.setPartitionerClass, if there is no custom Key hashCode () method to partition. Write to the ring buffer during the map phase, and sort once when the ring buffer overflows. Each partition internally calls the key comparison function class set by job.setSortComparatorClass to sort. If not, use the compareTo method of the Key implementation .

 

2. It will also sort again when merging all the overflow write files of each maptask's ring buffer

 

3. When reduce receives all the data transmitted by the map, merge and sort the data of each partition, call the key comparison function class set by job.setSortComparatorClass to sort all the data pairs, if not, use the implement of compareTo method.

 

4. Next, use the grouping function class set by job.setGroupingComparatorClass to group, and the value of the same Key is placed in an iterator. If GroupingComparatorClass is not specified, use the compareTo method of Key to group it.

 

It is inevitable in Hadoop1.0 that it can be closed in hadoop2.0, and set reduce task to 0.
 

Published 519 original articles · praised 1146 · 2.83 million views

Guess you like

Origin blog.csdn.net/u010003835/article/details/105301236