Understanding Map

Group aggregation is performed in Shuffle, and Reduce recalculates the grouped and aggregated data.
The slice corresponds to the MapTask
partition, and the partition corresponds to the ReduceTask.

You can also adjust the number of partitions by setting the number of reducers.

Partition rule:
When set to 1, the custom partitioner will not be used at all, but Hash will be used.
If the number of partitions is greater than the rule,
an error will be reported if the number of partitions is less than the rule (the number of partitions is not 1).

Processing of small files
Set the class of the input stream to control small files.
Insert image description here
Insert image description here
The cCombiner partition merging here refers to the calculation directly in shuflle, such as sum.
Insert image description here
This partition merging is the merging of partitions with the same area code of different tasks. Example: task1 Merge partition 1 and partition 1 of task2
Insert image description here

Group before going to Reduce
Insert image description here

Guess you like

Origin blog.csdn.net/qq_42265608/article/details/132437017
Map
Map
map
map