Group aggregation is performed in Shuffle, and Reduce recalculates the grouped and aggregated data.
The slice corresponds to the MapTask
partition, and the partition corresponds to the ReduceTask.
You can also adjust the number of partitions by setting the number of reducers.
Partition rule:
When set to 1, the custom partitioner will not be used at all, but Hash will be used.
If the number of partitions is greater than the rule,
an error will be reported if the number of partitions is less than the rule (the number of partitions is not 1).
Processing of small files
Set the class of the input stream to control small files.
The cCombiner partition merging here refers to the calculation directly in shuflle, such as sum.
This partition merging is the merging of partitions with the same area code of different tasks. Example: task1 Merge partition 1 and partition 1 of task2
Group before going to Reduce