这里是hadoop的mapreduce中的MapTask中的output输出的hash分区代码:
package org.apache.hadoop.mapreduce.lib.partition;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.mapreduce.Partitioner;
/** Partition keys by their {@link Object#hashCode()}. */
@InterfaceAudience.Public
@InterfaceStability.Stable
public class HashPartitioner<K, V> extends Partitioner<K, V> {
/** Use {@link Object#hashCode()} to partition. */
public int getPartition(K key, V value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}
key: 经过map处理后,你自己定义的
value: 经过map处理后,你自己定义的
numReduceTasks: reduce的数量(partition数量 == reduce)
(key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
之所以&Integer.MAX_VALUE = 1111 1111 为了避免负数出现;
0为正哦!!!