MapReduce implements TopK algorithm principle and code

1. Map stage

Use the map method to construct the data into a TreeMap whose data is less than K. After each map, judge the size of the TreeMap and the size of K. When the amount of data in the TreeMap is greater than K, take out the smallest number. After the map ends, the cleanup method is executed, which transfers the first K data in the map to the reduce task.

2. Reduce stage

In the reduce method, the K data passed in the map method are put into the TreeMap in turn, so that the K
data arranged from large to small using the firstKey method of the red-black tree from small to large. sorted in order
. So as to find the first K numbers.

3. Code part: Find the largest 100 numbers from 1000w data.

public class TopKAapp{
    
    
	private static final String INPUT PATH ="hdfs:/xxx/topk_input";
	private static final String OUT PATH="hdfs://xxx/topk out"

	public static void main (String[] args) throws Exception{
    
    
		Configuration conf new ConfigurationO;
		final FileSystem fileSystem FileSystem.get(new URI(INPUT_PATH), conf);
		final Path outPath new Path (OUT PATH);
		if (fileSystem.exists (outPath)){
    
    
			fileSystem.delete(outPath, true);
		}

		final Job job= new Job(conf, TopKAapp.class.getSimpleNameO);
		FilelnputFormat.setInputPaths(job, INPUT PATH);
		job.setMapperClass(MyMapper.class);
		job.setPartitionerClass(HashPartitioner.class);
		job.setNumReduceTasks(1);
		job.setReducerClass(MyReducer.class);
		job.setOutputKeyClass(NullWritable.class);
		job. setOutput ValueClass(Long Writable.class);
		FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));
		job.setOutputFormatClass(TextOutputFormat.class);
		job.waitForCompletion(true);
	}

	static class MyMapper extends MapperLong Writable, Text, Null Writable, Long Writable{
    
    
		public static final int K100;
		private TreeMapLong, Long> tree= new TreeMapLong, LongO;

		public void map(Long Writable key, Text text, Context context) throws IOException, InterruptedException{
    
    
			long temp Long.parseLong(text.toStringO);
			tree.put(temp, temp);
			if (tree.size()K)
				tree.remove(tree.firstKeyO);
		}Override
		protected void cleanup(Context context) throws IOException, InterruptedException
			for (Long text tree. values)){
    
    
				context. write(Null Writable.get), new Long Writable(text));
			}
		}
	}

	static class MyReducer extends Reducer<NullWritable, Long Writable, NullWritable,
Long Writable>{
    
    
		public static final int K=100;
		private TreeMap<Long, Long> tree new TreeMap<Long, Long>);

		@Override
		protected void cleanup(Context context) throws IOException, InterruptedException{
    
    
			for (Long val tree.descendingKeySet()){
    
    
				context.write(Null Writable.get(), new Long Writable(val));
			}
		}

		@Override
		protected void reduce(Null Writable key, Iterable<Long Writable> values, Context context) throws IOException, InterruptedException{
    
    
			for (Long Writable value values){
    
    
				tree.put(value.get), value.getO);
				if(tree.size()>K)
					tree.remove(tree.firstKey));
			}
		}
	}
}

Guess you like

Origin blog.csdn.net/wilde123/article/details/118878923