I love beijing
i love cina
beijing is the captial of china
When the statistics are as follows:
Note that in the above figure, the leftmost offset is the first 1, and then I LOVE CHINA In, I is the 4th word; so the offset is 4;
then the word segmentation is performed,
and then K3 is the KEY that classifies each word, and then V3 (1, 1), that is, the word I, the frequency of statistics ;
Related mapper writing:
public class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { /* * key: the entered key * value: data I love Beijing * context: Map context */ String data= value.toString(); //Participle String[] words = data.split(" "); // output each word for(String w:words){ context.write(new Text(w), new LongWritable(1)); } } }
reduce:
public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable>{ @Override protected void reduce(Text k3, Iterable<LongWritable> v3,Context context) throws IOException, InterruptedException { //v3: is a set, each element is v2 long total = 0; for(LongWritable l:v3){ total = total + l.get(); } // output context.write(k3, new LongWritable(total)); } }
Main program:
public class WordCountMain { public static void main(String[] args) throws Exception { //Create a job = map + reduce Configuration conf = new Configuration(); //create a job Job job = Job.getInstance(conf); //Specify the entry of the task job.setJarByClass(WordCountMain.class); //Specify the mapper of the job job.setMapperClass(WordCountMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); //Specify the reducer of the job job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); // Specify the input and output of the task FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); //Submit the task job.waitForCompletion(true); }