WordCount experiment

Experiment:

Existing electricity supplier website a collection of user data for commodities, recorded a user's favorite product id and the date of collection, called buyer_favorite1 .

buyer_favorite1 comprising: a buyer ID , product ID , the date of collection of these three fields, data "\ t" split, and sample data format is as follows:

  1. Buyers id    commodity id     collection date  
  2. 10181   1000481   2010-04-04 16:54:31  
  3. 20001   1001597   2010-04-07 15:07:52  
  4. 20001   1001560   2010-04-07 15:08:27  
  5. 20042   1001368   2010-04-08 08:20:30  
  6. 20067   1002061   2010-04-08 16:45:33  
  7. 20056   1003289   2010-04-12 10:50:55  
  8. 20056   1003290   2010-04-12 11:57:35  
  9. 20056   1003292   2010-04-12 12:05:29  
  10. 20054   1002420   2010-04-14 15:24:12  
  11. 20055   1001679   2010-04-14 19:46:04  
  12. 20054   1010675   2010-04-14 15:23:53  
  13. 20054   1002429   2010-04-14 17:52:45  
  14. 20076   1002427   2010-04-14 19:35:39  
  15. 20054   1003326   2010-04-20 12:54:44  
  16. 20056   1002420   2010-04-15 11:24:49  
  17. 20064   1002422   2010-04-15 11:35:54  
  18. 20056   1003066   2010-04-15 11:43:01  
  19. 20056   1003055   2010-04-15 11:43:06  
  20. 20056   1010183   2010-04-15 11:45:24  
  21. 20056   1002422   2010-04-15 11:45:49  
  22. 20056   1003100   2010-04-15 11:45:54  
  23. 20056   1003094   2010-04-15 11:45:57  
  24. 20056   1003064   2010-04-15 11:46:04  
  25. 20056   1010178   2010-04-15 16:15:20  
  26. 20076   1003101   2010-04-15 16:37:27  
  27. 20076   1003103   2010-04-15 16:37:05  
  28. 20076   1003100   2010-04-15 16:37:18  
  29. 20076   1003066   2010-04-15 16:37:31  
  30. 20054   1003103   2010-04-15 16:40:14  
  31. 20054   1003100   2010-04-15 16:40:16  

Requirements write MapReduce programs, the number of statistical collection of goods for each buyer.

Statistics data are as follows:

  1. Buyers id  quantity  
  2. 10181   1  
  3. 20001   2  
  4. 20042   1  
  5. 20054   6  
  6. 20055   1  
  7. 20056   12  
  8. 20064   1  
  9. 20067   1  
  10. 20076   5 

Code:

 1 package mapreduce;
 2 import java.io.IOException;  
 3 
 4 import java.util.StringTokenizer;  
 5 import org.apache.hadoop.fs.Path;  
 6 import org.apache.hadoop.io.IntWritable;  
 7 import org.apache.hadoop.io.Text;  
 8 import org.apache.hadoop.mapreduce.Job;  
 9 import org.apache.hadoop.mapreduce.Mapper;  
10 import org.apache.hadoop.mapreduce.Reducer;  
11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
12 importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
 13 is  
14  public  class the WordCount {
 15       public  static  class doMapper the extends Mapper <Object, the Text, the Text, IntWritable> {  
 16              // first input key indicates the type of Object; Text represents a second type of an input value; Text represents the third output indicates the type of bond; represents a fourth IntWritable output value type   
. 17              public  static  Final IntWritable one = new new IntWritable (. 1 );  
 18 is                      public  static Text Word = new new the Text ();  
 . 19                      @Override  
 20 is                      protected void Map (Object Key, the Text value, the Context context) throws IOException, InterruptedException   // throws an exception   
21 is                      {  
 22 is                          StringTokenizer the tokenizer = new new StringTokenizer (value.toString (), "\ T" );  
 23 is                        // StringTokenizer is Java toolkit in a class, a string for the split   
24                         word.set (tokenizer.nextToken ());  
 25                               // returns the string between the current position to the next separator   
26 is                              context.write (Word, one) ;  
 27                               // the word stored into the container, a number denoted   
28                        }
 29          }             
 30       public  static  class doReducer the extends the Reducer <the Text, IntWritable, the Text, IntWritable> {  
 31 is              // Parameter Map with the same input keys are successively indicates the type, the type of an input value, the output of the key type, the type of output values   
32              Private IntWritable Result = new new IntWritable ();  
 33 is                      @Override  
 34 is                      protected  void the reduce (the Text Key, the Iterable <IntWritable> values, the context context)  
 35                  throws IOException, InterruptedException {  
 36                  int SUM = 0 ;  
 37 [                  for (IntWritable value : values) {  
38                 sum += value.get();  
39                 }  
40                 //for循环遍历,将得到的values值累加  
41                 result.set(sum);  
42                 context.write(key, result);  
43                 }  
44                 } 
45      public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {  
46             Job job = Job.getInstance();  
47             job.setJobName("WordCount");  
48             job.setJarByClass(WordCount.class);  
49             job.setMapperClass(doMapper.class);  
50             job.setReducerClass(doReducer.class);  
51             job.setOutputKeyClass(Text.class);  
52             job.setOutputValueClass(IntWritable.class);  
53             Path in = new Path("hdfs://localhost:9000/mymapreduce1/in");  
54             Path out = new Path("hdfs://localhost:9000/mymapreduce1/out");  
55             FileInputFormat.addInputPath(job, in);  
56             FileOutputFormat.setOutputPath(job, out);  
57             System.exit(job.waitForCompletion(true) ? 0 : 1);  
58         }  
59 }

The final result screenshot:

 

Guess you like

Origin www.cnblogs.com/liyuchao/p/11767262.html