Experiment:
Existing electricity supplier website a collection of user data for commodities, recorded a user's favorite product id and the date of collection, called buyer_favorite1 .
buyer_favorite1 comprising: a buyer ID , product ID , the date of collection of these three fields, data "\ t" split, and sample data format is as follows:
- Buyers id commodity id collection date
- 10181 1000481 2010-04-04 16:54:31
- 20001 1001597 2010-04-07 15:07:52
- 20001 1001560 2010-04-07 15:08:27
- 20042 1001368 2010-04-08 08:20:30
- 20067 1002061 2010-04-08 16:45:33
- 20056 1003289 2010-04-12 10:50:55
- 20056 1003290 2010-04-12 11:57:35
- 20056 1003292 2010-04-12 12:05:29
- 20054 1002420 2010-04-14 15:24:12
- 20055 1001679 2010-04-14 19:46:04
- 20054 1010675 2010-04-14 15:23:53
- 20054 1002429 2010-04-14 17:52:45
- 20076 1002427 2010-04-14 19:35:39
- 20054 1003326 2010-04-20 12:54:44
- 20056 1002420 2010-04-15 11:24:49
- 20064 1002422 2010-04-15 11:35:54
- 20056 1003066 2010-04-15 11:43:01
- 20056 1003055 2010-04-15 11:43:06
- 20056 1010183 2010-04-15 11:45:24
- 20056 1002422 2010-04-15 11:45:49
- 20056 1003100 2010-04-15 11:45:54
- 20056 1003094 2010-04-15 11:45:57
- 20056 1003064 2010-04-15 11:46:04
- 20056 1010178 2010-04-15 16:15:20
- 20076 1003101 2010-04-15 16:37:27
- 20076 1003103 2010-04-15 16:37:05
- 20076 1003100 2010-04-15 16:37:18
- 20076 1003066 2010-04-15 16:37:31
- 20054 1003103 2010-04-15 16:40:14
- 20054 1003100 2010-04-15 16:40:16
Requirements write MapReduce programs, the number of statistical collection of goods for each buyer.
Statistics data are as follows:
- Buyers id quantity
- 10181 1
- 20001 2
- 20042 1
- 20054 6
- 20055 1
- 20056 12
- 20064 1
- 20067 1
- 20076 5
Code:
1 package mapreduce; 2 import java.io.IOException; 3 4 import java.util.StringTokenizer; 5 import org.apache.hadoop.fs.Path; 6 import org.apache.hadoop.io.IntWritable; 7 import org.apache.hadoop.io.Text; 8 import org.apache.hadoop.mapreduce.Job; 9 import org.apache.hadoop.mapreduce.Mapper; 10 import org.apache.hadoop.mapreduce.Reducer; 11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 12 importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 13 is 14 public class the WordCount { 15 public static class doMapper the extends Mapper <Object, the Text, the Text, IntWritable> { 16 // first input key indicates the type of Object; Text represents a second type of an input value; Text represents the third output indicates the type of bond; represents a fourth IntWritable output value type . 17 public static Final IntWritable one = new new IntWritable (. 1 ); 18 is public static Text Word = new new the Text (); . 19 @Override 20 is protected void Map (Object Key, the Text value, the Context context) throws IOException, InterruptedException // throws an exception 21 is { 22 is StringTokenizer the tokenizer = new new StringTokenizer (value.toString (), "\ T" ); 23 is // StringTokenizer is Java toolkit in a class, a string for the split 24 word.set (tokenizer.nextToken ()); 25 // returns the string between the current position to the next separator 26 is context.write (Word, one) ; 27 // the word stored into the container, a number denoted 28 } 29 } 30 public static class doReducer the extends the Reducer <the Text, IntWritable, the Text, IntWritable> { 31 is // Parameter Map with the same input keys are successively indicates the type, the type of an input value, the output of the key type, the type of output values 32 Private IntWritable Result = new new IntWritable (); 33 is @Override 34 is protected void the reduce (the Text Key, the Iterable <IntWritable> values, the context context) 35 throws IOException, InterruptedException { 36 int SUM = 0 ; 37 [ for (IntWritable value : values) { 38 sum += value.get(); 39 } 40 //for循环遍历,将得到的values值累加 41 result.set(sum); 42 context.write(key, result); 43 } 44 } 45 public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { 46 Job job = Job.getInstance(); 47 job.setJobName("WordCount"); 48 job.setJarByClass(WordCount.class); 49 job.setMapperClass(doMapper.class); 50 job.setReducerClass(doReducer.class); 51 job.setOutputKeyClass(Text.class); 52 job.setOutputValueClass(IntWritable.class); 53 Path in = new Path("hdfs://localhost:9000/mymapreduce1/in"); 54 Path out = new Path("hdfs://localhost:9000/mymapreduce1/out"); 55 FileInputFormat.addInputPath(job, in); 56 FileOutputFormat.setOutputPath(job, out); 57 System.exit(job.waitForCompletion(true) ? 0 : 1); 58 } 59 }
The final result screenshot: