2017 classroom test papers - data cleansing

Shijiazhuang Railway Institute 201 9 Fall

  201 . 7 stage Class Test Paper - Data Cleaning 

Course Title: Large Database Technology Instructor : Wang test of time: 100 Fenzhong        

 

Result file data Description:

Ip : 106.39.41.166, (city) a Date : 10 / Nov / 2016: 00: 01: 02 +0800, (date) Day : 10, (Days) Traffic: 54, (traffic) Type: Video, (Type: Video video or article article This article was )

The above mentioned id: 8701 (video or article of the above mentioned id )

Testing requirements:

1,  data cleaning: Cleaning in accordance with the data, and import data washing hive data repository .

Two-stage data cleaning:

( 1 ) First stage: the required information is extracted from the original log

ip: 199.30.25.88time: 10 / Nov / 2016: 00: 01: 03 + 0800traffic: 62 articles: Article This article was / 11325 Video: Video / 3235

( 2 ) The second stage: to do fine operation based on information extracted from the

ip--->城市 cityIPdate--> time:2016-11-10 00:01:03day: 10traffic:62type:article/videid:11325

( . 3 ) Hive database table structure :

create table data(  ip string,  time string , day string, traffic bigint,type string, id   string )

2 , the data processing:

· Statistical area 's most popular video / article Top10 visits ( Video / Article This article was )

· According to the statistics of the most popular cities Top10 course ( ip )

· According to traffic statistics of the most popular Top10 course ( traffic )

3 , Data Visualization: The statistical results poured MySql database, unfolded through a graphical display mode.

******************************************************************************

Description:

Operating environment: MyEclipse (linux outside)

My understanding is that there is an error, the code is adapted from the 11 before that. But it is able to achieve a simple cleaning. Han knew I thank the representative configuration environment *. *

Data cleansing:

 

 1 package mapreduce1;
 2 
 3 import java.io.IOException;
 4 import java.util.ArrayList;
 5 import java.util.List;
 6 import org.apache.hadoop.conf.Configuration;
 7 import org.apache.hadoop.fs.FileSystem;
 8 import org.apache.hadoop.fs.Path;  
 9 import org.apache.hadoop.io.Text;  
10 import org.apache.hadoop.mapreduce.Job;  
11 import org.apache.hadoop.mapreduce.Mapper;  
12 import org.apache.hadoop.mapreduce.Reducer;  
13 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
14 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;  
15 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
16 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;  
17 public class Result_1{  
18 
19     static int Sum=0;
20     public static class Map extends Mapper<Object , Text , Text,Text>{  
21     private static Text Name =new Text();  
22     private static Text num=new Text();  
23     public void map(Object key,Text value,Context context) throws IOException, InterruptedException{  
24     String line=value.toString();  
25     String arr[]=line.split(",");  
26         Name.set(arr[0]);
27         String trm =arr[3].trim();
28         num.set(trm);
29         System.out.println(num);
30     context.write(Name,num);  
31     }  
32     }  
33     public static class Reduce extends Reducer< Text, Text,Text, Text>{  
34     int i=0;
35     public void reduce(Text key,Iterable<Text> values,Context context) throws IOException, InterruptedException{  
36         Text num=new Text();
37         for(Text val:values){  
38             num=val;
39             Sum+=1;
40             }  
41         String mid=new String();
42         mid=String.valueOf(Sum);
43         mid=num.toString()+"\t"+mid;
44         num.set(mid);
45         context.write(key,num);
46         System.out.println(Sum);
47         }  
48          }  
49     public static int run()throws IOException, ClassNotFoundException, InterruptedException
50     {
51         Configuration conf=new Configuration();  
52         conf.set("fs.defaultFS", "hdfs://192.168.1.100:9000");
53         FileSystem fs =FileSystem.get(conf);
54         Job job =new Job(conf,"Result_1");  
55         job.setJarByClass(Result_1.class);  
56         job.setMapperClass(Map.class);  
57         job.setReducerClass(Reduce.class);  
58         job.setOutputKeyClass(Text.class);  
59         job.setOutputValueClass(Text.class);  
60         job.setInputFormatClass(TextInputFormat.class);  
61         job.setOutputFormatClass(TextOutputFormat.class);  
62         Path in=new Path("hdfs://192.168.1.100:9000/mymapreduce1/in/result.txt");  
63         Path out=new Path("hdfs://192.168.1.100:9000/mymapreduce1/out_result");  
64         FileInputFormat.addInputPath(job,in);  
65         fs.delete(out,true);
66         FileOutputFormat.setOutputPath(job,out);  
67         return(job.waitForCompletion(true) ? 0 : 1);  
68     }
69         public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{  
70     
71             run();
72         }  
73         }  

 

result:

 

简单排序
1
package mapreduce1; 2 3 import java.io.IOException; 4 import java.util.ArrayList; 5 import java.util.List; 6 7 import org.apache.hadoop.conf.Configuration; 8 import org.apache.hadoop.fs.FileSystem; 9 import org.apache.hadoop.fs.Path; 10 import org.apache.hadoop.io.IntWritable; 11 import org.apache.hadoop.io.Text; 12 import org.apache.hadoop.io.WritableComparable; 13 import org.apache.hadoop.io.WritableComparator; 14 import org.apache.hadoop.mapreduce.Job; 15 import org.apache.hadoop.mapreduce.Mapper; 16 import org.apache.hadoop.mapreduce.Reducer; 17 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 18 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 19 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 20 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 21 public class Result_2 { 22 23 public static List <String> Names = new new the ArrayList <String> (); 24 public static List <String> Values = new new the ArrayList <String> (); 25 public static List <String> Texts = new new the ArrayList <String> (); 26 public static class the Sort the extends WritableComparator { 27 public the Sort () { 28 // here is to look at your map output key is to fill in what type of data, give what type 29 Super (IntWritable. class , to true ); 30 } 31 is @Override 32 public int Compare (WritableComparable A, B WritableComparable) { 33 is return -a.compareTo (B); // add negative sign is reverse, the negative sign is positive sequence removed. 34 is } 35 } 36 public static class the Map the extends Mapper <Object, the Text, IntWritable, the Text> { 37 [ Private static the Text the Name = new new the Text (); 38 is Private static IntWritable NUM = new new IntWritable (); 39 public void map(Object key,Text value,Context context)throws IOException, InterruptedException 40 { 41 String line=value.toString(); 42 String mid=new String(); 43 String arr[]=line.split("\t"); 44 if(!arr[0].startsWith(" ")) 45 { 46 num.set(Integer.parseInt(arr[2])); 47 mid=arr[0]+"\t"+arr[1]; 48 Name.set(mid); 49 context.write(num, Name); 50 } 51 52 } 53 } 54 public static class Reduce extends Reducer< IntWritable, Text, Text, IntWritable>{ 55 private static IntWritable result= new IntWritable(); 56 int i=0; 57 58 public void reduce(IntWritable key,Iterable<Text> values,Context context) throws IOException, InterruptedException{ 59 for(Text val:values){ 60 61 if(i<10) 62 {i=i+1; 63 String mid=new String(); 64 mid=val.toString(); 65 String arr[]=mid.split("\t"); 66 Texts.add(arr[1]); 67 Names.add(arr[0]); 68 Values.add(key.toString()); 69 } 70 context.write(val,key); 71 } 72 } 73 } 74 75 76 77 78 79 public static int run()throws IOException, ClassNotFoundException, InterruptedException{ 80 Configuration conf=new Configuration(); 81 conf.set("fs.defaultFS", "hdfs://192.168.1.100:9000"); 82 FileSystem fs =FileSystem.get(conf); 83 Job job =new Job(conf,"Result_2"); 84 job.setJarByClass(Result_2.class); 85 job.setMapperClass(Map.class); 86 job.setReducerClass(Reduce.class); 87 job.setSortComparatorClass(Sort.class); 88 job.setOutputKeyClass(IntWritable.class); 89 job.setOutputValueClass(Text.class); 90 job.setInputFormatClass(TextInputFormat.class); 91 job.setOutputFormatClass(TextOutputFormat.class); 92 Path in=new Path("hdfs://192.168.1.100:9000/mymapreduce1/out_result/part-r-00000"); 93 Path out=new Path("hdfs://192.168.1.100:9000/mymapreduce1/out_result1"); 94 FileInputFormat.addInputPath(job,in); 95 fs.delete(out,true); 96 FileOutputFormat.setOutputPath(job,out); 97 return(job.waitForCompletion(true) ? 0 : 1); 98 99 100 } 101 public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{ 102 run(); 103 for(String n:Names) 104 { 105 System.out.println(n); 106 } 107 } 108 }

result:

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/daisy99lijing/p/11853896.html