Hive storage and processing MapReduce - Data Cleaning

Date: 2019.11.13

Blog period: 115

Wednesday

  

Result file data Description:

  Ip: 106.39.41.166, (city)

  Date: 10 / Nov / 2016: 00: 01: 02 +0800, (date)

  Day: 10, (number of days)

  Traffic: 54, (the flow rate)

  Type: video, (Type: Video video or article article)

  Id: 8701 (id video or article)

Testing requirements:

1, data cleaning: Cleaning in accordance with the data, and import data washing hive database.

Two-stage data cleaning:

(1) First stage: the required information is extracted from the original log

  ip:    199.30.25.88

  time:  10/Nov/2016:00:01:03 +0800

  traffic:  62

  Articles: article / 11325

  Video: video / 3235

(2) The second stage: to do fine operation based on information extracted from the

  ip ---> urban city (IP)

  date--> time:2016-11-10 00:01:03

  day: 10

  traffic:62

  type:article/video

  id:11325

(3) hive database table structure:

  create table data(  ip string,  time string , day string, traffic bigint,type string, id   string )

2 , the data processing:

  · Top10 Visits statistics most popular video / article (video / article)

  · According to the statistics of the most popular cities Top10 course (ip)

  · According to traffic statistics Top10 most popular courses (traffic)

3 , Data Visualization: The statistical results poured MySql database, unfolded through a graphical display mode.

 

 

  Production:

   A, Bean class data base

 1 package com.hive.basic;
 2 
 3 import com.hive.format.IPUtil;
 4 import com.hive.format.TimeUtil;
 5 
 6 public class Bean {
 7     protected String ip;
 8     protected String time;
 9     protected String day;
10     protected int traffic;
11     protected String type;
12     protected String id;
13     public String getIp() {
14         return ip;
15     }
16     public void setIp(String ip) {
17         this.ip = ip;
18     }
19     public String getTime() {
20         return time;
21     }
22     public String getDay() {
23         return day;
24     }
25     public void setDay(String day) {
26         this.day = day;
27     }
28     public void setTime(String time) {
29         this.time = time;
30     }
31     public int getTraffic() {
32         return traffic;
33     }
34     public void setTraffic(int traffic) {
35         this.traffic = traffic;
36     }
37     public String getType() {
38         return type;
39     }
40     public void setType(String type) {
41         this.type = type;
42     }
43     public String getId() {
44         return id;
45     }
46     public void setId(String id) {
47         this.id = id;
48     }
49     public Bean(String ip, String time, String day , int traffic, String type, String id) {
50         super();
51         this.ip = ip;
52         this.time = time;
53         this.day = day;
54         this.traffic = traffic;
55         this.type = type;
56         this= .id ID;
 57 is      }
 58      public Bean () {
 59          Super ();
 60          // the TODO automatically generated constructor stub 
61      }
 62      / * format conversion * / 
63 is      public  void the format () {
 64          the this .ip = IPUtil .getCityInfo ( "106.39.41.166") split ( "\\ |") [3] .replace ( " City", "." );
 65          the this .time = TimeUtil.deal ( the this .time);
 66      }
 67      public  void the display () {
 68          System.out.println (IP + "," + Time + "," + + Day ","+traffic+","+type+","+id);
69     }
70 }
Bean.java

   B, the date format conversion classes

 1 package com.hive.format;
 2 
 3 import java.text.ParseException;
 4 import java.text.SimpleDateFormat;
 5 import java.util.Date;
 6 import java.util.Locale;
 7 
 8 public class TimeUtil {
 9     public static String deal(String time){
10         
11         SimpleDateFormat sdf = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z", Locale.ENGLISH);
12         Date dd = null;
13         try {
14              dd = sdf.parse (Time);
 15          } catch (a ParseException E) {
 16              // the TODO automatically generated catch block 
. 17              e.printStackTrace ();
 18 is          } // string format to date 
. 19          
20 is           String resDate = new new the SimpleDateFormat ( "the mM-dd-YYYY HH: mm: SS" ) .format (dd);
 21 is          
22 is          return resDate;
 23 is      }
 24      public  static  void main (String [] args) throws a ParseException {
 25          
26 is         String dateString = "10/Nov/2016:00:01:02 +0800";
27         SimpleDateFormat sdf = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z", Locale.ENGLISH);
28         Date dd = sdf.parse(dateString); //将字符串改为date的格式
29          String resDate= new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(dd);
30         System.out.println(resDate);
31     }
32 }
TimeUtil.java

 

 

 

 

Guess you like

Origin www.cnblogs.com/onepersonwholive/p/11852910.html