2018-08-29期 利用Hive统计双色球中奖情况

下面以双色球中奖数据作为示例,利用hive外部表来统计双色球近10年每1列每个数字的中奖次数

一、下面文件shuangseqiu.dat是近10年双色球的所有中奖号码,格式如下

28 27 30 18 03 01 05

31 23 03 12 14 32 10

14 19 02 24 18 07 01

22 27 03 16 11 06 11

06 17 08 23 16 01 05

30 01 29 31 07 26 06

27 10 26 30 04 05 12

24 01 20 12 04 29 15

20 13 04 30 07 29 16

09 29 07 08 18 32 09

08 04 26 21 10 12 09

....................

总共7列,前面6列为红球,后面一列为蓝色球

二、针对以上数据存放格式创建一个外部表与该文件做映射关联

hive> create external table t_shuangseqiu

   > (red_col1 string,red_col2 string,red_col3 string,red_col4 string,red_col5 string,red_col6 string,blue_col string)

   > row format delimited

   > fields terminated by ' '

   > location '/user/shuangseqiu'

   > ;

OK

Time taken: 2.851 seconds

hive> show table

table         tables        tablesample

hive> show tables

tables        tablesample

hive> show tables;

OK

t_shuangseqiu

Time taken: 0.093 seconds, Fetched: 1 row(s)

hive>

三、加载双色球数据都HDFS的/user/shuangseqiu目录,这里我使用以前存放到HDFS目录下

的/shuangseqiu/datasrc/shuangseqiu.dat数据,利用hive load命令加载:

hive> load data inpath '/shuangseqiu/datasrc/shuangseqiu.dat' into table t_shuangseqiu;

Loading data to table default.t_shuangseqiu

Table default.t_shuangseqiu stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 35110, raw_data_size: 0]

OK

Time taken: 0.402 seconds

hive>

[root@hadoop-server02 ~]# hadoop fs -ls /user/shuangseqiu

Found 1 items

-rw-r--r--   1 root supergroup      35110 2018-06-22 23:29 /user/shuangseqiu/shuangseqiu.dat

四、利用hive sql语句统计每一列每个数字出现的次数

select red_col1,count(red_col1) from t_shuangseqiu group by red_col1 order by red_col1 ;

select red_col2,count(red_col2) from t_shuangseqiu group by red_col2 order by red_col2 ;

select red_col3,count(red_col3) from t_shuangseqiu group by red_col3 order by red_col3 ;

select red_col4,count(red_col4) from t_shuangseqiu group by red_col4 order by red_col4 ;

select red_col5,count(red_col5) from t_shuangseqiu group by red_col5 order by red_col5 ;

select red_col6,count(red_col6) from t_shuangseqiu group by red_col6 order by red_col6 ;

select blue_col,count(blue_col) from t_shuangseqiu group by blue_col order by blue_col;

执行统计结果如下:

c1/n1 c2/n2 c3/n3 c4/n4 c5/n5 c6/n6 c7/n7

01/50 01/58 01/42 01/54 01/53 01/58 01/100

02/48 02/42 02/42 02/58 02/40 02/66 02/94

03/49 03/40 03/45 03/46 03/50 03/45 03/89

04/44 04/46 04/58 04/47 04/47 04/35 04/100

05/43 05/50 05/51 05/41 05/49 05/53 05/95

06/59 06/52 06/47 06/50 06/48 06/53 06/106

07/59 07/45 07/53 07/47 07/46 07/47 07/105

08/56 08/53 08/49 08/35 08/53 08/56 08/87

09/47 09/43 09/49 09/49 09/51 09/46 09/105

10/42 10/62 10/36 10/55 10/50 10/45 10/101

11/45 11/48 11/50 11/40 11/53 11/37 11/96

12/42 12/58 12/41 12/61 12/46 12/47 12/113

13/49 13/55 13/49 13/42 13/53 13/50 13/97

14/56 14/52 14/42 14/59 14/48 14/56 14/101

15/46 15/56 15/42 15/38 15/47 15/55 15/99

16/38 16/55 16/47 16/45 16/50 16/46 16/108

17/43 17/37 17/55 17/64 17/60 17/47

18/49 18/51 18/50 18/46 18/57 18/43

19/44 19/52 19/49 19/51 19/47 19/53

20/49 20/47 20/42 20/51 20/55 20/54

21/48 21/46 21/49 21/47 21/35 21/52

22/60 22/52 22/64 22/55 22/49 22/39

23/47 23/42 23/53 23/56 23/40 23/52

24/36 24/50 24/56 24/38 24/49 24/36

25/49 25/56 25/48 25/48 25/42 25/43

26/60 26/43 26/62 26/50 26/42 26/61

27/58 27/38 27/48 27/47 27/44 27/53

28/56 28/53 28/44 28/39 28/46 28/40

29/31 29/44 29/52 29/51 29/61 29/48

30/50 30/44 30/61 30/51 30/39 30/38

31/53 31/45 31/41 31/53 31/36 31/44

32/47 32/42 32/48 32/50 32/55 32/53

33/43 33/39 33/31 33/32 33/55 33/45


猜你喜欢

转载自blog.51cto.com/2951890/2165749