Hive的常见级联求和运算思想解析

1.需求:

有如下访客访问次数统计表 t_access_times
访客 月份 访问次数
A 2015-01-02 5
A 2015-01-03 15
B 2015-01-01 5
A 2015-01-04 8
B 2015-01-05 25
A 2015-01-06 5
A 2015-02-02 4
A 2015-02-06 6
B 2015-02-06 10
B 2015-02-07 5
…… …… ……

2.需要输出报表:t_access_times_accumulate

访客 月份 月访问总计 累计访问总计
A 2015-01 33 33
A 2015-02 10 43
……. ……. ……. …….
B 2015-01 30 30
B 2015-02 15 45
……. ……. ……. …….

3.根据每天的表t_access_times得到每个月的访问次数,然后根据每个月的访问次数得到:

一月份,月30次,总共30次
二月份,月10次,总共40次
三月份,月20次,总共60次
。。。。

4.思路:

#创建表
create table t_access_times(username string,month string,salary int) row format delimited fields terminated by ',';
#加载数据
load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;

原始数据:

A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5

5.第一步,先求个用户的月总金额sum是内置求和函数。

select username,month,sum(salary) as salary from t_access_times group by username,month

±----------±---------±--------±-+
| username | month | salary |
±----------±---------±--------±-+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
±----------±---------±--------±-+

第二步,将月总金额表 自己连接 自己连接

(select username,month,sum(salary) as salary from t_access_times group by username,month) A 
inner join 
(select username,month,sum(salary) as salary from t_access_times group by username,month) B

±------------±---------±----------±------------±---------±----------±-+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
±------------±---------±----------±------------±---------±----------±-+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
±------------±---------±----------±------------±---------±----------±-+

第三步,从上一步的结果中

进行分组查询,分组的字段是a.username a.month
求月累计值: 将b.month <= a.month的所有b.salary求和即可

#select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from 
(select username,month,sum(salary) as salary from t_access_times group by username,month) A 
inner join 
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month //分组求和
order by A.username,A.month; //使总的有序

猜你喜欢

转载自blog.csdn.net/qq_35688140/article/details/84634679