hive:级联累计报表查询

原文链接: http://www.cnblogs.com/drl-blogs/p/11086870.html
A,2015-01-08,5
A,2015-01-11,15
B,2015-01-12,5
A,2015-01-12,8
B,2015-01-13,25
A,2015-01-13,5
C,2015-01-09,10
C,2015-01-11,20
A,2015-02-10,4
A,2015-02-11,6
C,2015-01-12,30
C,2015-02-13,10
B,2015-02-10,10
B,2015-02-11,5
A,2015-03-20,14
A,2015-03-21,6
B,2015-03-11,20
B,2015-03-12,25
C,2015-03-10,10
C,2015-03-11,20

结果如图

在这里插入图片描述

首先创建表和导入数据

create table 
	t_access_times(username string,month string,counts int)
row format delimited fields terminated by ',';
load data local inpath '/root/access_times.txt' into table t_access_times;

然后,我们可以先得到如下数据

在这里插入图片描述
代码如下:

select 
	username,tmp.mo, sum(counts) sums  
from
(
	select username,
		concat(split(month,'-')[0] ,'-', split(month,'-')[1]) mo,
		counts 
	from t_access_times
) tmp
group by username,tmp.mo;

然后使用join连接两张表(两张表都是上面的表),根据a表用户名和月份分类,这样会有如下结果
在这里插入图片描述
这显然是不对的,b表是用来求累计的,而这儿直接把用户所有的销售额都加上,所以,这里加一个判断,判断b表的月份要小于等于a表的月份,这样,b表求和就只会是求之前的了,最后加上排序,以防数据错乱。

select 
	a.username as `姓名`,
	a.mo as `月份`,
	--分组后,这里的数据不能调用,因为一组中有多条数据,但是这里多条数据是重复的,所以,用max,或着min都一样
	max(a.sums) as `月总额`,
	sum(b.sums) as `累计到当月的总额` 
from
(
	select 
		username,
		tmp.mo,
		sum(counts) sums  
		from
		(
		select 
			username,
			concat(split(month,'-')[0] ,'-', split(month,'-')[1]) mo,
			counts 
		from 
			t_access_times
		) tmp
	group by 
		username,
		tmp.mo
)a
join
(
	select 
		username,
		tmp.mo, 
		sum(counts) sums  
	from
	(
		select 
		username,
		concat(split(month,'-')[0] ,'-', split(month,'-')[1]) mo,
		counts 
		from 
			t_access_times
	) tmp
	group by 
		username,
		tmp.mo
)b
on a.username=b.username 
where b.mo<=a.mo
group by a.username,a.mo
order by a.username,a.mo;

结果如下图:
在这里插入图片描述

转载于:https://www.cnblogs.com/drl-blogs/p/11086870.html

猜你喜欢

转载自blog.csdn.net/weixin_30596023/article/details/94879812