hive:级联累计报表查询

原文链接： http://www.cnblogs.com/drl-blogs/p/11086870.html

A,2015-01-08,5
A,2015-01-11,15
B,2015-01-12,5
A,2015-01-12,8
B,2015-01-13,25
A,2015-01-13,5
C,2015-01-09,10
C,2015-01-11,20
A,2015-02-10,4
A,2015-02-11,6
C,2015-01-12,30
C,2015-02-13,10
B,2015-02-10,10
B,2015-02-11,5
A,2015-03-20,14
A,2015-03-21,6
B,2015-03-11,20
B,2015-03-12,25
C,2015-03-10,10
C,2015-03-11,20

结果如图

在这里插入图片描述

首先创建表和导入数据

create table 
	t_access_times(username string,month string,counts int)
row format delimited fields terminated by ',';

load data local inpath '/root/access_times.txt' into table t_access_times;

然后，我们可以先得到如下数据

在这里插入图片描述
代码如下：

select 
	username,tmp.mo, sum(counts) sums  
from
(
	select username,
		concat(split(month,'-')[0] ,'-', split(month,'-')[1]) mo,
		counts 
	from t_access_times
) tmp
group by username,tmp.mo;

然后使用join连接两张表(两张表都是上面的表)，根据a表用户名和月份分类，这样会有如下结果
在这里插入图片描述
这显然是不对的，b表是用来求累计的，而这儿直接把用户所有的销售额都加上，所以，这里加一个判断，判断b表的月份要小于等于a表的月份，这样，b表求和就只会是求之前的了，最后加上排序，以防数据错乱。

select 
	a.username as `姓名`,
	a.mo as `月份`,
	--分组后，这里的数据不能调用，因为一组中有多条数据，但是这里多条数据是重复的，所以,用max,或着min都一样
	max(a.sums) as `月总额`,
	sum(b.sums) as `累计到当月的总额` 
from
(
	select 
		username,
		tmp.mo,
		sum(counts) sums  
		from
		(
		select 
			username,
			concat(split(month,'-')[0] ,'-', split(month,'-')[1]) mo,
			counts 
		from 
			t_access_times
		) tmp
	group by 
		username,
		tmp.mo
)a
join
(
	select 
		username,
		tmp.mo, 
		sum(counts) sums  
	from
	(
		select 
		username,
		concat(split(month,'-')[0] ,'-', split(month,'-')[1]) mo,
		counts 
		from 
			t_access_times
	) tmp
	group by 
		username,
		tmp.mo
)b
on a.username=b.username 
where b.mo<=a.mo
group by a.username,a.mo
order by a.username,a.mo;

结果如下图：
在这里插入图片描述

转载于:https://www.cnblogs.com/drl-blogs/p/11086870.html

hive:级联累计报表查询

猜你喜欢