Hive of cumulative report generation

Hive of cumulative report generation

1. Original data

u01 2019/1/21 5
u02 2019/1/23 6
u03 2019/1/22 8
u04 2019/1/20 3
u01 2019/1/23 6
u01 2019/2/21 8
u02 2019/1/23 6
u01 2019/2/22 4

2. building table mapping said data

create table action (userId string, visitDate string, visitCount int) row format delimited fields terminated by "\t";

 

 

 3. The user packet generation and month the total number of users accessing a month

create table action_amount
as
select tmp.userid,tmp.month,sum(tmp.visitcount) amount from (select userid,from_unixtime(unix_timestamp(visitdate,'yyyy/mm/dd'),'yyyy-mm') month,visitcount from action) tmp group by tmp.userid,tmp.month;

4. Since the connection via two tables, the establishment of a temporary table

create table action_tmp
as
select a.amount as a_amount,b.*
from action_amount a join action_amount b on a.userid=b.userid
where a.month <= b.month;

 

 5. The above Table according to the userid and packet month

select userid,month,max(amount) as amount,sum(a_amount) as accumulate
from action_tmp
group by userid,month;

Guess you like

Origin www.cnblogs.com/zhangchenchuan/p/11973764.html