Recently, a demand, the maximum number of days of continuous user logged in, login log data table hive.traffic.access_user only look at two fields: uid, day; date auxiliary table hive.ods.dim_date, this table has only one field day;
Let me talk about ideas,
uid | day | rownumber | day-rownumber【days】 |
---|---|---|---|
101 | 20190911 | 1 | 20190911-1=20190910 |
101 | 20190912 | 2 | 20190912-2=20190910 |
101 | 20190913 | 3 | 20190913-3=20190910 |
101 | 20190916 | 4 | 20190916-4=20190912 |
101 | 20190917 | 5 | 20190917-5=20190912 |
As can be seen, as long as a continuous log in, then the difference between day-rownumber is the same, then the question is, when such reductions in the plan period or New Year's Eve will be a problem, so we have to first convert the Date sequence of numbers
select day,ROW_NUMBER() OVER(ORDER BY day) daynum from hive.ods.dim_date
Next, we need to log user login uid according to the group, and sorted by date, and then calculate the rownumber
with a as (select uid,day from hive.traffic.access_user where day>=20190801 and uid<>'')
select uid,day,ROW_NUMBER() OVER(PARTITION BY uid ORDER BY uid,day) rownum from a group by day,uid
The next step is to calculate the difference, the difference represents the same continuous login date, complete the following sql
with a as (select uid,day from hive.traffic.access_user where day>=20190801 and uid<>''),
b as (select uid,day,ROW_NUMBER() OVER(PARTITION BY uid ORDER BY uid,day) rownum from a group by day,uid ),
c as(select day,ROW_NUMBER() OVER(ORDER BY day) daynum from hive.ods.dim_date),
d as (select uid,b.day,daynum,rownum,daynum-rownum days from b join c on b.day=c.day )
select uid,min(day)"连续登录开始日",count(*) "连续登录天数" from d group by uid,days
end