连续类指标计算方法,比如连续登录天数、连续领取天数等等指标~
下面以连续登录天数为例,说下解决方法;
输入
每日login表
dt | user |
---|---|
20170601 | A |
20170602 | A |
20170603 | A |
20170601 | B |
20170701 | B |
20170702 | B |
20170801 | C |
20170802 | C |
输出
dt | user | lianxu_num |
---|---|---|
20170601 | A | 1 |
20170602 | A | 2 |
20170603 | A | 3 |
20170601 | B | 1 |
20170701 | B | 1 |
20170702 | B | 2 |
20170801 | C | 1 |
20170802 | C | 2 |
解决方法
解决方法1: 写段累计的代码,以用户分组,以时间排序,如果本次时间和上次时间差值为1,则连续值+1
解决方法2:
dt | user | row_number() over(partition by user order by dt) as rn | dt-rn+1 | 以user和(dt-rn+1)进行分组,以dt进行排序,获得row_number取值,即得到连续值 |
---|---|---|---|---|
20170601 | A | 1 | 20170601 | 1 |
20170602 | A | 2 | 20170601 | 2 |
20170603 | A | 3 | 20170601 | 3 |
20170601 | B | 1 | 20170601 | 1 |
20170701 | B | 2 | 20170630 | 1 |
20170702 | B | 3 | 20170630 | 2 |
20170801 | C | 1 | 20170801 | 1 |
20170802 | C | 2 | 20170801 | 2 |