0 needs
There is an existing user active table user_active(user_id,active_date), user registry user_regist(user_id,regist_date),
The partition fields in the table are all dt (yyyy-MM-dd), and the user fields are all user_id;
Design a 1-180 day registration and active retention form;
1 analysis
The requirements require the design of registration and active retention tables with a retention period of 1-180 days, that is, the goals are as follows:
registration date |
retention cycle |
active number |
number of registrations |
retention rate |
2023-01-10 |
1 |
100 |
200 |
Active Number/Registered Number |
2023-01-10 |
2 |
50 |
200 |
|
2023-01-10 |
3 |
10 |
200 |
|
..... |
... |
... |
Main inspection point: Cartesian set (one-to-many association)
Observing the table structure: we can see that the denominator is fixed for each day, and the numerator changes with the retention period
Step 1: Find the number of daily registrations in the registration form. The number of registrations is used as the denominator, which is a fixed value for each day, because we use the window to solve this indicator.
select user_id
,to_date(regist_date) as regist_date
,count(user_id) over(partition by to_date(regist_date)) as regist_count
from user_regist
where dt >= date_sub(current_date(), 180)
Step 2: The user registry is used as the main table, associated with the active table, and the associated key is user_id. Due to the one-to-many relationship, a Cartesian set is generated
Note: active user table, users will be active multiple times a day, pay attention to deduplication
select regist_date
,t1.user_id
,t1.regist_count
,t2.user_id
,t2.active_date
,datediff(t2.active_date, t1.regist_date) as date_diff
from (
select user_id
,to_date(regist_date) as regist_date
,count(user_id) over(partition by to_date(regist_date)) as regist_count
from user_regist
where dt >= date_sub(current_date(), 180)
) t1
left join (
select user_id
,to_date(active_date) as active_date
from user_active
where dt >= date_sub(current_date(), 180)
group by user_id, to_date(active_date)
) t2
on t1.user_id = t2.user_id
regist_date |
t1.user_id |
t1.regist_count |
t2.user_id |
t2.active_date |
date_diff |
2023-01-10 |
A |
200 |
A |
2023-01-11 |
1 |
2023-01-10 |
A |
200 |
A |
2023-01-12 |
2 |
2023-01-10 |
A |
200 |
A |
2023-01-13 |
3 |
2023-01-10 |
A |
200 |
A |
2023-01-14 |
4 |
2023-01-10 |
B |
200 |
B |
2023-01-13 |
3 |
2023-01-10 |
B |
200 |
B |
2023-01-14 |
4 |
2023-01-10 |
B |
200 |
B |
2023-01-15 |
5 |
2023-01-10 |
B |
200 |
B |
2023-01-16 |
6 |
Step 3: Group by registration date and retention period, and calculate the number of active users under the retention period and at that point in time
select t1.regist_date
,max(t1.regist_count) as regist_cnt --每天是固定值,用max()函数取出该值
,datediff(t2.active_date, t1.regist_date) as date_diff
,count(t1.user_id) as active_user_cnt
from (
select user_id
,to_date(regist_date) as regist_date
,count(user_id) over(partition by to_date(regist_date)) as regist_count
from user_regist
where dt >= date_sub(current_date(), 180)
) t1
left join (
select user_id
,to_date(active_date) as active_date
from user_active
where dt >= date_sub(current_date(), 180)
group by user_id, to_date(active_date)
) t2 on t1.user_id = t2.user_id
where datediff(t2.active_date, t1.regist_date) >=1
and datediff(t2.active_date, t1.regist_date) <= 180
group by t1.regist_date, datediff(t2.active_date, t1.regist_date)
Step Four: Calculate Rate Retention
select regist_date
, date_diff
, active_user_cnt
, case when nvl(regist_cnt,0)!=0
then active_user_cnt/regist_cnt end as retention_rate
from
(select t1.regist_date
,max(t1.regist_count) as regist_cnt --每天是固定值,用max()函数取出该值
,datediff(t2.active_date, t1.regist_date) as date_diff
,count(t1.user_id) as active_user_cnt
from (
select user_id
,to_date(regist_date) as regist_date
,count(user_id) over(partition by to_date(regist_date)) as regist_count
from user_regist
where dt >= date_sub(current_date(), 180)
) t1
left join (
select user_id
,to_date(active_date) as active_date
from user_active
where dt >= date_sub(current_date(), 180)
group by user_id, to_date(active_date)
) t2 on t1.user_id = t2.user_id
where datediff(t2.active_date, t1.regist_date) >=1
and datediff(t2.active_date, t1.regist_date) <= 180
group by t1.regist_date, datediff(t2.active_date, t1.regist_date)
) t
2 Summary
This paper presents a calculation model for the 1-180 day registration active retention table, which is mainly solved in the form of a Cartesian set. This is also a method often used in data reports and needs to be mastered.