目录
一、求平均次日留存率
(1)、题目:现在运营想要查看用户在某天刷题后第二天还会再来刷题的平均概率。请你取出相应数据。
(2)、数据:
其中question_practice_detail是表名,id类似索引无实际意义,device_id是设备id不唯一,quest_id是所做题目的id,result是答题结果,date是日期。
(3)、问题分解
解法一:表里的数据可以看作是全部第一天来刷题了的,那么我们需要构造出第二天来了的字段,因此可以考虑用left join把第二天来了的拼起来,限定第二天来了的可以用datediff(date1, date2)=1
筛选,并用device_id限定是同一个用户。
平均概率:可以count(date1)得到左表全部的date记录数作为分母(去重之后便是全部人数),count(date2)得到右表关联上了的date记录数作为分子(关联上说明日期差为1,满足次日留存的条件),相除即可得到平均概率。
SELECT
count(distinct q2.device_id, q2.date) / count(distinct q1.device_id, q1.date) as avg_ret
FROM
question_practice_detail as q1
left outer join question_practice_detail as q2
on q1.device_id = q2.device_id
and datediff (q2.date, q1.date) = 1;
解法2:用lag函数将同一用户连续两天的记录拼接起来。先按用户分组partition by device_id
,再按日期升序排序order by date
,再两两拼接(第一个默认和null拼接),即lag(date) over (partition by device_id order by date),datediff(date,
lag(date) over (partition by device_id order by date))=1则满足次日留存条件。
select count(if (diff = 1, 1, null)) / (count(*) ) as avg_ret
from(
select device_id,date,
datediff (date,lag (date) over (partition by device_id
order by date)) diff
from question_practice_detail
group by device_id,date
) t
(4)、查询结果
附:SQL中lag()和lead()函数使用_lag函数_Schafferyy的博客-CSDN博客
二、连续签到问题
(1)、题目:查询每位用户的最大连续登录天数
(2)、数据:此处创建表user_login,并插入数据
create table user_login(
user_id INT,
visit_date datetime(6)
);
insert into user_login values(1, '2023-3-16');
insert into user_login values(1, '2023-3-17' );
insert into user_login values(2, '2023-3-18' );
insert into user_login values(2, '2023-3-19' );
insert into user_login values(1, '2023-3-20' );
insert into user_login values(1, '2023-3-21' );
insert into user_login values(1, '2023-3-22' );
insert into user_login values(1, '2023-3-23' );
commit;
select * from user_login
(3)、解法:使用窗口函数row_number() over(partition by user_id order by visit_date) as rk给每个用户的登陆时间一个连续升序的序号,同时输出用户登陆时间减去rk天数,如果差一样说明用户连续登录。
1、row_number() over(partition by user_id order by visit_date)结果:
SELECT
t.*,
ROW_number() over(PARTITION by t.user_id ORDER BY visit_date) rk,
visit_date-interval ROW_number() over(PARTITION by t.user_id ORDER BY visit_date) DAY
as diff
from user_login as t
关于日期的加减及相关函数可以参考:sql常用函数
2、通过上面的查询结果可知,只要根据diff分组,统计每组的数量就可以得到每个id所有连续登录的时间,根据id分组取最大值即是每个id的最大连续登录时间。
#连续登录次数
SELECT user_id,count(diff) num
from(
SELECT
t1.*,
ROW_number() over(PARTITION by t1.user_id ORDER BY visit_date) rk,
visit_date-interval ROW_number() over(PARTITION by t1.user_id ORDER BY visit_date) DAY diff
from user_login as t1) t2
GROUP BY user_id,diff
(3)、根据user_id分组求最大连续次数
#求最大连续登录时长
SELECT user_id,max(num) max_num
FROM(
SELECT user_id,count(diff) num
from(
SELECT
t1.*,
ROW_number() over(PARTITION by t1.user_id ORDER BY visit_date) rk,
visit_date-interval ROW_number() over(PARTITION by t1.user_id ORDER BY visit_date) DAY diff
from user_login as t1) t2
GROUP BY user_id,diff) t3
GROUP BY user_id