SQL calculates user retention rate and time retention rate

Sql 29 calculates the average next-day retention rate of users

Topic: Now the operation wants to check the average probability that users will come back to the next day after reading the questions on a certain day. Please take out the corresponding data.
Please add a picture description

  • Solution 1: The data in the table can be regarded as all the questions that came on the first day, so we need to construct the fields that came on the second day, so we can consider using left join to put together the fields that came on the next day. You can use the filter to limit those who come the next day date_add(date1, interval 1 day)=date2, and use device_id to limit the same user.
-- 通过左连接 筛选出第二天还会来的人
select distinct a.device_id,a.date date1,b.device_id,b.date date2
from question_practice_detail a
left join question_practice_detail b
on a.device_id=b.device_id and date_add(a.date,interval 1 day)=b.date

Please add a picture description

Remember: it is a left join

-- 如果没有采用左连接的话
select distinct a.device_id a_device_id,a.date date1,b.device_id b_device_id,b.date date2
from question_practice_detail a,question_practice_detail b
where a.device_id=b.device_id and date_add(a.date,interval 1 day)=b.date

Please add a picture description

-- 通过左连接 筛选出第二天还会来的人
with t as (
select distinct a.device_id a_device_id,a.date date1,b.device_id b_device_id,b.date date2
from question_practice_detail a
left join question_practice_detail b
on a.device_id=b.device_id and date_add(a.date,interval 1 day)=b.date
)
-- 计算用户的次日留存率
select count(date2)/count(date1) avg_ret
from t

Please add a picture description

  • Solution 2: Use the lead function to stitch together the records of the same user for two consecutive days. First group by user partition by device_id, then sort by date in ascending order order by date, and then splicing two by two (the last one is spliced ​​by default and null), that islead(date) over (partition by device_id order by date)
-- 用lead函数
select avg(if(datediff(date2, date1)=1, 1, 0)) as avg_ret
from (
    select
        distinct device_id,
        date as date1,
        lead(date) over (partition by device_id order by date) as date2
    from (
        select distinct device_id, date
        from question_practice_detail
    ) as a
) as b

SQL calculates the retention rate of the next day, the retention rate of the 3rd day, and the retention rate of the 7th day

The data format is as follows
insert image description here

-- 筛选出次留用户
SELECT t1.user_guid,
MAX(CASE WHEN DATEDIFF(date_time,newdate)=1 THEN 1 ELSE 0 END) '是否是次留用户'
FROM(
-- 构建一个每天的新用户表
SELECT user_guid,MIN(date_time) newdate
FROM `view_log`
GROUP BY user_guid
) t1
JOIN view_log t2 ON t1.user_guid=t2.user_guid
GROUP BY t1.user_guid

insert image description here

SELECT t1.date_time '日期',
COUNT(DISTINCT CASE WHEN DATEDIFF(t3.date_time,t2.newdate)=1 THEN t3.user_guid
ELSE NULL END)/COUNT(DISTINCT t2.`user_guid`) '次日留存率',
COUNT(DISTINCT CASE WHEN DATEDIFF(t3.date_time,t2.newdate)<=3 THEN t3.user_guid
ELSE NULL END)/COUNT(DISTINCT t2.`user_guid`) '3日留存率',
COUNT(DISTINCT CASE WHEN DATEDIFF(t3.date_time,t2.newdate)<=7 THEN t3.user_guid
ELSE NULL END)/COUNT(DISTINCT t2.`user_guid`) '7日留存率'
FROM(
SELECT DISTINCT date_time
FROM view_log
)t1 LEFT JOIN
(SELECT user_guid,MIN(date_time) newdate
FROM view_log
GROUP BY user_guid
)t2 ON t2.newdate=t1.date_timeLEFT JOIN view_log t3 ON t2.user_guid=t3.user_guid
GROUP BY t1.date_time

insert image description here

Guess you like

Origin blog.csdn.net/F13122298/article/details/127112159