SQL29 Calculate the average next-day retention rate of users

Original title link

【describe】

Question: Now the operation wants to check the average probability that users will come back to answer questions the next day after answering questions on a certain day. Please take out the corresponding data.

[Example]: question_practice_detail

Insert image description here

Insert image description here

[Problem analysis] Excerpted from the solution area "Reg333"

The so-called retention for the next day means that the same user (in this case, the same device, device_id) completes the questions on both the same day and the next day. Note that in this question, we don't care what questions the same user (device) answered on this day and what the answer results were. We only care about whether he answered the questions. Therefore, there is duplicate data for this question (as shown in the red box in the figure below). Need to use DISTINCTdeduplication.

Insert image description here

avg_ret = Number of devices online for both days/Total number of devices on the first day

SELECT
  COUNT(q2.device_id) / COUNT(q1.device_id) AS avg_ret
FROM
  (
    SELECT
      DISTINCT device_id, date
    FROM
      question_practice_detail
  ) as q1
LEFT JOIN 
  (
    SELECT
      DISTINCT device_id, date
    FROM
      question_practice_detail
  ) AS q2 
ON q1.device_id = q2.device_id AND q2.date = DATE_ADD(q1.date, interval 1 day)

Note that MySQL COUNTdoes not count null entries when counting columns.

[Experience]
Use to LEFT JOIN perform self-connection. When q1.device_id = q2.device_id AND q2.date = DATE_ADD(q1.date, interval 1 day)this condition is met (device_id is naturally always established, so the key is to see whether DATE_ADD is established), the device_id and date of q2 are naturally not null, that is, the device is online for two days; when When this condition is not met, the device_id of q2 is null and COUNTwill not be calculated. In this way, COUNT(q2.device_id)the calculation will always be the correct number of devices online for two days; and the number of devices on the first day isCOUNT(q1.device_id)

[Example of the effect of a temporary table after left join:]

q1.device_id q1.date q2.device_id q2.date
2315 2021-08-13 2315 2021-08-14
2315 2021-08-14 2315 2021-08-15
2138 2021-05-03 null null
3214 2021-05-09 null null
3214 2021-06-15 null null

Guess you like

Origin blog.csdn.net/QinLaoDeMaChu/article/details/128133760