In Tencent, NetEase or some major game companies, they often focus on the number of users online and the number of days. So if we are given a database, how can we quickly query that user's login for N consecutive days?
Let us use a case to illustrate that no matter how much language is used, it will always be so pale in the face of reality:
The old scumbag and the scumbag are both masters of love and time management. Since the last date, the two have become even more quarrelsome. Now they go on dates with Xiaomei every day. Xiaomei has a hobby. Every time I like to log in date time information with people I date. In fact, Xiao Mei is also the king of the sea. Excellent hunters always appear as prey. Now Xiao Mei wants to find out who has dated for 3 consecutive days, Xiao Zha or Lao Zha? Can you help Xiaomei solve this problem?
First look at the chart. If we operate it manually, how should we execute it:
We use code to simulate this scenario:
First create the table:
CREATE TABLE login(
DATE DATE,
NAME VARCHAR(20)
)
INSERT INTO login VALUES('2023-08-19','老渣'),('2023-08-20','老渣'),('2023-08-21','老渣'),
('2023-08-18','小渣'),('2023-08-19','小渣')
DROP TABLE login
Let’s take the first step to query:
WITH t1 AS(
SELECT DISTINCT NAME,DATE d FROM login
)
SELECT * FROM t1
After grouping the old scum and the young scum, we sorted their respective times:
WITH t1 AS(
SELECT DISTINCT NAME,DATE d FROM login
),
t2 AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY d) AS rn
FROM t1
)
SELECT * FROM t2
Then obtain a temporary date based on the newly obtained rn:
WITH t1 AS(
SELECT DISTINCT NAME,DATE d FROM login
),
t2 AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY d) AS rn
FROM t1
),
t3 AS(
SELECT*,
DATE_SUB(d,INTERVAL rn DAY) AS temp
FROM t2
)
SELECT * FROM t3
According to our temporary date, we conduct statistics on each user and perform conditional queries:
WITH t1 AS(
SELECT DISTINCT NAME,DATE d FROM login
),
t2 AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY d) AS rn
FROM t1
),
t3 AS(
SELECT*,
DATE_SUB(d,INTERVAL rn DAY) AS temp
FROM t2
),
t4 AS(
SELECT NAME,temp,
COUNT(1) AS cnt
FROM t3
GROUP BY NAME,temp
HAVING COUNT(1)>=3
)
SELECT * FROM t4
Since it is possible that the final result will show two Zhang San, this will cause a little conflict with the problem we solved, and the display will not look good, so the fields will be deduplicated at the end.
SELECT DISTINCT NAME FROM t4
In this way, the interviewer also wants to see you use this method to solve the problem ( preferred ), because this kind of code is more readable. If you want to compress and merge it, you may not be me, and I am not you.
Template for everyone:
distinct
->row_number
->date_sub(st,rn) as dt2
->group by dt2.name
->having count(1)>=N天
->distinct name
->count(name)
The second method (broaden your thinking): window function
First upload the template:
->distinct
->date_add(dt,N-1) as date2
->lead(dt,N-1) over(partition by userid order by dt)as date3
->where date2=date3
->distinct
WITH t1 AS (
SELECT DISTINCT NAME, DATE FROM login
),
t2 AS (
SELECT *,
DATE_ADD(DATE, INTERVAL 2 DAY) AS date2,
LEAD(DATE, 2) OVER (PARTITION BY NAME ORDER BY DATE) AS date3
FROM t1
)
SELECT * FROM t2 WHERE date2 = date3;
Although this idea has less code than the first one and looks simple, it is actually much more difficult to understand than the first one. I suggest that if you encounter this kind of question during the interview, just do it directly with the first method. It will definitely It’s not a big problem. Memorize the template and understand it well, then it’s OK. Also when learning the SQL function, some of the parameters in it are different in size due to version or method issues. Don’t do this when learning. Just follow the example and focus on your own version of SQL.