5 must-take SQL interview questions

For students who learn Python, SQL must also be learned. SQL is a must-have topic for almost every data post. Here are some common SQL exercises in big factories.

(1) Find out the users who have logged in for 7 consecutive days and 30 consecutive days (Xiaohongshu written test, Telecom Cloud interview), the question of the maximum number of consecutive login days-window function

(2) Find the number of users who have clicked three times in a row, there can be no other people's clicks in the middle, the deformation problem of the maximum number of consecutive days (Tencent Weibo interview) – window function

(3) Calculate the average salary excluding the highest salary and the lowest salary in the department (byte beating interview) – window function

(4) The calculation of retention, and the calculation of cumulative summation-window function, self-connection (pdd interview)

(5) AB team score flow table, get the names of the players who scored three consecutive times and the names of the players who overtook the opponent each time, (pdd interview)

Understand these types of questions thoroughly, and you will no longer be afraid of tearing SQL and written exams. The most difficult of them is question (5). The SQL in the entire interview is basically the gameplay of window functions, and there are more tests with case when. If you like this article, remember to bookmark, follow, and like.

[Note] Pay attention to the public account at the end of the article and get a real interview question

(1) Find out the users who have logged in for 7 consecutive days and 30 consecutive days

select *
fromselect user_id ,count(1) as num
  from
     (select user_id,date_sub(log_in_date, rank) dts
          f rom  (select user_id,log_in_date, 
                  row_number() over(partitioned by user_id order by log_in_date ) as rank
    from user_log
           )t
      )a
  group by dts
)b
where num = 7  

(2) Find the number of users who clicked three times in a row, and there can be no other people's clicks in the middle,

a table records the click flow information, including user id, and click time

usr_id a a b a a a a

click_time t1 t2 t3 t4 t5 t6 t7

row_number() over(order by click_time) as rank_1get rank_1 as 1 2 3 4 5 6 7

row_number() over(partition by usr_id order by click_time)get rank_2 as 1 2 1 3 4 5 6

rank_1- rank2get diff as 0 0 2 1 1 1 1

At this time, we found that we only need to group the diffs with more than 3 counts, that is, users who clicked more than three consecutively and no other people clicked in the middle.

select distinct usr_id
from    
(
   select *, rank_1- rank2  as diff
   from
  (
      select *,
      row_number() over(order by click_time) as  rank_1
      row_number() over(partition by usr_id order by click_time) as rank_2
      from a
   ) b
) c
group by diff,usr_id
having count(diff) >=3

(3) Calculate the average salary excluding the highest salary of the department and the lowest salary (byte beating interview) – window function

emp table

id 员工 id ,deptno 部门编号,salary 工资

The core is to use the window function in descending and ascending order to take out the highest and lowest.

select a.deptno,avg(a.salary)
from  
 (
 select *, rank() over( partition by deptno order by salary ) as rank_1
 , rank() over( partition by deptno order by salary desc) as rank_2 
 from emp
 )  a 
group by a.deptno
where a.rank_1 >1 and a.rank_2 >1 

(4) The calculation of retention, and the calculation of cumulative summation-window function, self-connection (pdd interview)

The camera in the mobile phone is one of the most popular applications. The following picture is a screenshot of some data in the user behavior information table in the database of a mobile phone manufacturer

insert image description here

Now the mobile phone manufacturer wants to analyze the active situation of the mobile phone application (camera), and needs to count the following data:

The format of the data to be obtained is as follows:
insert image description here

select d.a_t,count(distinct case when d.时间间隔=1 then d.用户id     
               else null
               end) as  次日留存数, 
count(distinct case when 时间间隔=1 then d.用户id
               else null
               end) /count(distinct d.用户id) as 次日留存率,
count(distinct case when d.时间间隔=3 then d.用户id     
               else null
               end) as  3日留存数 ,
count(distinct case when 时间间隔=3 then d.用户id
               else null
               end) /count(distinct d.用户id) as 3日留存率,
count(distinct case when d.时间间隔=7 then d.用户id     
               else null
               end) as  7日留存数 ,
count(distinct case when 时间间隔=7 then d.用户id
               else null
               end) /count(distinct d.用户id) as 7日留存率

from
(select *,timestampdiff(day,a_t,b_t) as 时间间隔
from (select a.`用户id`,a.登陆时间 as a_t ,b.登陆时间 as b_t
from 登录信息 as a  
left join 登录信息 as b
on a.`用户id`=b.`用户id`
where a.应用名称= '相机' AND b.应用名称='相机') as c) as d
group by d.a_t; 

(5) AB team score flow meter, get the name of the player who scored three consecutive times and the name of the player who overtook the opponent each time (pdd)

During the review, I found a similar original question. This is the most difficult question I encountered in the interview.

Question: Two basketball teams have a heated basketball game, and the scores alternately rise. After the game, you have a list of the scores of the two teams, recording the team team, player number, player name, score score and score time (datetime). Now the team wants to reward the outstanding players in the game, so please use sql to count

1) List of players who scored for the team three times (or more) in a row

2) The names and corresponding time of the players who helped their teams to overtake the score during the game.

Create a similar table

CREATE TABLE basketball_game_score_detail(
   team  VARCHAR(40) NOT NULL ,
   number VARCHAR(100) NOT NULL,
   score_time datetime NOT NULL,
   score int NOT NULL,
   name varchar(100)  NOT NULL
);
insert into  basketball_game_score_detail values('A',1,'2020/8/28 9:01:14',1,'A1');
insert into  basketball_game_score_detail values('A',5,'2020/8/28 9:02:28',1,'A5');
insert into  basketball_game_score_detail values('B',4,'2020/8/28 9:03:42',3,'B4');
insert into  basketball_game_score_detail values('A',4,'2020/8/28 9:04:55',3,'A4');
insert into  basketball_game_score_detail values('B',1,'2020/8/28 9:06:09',3,'B1');
insert into  basketball_game_score_detail values('A',3,'2020/8/28 9:07:23',3,'A3');
insert into  basketball_game_score_detail values('A',4,'2020/8/28 9:08:37',3,'A4');
insert into  basketball_game_score_detail values('B',1,'2020/8/28 9:09:51',2,'B1');
insert into  basketball_game_score_detail values('B',2,'2020/8/28 9:11:05',2,'B2');
insert into  basketball_game_score_detail values('B',4,'2020/8/28 9:12:18',1,'B4');
insert into  basketball_game_score_detail values('A',1,'2020/8/28 9:13:32',2,'A1');
insert into  basketball_game_score_detail values('A',1,'2020/8/28 9:14:46',1,'A1');
insert into  basketball_game_score_detail values('A',4,'2020/8/28 9:16:00',1,'A4');
insert into  basketball_game_score_detail values('B',3,'2020/8/28 9:17:14',3,'B3');
insert into  basketball_game_score_detail values('B',2,'2020/8/28 9:18:28',3,'B2');
insert into  basketball_game_score_detail values('A',2,'2020/8/28 9:19:42',3,'A2');
insert into  basketball_game_score_detail values('A',1,'2020/8/28 9:20:55',1,'A1');
insert into  basketball_game_score_detail values('B',3,'2020/8/28 9:22:09',2,'B3');
insert into  basketball_game_score_detail values('B',3,'2020/8/28 9:23:23',3,'B3');
insert into  basketball_game_score_detail values('A',5,'2020/8/28 9:24:37',2,'A5');
insert into  basketball_game_score_detail values('B',1,'2020/8/28 9:25:51',3,'B1');
insert into  basketball_game_score_detail values('B',2,'2020/8/28 9:27:05',1,'B2');
insert into  basketball_game_score_detail values('A',3,'2020/8/28 9:28:18',1,'A3');
insert into  basketball_game_score_detail values('B',4,'2020/8/28 9:29:32',1,'B4');
insert into  basketball_game_score_detail values('A',1,'2020/8/28 9:30:46',3,'A1');
insert into  basketball_game_score_detail values('B',1,'2020/8/28 9:32:00',1,'B1');
insert into  basketball_game_score_detail values('A',4,'2020/8/28 9:33:14',2,'A4');
insert into  basketball_game_score_detail values('B',1,'2020/8/28 9:34:28',1,'B1');
insert into  basketball_game_score_detail values('B',5,'2020/8/28 9:35:42',2,'B5');
insert into  basketball_game_score_detail values('A',1,'2020/8/28 9:36:55',1,'A1');
insert into  basketball_game_score_detail values('B',1,'2020/8/28 9:38:09',3,'B1');
insert into  basketball_game_score_detail values('A',1,'2020/8/28 9:39:23',3,'A1');
insert into  basketball_game_score_detail values('B',2,'2020/8/28 9:40:37',3,'B2');
insert into  basketball_game_score_detail values('A',3,'2020/8/28 9:41:51',3,'A3');
insert into  basketball_game_score_detail values('A',1,'2020/8/28 9:43:05',2,'A1');
insert into  basketball_game_score_detail values('B',3,'2020/8/28 9:44:18',3,'B3');
insert into  basketball_game_score_detail values('A',5,'2020/8/28 9:45:32',2,'A5');
insert into  basketball_game_score_detail values('B',5,'2020/8/28 9:46:46',3,'B5');

picture

Here I use lead and lag to get the first few values ​​of each group. This is not the same as the maximum number of contact days, but it can also be solved in a similar way, but it is easier to understand using lead and lag

select distinct a.name ,a.team from
(
select *,lead(name,1) over(partition by team order by score_time) as ld1
,lead(name,2) over(partition by team order by score_time) as ld2
,lag(name,1) over(partition by team order by score_time) as lg1
,lag(name,2) over(partition by team order by score_time) as lg2
from table
) a
where (a.name =a.ld1 and a.name =a.ld2)
or (a.name =a.ld1 and a.name =a.lg1)
or (a.name=a.lg1 and a.name=a.lg2)

The second question was not fully completed during the interview. I talked about my ideas. Now that I think about the ideas at that time, there are still problems, and this question is not difficult. The core is to record the cumulative score table at each moment.

SELECT TEAM,number,name,score_time,score,case when team='A' then score else 0 end as A_score
,case when team='B' then score else 0 end B_score
FROM basketball_game_score_detail
ORDER BY SCORE_time

picture

The cumulative score table at each moment is obtained as follows

select team,number,name,score_time,A_score,b_score
,sum(A_score)over(order by score_time) as  a_sum_score2
,sum(b_score)over(order by score_time) as b_sum_score2
from 
(
  SELECT TEAM,number,name,score_time,score,case when team='A' then score else 0 end as A_score
  ,case when team='B' then score else 0 end B_score
  FROM basketball_game_score_detail
  ORDER BY SCORE_time
) as x

picture

Calculate the cumulative score difference at each moment and the cumulative score difference at the previous time. As long as the signs of the two are opposite, it is the go-ahead moment. The idea seems to be relatively simple.

select *,score_gap*last_score_gap
from 
(
 select  *,a_sum_score2-b_sum_score2 as score_gap 
 ,lag(a_sum_score2-b_sum_score2,1)over(order by score_time) as last_score_gap
 from 
 (
  select team,number,name,score_time,A_score,b_score
  ,sum(A_score)over(order by score_time) as  a_sum_score2
  ,sum(b_score)over(order by score_time) as b_sum_score2
  from (
   SELECT TEAM,number,name,score_time,score,case when team='A' then score else 0 end as A_score
   ,case when team='B' then score else 0 end B_score
   FROM basketball_game_score_detail
   ORDER BY SCORE_time
  ) as x
 ) as y
) as z
where z.score_gap*last_score_gap<=0
and a_sum_score2<>b_sum_score2 

recommended article

Technology Exchange

Welcome to reprint, collect, like and support!

insert image description here

At present, a technical exchange group has been opened, and the group has more than 2,000 members . The best way to remark when adding is: source + interest direction, which is convenient to find like-minded friends

  • Method 1. Send the following picture to WeChat, long press to identify, and reply in the background: add group;
  • Method ②, add micro-signal: dkl88191 , note: from CSDN
  • Method ③, WeChat search public account: Python learning and data mining , background reply: add group

long press follow

Guess you like

Origin blog.csdn.net/weixin_38037405/article/details/123941304