Please write a sql statement to count the list of players who have scored for the team three times (and above) in a row

caf6deb7cfd9191d19c88daff495533a.jpeg

[Pinduoduo Interview Questions]

The two basketball teams competed fiercely, and the scores rose alternately. After the game, you have a breakdown of the scores of the two teams:

d33fd77ae54af20e58d290fae571d65c.png

The table records the team, player number, player name, points scored, and when the score was scored. Now the team has to reward the outstanding players in the game.

question:

Please write a sql statement to count the list of players who have scored for the team three times (and above) in a row

【Problem solving steps】

1. Window function

The list of players who have scored for the team three times (and above) in a row. The translation of this sentence in vernacular is: Find the player [name] in [each team] who has scored for the team three times (and above) in a row.

When it comes to "each", think of using grouping or window functions as mentioned in "Monkeys Learn SQL from Zero" .

Because the problem is a "continuous problem", that is, scoring more than three times in a row means that the game is sorted from front to back by scoring time. So use the window function, first group by team , and then sort by scoring time .

For example, after grouping by team and sorting in descending order of scoring time in the picture below, we can see that the names of player A1 in team A and player B3 in team B appear three times in a row.9bddd315ed930f09e6242756e6504a79.jpeg

The corresponding window functions are as follows:

select *,
       rank() over(partition by 球队 
                   order by 得分时间) as 排名
from 分数表;

search result:

f8881bee4a2d1079f797eadd1f1222ce.png

In the above results, we can see with the naked eye that A1 appears 3 times in a row, but how to use the SQL statement to get all the player names that appear 3 times in a row?

2. Find the value that appears 3 times in a row

If we displace the first column "Player Name" up 1 row to the 2nd column, and up 2 rows to the 3rd column, then the original 3 consecutive values ​​in the 1st column will go to the same row. For example, in the figure below, three consecutive A1 values ​​in column 1 are now in the same row.

a16f95b9372bdfe9a1300c8089f91def.jpeg

After this change, we only need a where clause to restrict the values ​​of the three columns to be equal, and we can filter out the names of players that appear three times in a row.

So, how to use SQL to achieve the effect of the above-mentioned dislocation of two columns?

You can use the window function lag or lead:

Upward window function lead: Take out the column where the field name is located, and the data of N rows up , as an independent column

Downward window function lag: Take out the column where the field name is located, and down N rows of data as an independent column

The window function syntax is as follows:

lag(field name, N, default value) over(partion by ... order by ...)

lead(field name, N, default value) over(partion by ... order by ...)

The default value means that when the value goes up N rows or down N rows, if it has exceeded the range of table rows and columns, this default value will be used as the return value of the function. If no default value is specified, Null will be returned.

This is still too abstract, let's take a look at an example to understand.

The figure below is to use the upward window function lead to get the column (column 2) of the player's name up 1 row, because the 1 row up from A1 exceeds the range of the table row and column, so the corresponding value here is the default value (if the default value is not set, it is null ).

e90282091faa5d607f5afb2505c6e032.jpeg

The corresponding SQL statement:

select 球员姓名,
       lead(球员姓名,1) over(partition by 球队 
                            order by 得分时间) as 下一项
from 分数表;

The figure below uses the downward window function lag to get the column (column 2) of the player's name down 1 row,

403c101bb576ee4720d4f7e969599e7a.jpeg

The corresponding SQL statement is as follows:

select 球员姓名,
       lag(球员姓名,1) over(partition by 球队 
                           order by 得分时间) as 上一行
from 分数表;

According to the previous analysis, we need to get the value of the player's name up 1 line and up 2 lines, that is:

lead(player name,1)

lead(player name, 2)

03bcf356b6235ceb8b5ff8f350e01599.jpeg

The corresponding SQL is as follows:

select 球员姓名,
       lead(球员姓名,1) over(partition by 球队 order by 得分时间) as 姓名1,
       lead(球员姓名,2) over(partition by 球队 order by 得分时间) as 姓名2
from 分数表;

search result:

c03ed110ebd6dd3d700c7c38c49d96c0.jpeg

3. SQL running order

After completing the above work, you can now use the where clause to filter out the rows with the same three values, that is, player name = name 1 and player name = name 2.

But it should be noted that according to the SQL running sequence we mentioned before , the where clause cannot be added directly after the above steps. Because according to the running order of SQL, the from and where clauses will be run first, and then the select clause will be run.

Therefore, the two columns of name 1 and name 2 will not appear until the last run of select, we need to use a subquery to solve it, and at the same time, the last player name needs to be deduplicated (disitinct).

select distinct 球员姓名
from(
select 球员姓名,
lead(球员姓名,1) over(partition by 球队 order by 得分时间) as 姓名1,
lead(球员姓名,2) over(partition by 球队 order by 得分时间) as 姓名2
from 分数表
) as a
where (a.球员姓名 = a.姓名1 and a.球员姓名 = a.姓名2);

search result:

d201b809d037e6650a565a2f96de8676.png

In this case, the window function lag can also be used, and the same result can be obtained. The principle is similar. You can draw a picture yourself and send it to me to share your learning results after practice.

[Test points for this question]

1. Examine the running order and subqueries of SQL

2. What problems can be used with window functions?

"Monkey Learned SQL from Zero" mentioned that the following business scenarios need to use window functions:

1) Classic topN problem

2) Classic ranking problem

3) Questions to compare within each group

4) Cumulative summation problem

5) Moving average problem

6) Problems appearing N times in a row

3. Examine the usage of window function lag and lead

These two functions are generally used to calculate the difference, for example:

1) Calculation takes time. For example: a certain data is the time record of each user browsing the webpage. After the recorded time is misplaced, the actual time spent by each user browsing each webpage can be obtained by subtracting the two columns.

2) Calculate the salary increase compared to last time.

【learn by analogy】

In the future, if you encounter this kind of problem that appears N times in a row, you can use the following universal template to solve it:

e7a4ab99018e2376f84852668155aa35.png

select distinct 列1
from(
select 列1,
lead(列1,1) over(order by 序号) as 列2,
lead(列1,2) over(order by 序号) as 列3,
...
lead(列1,n-1) over(order by 列) as 列n,
from 表名
) as a
where (a.列1 = a.列2 and ... and a.列1 = a.列n);

example:

The following is the student's grade table (table name score, column name: student number, grade), use SQL to find all grades that appear at least 3 times in a row.

994580c3020361e359da57605906203a.png

For this question we use the lag function:

f2163553eb8da1f4725d85e9116ff376.jpeg

The corresponding implementation of SQL is as follows:

select 成绩,
lag(成绩,1) over(order by 学号) as 成绩1,
lag(成绩,2) over(order by 学号) as 成绩2
from 成绩表;

search result:

333add6bf36f8e20ba3cd5bc8d14a114.png

Final answer:

select distinct 成绩
from(
select 成绩,
lag(成绩,1) over(order by 学号) as 成绩1,
lag(成绩,2) over(order by 学号) as 成绩2
from 成绩表)t
where (t.成绩 = t.成绩1 and t.成绩 = t.成绩2);

search result:

82ee9767c7d6f8cca36a85361d845380.png

032c2a5c758565336f714b0030b8c26e.jpeg ⬇️Click "Read the original text"

 Sign up for free Data analysis training camp

Guess you like

Origin blog.csdn.net/zhongyangzhong/article/details/130023340