Zero-based self-study SQL course | window function

Hello everyone, my name is Ning Yi.

Today is our Lesson 24: Window Functions.

Window functions, also called OLAP (Online Analytical Processing), can perform real-time analysis and processing of database data.

Window function is a syntax function commonly used by data analysts, and it is almost a required question in interviews with major companies.

This section is advanced content and is difficult to understand. You must type it on your computer to see the effect.

Only in this way can we deepen our understanding. The previous course taught how to install the database. You can check it out on my homepage~

basic grammar

<窗口函数> OVER (
[PARTITION BY <用于分组的列名>] -- 可选
[ORDER BY <用于排序的列名>] -- 可选
)

The following two functions can be placed in the position of <window function>  above  :

(1) Aggregation functions : such as sum.avg, count, max, min, etc.;

(2) Special window functions : such as rank, dense_rank, row_number, which will be discussed in detail below.

We generally put window functions in the select statement.

1. Aggregation function

The window function is also used for grouping and sorting, and has a similar effect to the aggregate function + GROUP BY. However, the records generated by the window function are not aggregated together, and one record is generated for each row of data.

Example: In the Scores results table, find the students whose Sid is 7-10, and calculate and display the total score of each student.

SELECT
  Sid,
  SUM(score) AS "总分"
FROM Scores
WHERE Sid BETWEEN 7 AND 10
GROUP BY Sid;

But here comes the problem. While we know the student's total score, we also want to know the specific score of each subject. This cannot be done using the above statement.

why?

Because if the field after SELECT is an existing column in the table, this column must also be included in the GROUP BY clause.

Therefore, if you want to know the specific scores of each subject, you must add the Cid and score columns after the SELECT. If you also add these two columns in the GROUP BY clause, it will be contrary to our question.

We have emphasized this knowledge point in Lectures 15 and 16. You can click on the homepage to learn more about it, but we will not go into details now.

If we want our colleagues to know the student's total score, course number, and score, we need to use the window function.

SELECT
  Sid,Cid,score,
  SUM(score) OVER (PARTITION BY Sid) AS "总分"
FROM Scores
WHERE Sid BETWEEN 7 AND 10;

Supplementary knowledge : the meaning of PARTITION BY

PARTITION BY means partition, which is similar to GROUP BY grouping.

If PARTITION BY is not written, it means that the entire data set belongs to one partition.

For example, in our SQL statement above, if PARTITION BY is omitted

SELECT
  Sid,Cid,score,
  SUM(score) OVER() AS "总分"
FROM Scores
WHERE Sid BETWEEN 7 AND 10;

The result is as follows, and the total score of all students is calculated.

2. Special window functions

Commonly used dedicated window functions are:

(1) Obtain data ranking:

ROW_NUMBER(): Regardless of the tied ranking, for example, the top three scores are 88, 88, and 77, and the rankings are 1, 2, and 3.

RANK(): If there is a tied row, it will occupy the position of the next rank. For example, the top three scores are 88, 88, and 77, and the rankings are 1, 1, and 3.

DEBSE_RANK(): If the ranked rows are tied, they do not occupy the next ranked position. For example, the top three scores are all 88, 88, and 77, and the rankings are 1, 1, and 2.

Example: In the Scores table, find the students whose IDs are 7-10, and calculate their ranking from high to low.

SELECT *,
  ROW_NUMBER() OVER(
    ORDER BY score DESC
  ) AS "排名"
FROM Scores
WHERE Sid BETWEEN 7 AND 10;

The ROW_NUMBER() window function is used above, and you can see that the "ranking" column has been added. If ROW_NUMBER() is replaced by RANK(), the return result is as follows:

(2) Get the first or last place:

FIRST_VALUE(<column name>) : Get the first place.

LAST_VALUE(<column name>) : Get the last value.

Example: In the Scores table, find the students whose ID is 7-10 and get the highest score of each student.

SELECT *,
  FIRST_VALUE(score) OVER(
    PARTITION BY Sid
    ORDER BY score DESC
  ) AS "最高成绩"
FROM Scores
WHERE Sid BETWEEN 7 AND 10;

(3) Offset function:

LEAD(<column name>,<value n>) : Access data offset n rows downward from the current row.

LAG(<column name>,<value n>) : Access data offset n rows from the current row.

NTH_VALUE(<column name>,<value n>) : Get data from the Nth row in the result set.

(4) Distribution function:

CUME_DIST() : The number of rows in the group that is less than or equal to the current rank value/the total number of rows in the group.

PERCENT_RANK() : Returns the percentage ranking of each row of a column, and each row is calculated according to the formula (rank-1) / (rows-1).

NTILE(<Number n>) : Divide the entire result set into n groups and show which group a certain piece of data is allocated to.

Homework: Verify the offset function and distribution function in Scores and see what results can be returned.

For example: Get the offset score of line 2 below.

SELECT *,
  LEAD(score,2) OVER(
    ORDER BY score DESC
  ) AS "获取下面第2行score值"
FROM Scores
WHERE Sid BETWEEN 7 AND 10;

Our SQL entry to advanced course has been completed. In the future, I will continue to give advanced SQL courses, including views, indexes, concurrency, deadlocks, triggers, events, transactions, stored procedures, etc.

Recorded classes and live classes will also follow~

Guess you like

Origin blog.csdn.net/shine_a/article/details/125528974