Advanced SQL Analysis Functions - Window Functions

foreword

In SQL statements, aggregation functions play an important role in the statistics of business data results, such as calculating the total number of businesses in each business area, the average score of students in each class, and the maximum value of each category. However, today I will introduce window functions, which are also a set of functions compared to aggregate functions, but differ in how they are used and when they are applicable. In this chapter, I will focus on the RANK and DENSE_RANK functions in window functions, and their application scenarios in ranking and filtering. These window functions can help us process data more flexibly and obtain the desired results. It should be noted that the minimum required versions of window functions for current mainstream databases are as follows:

Mysql (>=8.0)
PostgreSQL(>=8.4)
SQL Server(>2005)
SQLite(>3.25.0)

If your database version is lower than the above requirements, you will not be able to use window functions.

Demand background:

In order to let everyone understand better, I will use student data as the conditional background of the query: Suppose a student of a certain grade in a certain school has completed an exam, and the results have also been entered into the database:

Now the grade-level dean wants to see:

1. The top 2 students in each subject of this grade in this exam.

2. The top 2 in each subject in each class in this exam.

3. The total score of each class in this test ranks the top 2.

If you use ordinary SQL to query, it is troublesome and time-consuming, but you can quickly query the student data you want by using the RANK and DENSE_RANK functions. The following will introduce how to use the RANK and DENSE_RANK functions to realize the query of student data.

Querying student data using polyRANK and DENSE_RANK functions

1. Query the top 2 students in each subject of this grade.

In order to get the top 2 of each different subject, we need to use the Rank() function to rank each student in their respective subject divisions. Execute the following SQL statement, and the query results are as shown in the figure below.

select sd.*, RANK() over(partition by subject order by score desc) as _rank from score_data sd;

It can be seen that the ranking field _rank has been obtained according to the results of each subject in the execution result. Next, you only need to filter out the part with the _rank field greater than 2. The query result is shown in the figure below.

select * from (

select sd.*, RANK() over(partition by subject order by score desc) as _rank from score_data sd

) tmp

where tmp._rank <=2

From the figure above, if there is the same grade, there will be a query result like a math subject: three values ​​​​are obtained from the math query (because there are two people with the same math score of 77 points), if we only want to keep For a piece of repeated data, we can use the DENSE_RANK function. The calculation syntax of this function is basically the same as that of RANK. The only difference is that when calculating Rank, it will get the total number of rows with higher scores than the current row, which is the query from the above figure. Three pieces of data of mathematics subjects, and DENSE_RANK is to calculate the total number of deduplicated records whose grades are higher than the current row, that is to say, if there are duplicate data in the mathematics subjects like the picture above, the duplicate data will be removed.

2. Query the top 2 of each subject in each class.

To query the top 2 of each subject in each class, you only need to add a class partition rule in the first step (query the top 2 students of each subject in this grade). The result of the query is shown in the following figure:

select * from (

select sd.*, RANK() over(partition by subject, class order by score desc) as _rank from score_data sd

) tmp

where tmp._rank <=2

3. Query the top 2 total scores in each class.

Similarly, on the basis of the second step (query the top 2 of each subject in each class in the score_data table), add a sum of grades SUM(score) function to query the top 2 of the total score in each class .

select class,name,SUM(score) AS total_score,

RANK() over (PARTITION by class order by SUM(score) desc)

from score_data sd group by class,name

SQL copy full screen


In the calculation of this indicator, it is necessary to combine the aggregation function and the ranking function, because the total score of each person is split into the sum of multiple subjects, so it is necessary to aggregate on the joint grouping dimension of class and subject, and combine the data Compressed to the granularity of total score per person.

Summarize

Window functions are very powerful tools in SQL functions, especially in the field of report statistics and other scenarios. They can not only simplify complex data calculation and analysis, but also improve query efficiency and flexibility. Window functions are like the sharpest swiss army knife in database operations, providing us with a powerful and precise way to process data.

Guess you like

Origin blog.csdn.net/weixin_45925028/article/details/132336930