How to use window functions to implement ranking calculations

Abstract: This article is original and first published on CSDN by the Grape City technical team. Please indicate the source for reprinting: Grape City official website . Grape City provides developers with professional development tools, solutions and services to empower developers.

Preface

In SQL statements, aggregate functions play an important role in statistical business data results, such as calculating the total number of businesses in each business area, the average score of students in each class, and the maximum value of each category. However, today I will introduce window functions. Compared with aggregate functions, they are also a set of functions, but they are different in usage and applicable scenarios. In this chapter, I will focus on the RANK and DENSE_RANK functions in window functions, and their application scenarios in ranking and filtering. These window functions can help us process data more flexibly and obtain the required results. It should be noted that the minimum required versions of window functions in current mainstream databases are as follows:

Mysql (>=8.0)
PostgreSQL(>=8.4)
SQL Server(>2005)
SQLite(>3.25.0)

If your database version is lower than the above requirements, you will not be able to use window functions.

Requirement background:

In order to give everyone a better understanding, I will use student data as the background of the query: Assume that students of a certain grade in a certain school have completed an exam, and the results have been entered into the database:

Now the academic director of this grade wants to take a look at:

1. The top 2 students in each subject of this grade in this exam.

2. The top 2 in each subject in each class in this exam.

3. Rank the top 2 in each class by total score in this exam.

If you use ordinary SQL to query, it is troublesome and time-consuming, but you can quickly query the student data you want by using the RANK and DENSE_RANK functions. The following will introduce how to use the RANK and DENSE_RANK functions to query student data.

Query student data using poly-RANK and DENSE_RANK functions

1. Check the top 2 students in each subject of this grade.

In order to get the top two in each different subject, we need to first use the Rank() function to rank each student in the partition of their respective subject. Execute the following SQL statement, and the query result is as shown below.

select sd.*, RANK() over(partition by subject order by score desc) as _rank from score_data sd;

As you can see, the execution results have already obtained the ranking field _rank based on the scores of each subject. Next, you only need to filter out the parts with the _rank field greater than 2. The query results are as shown in the figure below.

select * from (

select sd.*, RANK() over(partition by subject order by score desc) as _rank from score_data sd

) tmp

where tmp._rank <=2

From the picture above, we can see that if there are the same scores, there will be query results like math subjects: the math query comes out with three values ​​(because two people have the same math score of 77 points), if we only want to keep For a piece of repeated data, we can use the DENSE_RANK function. The calculation syntax of this function is basically the same as RANK. The only difference is that when Rank is calculated, the total number of rows of records with a score higher than the current row will be obtained, which is what is queried in the above figure. There are three pieces of data for the math subject, and DENSE_RANK calculates the total number of rows of deduplicated records whose scores are higher than the current row. In other words, if there is duplicate data in the math subject like the picture above, the duplicate data will be removed.

2. Query the top 2 in each subject in each class.

To query the top 2 students in each subject in each class, you only need to add a class partition rule to the first step (query the top 2 students in each subject of this grade). The query results are as shown in the figure below:

select * from (

select sd.*, RANK() over(partition by subject, class order by score desc) as _rank from score_data sd

) tmp

where tmp._rank <=2

3. Query the top 2 total scores in each class.

In the same way, on the basis of the second step (querying the top two in each subject in each class in the score_data table), add a sum of scores SUM(score) function to query the top two in each class. .

select class,name,SUM(score) AS total_score,

RANK() over (PARTITION by class order by SUM(score) desc)

from score_data sd group by class,name


In the calculation of this indicator, aggregation functions and ranking functions need to be used in combination. Because each person's total score is split into the sum of multiple subjects, it is necessary to aggregate the data on the joint grouping dimension of classes and subjects. Compressed to the granularity of each person’s total score.

Summarize

Window functions are very powerful tools in SQL functions, especially in scenarios such as report statistics. They not only simplify complex data calculations and analysis, but also improve query efficiency and flexibility. Window functions are like the sharpest Swiss Army knife in database operations, providing us with a powerful and precise way to process data.

Extension link:

Implementing Excel server import and export under the Spring Boot framework

Project practice: online quotation procurement system (React + SpreadJS + Echarts)

Svelte framework combined with SpreadJS implements pure front-end Excel online report design

Guess you like

Origin blog.csdn.net/powertoolsteam/article/details/132270569