SQL window function

1. What is the use of window functions?
In daily work, we often encounter the need to rank in each group , such as the following business requirements:

Ranking problem: each department ranks
topN according to performance problem: find the top N employees in each department for reward

Faced with such needs, it is necessary to use SQL's advanced window functions.

2. What is a window function?

Window functions, also called OLAP functions (Online Anallytical Processing, online analytical processing), can perform real-time analysis and processing of database data.

The basic syntax of the window function is as follows:

<窗口函数> over (partition by <用于分组的列名>
                order by <用于排序的列名>)

So what are the <window functions> in the syntax?

The position of <window function> can put the following two functions:

1) Dedicated window functions, including rank, dense_rank, row_number and other dedicated window functions to be mentioned later.

2) Aggregate functions, such as sum. avg, count, max, min, etc.

Because window functions operate on the results of where or group by clauses are processed, window functions can only be written in select clauses in principle .

3. How to use?

Next, I will introduce the usage of several window functions in combination with examples.

1. Dedicated window function rank

For example, the following figure is the content of the class table

img

If we want to rank by grades in each class, we get the following result.

img

Take class "1" as an example. The grade "95" of this class is ranked first, and the grade of "83" of this class is ranked fourth. The above result is indeed ranked according to our requirements in each class.

The sql statement code to get the above result is as follows:

select *,
   rank() over (partition by 班级
                 order by 成绩 desc) as ranking
from 班级表

Let's explain the select clause in this sql statement. Rank is a function of sorting. The requirement is "rank by grade within each class". This sentence can be divided into two parts:

1) Within each class: group by class

Partition by is used to group tables . In this example, so we specify the "class" grouping (partition by class)
2) Ranked by grade

The function of the order by clause is to sort the grouped results , and the default is to sort them in ascending order (asc). In this example (order by grade desc) is sorted by the column of grades, and the desc keyword is added to indicate descending order.

Through the following figure, we can understand the role of partition by (grouping) and order by (sorting within a group).

img

The window function has the group by clause grouping function and the order by clause sorting function that we have learned before. So, why use window functions?

This is because group by changes the number of rows in the table after grouping and summarizing, and there is only one category in a row. The partition by and rank functions will not reduce the number of rows in the original table . For example, the following counts the number of people in each class.

img

相信通过这个例子,你已经明白了这个窗口函数的使用:

select *,
   rank() over (partition by 班级
                 order by 成绩 desc) as ranking
from 班级表

Now we come back, why is it called the "window" function? This is because the result of partition by grouping is called "window". The window here is not the door and window of our house, but means "scope".

Simply put, the window function has the following functions:

1) It has the functions of grouping and sorting at the same time

2) Do not reduce the number of rows in the original table

3) The syntax is as follows:

<窗口函数> over (partition by <用于分组的列名>
                order by <用于排序的列名>)

2. Other professional window functions

What is the difference between the dedicated window functions rank, dense_rank, row_number?

Let me give you an example of the difference between them, you can understand it at a glance:

select *,
   rank() over (order by 成绩 desc) as ranking,
   dense_rank() over (order by 成绩 desc) as dese_rank,
   row_number() over (order by 成绩 desc) as row_num
from 班级表

got the answer:

img

As can be seen from the above results:

Rank function: In this example, it is 5-digit, 5-digit, 5-digit, 8-digit, that is, if there is a row with parallel ranking, it will occupy the next position. For example, the normal ranking is 1, 2, 3, 4, but now the top 3 are tied together, the result is: 1 , 1 , 1 , 4.

Dense_rank function: In this example, it is 5-digit, 5-digit, 5-digit, 6-digit, that is, if there is a row with a parallel rank, it does not occupy the position of the next rank. For example, the normal ranking is 1, 2, 3, 4, but now the top 3 are tied together, the result is: 1 , 1 , 1 , 2.

Row_number function: In this example, it is 5 digits, 6 digits, 7 digits, 8 digits, that is, the case of parallel ranking is not considered. For example, the top 3 ranks are tied, and the ranking is normal 1 , 2 , 3 , 4.

The differences between these three functions are as follows:

img

Finally, one point that needs to be emphasized is: in the above three dedicated window functions, the parentheses after the function do not require any parameters, just leave () empty.

Now, do you have a basic understanding of window functions?

3. Aggregate function as window function

The usage of the aggregation and window function is exactly the same as the dedicated window function mentioned above. You only need to write the aggregation function in the position of the window function, but the brackets after the function cannot be empty, and the column name of the aggregation needs to be specified.

Let's take a look at what results will come out when the window function is an aggregate function:

select *,
   sum(成绩) over (order by 学号) as current_sum,
   avg(成绩) over (order by 学号) as current_avg,
   count(成绩) over (order by 学号) as current_count,
   max(成绩) over (order by 学号) as current_max,
   min(成绩) over (order by 学号) as current_min
from 班级表

got the answer:

img

Did you find anything? I use sum alone as an example:

As shown in the figure above, the aggregate function sum in the window function is the result of summing the self record and the data above the self record. For example, No. 0004, after using the sum window function, the result is the sum of the results of No. 0001, 0002, 0003, 0004, if it is No. 0005, the result is the sum of the results of No. 0001~0005, and so on.

Not only sum, average, count, maximum and minimum values, but also the same. They are all calculated for their own records and all data on their own records. Now combine the results obtained just now (below), is it understood? Much easier?

img

For example, the result of the aggregation window function behind No. 0005 is: the sum, average, count, and maximum and minimum of the scores of five students with student numbers 0001~0005.

If you want to know the aggregated results of everyone's scores, such as the sum and average, just look at the last line.

What is the use of using window functions like this?

Aggregate function as a window function, you can intuitively see in the data of each row, as of the data in this row, what is the statistical data (maximum value, minimum value, etc.). At the same time, you can see the impact of each row of data on the overall statistical data.

Four. Matters needing attention

The partition clause is omitted. If omitted, the grouping is not specified. The results are as follows, but the results are sorted from high to low:

select *,
   rank() over (order by 成绩 desc) as ranking
from 班级表

got the answer:

img

However, this loses the function of the window function, so generally don't use it like this.
Four. Summary

1. Window function syntax

<窗口函数> over (partition by <用于分组的列名>
                order by <用于排序的列名>)

The position of <window function> can put the following two functions:

1) Dedicated window functions, such as rank, dense_rank, row_number, etc.

2) Aggregate functions, such as sum. avg, count, max, min, etc.

2. The window function has the following functions:

1) It has the functions of partition by and order by at the same time

2) Does not reduce the number of rows in the original table, so it is often used to rank within each group

3. Matters needing attention

In principle, window functions can only be written in the select clause

4. Window function usage scenarios

1) Business requirements " rank in each group" , such as:

Ranking problem: each department ranks
topN according to performance problem: find the top N employees in each department for reward

Other examples: https://www.cnblogs.com/DataArt/p/9961676.html

Guess you like

Origin blog.csdn.net/weixin_44322234/article/details/114001291