3.5. Window Functions

3.5. Window Functions

3.5 窗口函数

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like nonwindow aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes,the window function is able to access more than just the current row of the query result.

窗口函数针对当前行,对表行进行计算。这类似于使用聚合函数进行的计算。然而,窗口函数不会像非窗口函数那样将行分组聚合为一行输出,而是这些行保留各自的属性。而且,窗口函数能够不仅仅访问查询结果的当前行。

Here is an example that shows how to compare each employee's salary with the average salary in his or her department:

以下示例,展示了雇员如何与本部门的平均工资进行比较:

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY

depname) FROM empsalary;

depname | empno | salary | avg

-----------+-------+--------+-----------------------

develop | 11 | 5200 | 5020.0000000000000000

develop | 7 | 4200 | 5020.0000000000000000

develop | 9 | 4500 | 5020.0000000000000000

develop | 8 | 6000 | 5020.0000000000000000

develop | 10 | 5200 | 5020.0000000000000000

personnel | 5 | 3500 | 3700.0000000000000000

personnel | 2 | 3900 | 3700.0000000000000000

sales | 3 | 4800 | 4866.6666666666666667

sales | 1 | 5000 | 4866.6666666666666667

sales | 4 | 4800 | 4866.6666666666666667

(10 rows)

The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row. (This actually is the same function as the non-window avg aggregate, but the OVER clause causes it to be treated as a window function and computed across the window frame.)

输出中,前面三列来自于表 empsalary,每列表中都对应有数据。第四列为具有相同depname的salary的平均值。(这其实跟avg函数差不多,但是over使得该命令被视为窗口函数,并在窗口框架内进行计算。)

A window function call always contains an OVER clause directly following the window function's name and argument(s). This is what syntactically distinguishes it from a normal function or nonwindow aggregate. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row,the window function is computed across the rows that fall into the same partition as the current row.

窗口函数调用,总是在窗口函数名称和参数之后直接加一个over子句。这在语义上区别于普通的函数或者非窗口聚合函数。over子句确定窗口函数如何处理查询返回的行。over中的partition by子句将行按照partition by中的表达式(上例中的depname)进行分组或分区。对于每一行,窗口函数将统一分区中的行进行计算作为当前行(上例中的,相同depname的所有salary的平均值作为当前行)。

You can also control the order in which rows are processed by window functions using ORDER BY within OVER. (The window ORDER BY does not even have to match the order in which the rows are output.) Here is an example:

还可以在over中使用order by对窗口函数处理的行进行排序。示例如下:

SELECT depname, empno, salary,

rank() OVER (PARTITION BY depname ORDER BY salary DESC)

FROM empsalary;

depname | empno | salary | rank

-----------+-------+--------+------

develop | 8 | 6000 | 1

develop | 10 | 5200 | 2

develop | 11 | 5200 | 2

develop | 9 | 4500 | 4

develop | 7 | 4200 | 5

personnel | 2 | 3900 | 1

personnel | 5 | 3500 | 2

sales | 1 | 5000 | 1

sales | 4 | 4800 | 2

sales | 3 | 4800 | 2

(10 rows)

As shown here, the rank function produces a numerical rank for each distinct ORDER BY value in the current row's partition, using the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.

如上,函数rank分区对order by定义的排序进行了标号(相同的值,标号相同)。因为rank的行为完全受over子句影响,所以它不需要显式的定义参数。

The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.

窗口函数处理的行,来自于from子句查询出来的虚拟表。例如,被where条件过滤掉的行不会被窗口函数处理。一个查询中可以使用多个窗口函数,但它们都针对相同的结果行集起作用。

We already saw that ORDER BY can be omitted if the ordering of rows is not important. It is also possible to omit PARTITION BY, in which case there is a single partition containing all rows.

正如我们所见,如果不需要排序,那么可以不使用order by子句。当然如果只有一个分区的时候,也可以忽略partition by子句。

There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. 1 Here is an example using sum:

窗口函数中还有一个重要的概念:对于每一行,分区内的所有行称为窗口框(window frame)。一些窗口函数只会作用于窗口框,而不是所有的分区。默认情况下,如果使用了order by子句,那么窗口框包含从分区开始到当前行、以及等于当前行的所有行。如果未使用order by,则窗口框包含分区内所有行。以下为使用sum的示例:

SELECT salary, sum(salary) OVER () FROM empsalary;

salary | sum

--------+-------

5200 | 47100。

5000 | 47100

3500 | 47100

4800 | 47100

3900 | 47100

4200 | 47100

4500 | 47100

4800 | 47100

6000 | 47100

5200 | 47100

(10 rows)

Above, since there is no ORDER BY in the OVER clause, the window frame is the same as the partition,which for lack of PARTITION BY is the whole table; in other words each sum is taken over the whole table and so we get the same result for each output row. But if we add an ORDER BY clause,we get very different results:

上例中,因为over子句中没有使用order by,所以窗口框与分区相同,而因为没有partition by,所以为分区为整个表,也就是说,每个sum均为针对整表的,所以sum的每一行均为相同的值。但如果加上order by子句,那么结果将不同:

SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

salary | sum

--------+-------

3500 | 3500

3900 | 7400

4200 | 11600

4500 | 16100

4800 | 25700

4800 | 25700

5000 | 30700

5200 | 41100

5200 | 41100

6000 | 47100

(10 rows)

Here the sum is taken from the first (lowest) salary up through the current one, including any duplicates of the current one (notice the results for the duplicated salaries).

此处的sum计算了从最小值到当前行的和,如果两行相同,则只计算一次。

Window functions are permitted only in the SELECT list and the ORDER BY clause of the query.They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.

窗口函数只适用于select和order by子句中。不能用于如group by、having、where等子句处。这是因为,逻辑上,窗口函数是在上列子句执行完成后才执行。还有,窗口函数在非窗口函数之后运行。也就是说,可以在窗口函数中调用聚合函数,但反过来不行。

If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select. For example:

如果有需求,要在窗口函数执行之后进行行的筛选或分组,那么可以使用子查询。例如:

SELECT depname, empno, salary, enroll_date

FROM

(SELECT depname, empno, salary, enroll_date,

rank() OVER (PARTITION BY depname ORDER BY salary DESC,

empno) AS pos

FROM empsalary

) AS ss

WHERE pos < 3;

The above query only shows the rows from the inner query having rank less than 3.

以上查询仅返回内部查询中rank小于3的行。

When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:

如果查询中涉及多个窗口函数,可以为每个窗口函数分别写over子句,但如果多个函数需要相同的窗口行为,那么这样做会重复,且容易出错。更好的办法是,在窗口子句中命名每个窗口行为,并在over中引用即可。例如:

SELECT sum(salary) OVER w, avg(salary) OVER w

FROM empsalary

WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

More details about window functions can be found in Section 4.2.8, Section 9.21, Section 7.2.5, and the SELECT reference page.

关于窗口函数的更多信息可参见4.2.8节,9.21节,7.2.5节和select参考页。

发布了341 篇原创文章 · 获赞 53 · 访问量 88万+

猜你喜欢

转载自blog.csdn.net/ghostliming/article/details/103895833