4.2.7. Aggregate Expressions

4.2.7. Aggregate Expressions

4.2.7.聚合表达式

An aggregate expression represents the application of an aggregate function across the rows selected by a query. An aggregate function reduces multiple inputs to a single output value, such as the sum or average of the inputs. The syntax of an aggregate expression is one of the following:

聚合表达式表示在查询选择的行上应用聚合函数。 聚合函数将多个输入缩减为单个输出值,例如输入的总和或平均值。 聚合表达式的语法:

aggregate_name (expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name (ALL expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name (DISTINCT expression [ , ... ] [ order_by_clause ] )[ FILTER ( WHERE filter_clause ) ]

aggregate_name ( * ) [ FILTER ( WHERE filter_clause ) ]

aggregate_name ( [ expression [ , ... ] ] ) WITHIN GROUP( order_by_clause ) [ FILTER ( WHERE filter_clause ) ]

where aggregate_name is a previously defined aggregate (possibly qualified with a schema name) and expression is any value expression that does not itself contain an aggregate expression or a window function call. The optional order_by_clause and filter_clause are described below.

其中aggregation_name是先前定义的聚合(可能使用模式名称限定),而expression是本身不包含聚合表达式或窗口函数调用的任何值表达式。 可选的order_by_clause和filter_clause接下来进行描述。

The first form of aggregate expression invokes the aggregate once for each input row. The second form is the same as the first, since ALL is the default. The third form invokes the aggregate once for each distinct value of the expression (or distinct set of values, for multiple expressions) found in the input rows. The fourth form invokes the aggregate once for each input row; since no particular input value is specified, it is generally only useful for the count(*) aggregate function. The last form is used with ordered-set aggregate functions, which are described below.

聚合表达式的第一种形式为每个输入行调用一次聚合。 第二种形式与第一种形式相同,因为ALL是默认形式。 第三种形式为在输入行中找到的表达式的每个不同值(或多个表达式的不同值集)调用一次聚合。 第四种形式为每个输入行调用一次聚合; 由于未指定特定的输入值,因此通常仅对count(*)聚合函数有用。 最后一种形式与有序集合聚合函数一起使用,如下所述。

Most aggregate functions ignore null inputs, so that rows in which one or more of the expression(s) yield null are discarded. This can be assumed to be true, unless otherwise specified, for all built-in aggregates.

大部分聚合函数会忽略null输入,所以输入表达式中的null值会被忽略。如果特殊说明,可以假定这对于所有的内置聚合函数都是适用的。

For example, count(*) yields the total number of input rows; count(f1) yields the number of input rows in which f1 is non-null, since count ignores nulls; and count(distinct f1) yields the number of distinct non-null values of f1.

例如,count(*)计算输入的总行数;因为count会忽略null值,所以count(f1)仅会计算输入行中不为null的f1的总行数;count(distinct f1)计算输入行f1中非null的不重复的值的个数。

Ordinarily, the input rows are fed to the aggregate function in an unspecified order. In many cases this does not matter; for example, min produces the same result no matter what order it receives the inputs in. However, some aggregate functions (such as array_agg and string_agg) produce results that depend on the ordering of the input rows. When using such an aggregate, the optional order_by_clause can be used to specify the desired ordering. The order_by_clause has the same syntax as for a query-level ORDER BY clause, as described in Section 7.5, except that its expressions are always just expressions and cannot be output-column names or numbers. For example:

通常,输入到聚合函数的行未排序。 在许多情况下,这无关紧要; 例如,无论min接收输入的顺序如何,min都会产生相同的结果。但是,某些聚合函数(例如array_agg和string_agg)产生的结果取决于输入行的顺序。当使用这种聚合时,可以使用order_by_clause指定所需的排序。 如7.5节所述,order_by_clause的语法与查询中的ORDER BY子句的语法相同,不同之处在于,该表达式的表达式始终只是表达式,而不能是输出列的名称或数字。 例如:

SELECT array_agg(a ORDER BY b DESC) FROM table;

When dealing with multiple-argument aggregate functions, note that the ORDER BY clause goes after all the aggregate arguments. For example, write this:

对于多参数聚合函数,order by子句一定要放到所有参数的最后面。例如:

SELECT string_agg(a, ',' ORDER BY a) FROM table;

not this:

而不是

SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect

The latter is syntactically valid, but it represents a call of a single-argument aggregate function with two ORDER BY keys (the second one being rather useless since it's a constant).

后者在语法上虽然是正确的,但是它表示对于带有两个order by键的但参数聚合函数的调用(虽然order by的第二个键是一个没什么用的常量)。

If DISTINCT is specified in addition to an order_by_clause, then all the ORDER BY expressions must match regular arguments of the aggregate; that is, you cannot sort on an expression that is not included in the DISTINCT list.

如果除了order_by_clause之外还指定了DISTINCT,则所有ORDER BY表达式都必须匹配聚合的常规参数;也就是说,您不能对DISTINCT列表中未包含的表达式进行排序。

Note

The ability to specify both DISTINCT and ORDER BY in an aggregate function is a PostgreSQL extension.

在聚合函数中同时指定DISTINCT和ORDER BY的功能,是PostgreSQL的一个扩展。

Placing ORDER BY within the aggregate's regular argument list, as described so far, is used when ordering the input rows for general-purpose and statistical aggregates, for which ordering is optional.There is a subclass of aggregate functions called ordered-set aggregates for which an order_by_clause is required, usually because the aggregate's computation is only sensible in terms of a specific ordering of its input rows. Typical examples of ordered-set aggregates include rank and percentile calculations. For an ordered-set aggregate, the order_by_clause is written inside WITHIN GROUP (...), as shown in the final syntax alternative above. The expressions in the order_by_clause are evaluated once per input row just like regular aggregate arguments, sorted as per the order_by_clause's requirements, and fed to the aggregate function as input arguments.(This is unlike the case for a non-WITHIN GROUP order_by_clause, which is not treated as argument(s) to the aggregate function.) The argument expressions preceding WITHIN GROUP, if any, are called direct arguments to distinguish them from the aggregated arguments listed in the order_by_clause. Unlike regular aggregate arguments, direct arguments are evaluated only once per aggregate call, not once per input row. This means that they can contain variables only if those variables are grouped by GROUP BY; this restriction is the same as if the direct arguments were not inside an aggregate expression at all. Direct arguments are typically used for things like percentile fractions, which only make sense as a single value per aggregation calculation. The direct argument list can be empty; in this case, write just () not (*). (PostgreSQL will actually accept either spelling,but only the first way conforms to the SQL standard.)

如前所述,将ORDER BY放在聚合的常规参数列表中时,是在对通用和统计聚合的输入行进行排序时使用的,排序是可选的。聚合函数有一个子类,称为有序集聚合,通常需要order_by_clause,因为聚合的计算仅在其输入行进行排序是方才有意义。有序集聚合的典型示例包括等级和百分位计算。对于有序集聚合,将order_by_clause写入WITHIN GROUP(...)内,如上面语法示例中的最后一个。像常规聚合参数一样,对order_by_clause中的表达式每个输入行进行一次处理,按照order_by_clause的要求进行排序,然后将其作为输入参数给到聚合函数(这与非WITHIN GROUP order_by_clause的情况不同,后者不被视为聚合函数的参数。)WITHIN GROUP之前的参数表达式(如果有的话)称为直接参数,以区别于在order_by_clause中列出的聚合参数。与常规聚合参数不同,直接参数每个聚合调用仅被处理一次,而不是每个输入行一次。这意味着仅当这些变量按GROUP BY分组时,它们才能包含变量;即使直接参数根本不在聚合表达式,此限制依旧。直接变量通常用于百分比分数之类的东西,即每次聚合计算仅将其作为单个值才有意义。直接参数列表可以为空;在这种情况下,只需写()而不是(*)。 (PostgreSQL实际上都可接受,但是只有第一种方式符合SQL标准。)

An example of an ordered-set aggregate call is:

排序集聚合函数调用示例如下:

SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY income) FROM

households;

percentile_cont

-----------------

50489

which obtains the 50th percentile, or median, value of the income column from table households. Here, 0.5 is a direct argument; it would make no sense for the percentile fraction to be a value varying

across rows.

上例取表households的income列的第百分之五十,也就是中位数。此处,0.5是一个直接参数;如果是一个变量,其实是没什么意义的。

If FILTER is specified, then only the input rows for which the filter_clause evaluates to true are fed to the aggregate function; other rows are discarded. For example:

如果定义了FILTER,那么只有经过filter_clause筛选之后的值才会传到聚合函数;其他行不做处理。例如:

SELECT

count(*) AS unfiltered,

count(*) FILTER (WHERE i < 5) AS filtered

FROM generate_series(1,10) AS s(i);

unfiltered | filtered

------------+----------

10 | 4

(1 row)

The predefined aggregate functions are described in Section 9.20. Other aggregate functions can be added by the user.

该预定义聚合函数在9.20节讲述。用户也可以自定义聚合函数。

An aggregate expression can only appear in the result list or HAVING clause of a SELECT command.It is forbidden in other clauses, such as WHERE, because those clauses are logically evaluated before the results of aggregates are formed.

聚合函数只可用在结果列表或者SELECT命令的HAVING子句。其不能再其他子句中使用,例如WHERE,因为在逻辑上,这些子句应该在聚合之前执行。

When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.22), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query. The aggregate expression as a whole is then an outer reference for the subquery it appears in, and acts as a constant over any one evaluation of that subquery. The restriction about appearing only in the result list or HAVING clause applies with respect to the query level that the aggregate belongs to.

当聚合表达式出现在子查询中时(请参见第4.2.11节和第9.22节),通常会对子查询的行进行聚合。 但是,如果聚合的参数(和filter_clause,如果有的话)仅包含外层变量,则会发生异常:然后,该聚合属于最接近的外层变量,并在该查询的行上进行求值。 总体上来说,聚合表达式是它所出现的子查询的外部引用,并充当该子查询的任何评估的常量。 关于仅出现在结果列表或HAVING子句中的限制适用于聚合所属的查询级别。

发布了341 篇原创文章 · 获赞 53 · 访问量 88万+

猜你喜欢

转载自blog.csdn.net/ghostliming/article/details/104274875