7.8.1. SELECT in WITH

7.8.1. SELECT in WITH

7.8.1.WITH中的SELECT

The basic value of SELECT in WITH is to break down complicated queries into simpler parts. An example is:

WITH中SELECT的基本价值是将复杂的查询分解为更简单的部分。一个例子是：

WITH regional_sales AS (

SELECT region, SUM(amount) AS total_sales

FROM orders

GROUP BY region

), top_regions AS (

SELECT region

FROM regional_sales

WHERE total_sales > (SELECT SUM(total_sales)/10 FROM

regional_sales)

)

SELECT region,

product,

SUM(quantity) AS product_units,

SUM(amount) AS product_sales

FROM orders

WHERE region IN (SELECT region FROM top_regions)

GROUP BY region, product;

which displays per-product sales totals in only the top sales regions. The WITH clause defines two auxiliary statements named regional_sales and top_regions , where the output of regional_ sales is used in top_regions and the output of top_regions is used in the primary SELECT query. This example could have been written without WITH , but we'd have needed two levels of nested sub- SELECT s. It's a bit easier to follow this way.

上例仅显示销售量最高的地区的每产品销售总额。WITH子句定义了两个名为region_sales和top_regions的辅助语句，其中region_sales的输出用于top_regions，top_regions的输出用于主SELECT查询。本示例可以在没有WITH的情况下编写，但是我们需要两级嵌套的SELECT子句。这种方式会容易一些。

The optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL. Using RECURSIVE , a WITH query can refer to its own output. A very simple example is this query to sum the integers from 1 through 100:

可选的RECURSIVE将WITH从单纯的语法方便性变为一种功能，该功能完成了标准SQL不可能完成的任务。使用RECURSIVE，WITH查询可以引用其自身的输出。一个非常简单的示例是此查询将1到100之间的整数相加：

WITH RECURSIVE t(n) AS (

VALUES (1)

UNION ALL

SELECT n+1 FROM t WHERE n < 100

)

SELECT sum(n) FROM t;

The general form of a recursive WITH query is always a non-recursive term , then UNION (or UNION ALL ), then a recursive term , where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:

递归WITH查询的一般形式始终是非递归项，然后是UNION（或UNION ALL），然后是递归项，其中只有递归项可以包含对查询自身输出的引用。这样的查询执行如下：

Recursive Query Evaluation

递归查询评估

1. Evaluate the non-recursive term. For UNION (but not UNION ALL ), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table .

1.执行非递归项。UNION会忽略重复行（UNION ALL不会忽略重复行）。递归查询的结果包括所有剩余的行，并将它们放置在临时工作表中。

2. So long as the working table is not empty, repeat these steps:

2.只要工作表不为空，则重复执行如下步骤：

a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL ), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table .

a. 评估递归项，用工作表的当前内容替换递归自引用。对于UNION（但不是UNION ALL），则丢弃重复的行以及与任何先前结果行重复的行。将所有剩余的行包括在递归查询的结果中，并将它们放在临时中间表中。

b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

b.使用中间表的数据替换掉工作表中的数据，然后清空中间表。

Note

注

Strictly speaking, this process is iteration not recursion, but RECURSIVE is the terminology chosen by the SQL standards committee.

严格来说，这个过程不是迭代，而是递归，但是RECURSIVE是SQL标准委员会选择的术语。

In the example above, the working table has just a single row in each step, and it takes on the values from 1 through 100 in successive steps. In the 100th step, there is no output because of the WHERE clause, and so the query terminates.

上例中，每一步中，工作表仅有一行，它从步骤中取到1到100的值。在第100步，因为where子句的限制，所以没有输出，然后查询中断。

Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:

递归查询通常用于处理层次结构或树状结构的数据。一个有用的示例是，查找产品的所有直接和间接子部分，仅给出一个显示直接包含的表：

WITH RECURSIVE included_parts(sub_part, part, quantity) AS (

SELECT sub_part, part, quantity FROM parts WHERE part =

'our_product'

UNION ALL

SELECT p.sub_part, p.part, p.quantity

FROM included_parts pr, parts p

WHERE p.part = pr.sub_part

)

SELECT sub_part, SUM(quantity) as total_quantity

FROM included_parts

GROUP BY sub_part

When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuples, or else the query will loop indefinitely. Sometimes, using UNION instead of UNION ALL can accomplish this by discarding rows that duplicate previous output rows. However, often a cycle does not involve output rows that are completely duplicate: it may be necessary to check just one or a few fields to see if the same point has been reached before. The standard method for handling such situations is to compute an array of the already-visited values. For example, consider the following query that searches a table graph using a link field:

使用递归查询时，重要的是要确保查询的递归部分最终能够结束，否则查询将无限期地循环。有时，使用UNION而不是UNION ALL可以通过丢弃先前输出的重复行来实现。但是，一个循环通常不包含完全重复的输出行：可能有必要仅检查一个或几个字段以查看之前是否达到了同一点。处理此类情况的标准方法是计算一个已访问值的数组。例如，考虑以下使用link字段搜索graph表的查询：

WITH RECURSIVE search_graph(id, link, data, depth) AS (

SELECT g.id, g.link, g.data, 1

FROM graph g

UNION ALL

SELECT g.id, g.link, g.data, sg.depth + 1

FROM graph g, search_graph sg

WHERE g.id = sg.link

)

SELECT * FROM search_graph;

This query will loop if the link relationships contain cycles. Because we require a “depth” output, just changing UNION ALL to UNION would not eliminate the looping. Instead we need to recognize whether we have reached the same row again while following a particular path of links. We add two columns path and cycle to the loop-prone query:

如果link关系包含循环引用，则此查询将循环。因为我们需要输出 “depth” ，所以仅将UNION ALL更改为UNION并不能消除循环。相反，我们需要识别在遵循特定link路径时是否再次到达同一行。我们向容易循环的查询添加两列path和cycle：

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS

(

SELECT g.id, g.link, g.data, 1,

ARRAY[g.id],

false

FROM graph g

UNION ALL

SELECT g.id, g.link, g.data, sg.depth + 1,

path || g.id,

g.id = ANY(path)

FROM graph g, search_graph sg

WHERE g.id = sg.link AND NOT cycle

)

SELECT * FROM search_graph;

Aside from preventing cycles, the array value is often useful in its own right as representing the “path” taken to reach any particular row.

除了防止循环外，数组值本身也通常有用，它代表了到达任何特定行的“路径”。

In the general case where more than one field needs to be checked to recognize a cycle, use an array of rows. For example, if we needed to compare fields f1 and f2 :

一般情况下，需要检查多个字段以识别一个循环，例如使用一组行。例如，如果我们需要比较字段f1和f2：

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS

(

SELECT g.id, g.link, g.data, 1,

ARRAY[ROW(g.f1, g.f2)],

false

FROM graph g

UNION ALL

SELECT g.id, g.link, g.data, sg.depth + 1,

path || ROW(g.f1, g.f2),

ROW(g.f1, g.f2) = ANY(path)

FROM graph g, search_graph sg

WHERE g.id = sg.link AND NOT cycle

)

SELECT * FROM search_graph;

Tip

小贴士

Omit the ROW() syntax in the common case where only one field needs to be checked to recognize a cycle. This allows a simple array rather than a composite-type array to be used, gaining efficiency.

在通常只需要检查一个字段即可识别一个循环的常见情况下，可省略ROW()语法。这允许使用简单阵列而不是复合型阵列，从而提高了效率。

Tip

小贴士

The recursive query evaluation algorithm produces its output in breadth-first search order. You can display the results in depth-first search order by making the outer query ORDER BY a “path” column constructed in this way.

递归查询评估算法以广度优先搜索顺序产生其输出。通过使外部查询ORDER BY以这种方式构造的“路径”列，可以按深度优先的搜索顺序显示结果。

A helpful trick for testing queries when you are not certain if they might loop is to place a LIMIT in the parent query. For example, this query would loop forever without the LIMIT :

当您不确定查询是否会循环时，一个有用的测试技巧是在父查询中放置一个LIMIT。例如，如果没有LIMIT，此查询将永远循环：

WITH RECURSIVE t(n) AS (

SELECT 1

UNION ALL

SELECT n+1 FROM t

)

SELECT n FROM t LIMIT 100;

This works because PostgreSQL's implementation evaluates only as many rows of a WITH query as are actually fetched by the parent query. Using this trick in production is not recommended, because other systems might work differently. Also, it usually won't work if you make the outer query sort the recursive query's results or join them to some other table, because in such cases the outer query will usually try to fetch all of the WITH query's output anyway.

上述方式之所以可行，是因为PostgreSQL的实现只评估WITH查询的行数，而该行数与父查询实际获取的行数相同。不建议在生产中使用此技巧，因为其他系统可能会以不同的方式工作。另外，如果您使外部查询对递归查询的结果进行排序或将它们连接到其他表，则通常不起作用，因为在这种情况下，外部查询通常将始终尝试获取WITH查询的所有输出。

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary subquery. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

WITH查询的一个有用属性是，即使父查询或同级WITH查询多次引用它们，每次执行父查询也只会对它们进行一次评估。因此，可以在WITH查询中放置多个耗费资源的计算，以避免多余的工作。另一个可能的应用是防止对有副作用的函数进行不必要的多重评估。但是，另一方面，与普通子查询相比，优化器也无法将限制从父查询向下推到WITH查询中。通常，WITH查询将被评估为已写入，而不会抑制父查询之后可能会丢弃的行。（但是，如上所述，如果对查询的引用仅需要有限的行数，则评估可能会早早停止。）

The examples above only show WITH being used with SELECT , but it can be attached in the same way to INSERT , UPDATE , or DELETE . In each case it effectively provides temporary table(s) that can be referred to in the main command.

上面的示例仅展示了WITH与SELECT一起使用，但是可以以相同的方式将其与INSERT，UPDATE或DELETE一起使用。在每种情况下，它都有效地提供了可以在主命令中引用的临时表。

丹心明月博客专家

发布了341 篇原创文章 · 获赞 54 · 访问量 88万+

他的留言板关注

猜你喜欢