Optimization and execution of SQL window functions

Window Function is a new feature defined in the SQL2003 standard, and it has been improved in SQL2011 and SQL2016, and several extensions have been added. The window function is different from the common functions and aggregate functions that we are familiar with. It performs a calculation for each row of data: input multiple rows (a window) and return a value . In analytical queries such as reports, window functions can elegantly express certain requirements and play an irreplaceable role.

This article first introduces the definition and basic syntax of window functions, and then will introduce how to achieve efficient calculation of window functions in DBMS and big data systems, including window function optimization, execution and parallel execution.

What is a window function?

Window function appears in the list of the SELECT clause expression, its most notable feature is the OVERkeyword. The syntax is defined as follows:

window_function (expression) OVER (
   [ PARTITION BY part_list ]
   [ ORDER BY order_list ]
   [ { ROWS | RANGE } BETWEEN frame_start AND frame_end ] )

It includes the following options:

  • PARTITION BY said it would press the data part_listpartition
  • ORDER BY represents the data of each partition by order_listsorting
Figure 1. Basic concepts of window functions
Figure 1. Basic concepts of window functions

 

The last item represents the definition of Frame, namely: What data does the current window contain?

  • ROWS front lines selection, for example, ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWINGrepresent forward to next 3 rows 3 rows, a total of seven rows (rows 7 or less, if encountered boundary)
  • RANGE selected data range, for example, RANGE BETWEEN 3 PRECEDING AND 3 FOLLOWINGrepresent all the values in the [c-3, c + 3 ]

Lines in this range, c

  • Is the value of the current row
Figure 2. Rows window and Range window
Figure 2. Rows window and Range window

 

Logically speaking, the calculation "process" of a window function is as follows:

  1. According to window definition, partition and sort all input data (if necessary)
  2. For each row of data, calculate its Frame range
  3. Enter the set of rows in the Frame into the window function, and fill in the calculation result into the current row

for example:

SELECT dealer_id, emp_name, sales,
       ROW_NUMBER() OVER (PARTITION BY dealer_id ORDER BY sales) AS rank,
       AVG(sales) OVER (PARTITION BY dealer_id) AS avgsales 
FROM sales

In the above query, the rankcolumn represents the sales rank of the employee avgsalesunder the current dealer ; it represents the average sales of all employees under the current dealer. The query results are as follows:

+------------+-----------------+--------+------+---------------+
| dealer_id  | emp_name        | sales  | rank | avgsales      |
+------------+-----------------+--------+------+---------------+
| 1          | Raphael Hull    | 8227   | 1    | 14356         |
| 1          | Jack Salazar    | 9710   | 2    | 14356         |
| 1          | Ferris Brown    | 19745  | 3    | 14356         |
| 1          | Noel Meyer      | 19745  | 4    | 14356         |
| 2          | Haviva Montoya  | 9308   | 1    | 13924         |
| 2          | Beverly Lang    | 16233  | 2    | 13924         |
| 2          | Kameko French   | 16233  | 3    | 13924         |
| 3          | May Stout       | 9308   | 1    | 12368         |
| 3          | Abel Kim        | 12369  | 2    | 12368         |
| 3          | Ursa George     | 15427  | 3    | 12368         |
+------------+-----------------+--------+------+---------------+

Note: Each part of the grammar is optional:

  • If not specified PARTITION BY, the data will not be partitioned; in other words, all data will be treated as the same partition
  • If it is not specified ORDER BY, the partitions will not be sorted, and it is usually used for window functions that have no order, such asSUM()
  • If the Frame clause is not specified, the following Frame definition is used by default:
    • If not specified ORDER BY, all rows in the partition will be used by defaultRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    • If specified ORDER BY, the first row to the current value in the partition will be used by defaultRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Finally, window functions can be divided into the following three categories:

  • Polymerization (Aggregate) : AVG(), COUNT(), MIN(), MAX(), SUM()...
  • Value (Value) : FIRST_VALUE(), LAST_VALUE(), LEAD(), LAG()...
  • Sort (Ranking) : RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE()...

Due to space limitations, this article will not discuss the meaning of each window function. Interested readers can refer to this document .

Note: Frame window function is defined, not all are suitable, such as ROW_NUMBER(), RANK(), LEAD()and so on. These functions are always applied to the entire partition, not the current Frame.

Window function VS. Aggregate function

From the perspective of aggregation , it seems that the window function and the Group By aggregation function can do the same thing. However, the similarities between them are limited to this! The key difference is that the window function only appends the result to the current result, it does not make any changes to the existing rows or columns . The Group By approach is completely different: for each Group it will only retain a row of aggregated results.

Some readers may ask, after adding window functions, the order of returned results has obviously changed. Isn't this a modification? Because SQL and relational algebra are defined on the basis of multi-set, there is no order in the result set itself , ORDER BYonly the order in which the results are finally presented.

On the other hand, logically and semantically, the various parts of the SELECT statement can be regarded as "executed" in the following order:

Figure 3. The logical execution sequence of each part of SQL
Figure 3. The logical execution sequence of each part of SQL

 

Evaluation noted that the window function is only located ORDER BYbefore, after and located in the vast majority of SQL. This also echoes the semantics of only appending and not modifying the window function -the result set has been determined at this time, and then the window function is calculated accordingly.

Execution of window functions

The classic execution method of window function is divided into two steps : sorting and function evaluation .

Figure 4. The execution process of a window function is usually divided into two steps: sorting and evaluation
Figure 4. The execution process of a window function is usually divided into two steps: sorting and evaluation

 

Window definition PARTITION BYand ORDER BYare easily accomplished by sequencing. For example, for a window PARTITION BY a, b ORDER BY c, d, we can press (a,b,c,d) for the input data

Or (b,a,c,d)

After sorting, the data will be arranged as shown in Figure 1.

Next consider: how to deal with Frame?

  • For the Frame of the entire partition (for example RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), it only needs to be calculated once for the entire partition, there is nothing to say;
  • For gradually growing Frames (for example RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), Aggregator can be used to maintain the accumulated state, which is also very easy to implement;
  • It is ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWINGrelatively difficult for sliding frames (for example ). A classical approach is to not only support the increased requirements Aggregator supports deletion (Removable), which may be more complicated than you think, for example, consider MAX()implementation.

Optimization of window functions

For window functions, the optimizer can do limited optimization. Here is a brief explanation for the completeness of the text.

Usually, we first extract the window function from the Project and become an independent operator called Window.

Figure 5. Optimization process of window function
Figure 5. Optimization process of window function

                                                                                                                                                                                                

Sometimes, a SELECT statement contains multiple window functions, and their window definitions ( OVERclauses) may be the same or different. Obviously, for the same window, there is no need to partition and sort again, we can merge them into a Window operator.

For different windows, in the simplest way, we can divide them all into different windows, as shown in the figure above. In actual execution, each Window needs to be sorted first , and the cost is not small.

Is it possible to use one sort to calculate multiple window functions? In some cases, this is possible. For example, the two window functions in the example in this article:

... ROW_NUMBER() OVER (PARTITION BY dealer_id ORDER BY sales) AS rank,
    AVG(sales) OVER (PARTITION BY dealer_id) AS avgsales ...

Although these two windows is not exactly the same, but AVG(sales)do not care about the order of the sub-region, can be reused entirely ROW_NUMBER()window. This paper provides a heuristic algorithm that can take advantage of opportunities that can be reused as much as possible.

Parallel execution of window functions *

Most modern DBMSs support parallel execution. For window functions, since the calculations between the partitions are completely unrelated, we can easily assign each partition to different nodes (threads) to achieve parallelism between partitions .

However, if only one global window function partition (no PARTITION BY, or the number of partitions clause) very little, not enough time to fully parallel, how to do it? The Removable Aggregator technology we mentioned above obviously cannot be used anymore. It relies on the internal state of a single Aggregator, and it is difficult to effectively parallelize it.

TUM's paper proposes using tree line (Segment Tree) for efficient parallel partition . The line segment tree is an N-ary tree data structure, and each node contains part of the aggregation results under the current node.

FIG next segment is a binary tree calculated using SUM()the examples. For example, 12 in the third row in the figure below

, Represents the aggregation result of the leaf node 5+7; and the 25 above it represents the leaf node 5+7+3+10

The result of the aggregation.

Figure 6. Use the line segment tree to calculate the sum of a given range
Figure 6. Use the line segment tree to calculate the sum of a given range

                                                                                                                                                                                       

Assuming that the current Frame is the second to the eighth line, that is, 7+3+10+...+4 need to be calculated

The sum of intervals. With the line segment tree, we can directly use 7+13+20

(Red font in the picture) Calculate the aggregation result.

The line segment tree can be in O(nlogn)

It is constructed in time and can query the aggregation results of any interval in O(logn) time. What's even better is that not only the query can be multi-threaded concurrently without interfering with each other, but the construction process of the line segment tree can also be well paralleled.

References

  1. Efficient Processing of Window Functions in Analytical SQL Queries - Leis, Viktor, et al. (VLDB'15)
  2. Optimization of Analytic Window Functions - Cao, Yu, et al. (VLDB'12)
  3. SQL Window Functions Introduction - Apache Drill
  4. PostgreSQL 11 Reestablishes Window Functions Leadership
  5. Window Functions in SQL Server

Guess you like

Origin blog.csdn.net/godlovedaniel/article/details/113845599