[SQL Excavator] - window function - calculate moving average

introduce:

When the window function is used, the calculation is related to the operation of all the data accumulated to the current row. In fact, more detailed summary ranges can also be specified. This summary range is called a frame.
In fact, this can also be understood as one 窗口, which 窗口we can set.

The window function we introduced before is partition bygrouped according to, then order bysorted according to, and finally outputs the corresponding result of the corresponding window function according to one of the two results, such as summation, sorting, etc. That is to say, the current window is the grouping Data from the very beginning of the group to the current row's data calculated.

usage:

Next, let’s introduce how to set up this window:

<窗口函数> over (order by <排序用列名> rows n preceding )  
                 
<窗口函数> over (order by <排序用列名> rows between n preceding and n following)

in,

  • preceding: the current line and the previous xxx lines, how many lines are specified by the previous n;
  • following: the current line and the following xxx lines, how many lines are specified by the previous n;

Some other formats are also introduced. In window functions, there are mainly the following forms of rows clauses:

  • "rows unbounded preceding": This means that the window range starts at unbounded and extends to the current row. In other words, it considers the current row and all rows before it.
  • "rows n preceding": This means that the window range starts from the previous n rows of the current row and extends to the current row. Such syntax includes the current line and considers the n lines before the current line.
  • “rows between unbounded preceding and current row”: This means that the window range starts at unbounded and extends to the current row. Such syntax will also include the current line.
  • "rows between n preceding and current row": This means that the window range starts from the previous n rows of the current row and extends to the current row. Such syntax includes the current line and considers the n lines before the current line.
  • "rows between current row and unbounded following": This means that the window range starts from the current row and extends to the end of unbounded. Such syntax includes the current line and considers all lines after the current line.

Note that the above only lists some common forms of rows clauses. In fact, you can combine and use different rows clauses according to specific needs to define the row range of the window function.

For "following", it is usually used in combination with "preceding" to define the row range of the window function.

  • "rows n following": This means that the window range starts from the last n rows of the current row and extends to the current row. Such syntax includes the current line and considers n lines after the current line.
  • "rows between current row and n following": This means that the window range starts from the current row and extends to the next n rows of the current row. Such syntax includes the current line and considers n lines after the current line.
  • "rows between unbounded preceding and n following": This means that the window range starts from unbounded and extends to the next n rows of the current row. Such a syntax considers the current line and all lines before it, and also includes n lines after the current line.

It should be noted that when "following" is used, the window function will perform calculations within the specified number of lines before/after the current line according to the current behavior benchmark. If neither "preceding" nor "following" is explicitly specified, "preceding" is assumed by default.

expand:

Some students may have some questions here, and I also have some questions:
"rows between n preceding and current row" 和 "rows n preceding" 的区别是?

There are some differences between "rows between n preceding and current row" and "rows n preceding" in using window functions.

"rows between n preceding and current row" indicates that the window range starts from the previous n rows of the current row and extends to the current row. Such syntax includes the current line and considers the n lines before the current line.
For example, if there is a result set containing 10 rows of data, and the window function syntax you use is "rows between 2 preceding and current row", then for row 5, the window function will calculate from row 3 to row 5 row of data.

On the other hand, "rows n preceding" means that the window range starts from the n rows before the current row and extends to the current row. Such syntax also includes the current line, and considers the n lines before the current line.

The difference is that "rows between n preceding and current row" is part of the window function and is used to define the start and end of the window, allowing us to do calculations within the range. And "rows n preceding" means that only the n rows before the current row are calculated, and there are no other rows defined in the window.

In short, "rows between n preceding and current row" defines a window range, including the current row and the previous n rows. And "rows n preceding" only means to calculate the n rows of data before the current row, without defining the window range.

这样两者的优缺点是?

Advantages:
"rows between n preceding and current row":
It has a more flexible window range definition capability, and can define the start and end of the window according to actual needs.
Applicable to scenarios that require complex calculations or analysis within the current line and a certain range before it.
"rows n preceding":
concise and clear, only calculate the fixed number of rows before the current row, no need to define the window range.
It is more convenient to use when you only need to simply get the data of a fixed number of rows before the current row.

Disadvantages:
"rows between n preceding and current row":
Although it provides flexibility, it is relatively complex in syntax, needs to specify the start and end points, and may require longer statements to express the window range.
"rows n preceding":
The window range cannot be defined, and only data with a fixed number of rows can be calculated, which is not suitable for scenarios that require more complex window calculations.
Dynamic resizing of the window is not supported, only a specific number of rows before the current row can be fixed.

In general, the choice of which form to use depends on the specific needs and computing scenarios. If more fine-grained window range definition and complex calculations are required, it is recommended to use "rows between n preceding and current row". If you only need to get the data of a fixed number of rows before the current row, you can choose "rows n preceding".

但是实现的效果应该是一样的吧?

Yes, "rows between n preceding and current row" and "rows n preceding" will give the same result in most cases, from the perspective of computing results. Both can be used to limit the range of rows for window function calculations and get the same data.

However, the main difference between the two syntaxes is semantics and flexibility. "rows between n preceding and current row" provides a more explicit and flexible way to define the window range, making the code more readable and maintainable. It allows us to more precisely control the start and end of the window. "rows n preceding" is a simplified form that can only calculate a fixed number of rows of data, and cannot specify the end point of the window.

Therefore, while the effect achieved may be the same, choosing to use the appropriate syntax form can improve the readability and maintainability of the code. It is important to choose the appropriate grammatical form according to specific needs and scenarios.

おすすめ

転載: blog.csdn.net/qq_40249337/article/details/132049109