The window function over() in oracle

The windowing function, like the aggregation function, performs aggregation calculations on a set of rows. It is used to define a window for rows (window here refers to the set of rows that the operation will operate on), it operates on a set of values, does not require a GROUP BY clause to group the data, and can return both in the same row Columns of the underlying row and aggregated columns. Anyway, I understand that this function has used sub-queries or other methods to obtain the values ​​of aggregated columns and merge them for me.

The example in the book is introduced step by step. Suppose we want to calculate the total number of all personnel, we can execute the following SQL statement:

SELECT COUNT(FName) FROM T_Person

This method is relatively straightforward, returning only the value of an aggregated column, without any column value of the underlying row. But sometimes it is necessary to access the values ​​of these aggregate calculations from the columns of the rows that are not in the aggregate function (i.e. the columns of the underlying row). For example, if we want to query the information (city and age) of each employee whose salary is less than 5,000 yuan, and display the number of all employees whose salary is less than 5,000 yuan in each row, try to write the following SQL statement:

SELECT FName, FCITY, FAGE, FSalary, COUNT(FName)
FROM T_Person
WHERE FSALARY<5000

After executing the above SQL we will get the following error message :
The column 'T_Person.FCity' in the select list is invalid because the column is not included in an aggregate function or GROUP BY clause.
This is because all columns that are not included in aggregate functions must be declared in the GROUP BY clause , which can be solved by using a subquery:

copy code
SELECT FName, FCITY, FAGE, FSalary,
(
SELECT COUNT(FName) FROM T_Person
WHERE FSALARY<5000
)
FROM T_Person
WHERE FSALARY<5000
copy code

Although the use of subqueries can solve this problem, the use of subqueries is very troublesome, and the use of windowing functions can greatly simplify the implementation. The following SQL statement shows how to use windowing functions to achieve the same effect:

SELECT FName, FCITY, FAGE, FSalary, COUNT(FName) OVER()
FROM T_Person
WHERE FSALARY<5000

It can be seen that the difference from the aggregation function is that the windowing function adds an OVER keyword after the aggregation function.
The calling format of the windowing function is: function name (column) OVER (option)

I am using SQL Server 2008 R2 here. I don’t know when it started. SQL SERVER also supports the use of ORDER BY clauses in windowing functions (Note: The book says that MSSQLServer does not support the use of ORDER BY clauses in windowing functions. ). Anyway, here I integrate relevant content from the Internet. It is precisely because the windowing function supports the ORDER BY clause that the windowing function is divided into two categories.

The first category: Aggregate windowing function ==== "Aggregate function (column) OVER (option), the option here can be the PARTITION BY clause, but not the ORDER BY clause.

The second category: sort window function ==== "sort function (column) OVER (option), the option here can be an ORDER BY clause, or OVER (PARTITION BY clause ORDER BY clause), but cannot be a PARTITION BY clause


Aggregate windowing function

 

The OVER keyword indicates that aggregate functions are treated as aggregate windowing functions rather than aggregate functions. The SQL standard allows all aggregate functions to be used as aggregate windowing functions.
In the above example, the windowing function COUNT(*) OVER() returns the number of all eligible rows for each row of the query result. Options are often added in parentheses after the OVER keyword to change the window range for aggregation operations. If the options in parentheses after the OVER keyword are empty, the windowing function aggregates all rows in the result set.

PARTITION BY clause

 

开窗函数的OVER关键字后括号中的可以使用PARTITION BY 子句来定义行的分区来供进行聚合计算。与GROUP BY 子句不同,PARTITION BY 子句创建的分区是独立于结果集的,创建的分区只是供进行聚合计算的,而且不同的开窗函数所创建的分区也不互相影响。下面的SQL语句用于显示每一个人员的信息以及所属城市的人员数:

SELECT FName, FCITY, FAGE, FSalary,
COUNT(FName) OVER(PARTITION BY FCITY)
FROM T_Person

OVER(PARTITION BY FCITY)表示对结果集按照FCITY进行分区,并且计算当前行所属的组的聚合计算结果。在同一个SELECT语句中可以同时使用多个开窗函数,而且这些开窗函数并不会相互干扰。比如下面的SQL语句用于显示每一个人员的信息、所属城市的人员数以及同龄人的人数:

SELECT FName,FCITY, FAGE, FSalary,
COUNT(FName) OVER(PARTITION BY FCITY),
COUNT(FName) OVER(PARTITION BY FAGE)
FROM T_Person

 

排序开窗函数

 

对于排序开窗函数来讲,它支持的开窗函数分别为:ROW_NUMBER(行号)、RANK(排名)、DENSE_RANK(密集排名)和NTILE(分组排名)。

先看一段SQL语句:

copy code
select  FName, FSalary, FCity, FAge,  
row_number() over(order by FSalary) as rownum,  
rank() over(order by FSalary) as rank,  
dense_rank() over(order by FSalary) as dense_rank,  
ntile(6) over(order by FSalary)as ntile 
from  T_Person 
order by  FName  
copy code

 执行的结果如下(对于想自己尝试的朋友,那你得辛苦点,下载电子书或者是购买书来学习吧。因为我可是限于篇幅,省略去大部分内容哦):

 

看到上面的结果了吧,下面来介绍下相关的内容。我们得到的最终结果是按照FName进行升序显示的。

对于row_number() over(order by FSalary) as rownum来说,这个排序开窗函数是按FSalary升序的方式来排序,并得出排序结果的序号

对于rank() over(order by FSalary) as rank来说,这个排序形容函数是按FSalary升序的方式来排序,并得出排序结果的排名号。这个函数求出来的排名结果可以排列,并列排名之后的排名将是并列的排名加上并列数(简单说每个人只有一种排名,然后出现两个并列第一名的情况,这时候排在两个第一名后面的人将是第三名,也就是没有了第二名,但是有两个第一名)

对于dense_rank() over(order by FSalary) as dense_rank来说,这个排序函数是按FSalary升序的方式来排序,并得出排序结果的排名号。这个函数与rand()函数不同在于,并列排名之后的排名只是并列排名加1(简单说每个人只有一种排名,然后出现两个并列第一名的情况,这时候排在两个第一名后面的人将是第二名,也就是两个第一名,一个第二名)

对于ntile(6) over(order by FSalary)as ntile 来说,这个排序函数是按FSalary升序的方式来排序,然后6等分成6个组吗,并显示所在组的序号。

排序函数和聚合开窗函数类似,也支持在OVER子句中使用PARTITION BY语句。例如:

copy code
copy code
copy code
select  FName, FSalary, FCity, FAge,  
row_number() over(partition by FName  order by FSalary) as rownum,  
rank() over(partition by FName order by FSalary) as rank,  
dense_rank() over(partition by FName order by FSalary) as dense_rank,  
ntile(6) over(partition by FName order by FSalary)as ntile 
from  T_Person 
order by  FName
copy code
copy code
copy code

 关于PARTITION BY子句,请看上面的介绍,这里就不再累赘了。但是需要注意一点的是,在排序开窗函数中使用PARTITION BY子句需要放置在ORDER BY子句之前。


总结:

over()开窗函数: 在使用聚合函数后,会将多行变成一行,
而开窗函数是将一行变成多行;
并且在使用聚合函数后,如果要显示其他的列必须将列加入到group by中,
而使用开窗函数后,可以不使用group by,直接将所有信息显示出来。

开窗函数适用于在每一行的最后一列添加聚合函数的结果。

常用开窗函数:
1.为每条数据显示聚合信息.(聚合函数() over())
2.为每条数据提供分组的聚合函数结果(聚合函数() over(partition by 字段) as 别名) --按照字段分组,分组后进行计算
3.与排名函数一起使用(row number() over(order by 字段) as 别名)

常用分析函数:(最常用的应该是1.2.3 的排序)
1、row_number() over(partition by ... order by ...)
2、rank() over(partition by ... order by ...)
3、dense_rank() over(partition by ... order by ...)
4、count() over(partition by ... order by ...)
5、max() over(partition by ... order by ...)
6、min() over(partition by ... order by ...)
7、sum() over(partition by ... order by ...)
8、avg() over(partition by ... order by ...)
9、first_value() over(partition by ... order by ...)
10、last_value() over(partition by ... order by ...)
11、lag() over(partition by ... order by ...)
12、lead() over(partition by ... order by ...)
lag 和lead 可以 获取结果集中,按一定排序所排列的当前行的上下相邻若干offset 的某个行的某个列(不用结果集的自关联);
lag ,lead 分别是向前,向后;
lag and lead have three parameters, the first parameter is the column name, the second parameter is the offset of the offset, and the third parameter is the default value when the recording window is exceeded)


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325723742&siteId=291194637