SQL-window functions

table data

drop table if exists examination_info,exam_record;
CREATE TABLE exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-07-02 09:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', '2021-09-01 12:21:01', 60),
(1002, 9002, '2021-09-02 12:01:01', '2021-09-02 12:31:01', 70),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 81),
(1002, 9002, '2021-07-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1003, 9003, '2021-09-08 12:01:01', '2021-09-08 12:11:01', 40),
(1003, 9001, '2021-09-08 13:01:01', null, null),
(1003, 9002, '2021-09-08 14:01:01', null, null),
(1003, 9003, '2021-09-08 15:01:01', null, null),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1005, 9002, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1005, 9002, '2021-09-02 12:11:01', '2021-09-02 12:31:01', 89);

A total of 13 lines , the table creation statement can be clicked
insert image description here

basic grammar

  • The execution sequence is partition by first and then order By the values ​​in the group, and then in the group according to range (incorporation calculation)\rows (non-incorporation calculation) from top to bottom or bottom to top (by preceding and following decision) aggregate function solution.
  • The execution order of the window function is after the where clause or group by.
function(expression)
over (
partition by column
order by column ASC/DESC
rows/range [...]
)

function function

SELECT
	*,
-- 	排序
	Row_NUMBER() over ( PARTITION BY uid ORDER BY score ) AS rn, 
	Rank() over ( PARTITION BY uid ORDER BY score ) AS rk,
	DENSE_RANK() over ( PARTITION BY uid ORDER BY score ) AS drk,
-- 	聚合
	sum(score) over ( PARTITION BY uid ) AS sum1,
	sum(score) over ( PARTITION BY uid ORDER BY score ) AS sum2,
	sum(score) over ( PARTITION BY uid ORDER BY score RANGE BETWEEN unbounded preceding and current row) AS sum3,
	sum(score) over ( PARTITION BY uid ORDER BY score ROWs BETWEEN unbounded preceding and current row) AS sum4,
	sum(score) over ( PARTITION BY uid ORDER BY score DESC) AS sum5,
-- 	sum(score) over ( PARTITION BY uid ORDER BY score RANGE BETWEEN current row AND ubounded following) AS sum6,
-- 	sum(score) over ( PARTITION BY uid ORDER BY score desc ubounded preceding and current row) AS sum7,
-- 	向前向后取整
	LOG(score) over (PARTITION by uid ORDER BY score) lag1,
	LOG(score,1,0) over (PARTITION by uid ORDER BY score) lag1,
	lead(score) over (PARTITION by uid ORDER BY score) lead1,
-- 	FIRST_VALUE(expr),LAST_VALUE(expr)
FIRST_VALUE(score) over (PARTITION by uid ORDER BY score) as first,
last_VALUE(score) over (PARTITION by uid ORDER BY score rows BETWEEN unbounded preceding and unbounded following) as last,
-- 分析函數
AVG(score) over (PARTITION by uid ORDER BY score) avg1,
count(uid) over (PARTITION by uid ORDER BY score) ct1,
count(*) over (PARTITION by uid ORDER BY score) ct2,
max(score) over (PARTITION by uid ORDER BY score) max1,
min(score) over (PARTITION by uid ORDER BY score) min1,
median() over (PARTITION by uid ORDER BY score) median1
FROM
	exam_record;

The above is only displayed as a function. If all functions are run, an error will be reported due to disordered sorting. Therefore, only one function can be kept for verification during verification, or the sorting can be prioritized.

to sort

SELECT
	uid,
	score,
	-- 	ROW_NUMBER() over () AS rn1,  -- 默认显示原始表的位置
	ROW_NUMBER() over ( ORDER BY score ) AS rn2,-- 不分组整体排序
	ROW_NUMBER() over ( PARTITION BY uid ORDER BY score ) AS rn3,
	RANK() over ( PARTITION BY uid ORDER BY score ) AS rk,
	DENSE_RANK() over ( PARTITION BY uid ORDER BY score ) AS drk 
FROM
	exam_record;

insert image description here

ROW_NUMBER(): When the value is the same, the ranking is different.
RANK(): When the value is the same, the same ranking and discontinuous
DENSE_RANK(): When the value is the same, the same ranking and
the above three consecutive ()do not contain parameters

The difference between partition by and group by

SELECT
	uid,
	SUM( score ) 
FROM
	exam_record 
GROUP BY
	uid;
SELECT
	uid,
	SUM( score ) over ( PARTITION BY uid ) 
FROM
	exam_record;

insert image description here
The left table only shows 6 rows after deduplication, and the right table shows all 13 rows (the picture is not truncated and the result is displayed). It
can be seen from the above that group by will display the deduplicated grouping column (the grouping column result shows that the row value is the only value) ), and partition by will display the value of each element (how many elements or how many rows).
The difference between window function and aggregation function: the aggregation function will aggregate a set of data and display only one result, but the window function will display the aggregation result of each row.

partition by example

SELECT
	uid,
	SUM( score ) over () 
FROM
	exam_record;

insert image description here
The left table and right table display all 13 rows (the picture is not truncated and the result is displayed), the difference is that the left table is the sum of grouped values, and the right table is the sum of all values.

order by

The calculation method will be determined according to whether the values ​​of the fields after the order by are the same. If yes range , the principle that the same values ​​in the fields will be incorporated into the calculation, and different values ​​will be accumulated in the calculation .
To sum up the following three comparisons (since the default is range, the following are the calculation principles of range), it can be found that the execution order is partition by first, then order by the values ​​in the group, and then accumulate and merge in the group in turn Aggregate function solution.

max,min

SELECT *,
max(score) over (PARTITION by uid ORDER BY uid DESC) max1,
max(score) over (PARTITION by uid ORDER BY score DESC) max2,
max(score) over (PARTITION by uid ORDER BY score ) max3
from exam_record;

Comparison of uid and score in different column fields after order 1
Insert picture description here](https://img-blog.csdnimg.cn/7487af9990ee4c14a8b8a15e24178da3.png)
The results of part1 and part2 are the same, but the calculation method is different:
the left table is arranged in descending order of uid. Since the uid values ​​are the same, the calculation is included in the group, and the overall (81,82,90,81) Find max90;
the right table is sorted in descending order of score, (90) max is 90, (90,82) max is 90, the two 81 values ​​are the same, merged into the calculation (90,82,81,81) max is 90, two Null value, (90,82,81,81,) max is 90.
When the column fields after order are the same (such as uid), the comparison of ascending and descending order 2
insert image description here
part1 and part2 results are the same, but the left table is arranged in ascending order of uid, and the right table is arranged in descending order of uid. Since there is no difference in the value of uid, the result is the same.
After the order, the column fields are the same, but when the column values ​​are different (for example, score) comparison of ascending and descending order 3 The
insert image description here
results of part1 and part2 are different, but the left table is arranged in descending order of score, and the right table is arranged in ascending order of score. Since the value of score is different, the result at this time is affected by the ascending and descending order of column values ​​after order by. Both can be understood as accumulating the maximum value in each uid group, but due to the problem of ascending and descending order, when the uid=1002 in part2 is arranged in ascending order, two 81s are incorporated into the calculation of (81,81) max It is 81, (81,81,82) max is 82, (81,81,82,90) max is 90, which is the maximum value in the group at this time.

The sorting of the overall comparison
insert image description here
score is based on the last sorting (part3), so the result of part3 is different from other parts.

avg

insert image description here
part1: First group by uid, then sort in ascending order according to the column score after order by, and calculate avg for all scores of uid after grouping.
part2: First group by uid, then sort in descending order according to the column score after order by, and calculate the avg of the uid scores after grouping, but the yellow box part appears twice because of 81, not the avg of (90, 82, 81) Value 84.333. Instead the avg value 83.5 is incorporated into the calculation (90,82,81,81).
part3: First group by uid, and then sort by the column uid after order by. Since the uids in the group are the same, the avg values ​​of all scores of uid after grouping are the same.

sum

SELECT
	uid,
	score,
	SUM(score) over ( ORDER BY score ) AS sum1,
	SUM(score) over ( PARTITION BY uid ORDER BY score DESC ) AS sum2,
	SUM(score) over ( PARTITION BY uid ORDER BY score ) AS sum3,
	SUM(score) over ( PARTITION BY uid ORDER BY uid ) AS sum4 
FROM
	exam_record;

insert image description here
The scores are sorted in ascending order of the last sum performed.
part1, summation without grouping, the same value will be incorporated into the calculation, and different values ​​will be accumulated and calculated.
part2, the grouping is summed in descending order, when the score is the same value, it will be included in the calculation, and when the score is different, it will be calculated cumulatively.
part3, the grouping is summed in ascending order, when the score has the same value, it will be included in the calculation, and when the score is different, it will be calculated cumulatively.
part4, the sum of the groups as a whole, the uid value is the same, and it is included in the calculation.

rows/range

The default value differs from both

SELECT
	uid,
	score,
	sum( score ) over ( PARTITION BY uid ) AS sum1,
	sum( score ) over ( PARTITION BY uid ORDER BY score ) AS sum2,
	sum( score ) over ( PARTITION BY uid ORDER BY score RANGE BETWEEN unbounded preceding AND current ROW ) AS sum3,
	sum( score ) over ( PARTITION BY uid ORDER BY score ROWs BETWEEN unbounded preceding AND current ROW ) AS sum4
FROM
	exam_record;

insert image description here
The sum1 in the green box has the same uid value, which is included in the calculation within the group.
The yellow box results in the same, indicating that the default value is range between unbounded preceding and current row.
The red box compares rangeand rowsdistinguishes, range will be included in the calculation, row will not be included in the calculation, and will be calculated according to the cumulative value in turn.

unbounde, preceding, following meaning

grammar:

  • CURRENT ROW: current row, offset 0
  • n PRECEDING: n rows of data ahead of the current row N is: the forward offset relative to the current row
  • n FOLLOWING: n rows of data after the current row N is: the backward offset relative to the current row
  • UNBOUNDED: The starting point, which can also be interpreted as unbounded
  • UNBOUNDED PRECEDING means from the beginning of the first row
  • UNBOUNDED FOLLOWING means to the end of the last line
  • BETWEEN unbounded preceding AND current ROW: from start to end (eg: sum6, sum8)
  • BETWEEN current ROW AND unbounded following: From the end point to the start point. (eg: sum7, sum9)

start to end, end to start

insert image description here

  • The red and yellow boxes indicate the top-to-bottom and bottom-to-top calculation methods for range and rows, respectively, under the condition that the score sorting method remains unchanged. On the left is the calculation from top to bottom, and on the right is the calculation from bottom to top.
  • The green box verifies the difference between range and rows.
  • Note: The order of the above fields and the execution order in SQL have been adjusted, and it needs to be interpreted against the field names.

specified number of lines

SELECT
	uid,
	score,
	sum( score ) over ( PARTITION BY uid ORDER BY score Rows 1 preceding ) AS sum10,
	-- 	sum( score ) over ( PARTITION BY uid ORDER BY score Rows 1 following ) AS sum11,
	sum( score ) over ( PARTITION BY uid ORDER BY score Rows BETWEEN 1 preceding AND 1 following ) AS sum12 
FROM
	exam_record;

insert image description here

  • sum10 is the sum of the current line and the previous line, ie sum(n-1,n)
  • sum12 is the sum of the 1 row before and after the specified current row, ie sum(n-1,n,n+1)
  • The wording of sum11 is wrong. Explain that following is only applicable to between...and... sentence patterns. The reason can be found in the explanation given below.

Equivalent writing

SELECT
	uid,
	score,
	sum( score ) over ( PARTITION BY uid ORDER BY score ) AS sum2,
	sum( score ) over ( PARTITION BY uid ORDER BY score DESC ) AS sum5,
	sum( score ) over ( PARTITION BY uid ORDER BY score DESC RANGE BETWEEN unbounded preceding AND current ROW ) AS sum6,
	sum( score ) over ( PARTITION BY uid ORDER BY score RANGE BETWEEN current ROW AND unbounded following ) AS sum7 
FROM
	exam_record;

insert image description here
The red boxes are different results in ascending and descending order of score.
The yellow box verifies the default is range between unbounded preceding and current row.
The green box indicates three different forms of writing, and the three are equivalent.

Guess you like

Origin blog.csdn.net/weixin_44964850/article/details/125010476