Window function OVER(PARTITION BY ...)

The syntax structure of the windowing function: analysis function () over (partition by grouping column name order by sorting column name rows between starting position and ending position)

The over() function includes three functions: partition by column name, sort order by column name, specify window range rows between starting position and ending position
rows between ... and ... are rarely used

       We know that aggregate functions perform calculations on a set of values ​​and return a single value, such as sum(), count(), max(), min(), avg(), etc. These functions are often used in conjunction with the group by clause. Except for COUNT, aggregate functions ignore null values.
       But sometimes a set of data returning only a set of values ​​does not meet the needs. For example, we often want to know the top rankings in each region, each class or each subject. At this time, each group needs to return multiple values. It is very convenient to use windowing functions to solve such problems. The difference between it and an aggregate function is that it returns multiple rows for each group, while an aggregate function returns only one row for each group.


Create table

DROP TABLE IF EXISTS temp

CREATE TABLE temp(
    id INT,
    name VARCHAR(10),
    class VARCHAR(10),
    score INT 
);
 
INSERT INTO temp (id, name, class, score) VALUES (1,'公孙衍', '2', 81);
INSERT INTO temp (id, name, class, score) VALUES (2,'廉颇', '3', 55);
INSERT INTO temp (id, name, class, score) VALUES (3,'李牧', '3', 55);
INSERT INTO temp (id, name, class, score) VALUES (4,'王翦', '1', 96);
INSERT INTO temp (id, name, class, score) VALUES (5,'王贲', '1', 92);
INSERT INTO temp (id, name, class, score) VALUES (6,'白起', '1', 96);
INSERT INTO temp (id, name, class, score) VALUES (7,'蔺相如', '3', 90);
INSERT INTO temp (id, name, class, score) VALUES (8,'赵胜', '3', 81);
INSERT INTO temp (id, name, class, score) VALUES (9,'赵雍', '3', 93);
INSERT INTO temp (id, name, class, score) VALUES (10,'魏无忌', '2', 92);

OVER(PARTITION BY … ORDER BY … DESC)

Example:

No group sorting Group sorting (for classes)
SELECT name,class,score, ROW_NUMBER() OVER(ORDER BY score DESC) mm FROM temp SELECT name,class,score, ROW_NUMBER() OVER(PARTITION BY class ORDER BY score DESC) mm FROM temp
--也能用无分组无次序的排序实现,只是没有排序后的次序(也就是没有mm列)
SELECT  * FROM temp ORDER BY sroce DESC

Example:

Check the results of the first place in each class Check the results of the last place in each class
SELECT name,class,score FROM (SELECT name,class,score, RANK() OVER(PARTITION BY class ORDER BY score DESC) mm FROM TEMP ) a WHERE mm = 1; SELECT name,class,score FROM ( SELECT name,class,score, RANK() OVER(PARTITION BY class ORDER BY score) mm FROM temp ) a WHERE mm = 1;

When looking for the first place result, row_number () cannot be used, because if two students in the same class tie for first place, mm=1 will only return one result.
Without desc, the sorting defaults to ascending order. Take mm=1, which is the last one.


Group sorting functions: row_number(), rank(), dense_rank(), ntile()

  • select *,ROW_NUMBER() over(order by name) as 排序 from temp

Sorting, the values ​​will not be sorted repeatedly. Such as 1,2,3,4,5

  • select *,RANK() over(order by name) as 排序 from temp

Sorting, the values ​​are sorted repeatedly, with gaps. Such as 1,1,3,4

  • select *,DENSE_RANK() over(order by name) as 排序 from temp

Sorting, the values ​​are sorted repeatedly without gaps. Such as 1,1,2,2,3,4,5

  • select *,NTILE(2) over(order by name) as 排序 from temp

Sort and divide into 2 groups. This function is generally used to retrieve the first few percent of data in the table. For example, to take the first 25% of the data, divide the data into 4 groups, and then the field condition is equal to 1.


Offset analysis window functions lag(), lead()

The lag and lead analysis functions can extract the first N rows of data (lag) and the last N rows of data (lead) of the same field as independent columns in the same query.
In practical applications, the application of lag and lead functions is particularly important when taking the difference between a certain field between today and yesterday. Of course, this operation can be implemented using self-joining of tables, but lag and lead are more efficient than self-joining such as left join and right join.

lag(exp_str, offset, defval) over(partition by … order by …)
lead(exp_str, offset, defval) over(partition by … order by …)
  • exp_str is the field name
  • offset is the offset, that is, the last 1 or the upper N values. Assume that the current row is ranked 5th in the table, and the offset is 3, which means that the data row we are looking for is the 2nd row in the table ( That is 5-3=2). The default value of offset is 1.
  • defval is the default value. When the two functions take the upper N/lower N values, and when N rows forward from the current row position in the table exceed the range of the table, the lag() function uses the defval parameter value as a function The return value, if no default value is specified, NUL is returned. In mathematical operations, a default value must always be given to avoid errors.

1. lag() example

SELECT id,score,
LAG(score,1,0)OVER() AS n1, 
LAG(score,1)  OVER() AS n2,
LAG(score,2,0)OVER() AS n6, 
LAG(score,2)  OVER() AS n7
FROM temp

2. lead() instance

SELECT id,score,
LEAD(score,1,0)OVER() AS n1, 
LEAD(score,1)  OVER() AS n2,
LEAD(score,2,0)OVER() AS n6, 
LEAD(score,2)  OVER() AS n7
FROM temp


Other aggregate functions

name describe
CUME_DIST() Cumulative allocation value
DENSE_RANK() The rank of the current row within its partition, without gaps
FIRST_VALUE() Parameter value of the first line of the window frame
LAG() Parameter values ​​from rows within the partition that lag the current row
LAST_VALUE() The parameter value of the last line of the window frame
LEAD() Intra-partition rows precede the parameter value of the current row
NTH_VALUE() Parameter value from window frame on line N
NTILE() The number of buckets for the current row within its partition
PERCENT_RANK() Percent rank value
RANK() The current row's ranking within its partition, with gaps
ROW_NUMBER() The current number of rows within its partition

Group by simply groups the reserved rows of the search results, and is generally used together with aggregate functions. Such as max, min, sum, avg, count, etc. are used together. Although partition by also has a grouping function, it also has other advanced functions.

Use of sum() over()

All fields are displayed for ease of viewing. When there is a clear goal, the corresponding fields can be appropriately selected.

SELECT t.*, SUM(t.score) s_sum FROM temp t GROUP BY t.class SELECT t.*, SUM(t.score) OVER(PARTITION BY t.class ORDER BY t.score DESC) s_sum FROM TEMP t SELECT t.*, SUM(t.score) OVER(ORDER BY t.id) s_sum FROM temp t

 

Use of avg() over()

SELECT id, score, AVG(score) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) FROM temp 

 

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
limits the range for calculating the moving average. This statement includes this row and the first two rows.

Syntax summary:

avg(…A...) over(partition by …b… order by …C… rows between …D1… and …D2…)
sum(…A…) over(partition by …b… order by …C… rows between …D1… and …D2…)

  • A: The name of the field that needs to be processed
  • B: Grouped field name
  • C: Sorted field name
  • D: Calculated row number range

Window scope description:

  • preceding:forward
  • following: going forward
  • current row: current row
  • unbounded: starting point (generally used in combination with preceding and following)
  • unbounded preceding represents the frontmost line (starting point) of the window
  • unbounded following: indicates the last line (end point) of the window

Guess you like

Origin blog.csdn.net/wangshiqi666/article/details/131493662