Window function as a new function introduced by mysql8.0
Can solve many ranking-related problems
One, build a table
create table student(
id int ,
st_name varchar(20),
score int,
date_time varchar(20),
primary key (id,date_time)
);
Two, insert data
insert into student value(101,'张飞',62,'2017');
insert into student value(101,'张飞',30,'2007');
insert into student value(101,'张飞',82,'2020');
insert into student value(101,'张飞',62,'2015');
insert into student value(101,'张飞',62,'2019');
insert into student value(102,'刘备',92,'2017');
insert into student value(102,'刘备',72,'2010');
insert into student value(102,'刘备',62,'2011');
insert into student value(102,'刘备',98,'2020');
insert into student value(103,'孙悟空',98,'2010');
insert into student value(103,'孙悟空',98,'2011');
insert into student value(103,'孙悟空',98,'2012');
insert into student value(103,'孙悟空',97,'2013');
insert into student value(103,'孙悟空',98,'2014');
insert into student value(103,'孙悟空',99,'2015');
insert into student value(103,'孙悟空',100,'2018');
insert into student value(104,'哪吒',10,'2010');
insert into student value(104,'哪吒',20,'2011');
insert into student value(104,'哪吒',30,'2012');
insert into student value(104,'哪吒',40,'2013');
insert into student value(104,'哪吒',50,'2014');
insert into student value(104,'哪吒',40,'2015');
insert into student value(104,'哪吒',50,'2016');
Three, window function
Mainly divided into 4 categories:
Usage: window function over (partition by grouping field order by sorting field)
(1) Sequence number function: rank(), dense_rank(), row_number()
- select *,rank() over(partition by id order by score)as ranking from student;
id, st_name, score, date_time, ranking
101, 张飞, 30, 2007, 1
101, 张飞, 62, 2015, 2
101, 张飞, 62, 2017, 2
101, 张飞, 62, 2019, 2
101, 张飞, 82, 2020, 5
102, 刘备, 62, 2011, 1
102, 刘备, 72, 2010, 2
102, 刘备, 92, 2017, 3
102, 刘备, 98, 2020, 4
103, 孙悟空, 97, 2013, 1
103, 孙悟空, 98, 2010, 2
103, 孙悟空, 98, 2011, 2
103, 孙悟空, 98, 2012, 2
103, 孙悟空, 98, 2014, 2
103, 孙悟空, 99, 2015, 6
103, 孙悟空, 100, 2018, 7
104, 哪吒, 10, 2010, 1
104, 哪吒, 20, 2011, 2
104, 哪吒, 30, 2012, 3
104, 哪吒, 40, 2013, 4
104, 哪吒, 40, 2015, 4
104, 哪吒, 50, 2014, 6
104, 哪吒, 50, 2016, 6
- select *,dense_rank() over(partition by id order by score)as ranking from student;
id, st_name, score, date_time, ranking
101, 张飞, 30, 2007, 1
101, 张飞, 62, 2015, 2
101, 张飞, 62, 2017, 2
101, 张飞, 62, 2019, 2
101, 张飞, 82, 2020, 3
102, 刘备, 62, 2011, 1
102, 刘备, 72, 2010, 2
102, 刘备, 92, 2017, 3
102, 刘备, 98, 2020, 4
103, 孙悟空, 97, 2013, 1
103, 孙悟空, 98, 2010, 2
103, 孙悟空, 98, 2011, 2
103, 孙悟空, 98, 2012, 2
103, 孙悟空, 98, 2014, 2
103, 孙悟空, 99, 2015, 3
103, 孙悟空, 100, 2018, 4
104, 哪吒, 10, 2010, 1
104, 哪吒, 20, 2011, 2
104, 哪吒, 30, 2012, 3
104, 哪吒, 40, 2013, 4
104, 哪吒, 40, 2015, 4
104, 哪吒, 50, 2014, 5
104, 哪吒, 50, 2016, 5
- select *,row_number() over(partition by id order by score)as ranking from student;
id, st_name, score, date_time, ranking
101, 张飞, 30, 2007, 1
101, 张飞, 62, 2015, 2
101, 张飞, 62, 2017, 3
101, 张飞, 62, 2019, 4
101, 张飞, 82, 2020, 5
102, 刘备, 62, 2011, 1
102, 刘备, 72, 2010, 2
102, 刘备, 92, 2017, 3
102, 刘备, 98, 2020, 4
103, 孙悟空, 97, 2013, 1
103, 孙悟空, 98, 2010, 2
103, 孙悟空, 98, 2011, 3
103, 孙悟空, 98, 2012, 4
103, 孙悟空, 98, 2014, 5
103, 孙悟空, 99, 2015, 6
103, 孙悟空, 100, 2018, 7
104, 哪吒, 10, 2010, 1
104, 哪吒, 20, 2011, 2
104, 哪吒, 30, 2012, 3
104, 哪吒, 40, 2013, 4
104, 哪吒, 40, 2015, 5
104, 哪吒, 50, 2014, 6
104, 哪吒, 50, 2016, 7
Comparing the above data, it can be seen that according to the ranking performed by score, when the same score appears, the results of these three functions are different.
rank: There is a situation where the rank is tied, the same rank occupies the rank value, until the next different value appears, the value is increasing
dense_rank: There is a situation where the rank is tied, the same rank does not occupy the ranking value, until the next different value appears, the value can be increased by 1
row_number: There is no case where there is a tie, the ranking can increase in order
(2) Distribution function: percent_rank(), cume_dist()
- select *,percent_rank() over(partition by id order by score)as ranking from student;
id, st_name, score, date_time, ranking
101, 张飞, 30, 2007, 0
101, 张飞, 62, 2015, 0.25
101, 张飞, 62, 2017, 0.25
101, 张飞, 62, 2019, 0.25
101, 张飞, 82, 2020, 1
102, 刘备, 62, 2011, 0
102, 刘备, 72, 2010, 0.3333333333333333
102, 刘备, 92, 2017, 0.6666666666666666
102, 刘备, 98, 2020, 1
103, 孙悟空, 97, 2013, 0
103, 孙悟空, 98, 2010, 0.16666666666666666
103, 孙悟空, 98, 2011, 0.16666666666666666
103, 孙悟空, 98, 2012, 0.16666666666666666
103, 孙悟空, 98, 2014, 0.16666666666666666
103, 孙悟空, 99, 2015, 0.8333333333333334
103, 孙悟空, 100, 2018, 1
104, 哪吒, 10, 2010, 0
104, 哪吒, 20, 2011, 0.16666666666666666
104, 哪吒, 30, 2012, 0.3333333333333333
104, 哪吒, 40, 2013, 0.5
104, 哪吒, 40, 2015, 0.5
104, 哪吒, 50, 2014, 0.8333333333333334
104, 哪吒, 50, 2016, 0.8333333333333334
How is this value calculated? The answer is: (rank-1) / (rows-1)
Rank is the rank calculated by the rank() function, and rows is the number of rows
For example, let’s take Zhang Fei as an example
Rank=1 in the first row, so the value is (1-1)/4=0
Rank=2 in the second row, so the value is (2-1)/4=0.25
Rank=2 in the third row, so the value is (2-1)/4=0.25
Rank=2 in the fourth row, so the value is (2-1)/4=0.25
Rank=5 in the fifth row, so the value is (5-1)/4=1
- select *,cume_dist() over(partition by id order by score)as ranking from student;
id, st_name, score, date_time, ranking
101, 张飞, 30, 2007, 0.2
101, 张飞, 62, 2015, 0.8
101, 张飞, 62, 2017, 0.8
101, 张飞, 62, 2019, 0.8
101, 张飞, 82, 2020, 1
102, 刘备, 62, 2011, 0.25
102, 刘备, 72, 2010, 0.5
102, 刘备, 92, 2017, 0.75
102, 刘备, 98, 2020, 1
103, 孙悟空, 97, 2013, 0.14285714285714285
103, 孙悟空, 98, 2010, 0.7142857142857143
103, 孙悟空, 98, 2011, 0.7142857142857143
103, 孙悟空, 98, 2012, 0.7142857142857143
103, 孙悟空, 98, 2014, 0.7142857142857143
103, 孙悟空, 99, 2015, 0.8571428571428571
103, 孙悟空, 100, 2018, 1
104, 哪吒, 10, 2010, 0.14285714285714285
104, 哪吒, 20, 2011, 0.2857142857142857
104, 哪吒, 30, 2012, 0.42857142857142855
104, 哪吒, 40, 2013, 0.7142857142857143
104, 哪吒, 40, 2015, 0.7142857142857143
104, 哪吒, 50, 2014, 1
104, 哪吒, 50, 2016, 1
How is this value calculated? The answer is: rank_rows / rows
rank_rows: The number of rows in the group that are less than or equal to the current rank value / the total number of rows in the group
rows: the total number of rows in the group
Let's take Zhang Fei as an example
In the first row, rank_rows is equal to 1, and rows is equal to 5, so the result is 0.2
In the second row, rank_rows is equal to 4 and rows is equal to 5, so the result is 0.8
In the third row, rank_rows is equal to 4 and rows is equal to 5, so the result is 0.8
In the fourth row, rank_rows is equal to 4 and rows is equal to 5, so the result is 0.8
In the fifth row, rank_rows is equal to 5, and rows is equal to 5, so the result is 1
(3) Before and after functions: lag(), lead()
- select *,lag(score,1) over(partition by id order by score)as score from student;
id, st_name, score, date_time, score
101, 张飞, 30, 2007,
101, 张飞, 62, 2015, 30
101, 张飞, 62, 2017, 62
101, 张飞, 62, 2019, 62
101, 张飞, 82, 2020, 62
102, 刘备, 62, 2011,
102, 刘备, 72, 2010, 62
102, 刘备, 92, 2017, 72
102, 刘备, 98, 2020, 92
103, 孙悟空, 97, 2013,
103, 孙悟空, 98, 2010, 97
103, 孙悟空, 98, 2011, 98
103, 孙悟空, 98, 2012, 98
103, 孙悟空, 98, 2014, 98
103, 孙悟空, 99, 2015, 98
103, 孙悟空, 100, 2018, 99
104, 哪吒, 10, 2010,
104, 哪吒, 20, 2011, 10
104, 哪吒, 30, 2012, 20
104, 哪吒, 40, 2013, 30
104, 哪吒, 40, 2015, 40
104, 哪吒, 50, 2014, 40
104, 哪吒, 50, 2016, 50
- select *,lead(score,1) over(partition by id order by score)as score from student;
id, st_name, score, date_time, score
101, 张飞, 30, 2007, 62
101, 张飞, 62, 2015, 62
101, 张飞, 62, 2017, 62
101, 张飞, 62, 2019, 82
101, 张飞, 82, 2020,
102, 刘备, 62, 2011, 72
102, 刘备, 72, 2010, 92
102, 刘备, 92, 2017, 98
102, 刘备, 98, 2020,
103, 孙悟空, 97, 2013, 98
103, 孙悟空, 98, 2010, 98
103, 孙悟空, 98, 2011, 98
103, 孙悟空, 98, 2012, 98
103, 孙悟空, 98, 2014, 99
103, 孙悟空, 99, 2015, 100
103, 孙悟空, 100, 2018,
104, 哪吒, 10, 2010, 20
104, 哪吒, 20, 2011, 30
104, 哪吒, 30, 2012, 40
104, 哪吒, 40, 2013, 40
104, 哪吒, 40, 2015, 50
104, 哪吒, 50, 2014, 50
104, 哪吒, 50, 2016,
(4) Head and tail functions: first_value(), last_value()
- select *,first_value(score) over(partition by id order by score)as score from student;
id, st_name, score, date_time, score
101, 张飞, 30, 2007, 30
101, 张飞, 62, 2015, 30
101, 张飞, 62, 2017, 30
101, 张飞, 62, 2019, 30
101, 张飞, 82, 2020, 30
102, 刘备, 62, 2011, 62
102, 刘备, 72, 2010, 62
102, 刘备, 92, 2017, 62
102, 刘备, 98, 2020, 62
103, 孙悟空, 97, 2013, 97
103, 孙悟空, 98, 2010, 97
103, 孙悟空, 98, 2011, 97
103, 孙悟空, 98, 2012, 97
103, 孙悟空, 98, 2014, 97
103, 孙悟空, 99, 2015, 97
103, 孙悟空, 100, 2018, 97
104, 哪吒, 10, 2010, 10
104, 哪吒, 20, 2011, 10
104, 哪吒, 30, 2012, 10
104, 哪吒, 40, 2013, 10
104, 哪吒, 40, 2015, 10
104, 哪吒, 50, 2014, 10
104, 哪吒, 50, 2016, 10
- select *,last_value(score) over(partition by id order by score)as score from student;
id, st_name, score, date_time, score
101, 张飞, 30, 2007, 30
101, 张飞, 62, 2015, 62
101, 张飞, 62, 2017, 62
101, 张飞, 62, 2019, 62
101, 张飞, 82, 2020, 82
102, 刘备, 62, 2011, 62
102, 刘备, 72, 2010, 72
102, 刘备, 92, 2017, 92
102, 刘备, 98, 2020, 98
103, 孙悟空, 97, 2013, 97
103, 孙悟空, 98, 2010, 98
103, 孙悟空, 98, 2011, 98
103, 孙悟空, 98, 2012, 98
103, 孙悟空, 98, 2014, 98
103, 孙悟空, 99, 2015, 99
103, 孙悟空, 100, 2018, 100
104, 哪吒, 10, 2010, 10
104, 哪吒, 20, 2011, 20
104, 哪吒, 30, 2012, 30
104, 哪吒, 40, 2013, 40
104, 哪吒, 40, 2015, 40
104, 哪吒, 50, 2014, 50
104, 哪吒, 50, 2016, 50