The window function of 8.0 is really fragrant
1. Problem description
Recently, I have been tossing to write all mysql slow query logs into the database, then display them in a centralized manner, and open them to the business department. It is also convenient for the students in the business department to view and optimize the slow SQL in their respective businesses. Added the function of regular report generation, which counts the changes in the number of slow queries in the last 1 to 2 weeks, and provides a more intuitive data comparison for business classmates, and understands the changes in the number of slow queries in the recent period, whether it is more or less . So there is the following SQL:
select hostname_max , db_max, sum(ts_cnt) as 1W
(select ifnull(sum(t1.ts_cnt),0) as ts_cnt from global_query_review_history t1 where
t1.hostname_max=t2.hostname_max and t1.ts_min>= date_sub(now(), interval 14 day) and
t1.ts_max<= date_sub(now(), interval 7 day)) AS 2W
from global_query_review_history t2 where
ts_min>= date_sub(now(), interval 7 day)
group by hostname_max, db_max
order by 1W desc limit 20;
The current global_query_review_history table has about 25,000 records. This SQL takes 1.16 seconds, which is obviously too slow. The following is the SQL execution plan:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: t2
partitions: NULL
type: ALL
possible_keys: ts_min
key: NULL
key_len: NULL
ref: NULL
rows: 25198
filtered: 41.09
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: t1
partitions: NULL
type: ref
possible_keys: hostname_max,ts_min
key: hostname_max
key_len: 258
ref: func
rows: 20
filtered: 14.90
Extra: Using where
You can see that you need to perform a subquery (it cannot be automatically optimized into a JOIN).
Status statistics after SQL execution:
+-----------------------+--------+
| Variable_name | Value |
+-----------------------+--------+
| Handler_read_first | 0 |
| Handler_read_key | 17328 |
| Handler_read_last | 0 |
| Handler_read_next | 809121 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 25380 |
+-----------------------+--------+
It can be seen that in addition to the full table scan, multiple row-by-row scans based on the index (Handler_read_next = 809121, caused by subquery).
2. SQL optimization
The main bottleneck of the above SQL lies in the nested sub-query, and even the full table scan is very fast even if the sub-query is removed.
[[email protected]]> select ...
...
20 rows in set (0.08 sec)
[[email protected]]> show status like 'handler%read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 16910 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 25380 |
+-----------------------+-------+
The difficulty of SQL optimization naturally thought of Mr. Songhua first. After learning that I was using MySQL 8.0, he helped to transform it into a way of writing based on window functions:
select hostname_max , db_max,
sum( case when ts_min>= date_sub(now(), interval 7 day) then ts_cnt end ) as 1W,
ifnull(sum(case when ts_min>= date_sub(now(), interval 14 day)
and ts_max<= date_sub(now(), interval 7 day) then ts_cnt end ) over(partition by hostname_max),0) 2W
from global_query_review_history t2
where ts_min>= date_sub(now(), interval 14 day)
group by hostname_max, db_max
order by 1W desc limit 20;
Look at the execution plan again:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: ts_min
key: NULL
key_len: NULL
ref: NULL
rows: 25198
filtered: 44.88
Extra: Using where; Using temporary; Using filesort
The new SQL is more tricky, you only need to read the data once, and use the window function to directly calculate the required statistics. Although there are available indexes, because the amount of data to be scanned is relatively large, it finally becomes a full table scan. The new SQL time-consuming and status statistics are as follows:
20 rows in set (0.08 sec)
[[email protected]]> show status like 'handler%read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 24396 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 886 |
| Handler_read_rnd_next | 26703 |
+-----------------------+-------+
There is a big gap with the previous SQL, and the optimization effect leverages.
The full text is over.
Enjoy MySQL 8.0 :)
Further reading
Continuous optimization of EXISTS and NOT EXISTS in MySQL 8.0
What if SQL optimization is difficult? Give you a simple and violent way
MySQL Join can be played like this? Copy according to conditions
Scan the code and follow the course of "In-depth SQL Programming Development and Optimization" by Songhua
Or click " Read the original " at the end of the article to go directly