An example of MySQL 8.0 window function optimization SQL

The window function of 8.0 is really fragrant

1. Problem description

Recently, I have been tossing to write all mysql slow query logs into the database, then display them in a centralized manner, and open them to the business department. It is also convenient for the students in the business department to view and optimize the slow SQL in their respective businesses. Added the function of regular report generation, which counts the changes in the number of slow queries in the last 1 to 2 weeks, and provides a more intuitive data comparison for business classmates, and understands the changes in the number of slow queries in the recent period, whether it is more or less . So there is the following SQL:

select hostname_max , db_max, sum(ts_cnt) as 1W
(select ifnull(sum(t1.ts_cnt),0) as ts_cnt from global_query_review_history t1 where 
t1.hostname_max=t2.hostname_max and t1.ts_min>= date_sub(now(), interval 14 day) and 
t1.ts_max<= date_sub(now(), interval 7 day)) AS 2W 
from global_query_review_history t2 where 
ts_min>= date_sub(now(), interval 7 day) 
group by hostname_max, db_max 
order by 1W desc limit 20;

The current global_query_review_history table has about 25,000 records. This SQL takes 1.16 seconds, which is obviously too slow. The following is the SQL execution plan:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t2
   partitions: NULL
         type: ALL
possible_keys: ts_min
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25198
     filtered: 41.09
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t1
   partitions: NULL
         type: ref
possible_keys: hostname_max,ts_min
          key: hostname_max
      key_len: 258
          ref: func
         rows: 20
     filtered: 14.90
        Extra: Using where

You can see that you need to perform a subquery (it cannot be automatically optimized into a JOIN).

Status statistics after SQL execution:

+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 0      |
| Handler_read_key      | 17328  |
| Handler_read_last     | 0      |
| Handler_read_next     | 809121 |
| Handler_read_prev     | 0      |
| Handler_read_rnd      | 0      |
| Handler_read_rnd_next | 25380  |
+-----------------------+--------+

It can be seen that in addition to the full table scan, multiple row-by-row scans based on the index (Handler_read_next = 809121, caused by subquery).

2. SQL optimization

The main bottleneck of the above SQL lies in the nested sub-query, and even the full table scan is very fast even if the sub-query is removed.

[[email protected]]> select ...
...
20 rows in set (0.08 sec)

[[email protected]]> show status like 'handler%read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 16910 |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 25380 |
+-----------------------+-------+

The difficulty of SQL optimization naturally thought of Mr. Songhua first. After learning that I was using MySQL 8.0, he helped to transform it into a way of writing based on window functions:

select hostname_max , db_max,
sum( case when ts_min>= date_sub(now(), interval 7 day)  then ts_cnt end ) as 1W,
ifnull(sum(case when  ts_min>= date_sub(now(), interval 14 day)
   and ts_max<= date_sub(now(), interval 7 day) then ts_cnt end ) over(partition by hostname_max),0) 2W
from global_query_review_history t2
 where ts_min>= date_sub(now(), interval 14 day)
group by hostname_max, db_max
order by 1W desc limit 20;

Look at the execution plan again:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: ts_min
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25198
     filtered: 44.88
        Extra: Using where; Using temporary; Using filesort

The new SQL is more tricky, you only need to read the data once, and use the window function to directly calculate the required statistics. Although there are available indexes, because the amount of data to be scanned is relatively large, it finally becomes a full table scan. The new SQL time-consuming and status statistics are as follows:

20 rows in set (0.08 sec)

[[email protected]]> show status like 'handler%read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 24396 |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 886   |
| Handler_read_rnd_next | 26703 |
+-----------------------+-------+

There is a big gap with the previous SQL, and the optimization effect leverages.

The full text is over.

Enjoy MySQL 8.0 :)

Further reading

Scan the code and follow the course of "In-depth SQL Programming Development and Optimization" by Songhua

Or click " Read the original " at the end of the article to go directly

Guess you like

Origin blog.csdn.net/n88Lpo/article/details/111939617