MySQL's Limit performance is poor? Can't really use it anymore?

First explain the version of MySQL:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.17 |
+-----------+
1 row in set (0.00 sec)mysql> select version();+-----------+| version() |+-----------+| 5.7.17 |+-----------+1 row in set (0.00 sec)

Table Structure:

mysql> desc test;
+--------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| val | int(10) unsigned | NO | MUL | 0 | |
| source | int(10) unsigned | NO | | 0 | |
+--------+---------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

id is a self-increasing primary key, and val is a non-unique index.

Fill in a lot of data, a total of 5 million:

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 5242882 |
+----------+
1 row in set (4.25 sec)

We know that when the offset in the limit offset rows is large, there will be efficiency problems:

mysql> select * from test where val=4 limit 300000,5;
+---------+-----+--------+
| id | val | source |
+---------+-----+--------+
| 3327622 | 4 | 4 |
| 3327632 | 4 | 4 |
| 3327642 | 4 | 4 |
| 3327652 | 4 | 4 |
| 3327662 | 4 | 4 |
+---------+-----+--------+
5 rows in set (15.98 sec)

In order to achieve the same purpose, we will generally rewrite the following statement:

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| id | val | source | id |
+---------+-----+--------+---------+
| 3327622 | 4 | 4 | 3327622 |
| 3327632 | 4 | 4 | 3327632 |
| 3327642 | 4 | 4 | 3327642 |
| 3327652 | 4 | 4 | 3327652 |
| 3327662 | 4 | 4 | 3327662 |
+---------+-----+--------+---------+
5 rows in set (0.38 sec)

The time difference is obvious.

Why does the above result appear? Let's look at the query process of select * from test where val = 4 limit 300000,5 ;:

  • Query the index leaf node data.
  • According to the primary key value on the leaf node, all the field values ​​needed for the query on the clustered index are queried.

Similar to the picture below:

MySQL's Limit performance is poor?  Can't really use it anymore?

 

image

Like the above, you need to query the index node 300005 times, query the data of the cluster index 300005 times, and finally filter the result to the first 300,000 items, and extract the last 5 items. MySQL spends a lot of random I / O on querying the clustered index data, and the data that is queried by 300,000 random I / O will not appear in the result set.

Someone will definitely ask: Since the index is used at the beginning, why not query the last 5 nodes along the index leaf node first, and then go to the clustered index to query the actual data. This requires only 5 random I / Os, similar to the process in the following picture:

MySQL's Limit performance is poor?  Can't really use it anymore?

 

image

In fact, I also want to ask this question.

Confirm

Let's actually operate to confirm the above reasoning:

In order to confirm that select * from test where val = 4 limit 300000, 5 scans 300005 index nodes and 300005 clustered data nodes on the clustered index, we need to know whether MySQL has a way to count the data nodes that are indexed by the index node in a SQL frequency. I first tried the Handler_read_ * series, unfortunately no variable can meet the conditions.

I can only confirm by indirect means:

There is a buffer pool in InnoDB. It contains recently accessed data pages, including data pages and index pages. So we need to run two sql to compare the number of data pages in the buffer pool. The prediction result is that after running select * from test a inner join (select id from test where val = 4 limit 300000,5) b>, the number of data pages in the buffer pool is far less than select * from test where val = 4 limit 300000,5; the corresponding number, because the previous sql only accessed the data page 5 times, and the latter sql accessed the data page 300005 times.

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;+---------+-----+--------+---------+| id      | val | source | id      |+---------+-----+--------+---------+| 3327622 |   4 |      4 | 3327622 || 3327632 |   4 |      4 | 3327632 || 3327642 |   4 |      4 | 3327642 || 3327652 |   4 |      4 | 3327652 || 3327662 |   4 |      4 | 3327662 |+---------+-----+--------+---------+5 rows in set (0.38 sec)

It can be seen that there is currently no data page about the test table in the buffer pool.

mysql> select * from test where val=4 limit 300000,5;
+---------+-----+--------+
| id | val | source |
+---------+-----+--------+
| 3327622 | 4 | 4 |
| 3327632 | 4 | 4 |
| 3327642 | 4 | 4 |
| 3327652 | 4 | 4 |
| 3327662 | 4 | 4 |
+---------+-----+--------+
5 rows in set (26.19 sec)

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY | 4098 |
| val | 208 |
+------------+----------+
2 rows in set (0.04 sec)

It can be seen that there are 4,098 data pages and 208 index pages for the test table in the buffer pool.

`select * from test a inner join (select id from test where val = 4 limit 300000,5)` `b> In order to prevent the impact of the last test, we need to empty the buffer pool and restart mysql.

mysqladmin shutdown
/usr/local/bin/mysqld_safe &
mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
Empty set (0.03 sec)

Run sql:

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| id | val | source | id |
+---------+-----+--------+---------+
| 3327622 | 4 | 4 | 3327622 |
| 3327632 | 4 | 4 | 3327632 |
| 3327642 | 4 | 4 | 3327642 |
| 3327652 | 4 | 4 | 3327652 |
| 3327662 | 4 | 4 | 3327662 |
+---------+-----+--------+---------+
5 rows in set (0.09 sec)

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY | 5 |
| val | 390 |
+------------+----------+
2 rows in set (0.03 sec)

We can clearly see the difference between the two: the first sql loads 4,098 data pages to the buffer pool, while the second sql only loads 5 data pages to the buffer pool. In line with our forecasts. It also confirms why the first sql is slow: read a large number of useless data rows (300000), and finally discard it.

And this will cause a problem: loading a lot of hot data pages that are not very high into the buffer pool will cause buffer pool pollution and occupy the buffer pool space.

Problems encountered

In order to ensure that the buffer pool is cleared every time we restart, we need to close innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup. These two options can control the dumping of the data in the buffer pool when the database is closed and the data in the backup buffer pool on the disk when the database is started.

Guess you like

Origin www.cnblogs.com/CQqfjy/p/12717514.html