Mysql: Why use time limit, offset very much effect on performance

First, explain MySQL version:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.17    |
+-----------+
1 row in set (0.00 sec)

Table Structure:

mysql> desc test;
+--------+---------------------+------+-----+---------+----------------+
| Field  | Type                | Null | Key | Default | Extra          |
+--------+---------------------+------+-----+---------+----------------+
| id     | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| val    | int(10) unsigned    | NO   | MUL | 0       |                |
| source | int(10) unsigned    | NO   |     | 0       |                |
+--------+---------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

id to the primary key increment, val non-unique index.

Poured large amounts of data, a total of 500 million:

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|  5242882 |
+----------+
1 row in set (4.25 sec)

We know that when the offset limit offset rows of large, there will be efficiency:

mysql> select * from test where val=4 limit 300000,5;
+---------+-----+--------+
| id      | val | source |
+---------+-----+--------+
| 3327622 |   |      4 |
| 3327632 |   |      4 |
| 3327642 |   |      4 |
| 3327652 |   |      4 |
| 3327662 |   |      4 |
+---------+-----+--------+
5 rows in set (15.98 sec)

In order to achieve the same purpose, we generally will rewrite the following statement:

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| id      | val | source | id      |
+---------+-----+--------+---------+
3327622 |   4 |      | 3327622 |
| 3327632 |   |      4 | 3327632 |
3327642 |   4 |      | 3327642 |
| 3327652 |   |      4 | 3327652 |
3327662 |   4 |      | 3327662 |
+---------+-----+--------+---------+
5 rows in set (0.38 sec)

Time difference is obvious.

Why is there the results of the above? We look at select * from test where val=4 limit 300000,5;the query process:

  • Queries to index leaf node data.

  • All queries on the index field value required according to the master key to the leaf node cluster.

This figure is similar to the following:

 

Like above, need, inode 300,005 times, 300,005 times query data clustered index, then the final result filtered 300,000 before, removed the last five. MySQL cost a lot of random I / O on data query clustered index, while 300000 random I / O data to the query will not appear in the result of the set of.

Surely someone will ask: Since the beginning of the use of the index, why not first check along the index leaf node to node last five needs, then go clustered index to query the actual data. So that only 5 random I / O, the process is similar to the following picture:

In fact, I want to ask this question.

Confirm

Here we look at the actual operation to confirm the above reasoning:

To confirm the select * from test where val=4 limit 300000,5scan data node on the index nodes 300 005 and 300 005 clustered index, we need to know there is no way to count the number of MySQL query data through the index node in a node in sql. I first tried Handler_read_ * series, unfortunately not a variable to meet the conditions.

I can only be confirmed through an indirect way:

There InnoDB buffer pool. Which contains data pages recently visited, including data and index pages. So we need to run two sql, compare the number of data pages in the buffer pool.

The result is forecast to run select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;after the number of buffer pool data pages is far less than select * from test where val=4 limit 300000,5;the number of corresponding previous sql because only five times the data access page, and then a sql data access page 300005 times.

select * from test where val=limit 300000,5
mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
Empty set (0.04 sec)

可以看出,目前buffer pool中没有关于test表的数据页。

mysql> select * from test where val=4 limit 300000,5;
+---------+-----+--------+
| id      | val | source |
+---------+-----+--------+
| 3327622 |   |      4 |
| 3327632 |   |      4 |
| 3327642 |   |      4 |
| 3327652 |   |      4 |
| 3327662 |   |      4 |
+---------+-----+--------+
5 rows in set (26.19 sec)

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY    |     4098 |
| val        |      208 |
+------------+----------+
2 rows in set (0.04 sec)

可以看出,此时buffer pool中关于test表有4098个数据页,208个索引页。

select * from test a inner join (select id from test where val=limit 300000,5) b on a.id=b.id

为了防止上次试验的影响,我们需要清空buffer pool,重启mysql。

mysqladmin shutdown
/usr/local/bin/mysqld_safe &
mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
Empty set (0.03 sec)

运行sql:

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| id      | val | source | id      |
+---------+-----+--------+---------+
3327622 |   4 |      | 3327622 |
| 3327632 |   |      4 | 3327632 |
3327642 |   4 |      | 3327642 |
| 3327652 |   |      4 | 3327652 |
3327662 |   4 |      | 3327662 |
+---------+-----+--------+---------+
5 rows in set (0.09 sec)

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY    |        5 |
| val        |      390 |
+------------+----------+
2 rows in set (0.03 sec)

我们可以看明显的看出两者的差别:第一个sql加载了4098个数据页到buffer pool,而第二个sql只加载了5个数据页到buffer pool。符合我们的预测。也证实了为什么第一个sql会慢:读取大量的无用数据行(300000),最后却抛弃掉。

而且这会造成一个问题:加载了很多热点不是很高的数据页到buffer pool,会造成buffer pool的污染,占用buffer pool的空间。

遇到的问题

为了在每次重启时确保清空buffer pool,我们需要关闭innodb_buffer_pool_dump_at_shutdowninnodb_buffer_pool_load_at_startup,这两个选项能够控制数据库关闭时dump出buffer pool中的数据和在数据库开启时载入在磁盘上备份buffer pool的数据。

Guess you like

Origin www.cnblogs.com/ldsweely/p/11987968.html