Analysis of MySQL Millions of Data Depth Paging Optimization Ideas

Business scene

Generally, in project development, there will be a lot of statistical data that needs to be reported and analyzed. Generally, after the analysis, it will be displayed in the background for operations and products to view in pages . The most common one is to filter by date . This kind of statistical data will gradually increase in size over time, and it is only a matter of time before reaching millions or tens of millions of data.

Bottleneck reproduction

Created a user table and added an index to the create_time field . And added 100w pieces of data to the table.

Here we use the limit pagination method to query the difference in query time between the first 5 pieces of data and the last 5 pieces of data.

Querying the first 10 items basically does not consume much time

When we fetch data from 50w+, the query takes 1 second .

The keyword SQL_NO_CACHE
is to prevent SQL queries from being cached.

With the same SQL statement and different paging conditions, the performance gap between the two is so large, then as the amount of data grows, the time spent querying subsequent pages will logically increase.

problem analysis

return form

We generally create indexes for fields with high query frequency. Indexes will improve our query efficiency. Our statement above uses SELECT * FROM user , but not all of our fields are indexed. After the qualified data is queried from the index file , it is also necessary to query the fields that have not been indexed from the data file . Then this process is called back to the table .

covering index

如果查询的字段正好创建了索引了,比如 SELECT create_time FROM user,我们查询的字段是我们创建的索引,那么这个时候就不需要再去数据文件里面查询,也就不需要回表。这种情况我们称之为覆盖索引

IO

回表操作通常是IO操作,因为需要根据索引查找到数据行后,再根据数据行的主键或唯一索引去聚簇索引中查找具体的数据行。聚簇索引一般是存储在磁盘上的数据文件,因此在执行回表操作时需要从磁盘读取数据,而磁盘IO是相对较慢的操作。

LIMTI 2000,10 ?

你有木有想过LIMIT 2000,10会不会扫描1-2000行,你之前有没有跟我一样,觉得数据是直接从2000行开始取的,前面的根本没扫描或者不回表。其实这样的写法,一个完整的流程是查询数据,如果不能覆盖索引,那么也是要回表查询数据的。

现在你知道为什么越到后面查询越慢了吧!

问题总结

我们现在知道了LIMIT 遇到后面查询的性能越差,性能差的原因是因为要回表,既然已经找到了问题那么我们只需要减少回表的次数就可以提升查询性能了。

解决方案

既然覆盖索引可以防止数据回表,那么我们可以先查出来主键id(主键索引),然后将查出来的数据作为临时表然后 JOIN 原表就可以了,这样只需要对查询出来的5条结果进行数据回表,大幅减少了IO操作。

优化前后性能对比

我们看下执行效果:

  • 优化前:1.4s

  • 优化后:0.2s

查询耗时性能大幅提升。这样如果分页数据很大的话,也不会像普通的limit查询那样慢。

更多优秀的内容请关注公众号:一个程序员的成长

Guess you like

Origin juejin.im/post/7230979300828151865