reasons for the slow and limit query optimization [rpm]

select * from table where status = xx limit 10 offset 100000;

Under Paging scene, even if there is an index, limit the request will be very slow, in the amount of data that only 100,000 cases, stand-alone about 2-3 seconds

problem analysis

index

We know that MySQL index is b + tree . If it is a binary tree, because they can not know the number of the top 100 in the distribution of the tree, so I can not use binary characteristics are looking for. In the b + tree, the leaf nodes can list in order O(n)of complexity to find a large number of 100, but even O (n), nor so slow as to whether there are other reasons

(Author) through access to information that, InnoDB's index is divided into two

  • Clustered index: A primary key index and the corresponding actual data, the index leaf node is a data node, find the index will find a data
  • Secondary indexes: can be understood as two nodes, which is a leaf node or an index node that contains the primary key id, also you need to query the data again

Due to the stratification of MySQL, even if the former 100,000 will be thrown away, MySQL will be the primary key id on secondary indexes, go over and check data on clustered indexes, this is 100,000 random IO, naturally slow to Husky

Stratified

You need to understand a concept before about this, the logical operator. A brief look at some of the logical query plan operator

  • DataSource: Data source , which is our SQL statement table. select name from table1The table1
  • Join: connection , as select * from table1 table2 where table1.name = table2.nameit is the two tables do Join. Join condition is the simplest equivalent connection, of course, we know there are other inner join, left join, right joinetc.
  • Selection: select , as select name from table1 where id = 1the condition of the filter where
  • Aggregation: grouping, such as select sum(score) from table1 group by namethe group by. Some columns in accordance with the packet, the polymerization operation may be performed after a number of packets, such as Max, Min, Sum, Count, Average etc.
  • Projection:投影,指搜索的列,如select name from table1 where id = 1中的列name
  • Sort:排序,如select * from table1 order by id里面的order by。无序的数据通过这个算子处理后,输出有序的数据
  • Apply:子查询,如select * from (select id,name from table1) as t中的(select id,name from table1) as t。可以进行嵌套查询。

选择、投影、连接就是最基本的算子,其中 Join 有内连接,左外右外连接等多种连接方式

select b from t1, t2 where t1.c = t2.c and t1.a > 5

变成逻辑查询计划之后

  • t1 t2 对应的 DataSource,负责将数据捞上来
  • 上面接个 Join 算子,将两个表的结果按 t1.c = t2.c连接
  • 再按 t1.a > 5 做一个 Selection 过滤
  • 最后将 b 列投影

下图是未经优化的表示,所以说不是mysql不想把limit传递给引擎层,而是因为划分了逻辑算子,所以导致无法直到具体算子包含了多少符合条件的数据

ps:SELECT执行顺序

SQL语句执行顺序 MySQL执行顺序
1 select distinct from
2 from on
3 join join
4 on where
5 where group by
6 group by having+聚合函数
7 having select distinct
8 union union
9 order by order by
10 limit limit

解决方法

解决方法有2种

  • 根据业务实际需求,看能否替换为下一页,上一页的功能,特别在移动端。把limit替换成>id的方式。该id再调用时,需要返回给前端。但是这种有些业务场景不适用
select * from table where status = xx id > 100000 limit 10;
  • 【推荐】嵌套子查询,先查找数据的主键值,因为主键在辅助索引上就有,所以不用回归到聚簇索引的磁盘去拉取。再通过这些已经被limit出来的10个主键id,去查询聚簇索引。这样只会十次随机IO。在业务确实需要用分页的情况下,使用该方案可以大幅度提高性能。通常能满足性能要求
select xxx from in (select id from table where status = xx limit 10 offset 100000);

原文出处,删除作者经历和感想部分,如原作者觉得此文不妥,我会修改

作者:叶不闻
链接:https://juejin.im/post/5c4db295e51d4503834d9c43
来源:掘金
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

逻辑算子部分引用了

作者:叁金
链接:http://www.imooc.com/article/278660
来源:慕课网

Guess you like

Origin www.cnblogs.com/n031/p/12014428.html