mysql optimize large data paging

First, the use of large amount of data mysql limit tab, as the page number increases, the search efficiency is low.

1. Direct with limit start, count page statement, which I used in the method of the program:

select * from product limit start, count
as the start page is small, there is no query performance issues, we look respectively from 10, 100, 1000, 10000 execution time to start paging (page take 20), as follows:

select * from product limit 10, 20   0.016秒
select * from product limit 100, 20   0.016秒
select * from product limit 1000, 20   0.047秒
select * from product limit 10000, 20   0.094秒

We have seen that with the increase of the initial recording, time also with the increase, indicating pagination statement limit with the starting page number is a great relationship , then we start recording the change 40w facie (ie record about half)       

 select * from product limit 400000, 20   3.229秒

We take a look at the record of the last time
the SELECT  *  from Product limit 866 613, 20 37.44 Miao

No wonder the search engines crawl the page we often report a timeout, as this page's largest PAGE Obviously, this time
between is intolerable.

From which we can summarize two things:
  1) query time limit statements to the position of the start of the recording is proportional to
  2) mysql the limit statement is very convenient, but many of the records of the table are not suitable for direct use.

2. Performance optimization problem to limit pagination

Covering the use of the index table to speed up query paging
we all know, the use of the index query if the statement contains only the index column (covering indexes), then this situation will soon queries.

Because the use of the index Finding optimization algorithms and index data in the query above, do not have to go to address the relevant data, this saves a lot of time. In addition Mysql is also related to the index cache, at a time of high concurrency better use of caching effects.

In our example, we know that the id field is the primary key, naturally contains the default primary key index. Now let's see how to use a covering index query results:

The last page of the query data (covering the use of the index contains only the id column), as follows among us:
the SELECT  id  from Product limit 866 613, 20 0.2 Miao
relative to query all the columns of 37.44 seconds to improve by about 100 times speed

So if we have to query all the columns, there are two ways,

(1) A is the id> = form, it is to use another join, look at the actual situation:

  SELECT * FROM product WHERE ID> = (select id from product limit 866613, 1) limit 20
  query time is 0.2 seconds, and is simply a qualitative leap ah, ha

(2) another way
  SELECT * FROM product a JOIN (select id from product limit 866613, 20) b ON a.ID = b.id
  query time is very short, like!

In fact both a principle with the place, so the effect is almost

 

Second, the coverage index

1. Definition:

  (1) values ​​for all the fields need, if an index containing (or cover), is called "covering index '. That simply scan the index without having to return to the table.

  (2) without the need to scan only the index back to the table of advantages:
        1) the index entry typically much smaller than the size of the data line, only need to read the index, the mysql greatly reduce the amount of data access.
        2) Because the index values are stored sequentially in columns, so the range for the intensive IO IO lookup reads each line of data from the disk is much less than random.
        3) The number of storage engine myisam index only cached in memory, depends on the data cache to the operating system, and therefore need to access the data system call
        . 4 innodb clustered index), particularly useful for covering index innodb table. (InnoDB secondary index is stored in leaf nodes of the primary key of the row, so if two queries to cover the primary key, the primary key index avoided secondary query)
    (3) must cover index stores the index value of the column and the value of the hash index, spatial indexing and full-text indexes do not store the index column, so mysql can only be covered by the index B-tree index .

      When a query is initiated coverage index (also called the index covers the query), in the Extra column of EXPLAIN information see "Using index" of

  

2, experimental verification

Table Structure

More than 150 million data, so a simple statement:

Slow query log a lot actually took 1 second, the result Explain that:

Explain can be seen from the results of the query index has been used, but why so slow?

Analysis: First of all, the use of the ORDER BY statement Using filesort file sorting, low query efficiency; secondly, query on the field is not in the index, covering index is not used, you need to query by the index back to the table; there are reasons for data distribution.

Know the reason, then the problem is like solved.

Solution: Because only query uid field, add a joint index can be avoided back to the table and sort the files by using covering indexes improve query speed, while taking advantage of indexing is complete sequencing.

Covering indexes: SQL can only return query data required by the index, primary key found after having to go through a secondary index to query the data.

Explain once again we see:

Extra information has been 'Using Index', it said that it has to use a covering index. After the index optimization, query line is not substantially more than 0.001 seconds.

 

Part from: https: //www.cnblogs.com/lpfuture/p/5772055.html

1. Direct with limit start, count page statement, which I used in the method of the program:

select * from product limit start, count
as the start page is small, there is no query performance issues, we look respectively from 10, 100, 1000, 10000 execution time to start paging (page take 20), as follows:

select * from product limit 10, 20   0.016秒
select * from product limit 100, 20   0.016秒
select * from product limit 1000, 20   0.047秒
select * from product limit 10000, 20   0.094秒

We have seen that with the increase of the initial recording, time also with the increase, indicating pagination statement limit with the starting page number is a great relationship , then we start recording the change 40w facie (ie record about half)       

 select * from product limit 400000, 20   3.229秒

We take a look at the record of the last time
the SELECT  *  from Product limit 866 613, 20 37.44 Miao

难怪搜索引擎抓取我们页面的时候经常会报超时,像这种分页最大的页码页显然这种时
间是无法忍受的。

从中我们也能总结出两件事情:
  1)limit语句的查询时间与起始记录的位置成正比
  2)mysql的limit语句是很方便,但是对记录很多的表并不适合直接使用。

2.   对limit分页问题的性能优化方法

利用表的覆盖索引来加速分页查询
我们都知道,利用了索引查询的语句中如果只包含了那个索引列(覆盖索引),那么这种情况会查询很快。

因为利用索引查找有优化算法,且数据就在查询索引上面,不用再去找相关的数据地址了,这样节省了很多时间。另外Mysql中也有相关的索引缓存,在并发高的时候利用缓存就效果更好了。

在我们的例子中,我们知道id字段是主键,自然就包含了默认的主键索引。现在让我们看看利用覆盖索引的查询效果如何:

这次我们之间查询最后一页的数据(利用覆盖索引,只包含id列),如下:
select id from product limit 866613, 20 0.2秒
相对于查询了所有列的37.44秒,提升了大概100多倍的速度

那么如果我们也要查询所有列,有两种方法,

(1)一种是id>=的形式,另一种就是利用join,看下实际情况:

  SELECT * FROM product WHERE ID > =(select id from product limit 866613, 1) limit 20
  查询时间为0.2秒,简直是一个质的飞跃啊,哈哈

(2)另一种写法
  SELECT * FROM product a JOIN (select id from product limit 866613, 20) b ON a.ID = b.id
  查询时间也很短,赞!

其实两者用的都是一个原理嘛,所以效果也差不多

 

二、覆盖索引

1、定义:

  (1)如果一个索引包含(或覆盖)所有需要查询的字段的值,称为‘覆盖索引’。即只需扫描索引而无须回表。

  (2)只扫描索引而无需回表的优点:
        1)索引条目通常远小于数据行大小,只需要读取索引,则mysql会极大地减少数据访问量。
        2)因为索引是按照列值顺序存储的,所以对于IO密集的范围查找会比随机从磁盘读取每一行数据的IO少很多。
        3)一些存储引擎如myisam在内存中只缓存索引,数据则依赖于操作系统来缓存,因此要访问数据需要一次系统调用
        4)innodb的聚簇索引,覆盖索引对innodb表特别有用。(innodb的二级索引在叶子节点中保存了行的主键值,所以如果二级主键能够覆盖查询,则可以避免对主键索引的二次查询)
    (3)覆盖索引必须要存储索引列的值,而哈希索引、空间索引和全文索引不存储索引列的值,所以mysql只能用B-tree索引做覆盖索引

      当发起一个被索引覆盖的查询(也叫作索引覆盖查询)时,在EXPLAIN的Extra列可以看到“Using index”的信息

  

2、实验验证

表结构

150多万的数据,这么一个简单的语句:

慢查询日志里居然很多用了1秒的,Explain的结果是:

从Explain的结果可以看出,查询已经使用了索引,但为什么还这么慢?

分析:首先,该语句ORDER BY 使用了Using filesort文件排序,查询效率低;其次,查询字段不在索引上,没有使用覆盖索引,需要通过索引回表查询;也有数据分布的原因。

知道了原因,那么问题就好解决了。

解决方案:由于只需查询uid字段,添加一个联合索引便可以避免回表和文件排序,利用覆盖索引提升查询速度,同时利用索引完成排序。

覆盖索引:SQL只需要通过索引就可以返回查询所需要的数据,而不必通过二级索引查到主键之后再去查询数据。

我们再Explain看一次:

Extra信息已经有'Using Index',表示已经使用了覆盖索引。经过索引优化之后,线上的查询基本不超过0.001秒。

 

部分内容来自于:https://www.cnblogs.com/lpfuture/p/5772055.html

Guess you like

Origin www.cnblogs.com/ivy-zheng/p/10994215.html