High Performance MySQL (4): Query Performance Optimization

First, how to optimize queries? Detailed analysis of the answers Questions below

Not all tables must be optimized, it is generally optimized by the architect with one or two people to do guidance, open a systematic guidance to be carried out every week.

- Optimizing Data Access

  • a, a request to the database to check whether the unnecessary data: for example, does not require recording, return all of the columns associated with multi-table, always remove all of the columns, the same data repeated queries;
  • b, check whether additional scans record: for example, the number of lines scanned and the number of rows returned vary widely, can optimize it (how optimization, see below);
  • c, using the index

- Reconstruction of query

  • a, segmentation query: large queries can be divided into several small ideas queries, such as may be used to delete a lot of old data;
  • B, decomposition relational query: split association database query, a query for a single table for each table, the association code, there are many benefits (what benefits, see below)

- use some middleware

  • a, such as MyCat separate read and write, sub-sub-table library plug-in (to see the author continued to read the article);
  • b、

Second, slow query basis: Optimizing Data Access

The most basic reason is that too many query performance hi to access the data, but can be inefficient query by analysis of the following two steps:

  • If it requested unwanted data to the database, means access to too much or too many rows of columns;
  • Whether MySQL scanning lines much more than needed to scan additional records.

2.1, if the requested data to the database unwanted 

(1) query unwanted records

Mistaken MySQL returns only the data you need, in fact, MySQL is to return the entire result set is then calculated . For instance using a SELECT statement queries a large number of results, and then get off the front of the result set of N lines (such as querying data show only 1000 page 10), the most effective way is to query the back plus the LIMIT .

(2) returns all of the columns associated with multi-table

(3) always remove all of the columns

SELECT *, the optimizer will not complete the scan index covering such optimization , it will also bring additional performance overhead.

(4) the same data is repeated queries

Repeat the same query, returns each time exactly the same data. For example: In the local user comments need to query the user avatar URL , this URL may repeatedly query the data when the user repeatedly comment; resolve: when the initial query to the data cached, and then take from the cache when needed.

2.2, MySQL whether additional scanning records

In the query returns only the data you need, it is next considered whether to scan the query results are returned much data, measured by three indicators query cost, these three indicators will be recorded in the log MySQL's slow, so slow logging check is to find an excessive number of scanning lines query good way:

  • Response time;
  • The number of scan lines;
  • The number of rows returned.

(1) Response time

Response time refers to the sum of two parts and: service time and queuing time. Service time - database query processing really spent much time; queuing time - because the server resources without really waiting for some execution time of the query (such as such as I / O operation is complete, such as row locks ...), but Generally it is difficult to determine whether each taken together are both determined response time.

(2) Number of lines scanned and returned

The number of lines scanned under ideal conditions and number of rows returned should be the same, but in fact very difficult to get them close to the best.

Access Type:

MySQL There are several ways to find and return line result, the speed from slow to fast: full table scan, index scan, scan range, the only index query, constant index. If the best way is to add the appropriate access type you can not find a suitable index, an index so that MySQL in the most efficient and least number of scanning lines need to find a way to record .

 

If you find that the query needs to scan large amounts of data, but only a small number of rows returned, it can be optimized:

a, using an index scan covering: the need to use all of the columns in the index, so the storage engine do not need to get back to the table corresponding row can return the results;

b, changing the database table structures: for example, using a separate summary;

c, complex query rewrite this: let MySQL optimizer to execute the query in a more optimal way.

 

Third, the reconstruction of query

3.1, choose a more complex query or a simple query

3.2, segmentation query (DLETE)

Sometimes for a large query can be cut into small queries, each query function exactly the same, only completed a small part, a small part of a time to return query results.

Example: Delete the old data, or false flag to 0 to delete data

If a big-time statement may be required to complete a one-time lock a lot of data, it occupies the entire transaction log, system resources, blocked a number of small but important queries, a large DELETE segmentation, such as the need to run a month once the following statement:

If the data are deleted after each rest a can then delete the original server to delete the one-time pressure dispersed to a very long period of time, you can reduce lock time of deletion.

3.3, the decomposition associated with the query

Each table is to conduct a single-table query, and then correlate the results of the application code. E.g:

(Provided that these tables are large, and overall very slow makes sense)

Decomposition benefits associated with the query:

a, so that higher cache efficiency: MySQL query cache if the associated changes in a table, you can not use the query cache ; if after splitting a table rarely change, based on the query table can be repeated results using the query cache; (e.g. in the above example: if the tag is already cached application may skip the first query)

b, to reduce lock contention: performing a single query can reduce lock contention after query decomposition;

C, improved scalability : Correlative code layer, is easier to split the database;

d, improve the efficiency of the query itself: the above example uses the IN () instead of the relational query , allowing a query ID order MySQL, more efficient than random association;

e, can reduce redundancy record query : layer of code to do certain records associated only need to query a database layer might do some records related to check several times;

 

 

 

Fourth, the basic query execution

MySQL execution of a query:

(1) protocol: the client sends a query to the server;

(2) Check status: check the server cache, if the cache hit, then immediately return the results stored in the cache. Otherwise, proceed to the next stage;

(3) the query cache: parsing SQL server side, pretreatment, and then the corresponding execution plan generated by the optimizer;

(4)优化器:MySQL根据优化器生成的执行计划,调用存储引擎的API来执行查询;

(5)将结果返回给客户端

4.1、客户端/服务端通信协议

4.2、查询状态

等待、查询、锁定、排序

4.3、查询缓存

解析一个查询语句之前,如果查询缓存打开则优先检查是否命中缓存,这个检查是通过一个对大小写敏感的哈希查找实现的。

4.4、优化查询处理

(1)语法解析器和预处理

MySQL通过关键字将SQL语句进行解析,并生成一颗对应的“解析树”。

(2)查询优化器

下面是MySQL能够处理的优化类型:

  • 重新定义关联表的顺序;
  • 将外连接转化为内连接;
  • 使用等价变换规则;
  • 优化COUNT()、MIN()、MAX();
  • 预估转化为常数表达式;
  • 覆盖索引扫描;
  • 子查询优化;
  • 提前终止查询;
  • 等值转播;
  • 列表IN()的比较。

4.5、查询执行引擎

4.6、将 结果返回给客户端

 

五、MySQL查询优化器的局限性

5.1、关联子查询

虽然IN()通常能提高效率,但是最糟糕的一类子查询却是WHERE条件中包含IN()的子查询。

子查询可能会提高性能也可能影响性能。

5.2、UNION的局限性

使用UNION无法将限制条件从外层“下推“”到内层,这使得原本能够限制部分返回结果的条件无法应用到内层查询优化上。

比如希望UNION的各个字句能够根据LIMIT只取部分结果集,或者先排序好再合并结果集,则每个UNION的表数据都放在临时表空间然后再取出条件的数据,但是从临时表空间取出来的数据是无序的,所以在外层还需要加一个ORDER BY和LIMIT操作

5.3、索引合并优化

WHERE字句包含多个复杂条件的时候,MySQL能够访问单个表的多个索引以合并和交叉过滤的方式来定位需要查找的行。

5.4、等值传递

5.5、并行执行

MySQL没有并行执行查询的功能

5.6、哈希关联

5.7、松散索引扫描

5.8、最大值和最小值优化

5.9、在同一个表上查询和更新

MySQL不允许同时在一张报表进行更新和查询

六、优化特定类型的查询

6.1、优化COUNT()查询

(1)COUNT ()聚合函数的作用

作用1:可以统计某个列值的函数

作用2:也可以统计行数

(2)简单的优化:

假设上面那个查询会扫描5000行数据,由于在MyISAM中COUNT()函数非常快,前提是没有任何WHERE条件的COUNT(*)才非常快,需要优化成只需要扫描5行数据:

(3)复杂的优化:

通常使用COUNT需要扫描大量的行,除了上面的简单优化,还可以使用索引覆盖扫描。

6.2、优化关联查询

(1)确保ON子句的列上有索引

创建索引时就要考虑关联的顺序,当表A和表B用c列关联的时候,如果优化器的关联顺序是B、A,那么就不需要在B表的c列上创建索引了,没有用到的索引会带来额外的负担。总的来说只需要在关联顺序中的第二个表的相应列创建索引。

(2)确保任何的GROUP BY 和ORDER BY中的表达式只涉及到一个表中的列,这样MySQL才能使用索引来优化这个过程;

6.3、优化GROUP BY

当无法使用索引时,GROUP BY使用两种策略来完成:使用临时表或者文件排序来做分组

优化GROUP BY WITH ROLLUP:

这是分组的一种变种——对返回的分组结果再做一次超级聚合最好不要使用它,可以把它实现的功能放在代码中实现

6.4、优化LIMIT分页

分页可以使用LIMIT+偏移量实现(LIMIT 偏移量,返回页数),同时加上合适的ORDER BY,如果有索引效率还可以没有则需要做大量文件排序操作。

但是当偏移量非常大时要优化:

(比如LIMIT 1000,20,则需要查询10020条记录但是只返回最后20条,前面10000条记录都被抛弃掉)

(1)使用索引覆盖扫描,而不是查询所有的列,然后根据需要做一次关联操作再返回所需的列

可改成:

6.5、优化UNION查询

MySQL总是通过创建并填充临时表的方式来执行UNION,上面也有提到过,所以很多优化策略在UNION都没得使用,经常需要手工地将WHERE、LIMIT、ORDER BY等子句“下推”到UNION的各个子查询中,以便里边优化器能更充分利用这些条件进行优化。

如何优化:

除非明确需要消除重复的行,否则一定要使用UNION ALL,如果没有ALL关键字,MySQL会给临时表加上DISTINCT选项,这会导致对整个临时表的数据做唯一性检查,很费时

 

七、查询优化案例学习

7.1、使用MySQL构建一个队列表

(1)背景:一个表包含多种类型的记录,高流量高并发情况下的,比如未处理、已处理、正在处理等,当多线程在表中查找未处理记录时,然后生成正在处理,在处理完后再将记录更新程已处理状态。比如邮件发送、多命令处理、评论修改等功能

(2)这种表设计不合理的两个原因

a、随着队列越来越大、索引深度正价,找到未处理记录的速度会随之变慢。(可以通过将队列表分为两部分解决,将已处理记录归档或者存放到历史表,这样可以保证队列很小);

b、一般的处理分为两步,是先找到未处理的记录、然后加锁,找到服务会增加服务器压力,而加锁会让各个消费者进程增加竞争;

(3)要解决的问题

如何让消费者标记正在处理的记录,而不至于让多个消费者重复处理一个记录

 

 

 

 

 

上一篇:https://blog.csdn.net/RuiKe1400360107/article/details/103783635

下一篇:https://blog.csdn.net/RuiKe1400360107/article/details/103963493

  参考资料:《高性能MySQL 第三版》

### 若对你有帮助的话,欢迎点赞!评论!+关注!

 

发布了52 篇原创文章 · 获赞 116 · 访问量 5万+

Guess you like

Origin blog.csdn.net/RuiKe1400360107/article/details/103963462