Eight kinds often overlooked use SQL error, you have not stepped on a pit?

1, LIMIT statement

Paging query is one of the most frequently used scene, but also usually the most problematic areas. For simple statement such as the following, the general approach is to think of DBA on the type, name, create_time field plus a composite index. Such conditions can be effectively used to sort the index, to elevate performance.

SELECT * 
FROM   operation 
WHERE  type = 'SQLStats' 
       AND name = 'SlowLog' 
ORDER  BY create_time 
LIMIT  1000, 10;

Okay, maybe more than 90% of the DBA to resolve the problem so far. But when the LIMIT clause becomes "LIMIT 1000000,10", programmers will still complain: I only get 10 records Why or slow?

To know the database does not know the article 1 million records from where to start, even if there is also need to calculate the index from scratch again. This performance problems, in most cases is a lazy programmer.

Browse the page data in the front end, or a large batch of data to export other scenes, the maximum value may be as a previous parameter as a query. SQL redesigned as follows:

SELECT   * 
FROM     operation 
WHERE    type = 'SQLStats' 
AND      name = 'SlowLog' 
AND      create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;

The basic query time fixed in the new design, not with the amount of data grows and changes.

2, implicit conversion

SQL statement query variables and fields defined type mismatch is another common mistake. For example, the following statement:

mysql> explain extended SELECT * 
     > FROM   my_balance b 
     > WHERE  b.bpn = 14000000123 
     >       AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'

Wherein bpn defined fields as varchar (20), MySQL, then the strategy comparison string after converting to digital. Function applied to the table fields, indices failure.

The above application framework may be filled automatically parameter rather than the programmer's intent. Now a lot of very complicated application framework, easy to use but also be careful it might give yourself digging.

3, the association update, delete

Although MySQL5.6 the introduction of physical and chemical properties, but it currently only requires special attention for optimizing the query statement. For the need to manually update or delete rewritten JOIN.

For example, the following UPDATE statement, MySQL is the actual implementation cycles / nested subqueries (DEPENDENT SUBQUERY), the execution time can be imagined.

UPDATE operation o 
SET    status = 'applying' 
WHERE  o.id IN (SELECT id 
                FROM   (SELECT o.id, 
                               o.status 
                        FROM   operation o 
                        WHERE  o.group = 123 
                               AND o.status NOT IN ( 'done' ) 
                        ORDER  BY o.parent, 
                                  o.id 
                        LIMIT  1) t);


Implementation plan:

+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type        | table | type  | possible_keys | key     | key_len | ref   | rows | Extra                                               |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1  | PRIMARY            | o     | index |               | PRIMARY | 8       |       | 24   | Using where; Using temporary                        |
| 2  | DEPENDENT SUBQUERY |       |       |               |         |         |       |      | Impossible WHERE noticed after reading const tables |
| 3  | DERIVED            | o     | ref   | idx_2,idx_5   | idx_5   | 8       | const | 1    | Using where; Using filesort                         |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+

After rewriting is JOIN, the selection mode from the subquery becomes DERIVED DEPENDENT SUBQUERY, significantly quicker, decreased from 7 seconds to 2 ms

UPDATE operation o 
       JOIN  (SELECT o.id, 
                            o.status 
                     FROM   operation o 
                     WHERE  o.group = 123 
                            AND o.status NOT IN ( 'done' ) 
                     ORDER  BY o.parent, 
                               o.id 
                     LIMIT  1) t
         ON o.id = t.id 
SET    status = 'applying' 

Implementation plan reduces to:

+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                                               |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1  | PRIMARY     |       |      |               |       |         |       |      | Impossible WHERE noticed after reading const tables |
| 2  | DERIVED     | o     | ref  | idx_2,idx_5   | idx_5 | 8       | const | 1    | Using where; Using filesort                         |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+

4, mixed sorting

MySQL can not use sort mix index. However, in some scenarios, there are opportunities to use a special method to enhance performance.

SELECT * 
FROM   my_order o 
       INNER JOIN my_appraise a ON a.orderid = o.id 
ORDER  BY a.is_reply ASC, 
          a.appraise_time DESC 
LIMIT  0, 20 


Show Execution Plan for the full table scan:

+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| id | select_type | table | type   | possible_keys     | key     | key_len | ref      | rows    | Extra    
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
|  1 | SIMPLE      | a     | ALL    | idx_orderid | NULL    | NULL    | NULL    | 1967647 | Using filesort |
|  1 | SIMPLE      | o     | eq_ref | PRIMARY     | PRIMARY | 122     | a.orderid |       1 | NULL           |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+

由于 is_reply 只有 0 和 1 两种状态,我们按照下面的方法重写后,执行时间从 1.58 秒降低到 2 毫秒。

SELECT * 
FROM   ((SELECT *
         FROM   my_order o 
                INNER JOIN my_appraise a 
                        ON a.orderid = o.id 
                           AND is_reply = 0 
         ORDER  BY appraise_time DESC 
         LIMIT  0, 20) 
        UNION ALL 
        (SELECT *
         FROM   my_order o 
                INNER JOIN my_appraise a 
                        ON a.orderid = o.id 
                           AND is_reply = 1 
         ORDER  BY appraise_time DESC 
         LIMIT  0, 20)) t 
ORDER  BY  is_reply ASC, 
          appraisetime DESC 
LIMIT  20;

5、EXISTS语句

MySQL 对待 EXISTS 子句时,仍然采用嵌套子查询的执行方式。如下面的 SQL 语句:

SELECT *
FROM   my_neighbor n 
       LEFT JOIN my_neighbor_apply sra 
              ON n.id = sra.neighbor_id 
                 AND sra.user_id = 'xxx' 
WHERE  n.topic_status < 4 
       AND EXISTS(SELECT 1 
                  FROM   message_info m 
                  WHERE  n.id = m.neighbor_id 
                         AND m.inuser = 'xxx') 
       AND n.topic_type <> 5 

执行计划为:

+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type        | table | type | possible_keys     | key   | key_len | ref   | rows    | Extra   |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
|  1 | PRIMARY            | n     | ALL  |  | NULL     | NULL    | NULL  | 1086041 | Using where                   |
|  1 | PRIMARY            | sra   | ref  |  | idx_user_id | 123     | const |       1 | Using where          |
|  2 | DEPENDENT SUBQUERY | m     | ref  |  | idx_message_info   | 122     | const |       1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+ 

去掉 exists 更改为 join,能够避免嵌套子查询,将执行时间从 1.93 秒降低为 1 毫秒。

SELECT *
FROM   my_neighbor n 
       INNER JOIN message_info m 
               ON n.id = m.neighbor_id 
                  AND m.inuser = 'xxx' 
       LEFT JOIN my_neighbor_apply sra 
              ON n.id = sra.neighbor_id 
                 AND sra.user_id = 'xxx' 
WHERE  n.topic_status < 4 
       AND n.topic_type <> 5 

新的执行计划:

+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type   | possible_keys     | key       | key_len | ref   | rows | Extra                 |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
|  1 | SIMPLE      | m     | ref    | | idx_message_info   | 122     | const    |    1 | Using index condition |
|  1 | SIMPLE      | n     | eq_ref | | PRIMARY   | 122     | ighbor_id |    1 | Using where      |
|  1 | SIMPLE      | sra   | ref    | | idx_user_id | 123     | const     |    1 | Using where           |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+

6、条件下推

外部查询条件不能够下推到复杂的视图或子查询的情况有:

  • 聚合子查询;

  • 含有 LIMIT 的子查询;

  • UNION 或 UNION ALL 子查询;

  • 输出字段中的子查询;

如下面的语句,从执行计划可以看出其条件作用于聚合子查询之后

 SELECT * 
FROM   (SELECT target, 
               Count(*) 
        FROM   operation 
        GROUP  BY target) t 
WHERE  target = 'rm-xxxx' 
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table      | type  | possible_keys | key         | key_len | ref   | rows | Extra       |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
|  1 | PRIMARY     | <derived2> | ref   | <auto_key0>   | <auto_key0> | 514     | const |    2 | Using where |
|  2 | DERIVED     | operation  | index | idx_4         | idx_4       | 519     | NULL  |   20 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+

确定从语义上查询条件可以直接下推后,重写如下:

SELECT target, 
       Count(*) 
FROM   operation 
WHERE  target = 'rm-xxxx' 
GROUP  BY target

执行计划变为:

+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+


7、提前缩小范围

先上初始 SQL 语句:

SELECT * 
FROM   my_order o 
       LEFT JOIN my_userinfo u 
              ON o.uid = u.uid
       LEFT JOIN my_productinfo p 
              ON o.pid = p.pid 
WHERE  ( o.display = 0 ) 
       AND ( o.ostaus = 1 ) 
ORDER  BY o.selltime DESC 
LIMIT  0, 15 

该 SQL 语句原意是:先做一系列的左连接,然后排序取前 15 条记录。从执行计划也可以看出,最后一步估算排序记录数为 90 万,时间消耗为 12 秒。

+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref             | rows   | Extra                                              |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
|  1 | SIMPLE      | o     | ALL    | NULL          | NULL    | NULL    | NULL            | 909119 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | o.uid |      1 | NULL                                               |
|  1 | SIMPLE      | p     | ALL    | PRIMARY       | NULL    | NULL    | NULL            |      6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+

由于最后 WHERE 条件以及排序均针对最左主表,因此可以先对 my_order 排序提前缩小数据量再做左连接。SQL 重写后如下,执行时间缩小为 1 毫秒左右。

SELECT * 
FROM (
SELECT * 
FROM   my_order o 
WHERE  ( o.display = 0 ) 
       AND ( o.ostaus = 1 ) 
ORDER  BY o.selltime DESC 
LIMIT  0, 15
) o 
     LEFT JOIN my_userinfo u 
              ON o.uid = u.uid 
     LEFT JOIN my_productinfo p 
              ON o.pid = p.pid 
ORDER BY  o.selltime DESC
limit 0, 15

再检查执行计划:子查询物化后(select_type=DERIVED) 参与 JOIN。虽然估算行扫描仍然为 90 万,但是利用了索引以及 LIMIT 子句后,实际执行时间变得很小。

+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref   | rows   | Extra                                              |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL  |     15 | Using temporary; Using filesort                    |
|  1 | PRIMARY     | u          | eq_ref | PRIMARY       | PRIMARY | 4       | o.uid |      1 | NULL                                               |
|  1 | PRIMARY     | p          | ALL    | PRIMARY       | NULL    | NULL    | NULL  |      6 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | o          | index  | NULL          | idx_1   | 5       | NULL  | 909112 | Using where                                        |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+

8、中间结果集下推

再来看下面这个已经初步优化过的例子 (左连接中的主表优先作用查询条件):

SELECT    a.*, 
          c.allocated 
FROM      ( 
              SELECT   resourceid 
              FROM     my_distribute d 
                   WHERE    isdelete = 0 
                   AND      cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a 
LEFT JOIN 
          ( 
              SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated 
              FROM     my_resources 
                   GROUP BY resourcesid) c 
ON        a.resourceid = c.resourcesid

那么该语句还存在其它问题吗?不难看出子查询 c 是全表聚合查询,在表数量特别大的情况下会导致整个语句的性能下降。

其实对于子查询 c,左连接最后结果集只关心能和主表 resourceid 能匹配的数据。因此我们可以重写语句如下,执行时间从原来的 2 秒下降到 2 毫秒。

SELECT    a.*, 
          c.allocated 
FROM      ( 
                   SELECT   resourceid 
                   FROM     my_distribute d 
                   WHERE    isdelete = 0 
                   AND      cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a 
LEFT JOIN 
          ( 
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated 
                   FROM     my_resources r, 
                            ( 
                                     SELECT   resourceid 
                                     FROM     my_distribute d 
                                     WHERE    isdelete = 0 
                                     AND      cusmanagercode = '1234567' 
                                     ORDER BY salecode limit 20) a 
                   WHERE    r.resourcesid = a.resourcesid 
                   GROUP BY resourcesid) c 
ON        a.resourceid = c.resourcesid

但是子查询 a 在我们的 SQL 语句中出现了多次。这种写法不仅存在额外的开销,还使得整个语句显的繁杂。使用 WITH 语句再次重写:

WITH a AS 
( 
         SELECT   resourceid 
         FROM     my_distribute d 
         WHERE    isdelete = 0 
         AND      cusmanagercode = '1234567' 
         ORDER BY salecode limit 20)
SELECT    a.*, 
          c.allocated 
FROM      a 
LEFT JOIN 
          ( 
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated 
                   FROM     my_resources r, 
                            a 
                   WHERE    r.resourcesid = a.resourcesid 
                   GROUP BY resourcesid) c 
ON        a.resourceid = c.resourcesid

数据库编译器产生执行计划,决定着 SQL 的实际执行方式。但是编译器只是尽力服务,所有数据库的编译器都不是尽善尽美的。

上述提到的多数场景,在其它数据库中也存在性能问题。了解数据库编译器的特性,才能避规其短处,写出高性能的 SQL 语句。

程序员在设计数据模型以及编写 SQL 语句时,要把算法的思想或意识带进来。

编写复杂 SQL 语句要养成使用 WITH 语句的习惯。简洁且思路清晰的 SQL 语句也能减小数据库的负担 。


“不积跬步,无以至千里”,希望未来的你能:有梦为马 随处可栖!加油,少年!

关注公众号:「Java 知己」,每天更新Java知识哦,期待你的到来!

  • 发送「Group」,与 10 万程序员一起进步。
  • 发送「面试」,领取BATJ面试资料、面试视频攻略。
  • 发送「玩转算法」,领取《玩转算法》系列视频教程。
  • 千万不要发送「1024」...
    每日福利

Guess you like

Origin www.cnblogs.com/java-friend/p/11881539.html