8 kinds of SQL writing methods designed to pit colleagues, let's try

8 kinds of SQL writing methods designed to pit colleagues, let's try

Today, I will share with you some common "bad problems" and optimization techniques of SQL.

  • [External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-2UlcZEhP-1680691534083) (8%E7%A7%8D%20%E4%B8%93%E5%9D %91%20%E5%90%8C%E4%BA%8B%E7%9A%84SQL%E5%86%99%E6%B3%95%EF%BC%8C%E6%9D%A5%E8%AF %95%E8%AF%95%E5%90%A7.assets/640.jpg)]

LIMIT statement

  • Pagination query is one of the most commonly used scenarios, but it is also usually the place where problems are most likely to occur. For example, for the following simple statement, the general DBA thinks of adding a composite index to the type, name, and create_time fields. In this way, conditional sorting can effectively use the index, and the performance is rapidly improved.

SELECT *
FROM operation
WHERE type = ‘SQLStats’
AND name = ‘SlowLog’
ORDER BY create_time
LIMIT 1000, 10;


- 好吧,可能90%以上的 DBA 解决该问题就到此为止。但当 LIMIT 子句变成 “LIMIT 1000000,10” 时,程序员仍然会抱怨:我只取10条记录为什么还是慢?

- 要知道数据库也并不知道第1000000条记录从什么地方开始,即使有索引也需要从头计算一次。出现这种性能问题,多数情形下是程序员偷懒了。

- 在前端数据浏览翻页,或者大数据分批导出等场景下,是可以将上一页的最大值当成参数作为查询条件的。SQL 重新设计如下:

- ```
SELECT   *
FROM     operation
WHERE   type = 'SQLStats'
AND     name = 'SlowLog'
AND     create_time > '2017-03-16 14:00:00'
ORDER BY create_time limit 10;
  • Under the new design, the query time is basically fixed and will not change as the amount of data grows.

implicit conversion

  • Type mismatch between query variable and field definition in SQL statement is another common mistake. For example the following statement:

  • mysql> explain extended SELECT *
         > FROM   my_balance b
         > WHERE b.bpn = 14000000123
         >       AND b.isverified IS NULL ;
    mysql> show warnings;
    | Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'
    
  • Among them, the field bpn is defined as varchar(20), and MySQL's strategy is to convert strings into numbers before comparison. The function acts on the table field, and the index is invalid.

  • The above cases may be auto-filled parameters of the application framework, rather than the original intention of the programmer. Nowadays, there are many application frameworks that are very complicated. While it is convenient to use, be careful that it may dig a hole for yourself.

Association update, delete

  • Although MySQL5.6 introduces the materialization feature, special attention needs to be paid to its current optimization only for query statements. For updating or deleting, it needs to be manually rewritten as JOIN.

  • For example, in the following UPDATE statement, MySQL actually executes a loop/nested subquery (DEPENDENT SUBQUERY), and its execution time can be imagined.

  • UPDATE operation o
    SET   status = 'applying'
    WHERE  o.id IN (SELECT id
                   FROM   (SELECT o.id,
                                   o.status
                           FROM   operation o
                           WHERE  o.group = 123
                                   AND o.status NOT IN ( 'done' )
                           ORDER  BY o.parent,
                                     o.id
                           LIMIT  1) t);
    
  • Implementation plan:

  • +----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
    | id | select_type       | table | type | possible_keys | key     | key_len | ref   | rows | Extra                                               |
    +----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
    | 1  | PRIMARY           | o     | index |               | PRIMARY | 8       |       | 24   | Using where; Using temporary                       |
    | 2 | DEPENDENT SUBQUERY |       |       |               |         |         |       |     | Impossible WHERE noticed after reading const tables |
    | 3  | DERIVED           | o     | ref   | idx_2,idx_5   | idx_5   | 8       | const | 1   | Using where; Using filesort                         |
    +----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
    
  • After rewriting to JOIN, the selection mode of the subquery changes from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly accelerated, from 7 seconds to 2 milliseconds.

  • UPDATE operation o
           JOIN  (SELECT o.id,
                               o.status
                         FROM   operation o
                         WHERE  o.group = 123
                               AND o.status NOT IN ( 'done' )
                         ORDER  BY o.parent,
                                   o.id
                         LIMIT  1) t
             ON o.id = t.id
    SET   status = 'applying'
    
  • The execution plan simplifies to:

  • +----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
    | id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                                               |
    +----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
    | 1  | PRIMARY     |       |     |               |       |         |       |     | Impossible WHERE noticed after reading const tables |
    | 2 | DERIVED     | o     | ref | idx_2,idx_5   | idx_5 | 8       | const | 1   | Using where; Using filesort                         |
    +----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
    

mixed sort

  • MySQL cannot use indexes for mixed sorting. But in some scenarios, there are still opportunities to use special methods to improve performance.

  • SELECT *
    FROM   my_order o
           INNER JOIN my_appraise a ON a.orderid = o.id
    ORDER  BY a.is_reply ASC,
             a.appraise_time DESC
    LIMIT  0, 20
    
  • The execution plan shows a full table scan:

  • +----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
    +----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
    | 1 | SIMPLE | a | ALL | idx_orderid | NULL | NULL | NULL | 1967647 | Using filesort |
    |  1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 122     | a.orderid |       1 | NULL |
    +----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+
    
  • Since is_reply only has two states of 0 and 1, after we rewrite it according to the following method, the execution time is reduced from 1.58 seconds to 2 milliseconds.

  • SELECT *
    FROM   ((SELECT *
             FROM   my_order o
                    INNER JOIN my_appraise a
                            ON a.orderid = o.id
                               AND is_reply = 0
             ORDER  BY appraise_time DESC
             LIMIT  0, 20)
            UNION ALL
            (SELECT *
             FROM   my_order o
                    INNER JOIN my_appraise a
                            ON a.orderid = o.id
                               AND is_reply = 1
             ORDER  BY appraise_time DESC
             LIMIT  0, 20)) t
    ORDER  BY  is_reply ASC,
              appraisetime DESC
    LIMIT  20;
    

EXISTS statement

  • When MySQL treats the EXISTS clause, it still uses the execution method of nested subqueries. Such as the following SQL statement:

  • SELECT *
    FROM   my_neighbor n
           LEFT JOIN my_neighbor_apply sra
                  ON n.id = sra.neighbor_id
                     AND sra.user_id = 'xxx'
    WHERE  n.topic_status < 4
           AND EXISTS(SELECT 1
                      FROM   message_info m
                      WHERE  n.id = m.neighbor_id
                             AND m.inuser = 'xxx')
           AND n.topic_type <> 5
    
  • The execution plan is:

  • +----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
    |  1 | PRIMARY | n | ALL |  | NULL | NULL | NULL | 1086041 | Using where |
    | 1 | PRIMARY | sra | ref | | idx_user_id | 123 | const | 1 | Using where |
    |  2 | DEPENDENT SUBQUERY | m | ref |  | idx_message_info | 122     | const |       1 | Using index condition; Using where |
    +----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
    
  • Remove exists and change to join, which can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.

  • SELECT *
    FROM   my_neighbor n
           INNER JOIN message_info m
                   ON n.id = m.neighbor_id
                      AND m.inuser = 'xxx'
           LEFT JOIN my_neighbor_apply sra
                  ON n.id = sra.neighbor_id
                     AND sra.user_id = 'xxx'
    WHERE  n.topic_status < 4
           AND n.topic_type <> 5
    
  • New execution plan:

  • +----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
    |  1 | SIMPLE | m | ref | | idx_message_info | 122     | const |    1 | Using index condition |
    | 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where |
    |  1 | SIMPLE | sra | ref | | idx_user_id | 123     | const |    1 | Using where |
    +----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
    

Conditional push down

  • Situations where external query conditions cannot be pushed down to complex views or subqueries include:

  • 1. Aggregation subquery;

  • 2. A subquery containing LIMIT;

  • 3. UNION or UNION ALL subqueries;

  • 4. Subquery in the output field;

  • As in the following statement, it can be seen from the execution plan that its condition is applied after the aggregation subquery:

  • SELECT *
    FROM   (SELECT target,
                   Count(*)
            FROM   operation
            GROUP  BY target) t
    WHERE  target = 'rm-xxxx'
    +----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
    | id | select_type | table      | type  | possible_keys | key         | key_len | ref   | rows | Extra |
    +----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
    | 1 | PRIMARY | <derived2> | ref   | <auto_key0> | <auto_key0> | 514     | const | 2 | Using where |
    | 2 | DERIVED | operation | index | idx_4 | idx_4 | 519     | NULL  | 20 | Using index |
    +----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
    
  • After confirming that the semantic query condition can be directly pushed down, rewrite it as follows:

  • SELECT target,
           Count(*)
    FROM   operation
    WHERE  target = 'rm-xxxx'
    GROUP  BY target
    
  • The execution plan becomes:

  • +----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
    | 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
    +----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
    

Narrow down early

  • First on the initial SQL statement:

  • SELECT *
    FROM   my_order o
           LEFT JOIN my_userinfo u
                  ON o.uid = u.uid
           LEFT JOIN my_productinfo p
                  ON o.pid = p.pid
    WHERE  ( o.display = 0 )
           AND ( o.ostaus = 1 )
    ORDER  BY o.selltime DESC
    LIMIT  0, 15
    
  • The original meaning of the SQL statement is: first do a series of left joins, and then sort the first 15 records. It can also be seen from the execution plan that the estimated number of sorted records in the last step is 900,000, and the time consumption is 12 seconds.

  • +----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
    |  1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 909119 | Using where; Using temporary; Using filesort |
    | 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL |
    |  1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL |      6 | Using where; Using join buffer (Block Nested Loop) |
    +----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
    
  • Since the last WHERE condition and sorting are all for the leftmost main table, you can first sort my_order and reduce the amount of data in advance before doing left join. After the SQL is rewritten as follows, the execution time is reduced to about 1 millisecond.

  • SELECT *
    FROM (
    SELECT *
    FROM   my_order o
    WHERE  ( o.display = 0 )
           AND ( o.ostaus = 1 )
    ORDER  BY o.selltime DESC
    LIMIT  0, 15
    ) o
         LEFT JOIN my_userinfo u
                  ON o.uid = u.uid
         LEFT JOIN my_productinfo p
                  ON o.pid = p.pid
    ORDER BY  o.selltime DESC
    limit 0, 15
    
  • Check the execution plan again: After the subquery is materialized (select_type=DERIVED), participate in JOIN. Although the estimated row scan is still 900,000, the actual execution time becomes very small after using the index and the LIMIT clause.

  • +----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
    |  1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL |     15 | Using temporary; Using filesort |
    | 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL |
    |  1 | PRIMARY | p | ALL | PRIMARY | NULL | NULL | NULL |      6 | Using where; Using join buffer (Block Nested Loop) |
    | 2 | DERIVED | o | index | NULL | idx_1 | 5 | NULL | 909112 | Using where |
    +----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
    

Intermediate result set pushdown

  • Let's look at the following example that has been initially optimized (the main table in the left join takes priority as the query condition):

  • SELECT    a.*,
              c.allocated
    FROM      (
                  SELECT   resourceid
                  FROM     my_distribute d
                       WHERE    isdelete = 0
                       AND      cusmanagercode = '1234567'
                       ORDER BY salecode limit 20) a
    LEFT JOIN
              (
                  SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                  FROM     my_resources
                       GROUP BY resourcesid) c
    ON        a.resourceid = c.resourcesid
    
  • Are there any other problems with this statement? It is not difficult to see that the subquery c is a full-table aggregate query, which will lead to a decrease in the performance of the entire statement when the number of tables is particularly large.

  • In fact, for subquery c, the final result set of the left join only cares about the data that can match the resourceid of the main table. So we can rewrite the statement as follows, and the execution time drops from 2 seconds to 2 milliseconds.

  • SELECT    a.*,
              c.allocated
    FROM      (
                       SELECT   resourceid
                       FROM     my_distribute d
                       WHERE    isdelete = 0
                       AND      cusmanagercode = '1234567'
                       ORDER BY salecode limit 20) a
    LEFT JOIN
              (
                       SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                       FROM     my_resources r,
                                (
                                         SELECT   resourceid
                                         FROM     my_distribute d
                                         WHERE    isdelete = 0
                                         AND      cusmanagercode = '1234567'
                                         ORDER BY salecode limit 20) a
                       WHERE    r.resourcesid = a.resourcesid
                       GROUP BY resourcesid) c
    ON        a.resourceid = c.resourcesid
    
  • But the subquery a appears multiple times in our SQL statement. This way of writing not only has additional overhead, but also makes the entire statement significantly complicated. Rewrite again using the WITH statement:

  • WITH a AS
    (
             SELECT   resourceid
             FROM     my_distribute d
             WHERE    isdelete = 0
             AND      cusmanagercode = '1234567'
             ORDER BY salecode limit 20)
    SELECT    a.*,
              c.allocated
    FROM      a
    LEFT JOIN
              (
                       SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                       FROM     my_resources r,
                                a
                       WHERE    r.resourcesid = a.resourcesid
                       GROUP BY resourcesid) c
    ON        a.resourceid = c.resourcesid
    

Summarize

  • The database compiler generates an execution plan, which determines how SQL is actually executed. But the compiler is just doing its best, and not all database compilers are perfect.
  • Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of the database compiler can we avoid its shortcomings and write high-performance SQL statements.
  • When designing data models and writing SQL statements, programmers should bring in the idea or awareness of algorithms.
  • To write complex SQL statements, you must develop the habit of using the WITH statement. Concise and clear SQL statements can also reduce the burden on the database.

Guess you like

Origin blog.csdn.net/Andrew_Chenwq/article/details/129974923