SQL Essentials

  • 1. LIMIT statement

  • 2. Implicit conversion

  • 3. Association update and delete

  • 4. Mixed sorting

  • 5. EXISTS statement

  • 6. Conditional push down

  • 7. Narrow down the scope in advance

  • 8. Push down the intermediate result set

  • Summarize

The execution sequence of sql statements:

FROM
<left_table>

ON
<join_condition>

<join_type>
 JOIN
<right_table>

WHERE
<where_condition>

GROUP BY
<group_by_list>

HAVING
<having_condition>

SELECT

DISTINCT
<select_list>

ORDER BY
<order_by_condition>

LIMIT
<limit_number>

1. LIMIT statement

Pagination query is one of the most commonly used scenarios, but it is also usually the place where problems are most likely to occur. For example, for the following simple statement, the general DBA thinks of adding a composite index to the type, name, and create_time fields. In this way, conditional sorting can effectively use the index, and the performance is rapidly improved.

SELECT *
FROM   operation
WHERE  type = 'SQLStats'
       AND name = 'SlowLog'
ORDER  BY create_time
LIMIT  1000, 10;

Well, maybe more than 90% of DBAs solve this problem and stop here. But when the LIMIT clause becomes "LIMIT 1000000,10", the programmer will still complain: Why is it still slow when I only fetch 10 records?

You must know that the database does not know where the 1000000th record starts, even if there is an index, it needs to be calculated from scratch. This kind of performance problem occurs, in most cases, the programmer is lazy.

In scenarios such as front-end data browsing and page turning, or batch export of big data, the maximum value of the previous page can be used as a parameter as a query condition. The SQL redesign is as follows:

SELECT   *
FROM     operation
WHERE    type = 'SQLStats'
AND      name = 'SlowLog'
AND      create_time > '2017-03-16 14:00:00'
ORDER BY create_time limit 10;

Under the new design, the query time is basically fixed and will not change as the amount of data grows.

2. Implicit conversion

Type mismatch between query variable and field definition in SQL statement is another common mistake. For example the following statement:

mysql> explain extended SELECT *
     > FROM   my_balance b
     > WHERE  b.bpn = 14000000123
     >       AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'

Among them, the field bpn is defined as varchar(20), and MySQL's strategy is to convert strings into numbers before comparison. The function acts on the table field, and the index is invalid.

The above cases may be auto-filled parameters of the application framework, rather than the original intention of the programmer. Nowadays, there are many application frameworks that are very complicated. While it is convenient to use, be careful that it may dig a hole for yourself.

3. Association update and delete

Although MySQL5.6 introduces the materialization feature, special attention needs to be paid to its current optimization only for query statements. For updating or deleting, it needs to be manually rewritten as JOIN.

For example, in the following UPDATE statement, MySQL actually executes a loop/nested subquery (DEPENDENT SUBQUERY), and its execution time can be imagined.

UPDATE operation o
SET    status = 'applying'
WHERE  o.id IN (SELECT id
                FROM   (SELECT o.id,
                               o.status
                        FROM   operation o
                        WHERE  o.group = 123
                               AND o.status NOT IN ( 'done' )
                        ORDER  BY o.parent,
                                  o.id
                        LIMIT  1) t);

Implementation plan:

+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type        | table | type  | possible_keys | key     | key_len | ref   | rows | Extra                                               |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1  | PRIMARY            | o     | index |               | PRIMARY | 8       |       | 24   | Using where; Using temporary                        |
| 2  | DEPENDENT SUBQUERY |       |       |               |         |         |       |      | Impossible WHERE noticed after reading const tables |
| 3  | DERIVED            | o     | ref   | idx_2,idx_5   | idx_5   | 8       | const | 1    | Using where; Using filesort                         |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+

After rewriting to JOIN, the selection mode of the subquery changes from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly accelerated, from 7 seconds to 2 milliseconds.

UPDATE operation o
       JOIN  (SELECT o.id,
                            o.status
                     FROM   operation o
                     WHERE  o.group = 123
                            AND o.status NOT IN ( 'done' )
                     ORDER  BY o.parent,
                               o.id
                     LIMIT  1) t
         ON o.id = t.id
SET    status = 'applying'

The execution plan simplifies to:

+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                                               |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1  | PRIMARY     |       |      |               |       |         |       |      | Impossible WHERE noticed after reading const tables |
| 2  | DERIVED     | o     | ref  | idx_2,idx_5   | idx_5 | 8       | const | 1    | Using where; Using filesort                         |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+

4. Mixed sorting

MySQL cannot use indexes for mixed sorting. But in some scenarios, there are still opportunities to use special methods to improve performance.

SELECT *
FROM   my_order o
       INNER JOIN my_appraise a ON a.orderid = o.id
ORDER  BY a.is_reply ASC,
          a.appraise_time DESC
LIMIT  0, 20

The execution plan shows a full table scan:

+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| id | select_type | table | type   | possible_keys     | key     | key_len | ref      | rows    | Extra
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
|  1 | SIMPLE      | a     | ALL    | idx_orderid | NULL    | NULL    | NULL    | 1967647 | Using filesort |
|  1 | SIMPLE      | o     | eq_ref | PRIMARY     | PRIMARY | 122     | a.orderid |       1 | NULL           |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+

Since is_reply only has two states of 0 and 1, after we rewrite it according to the following method, the execution time is reduced from 1.58 seconds to 2 milliseconds.

SELECT *
FROM   ((SELECT *
         FROM   my_order o
                INNER JOIN my_appraise a
                        ON a.orderid = o.id
                           AND is_reply = 0
         ORDER  BY appraise_time DESC
         LIMIT  0, 20)
        UNION ALL
        (SELECT *
         FROM   my_order o
                INNER JOIN my_appraise a
                        ON a.orderid = o.id
                           AND is_reply = 1
         ORDER  BY appraise_time DESC
         LIMIT  0, 20)) t
ORDER  BY  is_reply ASC,
          appraisetime DESC
LIMIT  20;

5. EXISTS statement

When MySQL treats the EXISTS clause, it still uses the execution method of nested subqueries. Such as the following SQL statement:

SELECT *
FROM   my_neighbor n
       LEFT JOIN my_neighbor_apply sra
              ON n.id = sra.neighbor_id
                 AND sra.user_id = 'xxx'
WHERE  n.topic_status < 4
       AND EXISTS(SELECT 1
                  FROM   message_info m
                  WHERE  n.id = m.neighbor_id
                         AND m.inuser = 'xxx')
       AND n.topic_type <> 5

The execution plan is:

+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type        | table | type | possible_keys     | key   | key_len | ref   | rows    | Extra   |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
|  1 | PRIMARY            | n     | ALL  |  | NULL     | NULL    | NULL  | 1086041 | Using where                   |
|  1 | PRIMARY            | sra   | ref  |  | idx_user_id | 123     | const |       1 | Using where          |
|  2 | DEPENDENT SUBQUERY | m     | ref  |  | idx_message_info   | 122     | const |       1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+

Remove exists and change to join, which can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.

SELECT *
FROM   my_neighbor n
       INNER JOIN message_info m
               ON n.id = m.neighbor_id
                  AND m.inuser = 'xxx'
       LEFT JOIN my_neighbor_apply sra
              ON n.id = sra.neighbor_id
                 AND sra.user_id = 'xxx'
WHERE  n.topic_status < 4
       AND n.topic_type <> 5

New execution plan:

+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type   | possible_keys     | key       | key_len | ref   | rows | Extra                 |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
|  1 | SIMPLE      | m     | ref    | | idx_message_info   | 122     | const    |    1 | Using index condition |
|  1 | SIMPLE      | n     | eq_ref | | PRIMARY   | 122     | ighbor_id |    1 | Using where      |
|  1 | SIMPLE      | sra   | ref    | | idx_user_id | 123     | const     |    1 | Using where           |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+

6. Conditional push down

Situations where external query conditions cannot be pushed down to complex views or subqueries include:

1. Aggregation subquery; 2. Subquery containing LIMIT; 3. UNION or UNION ALL subquery; 4. Subquery in the output field;

As in the following statement, it can be seen from the execution plan that its condition is applied after the aggregation subquery:

SELECT *
FROM   (SELECT target,
               Count(*)
        FROM   operation
        GROUP  BY target) t
WHERE  target = 'rm-xxxx'

+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table      | type  | possible_keys | key         | key_len | ref   | rows | Extra       |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
|  1 | PRIMARY     | <derived2> | ref   | <auto_key0>   | <auto_key0> | 514     | const |    2 | Using where |
|  2 | DERIVED     | operation  | index | idx_4         | idx_4       | 519     | NULL  |   20 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+

After confirming that the semantic query condition can be directly pushed down, rewrite it as follows:

SELECT target,
       Count(*)
FROM   operation
WHERE  target = 'rm-xxxx'
GROUP  BY target

The execution plan becomes:

+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+

Please refer to the previous article for a detailed explanation of MySQL's external conditions that cannot be pushed down: MySQL · Performance Optimization · Conditional Pushdown to Materialized Tables http://mysql.taobao.org/monthly/2016/07/08

7. Narrow down the scope in advance

First on the initial SQL statement:

SELECT *
FROM   my_order o
       LEFT JOIN my_userinfo u
              ON o.uid = u.uid
       LEFT JOIN my_productinfo p
              ON o.pid = p.pid
WHERE  ( o.display = 0 )
       AND ( o.ostaus = 1 )
ORDER  BY o.selltime DESC
LIMIT  0, 15

The original meaning of the SQL statement is: first do a series of left joins, and then sort the first 15 records. It can also be seen from the execution plan that the estimated number of sorted records in the last step is 900,000, and the time consumption is 12 seconds.

+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref             | rows   | Extra                                              |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
|  1 | SIMPLE      | o     | ALL    | NULL          | NULL    | NULL    | NULL            | 909119 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | o.uid |      1 | NULL                                               |
|  1 | SIMPLE      | p     | ALL    | PRIMARY       | NULL    | NULL    | NULL            |      6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+

Since the last WHERE condition and sorting are all for the leftmost main table, you can first sort my_order and reduce the amount of data in advance before doing left join. After the SQL is rewritten as follows, the execution time is reduced to about 1 millisecond.

SELECT *
FROM (
SELECT *
FROM   my_order o
WHERE  ( o.display = 0 )
       AND ( o.ostaus = 1 )
ORDER  BY o.selltime DESC
LIMIT  0, 15
) o
     LEFT JOIN my_userinfo u
              ON o.uid = u.uid
     LEFT JOIN my_productinfo p
              ON o.pid = p.pid
ORDER BY  o.selltime DESC
limit 0, 15

Check the execution plan again: After the subquery is materialized (select_type=DERIVED), participate in JOIN. Although the estimated row scan is still 900,000, the actual execution time becomes very small after using the index and the LIMIT clause.

+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref   | rows   | Extra                                              |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL  |     15 | Using temporary; Using filesort                    |
|  1 | PRIMARY     | u          | eq_ref | PRIMARY       | PRIMARY | 4       | o.uid |      1 | NULL                                               |
|  1 | PRIMARY     | p          | ALL    | PRIMARY       | NULL    | NULL    | NULL  |      6 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | o          | index  | NULL          | idx_1   | 5       | NULL  | 909112 | Using where                                        |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+

8. Push down the intermediate result set

Let's look at the following example that has been initially optimized (the main table in the left join takes priority as the query condition):

SELECT    a.*,
          c.allocated
FROM      (
              SELECT   resourceid
              FROM     my_distribute d
                   WHERE    isdelete = 0
                   AND      cusmanagercode = '1234567'
                   ORDER BY salecode limit 20) a
LEFT JOIN
          (
              SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
              FROM     my_resources
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

So are there any other problems with this statement? It is not difficult to see that the subquery c is a full-table aggregation query, which will lead to a decrease in the performance of the entire statement when the number of tables is particularly large.

In fact, for subquery c, the final result set of the left join only cares about the data that can match the resourceid of the main table. So we can rewrite the statement as follows, and the execution time drops from 2 seconds to 2 milliseconds.

SELECT    a.*,
          c.allocated
FROM      (
                   SELECT   resourceid
                   FROM     my_distribute d
                   WHERE    isdelete = 0
                   AND      cusmanagercode = '1234567'
                   ORDER BY salecode limit 20) a
LEFT JOIN
          (
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                   FROM     my_resources r,
                            (
                                     SELECT   resourceid
                                     FROM     my_distribute d
                                     WHERE    isdelete = 0
                                     AND      cusmanagercode = '1234567'
                                     ORDER BY salecode limit 20) a
                   WHERE    r.resourcesid = a.resourcesid
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

But the subquery a appears multiple times in our SQL statement. This way of writing not only has additional overhead, but also makes the entire statement significantly complicated. Rewrite again using the WITH statement:

WITH a AS
(
         SELECT   resourceid
         FROM     my_distribute d
         WHERE    isdelete = 0
         AND      cusmanagercode = '1234567'
         ORDER BY salecode limit 20)
SELECT    a.*,
          c.allocated
FROM      a
LEFT JOIN
          (
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                   FROM     my_resources r,
                            a
                   WHERE    r.resourcesid = a.resourcesid
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

Summarize

The database compiler generates an execution plan, which determines how SQL is actually executed. But the compiler is just doing its best, and not all database compilers are perfect.

Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of the database compiler can we avoid its shortcomings and write high-performance SQL statements.

When designing data models and writing SQL statements, programmers should bring in the idea or awareness of algorithms.

To write complex SQL statements, you must develop the habit of using the WITH statement. Concise and clear SQL statements can also reduce the burden on the database.

Guess you like

Origin blog.csdn.net/z_ssyy/article/details/128738232