-
1. LIMIT statement
-
2. Implicit conversion
-
3. Association update and delete
-
4. Mixed sorting
-
5. EXISTS statement
-
6. Conditional push down
-
7. Narrow down the scope in advance
-
8. Push down the intermediate result set
-
Summarize
The execution sequence of sql statements:
FROM
<left_table>
ON
<join_condition>
<join_type>
JOIN
<right_table>
WHERE
<where_condition>
GROUP BY
<group_by_list>
HAVING
<having_condition>
SELECT
DISTINCT
<select_list>
ORDER BY
<order_by_condition>
LIMIT
<limit_number>
1. LIMIT statement
Pagination query is one of the most commonly used scenarios, but it is also usually the place where problems are most likely to occur. For example, for the following simple statement, the general DBA thinks of adding a composite index to the type, name, and create_time fields. In this way, conditional sorting can effectively use the index, and the performance is rapidly improved.
SELECT *
FROM operation
WHERE type = 'SQLStats'
AND name = 'SlowLog'
ORDER BY create_time
LIMIT 1000, 10;
Well, maybe more than 90% of DBAs solve this problem and stop here. But when the LIMIT clause becomes "LIMIT 1000000,10", the programmer will still complain: Why is it still slow when I only fetch 10 records?
You must know that the database does not know where the 1000000th record starts, even if there is an index, it needs to be calculated from scratch. This kind of performance problem occurs, in most cases, the programmer is lazy.
In scenarios such as front-end data browsing and page turning, or batch export of big data, the maximum value of the previous page can be used as a parameter as a query condition. The SQL redesign is as follows:
SELECT *
FROM operation
WHERE type = 'SQLStats'
AND name = 'SlowLog'
AND create_time > '2017-03-16 14:00:00'
ORDER BY create_time limit 10;
Under the new design, the query time is basically fixed and will not change as the amount of data grows.
2. Implicit conversion
Type mismatch between query variable and field definition in SQL statement is another common mistake. For example the following statement:
mysql> explain extended SELECT *
> FROM my_balance b
> WHERE b.bpn = 14000000123
> AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'
Among them, the field bpn is defined as varchar(20), and MySQL's strategy is to convert strings into numbers before comparison. The function acts on the table field, and the index is invalid.
The above cases may be auto-filled parameters of the application framework, rather than the original intention of the programmer. Nowadays, there are many application frameworks that are very complicated. While it is convenient to use, be careful that it may dig a hole for yourself.
3. Association update and delete
Although MySQL5.6 introduces the materialization feature, special attention needs to be paid to its current optimization only for query statements. For updating or deleting, it needs to be manually rewritten as JOIN.
For example, in the following UPDATE statement, MySQL actually executes a loop/nested subquery (DEPENDENT SUBQUERY), and its execution time can be imagined.
UPDATE operation o
SET status = 'applying'
WHERE o.id IN (SELECT id
FROM (SELECT o.id,
o.status
FROM operation o
WHERE o.group = 123
AND o.status NOT IN ( 'done' )
ORDER BY o.parent,
o.id
LIMIT 1) t);
Implementation plan:
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Using where; Using temporary |
| 2 | DEPENDENT SUBQUERY | | | | | | | | Impossible WHERE noticed after reading const tables |
| 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
After rewriting to JOIN, the selection mode of the subquery changes from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly accelerated, from 7 seconds to 2 milliseconds.
UPDATE operation o
JOIN (SELECT o.id,
o.status
FROM operation o
WHERE o.group = 123
AND o.status NOT IN ( 'done' )
ORDER BY o.parent,
o.id
LIMIT 1) t
ON o.id = t.id
SET status = 'applying'
The execution plan simplifies to:
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY | | | | | | | | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
4. Mixed sorting
MySQL cannot use indexes for mixed sorting. But in some scenarios, there are still opportunities to use special methods to improve performance.
SELECT *
FROM my_order o
INNER JOIN my_appraise a ON a.orderid = o.id
ORDER BY a.is_reply ASC,
a.appraise_time DESC
LIMIT 0, 20
The execution plan shows a full table scan:
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| 1 | SIMPLE | a | ALL | idx_orderid | NULL | NULL | NULL | 1967647 | Using filesort |
| 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 122 | a.orderid | 1 | NULL |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+
Since is_reply only has two states of 0 and 1, after we rewrite it according to the following method, the execution time is reduced from 1.58 seconds to 2 milliseconds.
SELECT *
FROM ((SELECT *
FROM my_order o
INNER JOIN my_appraise a
ON a.orderid = o.id
AND is_reply = 0
ORDER BY appraise_time DESC
LIMIT 0, 20)
UNION ALL
(SELECT *
FROM my_order o
INNER JOIN my_appraise a
ON a.orderid = o.id
AND is_reply = 1
ORDER BY appraise_time DESC
LIMIT 0, 20)) t
ORDER BY is_reply ASC,
appraisetime DESC
LIMIT 20;
5. EXISTS statement
When MySQL treats the EXISTS clause, it still uses the execution method of nested subqueries. Such as the following SQL statement:
SELECT *
FROM my_neighbor n
LEFT JOIN my_neighbor_apply sra
ON n.id = sra.neighbor_id
AND sra.user_id = 'xxx'
WHERE n.topic_status < 4
AND EXISTS(SELECT 1
FROM message_info m
WHERE n.id = m.neighbor_id
AND m.inuser = 'xxx')
AND n.topic_type <> 5
The execution plan is:
+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
| 1 | PRIMARY | n | ALL | | NULL | NULL | NULL | 1086041 | Using where |
| 1 | PRIMARY | sra | ref | | idx_user_id | 123 | const | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | m | ref | | idx_message_info | 122 | const | 1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
Remove exists and change to join, which can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.
SELECT *
FROM my_neighbor n
INNER JOIN message_info m
ON n.id = m.neighbor_id
AND m.inuser = 'xxx'
LEFT JOIN my_neighbor_apply sra
ON n.id = sra.neighbor_id
AND sra.user_id = 'xxx'
WHERE n.topic_status < 4
AND n.topic_type <> 5
New execution plan:
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition |
| 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where |
| 1 | SIMPLE | sra | ref | | idx_user_id | 123 | const | 1 | Using where |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
6. Conditional push down
Situations where external query conditions cannot be pushed down to complex views or subqueries include:
1. Aggregation subquery; 2. Subquery containing LIMIT; 3. UNION or UNION ALL subquery; 4. Subquery in the output field;
As in the following statement, it can be seen from the execution plan that its condition is applied after the aggregation subquery:
SELECT *
FROM (SELECT target,
Count(*)
FROM operation
GROUP BY target) t
WHERE target = 'rm-xxxx'
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 514 | const | 2 | Using where |
| 2 | DERIVED | operation | index | idx_4 | idx_4 | 519 | NULL | 20 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
After confirming that the semantic query condition can be directly pushed down, rewrite it as follows:
SELECT target,
Count(*)
FROM operation
WHERE target = 'rm-xxxx'
GROUP BY target
The execution plan becomes:
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
Please refer to the previous article for a detailed explanation of MySQL's external conditions that cannot be pushed down: MySQL · Performance Optimization · Conditional Pushdown to Materialized Tables http://mysql.taobao.org/monthly/2016/07/08
7. Narrow down the scope in advance
First on the initial SQL statement:
SELECT *
FROM my_order o
LEFT JOIN my_userinfo u
ON o.uid = u.uid
LEFT JOIN my_productinfo p
ON o.pid = p.pid
WHERE ( o.display = 0 )
AND ( o.ostaus = 1 )
ORDER BY o.selltime DESC
LIMIT 0, 15
The original meaning of the SQL statement is: first do a series of left joins, and then sort the first 15 records. It can also be seen from the execution plan that the estimated number of sorted records in the last step is 900,000, and the time consumption is 12 seconds.
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 909119 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL |
| 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
Since the last WHERE condition and sorting are all for the leftmost main table, you can first sort my_order and reduce the amount of data in advance before doing left join. After the SQL is rewritten as follows, the execution time is reduced to about 1 millisecond.
SELECT *
FROM (
SELECT *
FROM my_order o
WHERE ( o.display = 0 )
AND ( o.ostaus = 1 )
ORDER BY o.selltime DESC
LIMIT 0, 15
) o
LEFT JOIN my_userinfo u
ON o.uid = u.uid
LEFT JOIN my_productinfo p
ON o.pid = p.pid
ORDER BY o.selltime DESC
limit 0, 15
Check the execution plan again: After the subquery is materialized (select_type=DERIVED), participate in JOIN. Although the estimated row scan is still 900,000, the actual execution time becomes very small after using the index and the LIMIT clause.
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using temporary; Using filesort |
| 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL |
| 1 | PRIMARY | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | o | index | NULL | idx_1 | 5 | NULL | 909112 | Using where |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
8. Push down the intermediate result set
Let's look at the following example that has been initially optimized (the main table in the left join takes priority as the query condition):
SELECT a.*,
c.allocated
FROM (
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid
So are there any other problems with this statement? It is not difficult to see that the subquery c is a full-table aggregation query, which will lead to a decrease in the performance of the entire statement when the number of tables is particularly large.
In fact, for subquery c, the final result set of the left join only cares about the data that can match the resourceid of the main table. So we can rewrite the statement as follows, and the execution time drops from 2 seconds to 2 milliseconds.
SELECT a.*,
c.allocated
FROM (
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources r,
(
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
WHERE r.resourcesid = a.resourcesid
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid
But the subquery a appears multiple times in our SQL statement. This way of writing not only has additional overhead, but also makes the entire statement significantly complicated. Rewrite again using the WITH statement:
WITH a AS
(
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20)
SELECT a.*,
c.allocated
FROM a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources r,
a
WHERE r.resourcesid = a.resourcesid
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid
Summarize
The database compiler generates an execution plan, which determines how SQL is actually executed. But the compiler is just doing its best, and not all database compilers are perfect.
Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of the database compiler can we avoid its shortcomings and write high-performance SQL statements.
When designing data models and writing SQL statements, programmers should bring in the idea or awareness of algorithms.
To write complex SQL statements, you must develop the habit of using the WITH statement. Concise and clear SQL statements can also reduce the burden on the database.