Mysql advanced learning summary 14: subquery optimization, sorting optimization, GROUP BY optimization, paging query optimization

Mysql advanced learning summary 14: subquery optimization, sorting optimization, GROUP BY optimization, paging query optimization

1. Subquery optimization

Subquery is an important function of mysql, which can help us realize more complex queries through a sql statement. However, subqueries do not perform very efficiently. Because:

  1. When executing a subquery, mysql needs to create a temporary table for the query results of the inner query statement , and then the outer query statement queries records from the temporary table. After the query is completed, these temporary tables are revoked . This will consume too much CPU and IO resources and generate a large number of slow queries.
  2. The temporary table stored in the result set of the subquery does not have an index , so the query performance will be affected to a certain extent.
  3. For subqueries that return a relatively large result set, their impact on query performance is also increasing.

In mysql, a join query can be used instead of a subquery . Join query does not need to create a temporary table , and its speed is faster than subquery . Performance is better if indexes are used.

Try not to use NOT IN or NOT EXISTS, use LEFT JOIN TABLEX ON FIELDY WHERE FIELDY IS NULL instead .

2. Sorting optimization

2.1 Sorting optimization

In mysql, two sorting methods are supported, namely FileSort and Index sorting:

  • FileSort sorting is generally sorted and displayed in memory , which takes up more CPU . If the results to be sorted are large, temporary file I/O will occur to the disk for sorting, and the efficiency is low.
  • In index sorting, the index can ensure the order of the data, no need to sort, and the efficiency is higher .

Optimization suggestions:

  1. Indexes can be used in the WHERE clause and the ORDER BY clause to avoid full table scans in the WHERE clause and to avoid using FileSort in the ORDER BY clause . Of course, full table scans or FileSort sorts are not necessarily slower than indexes in some cases.
  2. Try to use INDEX to complete ORDER BY sorting. If the WHERE and ORDER BY are followed by the same column, use the but index column, and if they are different, use the joint index.
  3. When idex cannot be used, the FileSort method needs to be tuned.

Take the joint index IDEX a_b_c(a,b,c) as an example:
1) order by can use the leftmost prefix

ORDER BY a
ORDER BY a,b
ORDER BY a,b,c
ORDER BY a DESC, b DESC, c DESC

2) If WHERE uses the leftmost prefix of the index to define a constant, then order by can use the index

WHERE a = const ORDER BY b, c
WHERE a = const AND b = const ORDER BY c
WHERE a = const AND b > const ORDER BY b, c

3) When the index cannot be used for sorting

ORDER BY a ASC, b DESC     -- 排序不一致
WHERE g = const ORDER BY b, c     -- b字段前面丢失最左前缀索引a
WHERE a = const ORDER BY c     -- c字段前面丢失最左前缀索引b
WHERE a = const ORDER BY a, d     -- d不是索引
WHERE a IN (...) ORDER BY b, c     -- 对于排序来说,多个相等条件也是范围查找

2.2 filesort algorithm: two-way sorting and one-way sorting

If the sorted field is not on the index column, filesort will have two algorithms: two-way sorting and one-way sorting.
1) Two-way sorting (slow)

  • Before mysql4.1, two-way sorting was used. Literally scanning the disk twice to get the final result. First read the row pointer and order by column and sort them. Then scan the sorted list, and re-read the corresponding data output from the list according to the value in the list.
  • Get the sorting field from the disk, sort it in the buffer, and then get other fields from the disk .

To fetch a batch of data, the disk needs to be scanned twice. As we all know, IO is very time-consuming, so after mysql4.1, a second improved algorithm appeared, that is, one-way sorting.

2) One-way sorting (fast)
reads all the columns required by the query from the disk, sorts them in the buffer according to the order by column, and then scans the sorted list for output.

Optimization strategy:
1) Try to increase sort_buffer_size
No matter which algorithm is used, increasing this parameter will improve efficiency. It should be improved according to the capability of the system, because this parameter is adjusted between 1M-8M for each process. In Mysql 5.7, the Innodb storage engine defaults to 1MB

mysql> show variables like '%sort_buffer_size%';
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| innodb_sort_buffer_size | 1048576 |
| myisam_sort_buffer_size | 8388608 |
| sort_buffer_size        | 262144  |
+-------------------------+---------+
3 rows in set (0.00 sec)

2) It is a taboo to use select * when order by, it is best to only query the required fields .

3. GROUP BY optimization

  1. The principle of group by using the index is almost the same as that of order by. Even if the group by does not use the index for filter conditions, it can also use the index directly.
  2. group by is first sorted and then grouped, following the leftmost prefix rule built by the index
  3. When the index column cannot be used, increase the settings of the max_length_for_sort_data and sort_buffer_size parameters
  4. where is more efficient than having, if it can be written in the conditions limited by where, don’t write it in having
  5. Reduce the use of order by, and business communication can not be sorted without sorting, or sorting can be done in the program segment. Because the statements such as order by, group by, and distinct consume more CPU, the CPU resources of the database are extremely precious.
  6. Contains query statements such as order by, group by, distinct, and the result set filtered by where conditions should be kept within 1000 rows, otherwise SQL will be very slow.

4. Paging query optimization

Generally, in pagination query, the performance can be better improved by creating a covering index.
for example:

SELECT * FROM student LIMIT 2000000, 10;

You can consider completing the sorting and paging operation on the index first, and then associate back to the original table according to the primary key to query the required content:

SELECT * FROM student t, (SELECT id FROM student ORDER BY id LIMIT 2000000, 10) a
WHERE t.id = a.id;

Guess you like

Origin blog.csdn.net/xueping_wu/article/details/126071017