Mysql advanced - index optimization and query optimization (2)

5. Sorting optimization

5.1 Sorting optimization

Question: Add an index on the WHERE condition field, but why do you need to add an index on the ORDER BY field?


Optimization suggestions:

  • In SQL, you can use indexes in the WHERE clause and ORDER BY clause to avoid full table scans in the WHERE clause and FileSort sorting in the ORDER BY clause. Of course, in some cases, full table scan or FileSort sorting is not necessarily slower than indexing. But in general, we still have to avoid it to improve query efficiency.

  • Try to use Index to complete ORDER BY sorting. If WHERE and ORDER BY are followed by the same column, use a single index column; if they are different, use a joint index.

  • When Index cannot be used, the FileSort method needs to be tuned.

INDEX a_b_c(a,b,c)
order by 能使用索引最左前缀
- ORDER BY a
- ORDER BY a,b
- ORDER BY a,b,c
- ORDER BY a DESC,b DESC,c DESC
如果WHERE使用索引的最左前缀定义为常量,则order by 能使用索引
- WHERE a = const ORDER BY b,c
- WHERE a = const AND b = const ORDER BY c
- WHERE a = const ORDER BY b,c
- WHERE a = const AND b > const ORDER BY b,c
不能使用索引进行排序
- ORDER BY a ASC,b DESC,c DESC /* 排序不一致 */
- WHERE g = const ORDER BY b,c /*丢失a索引*/
- WHERE a = const ORDER BY c /*丢失b索引*/
- WHERE a = const ORDER BY a,d /*d不是索引的一部分*/
- WHERE a in (...) ORDER BY b,c /*对于排序来说,多个相等条件也是范围查询*/

5.2 Case practice

In the ORDER BY clause, try to use Index sorting and avoid using FileSort sorting.
Before executing the case, clear the index on the student, leaving only the primary key:

DROP INDEX idx_age ON student;
DROP INDEX idx_age_classid_stuno ON student;
DROP INDEX idx_age_classid_name ON student;
#或者
call proc_drop_index('atguigudb2','student');

Scenario: Query students who are 30 years old and whose student number is less than 101000, sorted by user name

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY
    -> NAME ;
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                       |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
|  1 | SIMPLE      | student | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 498917 |     3.33 | Using where; Using filesort |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
1 row in set, 2 warnings (0.00 sec)

The query results are as follows:

mysql> SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY
    -> NAME ;
+-----+--------+--------+------+---------+
| id  | stuno  | name   | age  | classId |
+-----+--------+--------+------+---------+
| 695 | 100695 | bXLNEI |   30 |     979 |
| 322 | 100322 | CeOJNY |   30 |      40 |
| 993 | 100993 | DVVPnT |   30 |     340 |
| 983 | 100983 | fmUNei |   30 |     433 |
| 946 | 100946 | iSPxRQ |   30 |     511 |
| 469 | 100469 | LTktoo |   30 |      69 |
|  45 | 100045 | mBZrKC |   30 |     280 |
| 635 | 100635 | nQnUJL |   30 |     732 |
|  16 | 100016 | NzjxKh |   30 |     539 |
| 363 | 100363 | OMuKtM |   30 |     695 |
| 293 | 100293 | qOYywO |   30 |     586 |
| 169 | 100169 | qUElsg |   30 |     526 |
| 798 | 100798 | rhHPdX |   30 |      71 |
| 749 | 100749 | TCgaJe |   30 |     697 |
| 157 | 100157 | TUQtvY |   30 |      22 |
| 580 | 100580 | UHDUOj |   30 |     423 |
| 532 | 100532 | XvmZkc |   30 |     861 |
| 939 | 100939 | yBlCbB |   30 |     320 |
| 710 | 100710 | yhmRvD |   30 |     219 |
| 266 | 100266 | YueogP |   30 |     524 |
+-----+--------+--------+------+---------+
20 rows in set, 1 warning (0.16 sec)

Conclusion: type is ALL, which is the worst case. Using filesort also appears in Extra, which is also the worst case scenario. Optimization is a must.

Optimization ideas:

Option 1: In order to remove filesort, we can build the index

#创建新索引
CREATE INDEX idx_age_name ON student(age,NAME);

Option 2: Try to use the upper index for filtering conditions and sorting where

Create a combined index of three fields:

DROP INDEX idx_age_name ON student;
CREATE INDEX idx_age_stuno_name ON student (age,stuno,NAME);
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME;
mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME;
+----+-------------+---------+------------+-------+--------------------+--------------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table   | partitions | type  | possible_keys      | key                | key_len | ref  | rows | filtered | Extra                                 |
+----+-------------+---------+------------+-------+--------------------+--------------------+---------+------+------+----------+---------------------------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_age_stuno_name | idx_age_stuno_name | 9       | NULL |   20 |   100.00 | Using index condition; Using filesort |
+----+-------------+---------+------------+-------+--------------------+--------------------+---------+------+------+----------+---------------------------------------+
1 row in set, 2 warnings (0.00 sec)
mysql> SELECT SQL_NO_CACHE * FROM student
    ->  WHERE age = 30 AND stuno <101000 ORDER BY NAME ;
+-----+--------+--------+------+---------+
| id  | stuno  | name   | age  | classId |
+-----+--------+--------+------+---------+
| 695 | 100695 | bXLNEI |   30 |     979 |
| 322 | 100322 | CeOJNY |   30 |      40 |
| 993 | 100993 | DVVPnT |   30 |     340 |
| 983 | 100983 | fmUNei |   30 |     433 |
| 946 | 100946 | iSPxRQ |   30 |     511 |
| 469 | 100469 | LTktoo |   30 |      69 |
|  45 | 100045 | mBZrKC |   30 |     280 |
| 635 | 100635 | nQnUJL |   30 |     732 |
|  16 | 100016 | NzjxKh |   30 |     539 |
| 363 | 100363 | OMuKtM |   30 |     695 |
| 293 | 100293 | qOYywO |   30 |     586 |
| 169 | 100169 | qUElsg |   30 |     526 |
| 798 | 100798 | rhHPdX |   30 |      71 |
| 749 | 100749 | TCgaJe |   30 |     697 |
| 157 | 100157 | TUQtvY |   30 |      22 |
| 580 | 100580 | UHDUOj |   30 |     423 |
| 532 | 100532 | XvmZkc |   30 |     861 |
| 939 | 100939 | yBlCbB |   30 |     320 |
| 710 | 100710 | yhmRvD |   30 |     219 |
| 266 | 100266 | YueogP |   30 |     524 |
+-----+--------+--------+------+---------+
20 rows in set, 1 warning (0.00 sec)

As a result, the running speed of filesort's sql exceeded that of the optimized sql of filesort, and it was much faster, and the results appeared almost instantly.

in conclusion:

  1. Two indexes exist at the same time, and MySQL automatically selects the optimal solution. (For this example, mysql selects idx_age_stuno_name). However, as the amount of data changes, the selected index will also change.
  2. When there is a choice between [range condition] and [group by or order by] fields, priority is given to observing the filtering quantity of the condition field. If there is enough filtered data and there is not much data that needs to be sorted, priority is given to placing the index in the range. on the field. vice versa.

5.3 filesort algorithm: two-way sorting and one-way sorting

Two-way sort (slow)

  • Before MySQL 4.1, two-way sorting was used, which literally means scanning the disk twice to finally get the data, read the row pointer and order by column, sort them, then scan the sorted list, and re-start from the list according to the values ​​in the list. Read the corresponding data output from the list

  • Get the sorting field from the disk, sort it in the buffer, and then get other fields from the disk.

To get a batch of data, the disk needs to be scanned twice. As we all know, IO is very time-consuming, so after mysql4.1, a second improved algorithm appeared, which is single-way sorting.

One-way sorting (fast)

Read all the columns required for the query from the disk, sort them in the buffer according to the order by column, and then scan the sorted list for output. It is more efficient and avoids reading the data a second time. And turns random IO into sequential IO, but it will use more space because it saves each row in memory.

Conclusions and raised questions

  • Since the single path comes out from behind, it is generally better than the dual path.
  • But there is a problem with using single channel

6. GROUP BY optimization

  • The principle of using index by by is almost the same as that of order by. Group by can use the index directly even if there is no filter condition that uses the index.
  • group by sorts first and then groups, following the best left prefix rule for index construction
  • When index columns cannot be used, increase the settings of the max_length_for_sort_data and sort_buffer_size parameters
  • The efficiency of where is higher than that of having. If the conditions can be written in where, do not write them in having.
  • Reduce the use of order by, and communicate with the business without sorting without sorting, or put the sorting in the program. Statements such as Order by, groupby, and distinct consume more CPU, and the CPU resources of the database are extremely precious.
  • Contains query statements such as order by, group by, and distinct. The result set filtered by the where condition must be kept within 1,000 rows, otherwise SQL will be very slow.

7. Optimize paging queries

Optimization idea one

Complete the sorting and paging operation on the index, and finally associate it back to other column contents required by the original table query based on the primary key.

mysql> EXPLAIN SELECT * FROM student t,(SELECT id FROM student ORDER BY id LIMIT 2000000,10)
    -> a
    -> WHERE t.id = a.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+------------+------------+--------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL | 498917 |   100.00 | NULL        |
|  1 | PRIMARY     | t          | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | a.id |      1 |   100.00 | NULL        |
|  2 | DERIVED     | student    | NULL       | index  | NULL          | PRIMARY | 4       | NULL | 498917 |   100.00 | Using index |
+----+-------------+------------+------------+--------+---------------+---------+---------+------+--------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

Optimization idea two

This solution is suitable for tables with auto-incrementing primary keys, and can convert Limit queries into queries at a certain location.

mysql> EXPLAIN SELECT * FROM student WHERE id > 2000000 LIMIT 10;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |    1 |   100.00 | Using where |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

8. Prioritize covering indexes

8.1 What is a covering index?

Understanding method one: Indexes are a way to find rows efficiently, but general databases can also use indexes to find data in a column, so it does not have to read the entire row. After all, index leaf nodes store the data they index; when the desired data can be obtained by reading the index, there is no need to read the rows. An index that contains data that satisfies the query results is called a covering index.

Understanding method two: A form of non-clustered composite index, which includes all columns used in the SELECT, JOIN and WHERE clauses in the query (that is, the fields used to build the index are exactly the fields involved in the query conditions
). To put it simply, the index column + primary key contains the columns queried from SELECT to FROM.

8.2 Pros and cons of covering indexes

benefit:

  1. Avoid secondary query of Innodb table index (table return)

  2. Can turn random IO into sequential IO to speed up query efficiency

Disadvantages:
Maintenance of index fields always comes at a cost. Therefore, there are trade-offs to consider when building redundant indexes to support covering indexes. This is the job of the business DBA, or business data architect.

Guess you like

Origin blog.csdn.net/qq_51495235/article/details/133102814