Sorting group optimization

Where conditions and on judge these filter conditions, as the priority optimization part, should be considered first!
Second, if there is grouping and sorting, then grouo by and order by should also be considered.

MySQL supports two kinds of sorting, FileSort and Index, Index is highly efficient.
It refers to MySQL scan index itself to complete sorting. The FileSort method is less efficient.

ORDER BY satisfies two situations, it will use Index to sort:

  • ORDER BY statement uses the left-most column of the index
  • Use Where clause and Order BY clause conditional column combination to satisfy the leftmost front index column
  • If the range query of the index appears in the where clause (that is, the range appears in the explain), the order by index will be invalid.

1. No filtering or indexing

create index idx_age_deptid_name on emp (age,deptid,name);
explain select * from emp where age=40 order by deptid;
explain select * from emp order by age,deptid;
explain select * from emp order by age,deptid limit 10;

Insert picture description here
using filesort indicates manual sorting! The reason is that there is no where as a filter!
Insert picture description here
Conclusion : No filtering, no indexing. where, limt are equivalent to a filter condition, so you can use the index!

2. Wrong order, must be sorted

explain select * from emp where age=45 order by deptid,name;

Insert picture description here

explain select * from emp where age=45 order by deptid,empno;

Insert picture description here
The empno field is not indexed, so the index cannot be used. This field needs to be sorted!

explain select * from emp where age=45 order by name,deptid;

Insert picture description here

explain select * from emp where deptid=45 order by age;

Insert picture description here
Deptid is used as the filter condition field, so the index cannot be used, so the index cannot be used for sorting.

3. Reverse direction, must be sorted

explain select * from emp where age=45 order by deptid desc, name desc ;

Insert picture description here
If the fields that can be used in the index are all used in positive or reverse order, there is actually no effect, just change the order of the result set.

explain select * from emp where age=45 order by deptid asc, name desc ;

Insert picture description here
If there is a difference in the order of the sorted fields, you need to reverse the order of the differences, so you still need to sort manually!

4. Index selection

① First, clear all indexes on emp and keep only the primary key index!
drop index idx_age_deptid_name on emp; ②Query
: users who are 30 years old and whose employee number is less than 101000, sorted by user name

explain SELECT SQL_NO_CACHE * FROM emp WHERE age =30 AND empno <101000 ORDER BY NAME ;

Insert picture description here
③ Full table scan is definitely not allowed, so we need to consider optimization.

Idea: First, you need to let the filter condition of where use the index; in the
query, age.empno is the filter condition of the query, and name is the sorted field, so let's create a composite index of these three fields:

create index idx_age_empno_name on emp(age,empno,name);

Insert picture description here
Query again and found that using filesort still exists.

Reason: empno is a range query, so the index is invalid, so the name field cannot be sorted by index.
Therefore, the conformity of the three fields to the index is meaningless, because the empno and name fields can only choose one!

④Solution: Fish and bear paws ca n’t have both, so either choose empno or name

drop index idx_age_empno_name on emp;
create index idx_age_name on emp(age,name);
create index idx_age_empno on emp(age,empno);

Two indexes exist at the same time, which one will mysql choose?
Insert picture description here

explain SELECT SQL_NO_CACHE * FROM emp use index(idx_age_name) WHERE age =30 AND empno <101000 ORDER BY NAME ;

Insert picture description here
Reason: All sorting is performed after conditional filtering , so if the condition filters most of the data, sorting hundreds or thousands of data
is not very costly. Even if the index is optimized for sorting, it actually improves performance. limited. Relative empno <101000 If this
index is not useful, it is necessary to scan tens of thousands of data, which is very costly. Using the range query of the empno field, the filterability is better
(empno starts from 100000)!

Conclusion: When there is a choice between the range condition and the group by or order by field, observe the number of filters in the condition field first. If there is
enough filtered data and there is not much data to be sorted, put the index on the range field first. . vice versa.

5. using filesort

5.1 mysql sorting algorithm

①Two-way sorting
Before MySQL 4.1, two-way sorting was used, which literally means scanning the disk twice, and finally got the data, reading the row pointer and orderby column, sorting them, then scanning the sorted list, and re-selecting the list according to the values ​​in the list Read the corresponding data output. Take the sort field from disk, sort in buffer, and then take other fields from disk.
In simple terms, taking a batch of data requires two scans of the disk. As we all know, I \ O is very time-consuming, so after mysql4.1, a second improved algorithm appeared, which is single-way sorting.

②Single channel sorting
Read all the columns required by the query from the disk, sort them according to the order by column in the buffer, and then scan the sorted list for output. It is faster and avoids reading the data a second time. And it turns random IO into sequential IO, but it will use more space because it keeps each row in memory.

③ The problem of single-channel sequencing
Since the one-way is the latter, overall it is better than the two-way. But there is a problem:
in sort_buffer in Method B to take up more space than a lot of method A, because B method is to remove all fields, it is possible to remove a number of
data sort_buffer total size exceeds the capacity, leading to a time It can take the data of sort_buffer capacity and sort it (create tmp file, multi-
way merge). After sorting, take the capacity of sort_buffer and re-arrange ... so as to multiple I / O.

Conclusion: I originally wanted to save an I / O operation, but it resulted in a large number of I / O operations.

5.2 How to optimize

① Increase the setting of sort_butter_size parameter
No matter which algorithm is used, increasing this parameter will increase efficiency. Of course, it should be increased according to the system's ability, because this parameter is adjusted between 1M-8M for each process.

② Increase the setting of the max_length_for_sort_data parameter.
The premise of mysql using single-way sorting is that the size of the sorted field should be less than max_length_for_sort_data.
Increasing this parameter will increase the probability of using the improved algorithm. However, if it is set too high, the probability that the total data capacity exceeds sort_buffer_size increases. The obvious symptoms are high disk I / O activity and low processor usage. (Adjust between 1024-8192).

③ Reduce the fields of the query after select.
When the sum of the field size of Query is less than max_length_for_sort_data and the sort field is not of TEXT | BLOB type, the improved algorithm-single-way sorting will be used, otherwise the old algorithm-multi-way sorting will be used.

The data of both algorithms may exceed the capacity of sort_buffer. After exceeding, a tmp file will be created for merge sorting, resulting in multiple I / Os, but the risk of using a single-way sorting algorithm will be greater, so sort_buffer_size should be increased.

6. Use Covered Index

Covered index: SQL only needs to return the data required by the query through the index, without having to find the primary key through the secondary index and then query the data. No need to return to the table

Insert picture description here

7. group by

The principle of using index by group by is almost the same as order by. The only difference is that group by can use the index directly even if there is no filter condition.
The essence of group by is to sort before grouping, and follow the best left prefix built by the index
Insert picture description here
Insert picture description here

Published 138 original articles · Like 3 · Visitor 7242

Guess you like

Origin blog.csdn.net/weixin_43719015/article/details/104967687