Let’s talk about how MySQL handles sorting

This article is shared from the Huawei Cloud Community " How does MySQL handle sorting⭐️How to optimize queries that need to be sorted?" ", author: Caicai's back-end private kitchen.

Preface

These two keywords are often used in MySQL queries. order by  group by 

What they have in common is that they all sort the fields. So how is the sorting in the query statement implemented?

There are two processing situations when the query statement used needs to be sorted:

  1. The current records are inherently ordered and do not need to be sorted.
  2. The current record does not maintain order and needs to be sorted

Use indexes to ensure ordering

For the first case, the ordering of the index columns in the secondary index is often used to ensure the ordering of the result set, so that no sorting is required.

For table a, create a secondary index for a2, then a2 will be ordered on the secondary index.

CREATE TABLE `a` (
   `a1` int(11) NOT NULL AUTO_INCREMENT,
   `a2` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,
   `a3` varchar(255) DEFAULT NULL,
   PRIMARY KEY (`a1`),
   KEY `idx_a2` (`a2`)
 ) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8;

select * from a order by a.a2 limit 10

When the optimizer chooses to use the a2 index, the records in the a2 column are themselves ordered, so there is no need to use other overhead for sorting.

image.png

Of course, the optimizer may not use the a2 index (when the optimizer thinks that using a2 to return the table is too expensive, it will use a full table scan)

image.png

When a2 is out of order on the index used by the optimizer, the results will be sorted by other means.

filesort

When the Extra information of the execution plan appears , the sort_buffer will be used to sort the results. Using filesort 

sort_buffer is a piece of memory used for sorting. sort_buffer may store all the fields required for the query, or it may only store the fields and primary keys that need to be sorted.

show variables like 'max_length_for_sort_data'

When the length of the fields required by the query is less than 1 , all fields required by the query will be put into sort_buffer, then the columns that need to be sorted will be sorted, and finally the results will be returned. max_length_for_sort_data 

image.png

When the length of the field required for the query is greater than the length of the field , only the fields and primary key values ​​that need to be sorted will be put into sort_buffer, and then the clustered index will be queried to obtain the columns that need to be queried after sorting (equivalent to one more return to the table) max_length_for_sort_data 

image.png

When sorting in sort_buffer, if there is enough memory, the sorting will be done in the memory. If the memory is not enough, the temporary file on the disk will be used to assist the sorting.

Turn it on to see whether temporary files are used to assist sorting. optimizer_trace 

#Enable optimizer tracking
 SET optimizer_trace='enabled=on'; 
 #sqlstatement
 select * from student order by student_name limit 10000;
 #View the information tracked by the optimizer
 SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`\G;

The algorithm used for sorting is the merging algorithm. It is first divided into multiple small files, sorted and then merged.

where number_of_tmp_files is the number of temporary files used and sort_buffer_size is the size of sort_buffer

image.png

Therefore, when using order by, group by and other keywords that need to be sorted, it is best to establish a suitable index.

If the amount of data is small, it can be sorted in the sort buffer. If the amount of data is too large, it needs to interact with the disk.

Summarize

When the query statement needs to be sorted, it will be divided into two situations: no sorting and need to sort.

When the index used is in order, there is no need to sort, and the order is ensured through the index.

When the index used is out of order, sort_buffer will be used for sorting. When the length of the query field does not exceed the limit, each record in sort_buffer will store the column that needs to be queried.

If the limit is exceeded, sort_buffer will only store the columns and primary key values ​​that need to be sorted. After sorting, the primary key values ​​are used to return the table to obtain the columns that need to be queried.

When the amount of data is too large to be sorted in memory, disk pages will be used to assist sorting, and a merge algorithm will be used to disperse the sorted data into multiple pages and then merge them.

You can analyze the content through the trace optimizer optimizer_trace to view the number of auxiliary pages and other information.

Create appropriate indexes for columns that need to be sorted to avoid using disk page-assisted sorting

Sort buffer or max_length_for_sort_data can be adjusted when indexing cannot be used (with caution)

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~

Microsoft's China AI team collectively packed up and went to the United States, involving hundreds of people. How much revenue can an unknown open source project bring? Huawei officially announced that Yu Chengdong's position was adjusted. Huazhong University of Science and Technology's open source mirror station officially opened external network access. Fraudsters used TeamViewer to transfer 3.98 million! What should remote desktop vendors do? The first front-end visualization library and founder of Baidu's well-known open source project ECharts - a former employee of a well-known open source company that "went to the sea" broke the news: After being challenged by his subordinates, the technical leader became furious and rude, and fired the pregnant female employee. OpenAI considered allowing AI to generate pornographic content. Microsoft reported to The Rust Foundation donated 1 million US dollars. Please tell me, what is the role of time.sleep(6) here?
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/11138574