How does order by work?
CREATE TABLE `person` (
`id` int(11) NOT NULL,
`city` varchar(16) NOT NULL,
`name` varchar(16) NOT NULL,
`age` int(11) NOT NULL,
`addr` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `city` (`city`)
) ENGINE=InnoDB;
If there is a table like the above, execute the following sql
explain select city, name, age from person where city = '杭州' order by name limit 1000;
The result of explain is as follows
There is a Using filesort in the Extra column indicating that the sorting is performed. The sorting may be done in memory or in disk.
So how to sort records according to the name field?
There are two ways of sorting, full field sorting and rowid sorting
Sort by all fields
The process of sorting all fields is as follows
- Initialize the sort buffer and find the primary key id that satisfies the city=Hangzhou condition from the city index
- According to the primary key id, go back to the table to find the corresponding record, take out the values of the three fields of name city age, and store them in the sort buffer
- Find the primary key of the next record from the city index
- Repeat steps 2 and 3 to find all records that meet the conditions
- Sort the data in the sort buffer according to the field name, and return the first 1000 rows of the sorting result to the client
We can check the size of the sort buffer by executing the following statement
show variables like '%sort_buffer%'
We call this sorting process a full field sorting
The action of sorting by name may be done in memory, or it may need to use an external sort. This depends on the memory size required for sorting and sort_buffer_size (the size of memory mysql opens up for sorting, that is, sort buffer)
If the amount of data is too large, you need to take advantage of disk file sorting
Sort by rowid
If there are many fields to be returned by the query, then the number of fields that need to be put in the sort buffer is also large. At this time, it will be divided into many temporary files, and the sorting performance will be very poor.
If the single line is large, this method is not efficient enough. So can't we solve this problem by reducing the size of the sort buffer?
When the size of a single row exceeds the fixed value, we only put the necessary fields (primary key id and sorting field) into the sort buffer, and after sorting by name, return the query data to the table according to the primary key id, and then
we can sort this The process is called rowid sorting
We can set a fixed value by executing the following statement. When the size of a single row exceeds this fixed value, let mysql change the algorithm
SET max_length_for_sort_data = 16;
Sorting by all fields, how to choose rowid sorting?
When memory is sufficient, full field sorting will be used to reduce disk access. When memory is not enough, rowid sorting is used
Of course, not all order by statements require sorting operations. The reason why MySQL needs to generate a temporary table and perform sorting operations on the temporary table is that the original data is out of order
Is it possible to get the data, the name is already in order?
It is not enough for us to build a joint index of city and name
alter table person add index city_user(city, name);
It can be seen that the Extra column of the execution plan has no Using filesort, indicating that no sorting is required, because the name is already in order when reading data from the index
Assuming that the person table has a joint index on city and name, does the following statement need to be sorted?
explain select * from person where city in ('杭州') order by name limit 100
The answer is no, the name of a city is ordered, no need to sort
What if the following statement?
explain select * from person where city in ('杭州', '苏州') order by name limit 100
The answer is yes, the names of multiple cities are not ordered and need to be sorted
When the execution of the order by statement is relatively slow, we can optimize it by the following methods
- The sorted field increases the index
- Increase the size of the sort buffer
- Don't use * as a query list, just return the required columns
Reference blog
[1]https://mp.weixin.qq.com/s/yUrq3UfCKP91jRp9VEFT6w
[2]https://zhuanlan.zhihu.com/p/380671457
[3]https://time.geekbang.org/column/article/73479