Write order by every day, do you know how to execute Mysql bottom layer?

  • In actual development, you will definitely encounter the need to sort based on a field and then display the results, but do you really understand how order by is executed at the bottom of Mysql?
  • Suppose you want to query the names of all the people whose city is Suzhou, and sort them by name and return the names and ages of the first 1000 people. How should you write this SQL statement?
  • First create a user table, the sql statement is as follows:
CREATE TABLE user (
id int(11) NOT NULL,
city varchar(16) NOT NULL,
name varchar(16) NOT NULL,
age int(11) NOT NULL,
PRIMARY KEY (id),
KEY city (city)
) ENGINE=InnoDB;
  • The SQL query statement for the above requirements is as follows:
select city,name,age from user where city='苏州' order by name limit 1000;
  • I believe everyone can write this sql query statement, but do you understand its execution process at the bottom of Mysql? Today, Mr. Chen came to talk about how this sql statement is executed and what parameters will affect the execution process.
  • This article is divided into the following parts to elaborate: full field sorting rowid sorting full field sorting VS rowid sorting how to avoid sorting

Sort all fields

  • Earlier I talked about the index can avoid the full table scan, so we added an index to the city field, of course, the city field is very small, do not need to consider the index of the string.
  • At this time, Explain is used to analyze the execution of this query statement, and the result is as follows:
Write order by every day, do you know how to execute Mysql bottom layer?

 

 

  • Using filesort in the Extra field indicates that sorting is required. MySQL will allocate a piece of memory to each thread for sorting, called sort_buffer.
  • Since the index is used for querying, let's simply draw the structure of the index tree of city, as shown below:
Write order by every day, do you know how to execute Mysql bottom layer?

 

 

  • As can be seen from the above figure, satisfying city = 'Suzhou' is the records from ID3 to IDX.
  • Normally, the execution flow of this sql statement is as follows:
  1. Initialize sort_buffer, make sure to put the three fields name, city, and age.
  2. From the index city, find the first primary key id that satisfies the condition city = 'Suzhou', which is ID3 in the figure.
  3. Go to the primary key id index to retrieve the entire row, take the values ​​of the three fields name, city, and age, and store them in sort_buffer.
  4. Take the primary key id of the next record from the index city.
  5. Repeat steps 3 and 4 until the value of city does not meet the query conditions, the corresponding primary key id is IDX in the figure.
  6. Quickly sort the data in sort_buffer according to the field name.
  7. Take the first 1000 rows and return to the client according to the sorting result.
  • We call this sorting process full-field sorting, and the flow chart for execution is as follows:
Write order by every day, do you know how to execute Mysql bottom layer?

 

 

  • The action of sorting by name in the figure may be completed in memory, or external sorting may be required, depending on the memory and parameter sort_buffer_size required for sorting.
  • sort_buffer_size: It is the size of the memory (sort_buffer) opened by MySQL for sorting. If the amount of data to be sorted is less than sort_buffer_size, sorting is done in memory. However, if the amount of sorting data is too large to be stored, you have to use disk temporary files to assist in sorting.

rowid sort

  • In the above algorithm process, only the original table data was read once, and the remaining operations were performed in sort_buffer and temporary files. However, there is a problem with this algorithm. If there are many fields to be returned by the query, there are too many fields to be placed in the sort_buffer, so that the number of rows that can be placed simultaneously in the memory is very small. It will be bad.
  • So if the single line is large, this method is not efficient enough.
  • We can modify a max_length_for_sort_data parameter to use another algorithm. max_length_for_sort_data is a parameter in MySQL that specifically controls the length of the row data used for sorting. It means that if the length of a single row exceeds this value, MySQL thinks that the single row is too large and needs to change an algorithm.
  • The total length of the three fields of city, name, and age is 36. I set max_length_for_sort_data to 16, let's take a look at the calculation process. The set SQL statement is as follows:
SET max_length_for_sort_data = 16;
  • The new algorithm puts in the fields of sort_buffer, only the column to be sorted (name field) and the primary key id.
  • But at this time, the result of the sorting cannot be returned directly because of the lack of the values ​​of the city and age fields, and the entire execution process becomes as follows:
  1. Initialize sort_buffer, make sure to put two fields, namely name and id.
  2. From the index city, find the first primary key id that satisfies the condition city = 'Suzhou', which is ID3 in the figure.
  3. Take the entire row to the primary key id index, take the two fields name and id, and store it in sort_buffer. Take the primary key id of the next record from the index city.
  4. Repeat steps 3 and 4 until the value of city does not meet the query conditions, the corresponding primary key id is IDX in the figure.
  5. Quickly sort the data in sort_buffer according to the field name.
  6. Iterate through the sorting results, take the first 1000 rows, and return to the original table according to the value of id to take out the three fields of city, name and age and return to the client.
  7. The schematic diagram of this execution process is as follows, I call it rowid sorting.
Write order by every day, do you know how to execute Mysql bottom layer?

 

  • Compared with the full field sorting, the rowid sorting has one more back-to-table query, which is the primary key index tree of the query in step 7.

Full field sorting vs rowid sorting

  • If MySQL is really worried that the sorting memory is too small, which will affect the sorting efficiency, the rowid sorting algorithm will be used, so that more rows can be sorted at one time during the sorting process, but you need to return to the original table to fetch the data.
  • If MySQL thinks that the memory is large enough, it will first select the full field sorting, and put the required fields in sort_buffer, so that after sorting, the query results will be returned directly from the memory, and there is no need to return to the original table to fetch data.
  • This also reflects a design idea of ​​MySQL: if there is enough memory, it is necessary to use more memory to minimize disk access.
  • For InnoDB tables, rowid sorting will require more back-to-tables and cause disk reads, so it will not be preferred.

How to avoid sorting

  • In fact, not all order by statements require sorting operations. From the execution process analyzed above, we can see that the reason why MySQL needs to generate temporary tables and do sort operations on the temporary tables is because the original data is unordered.
  • If you can guarantee that the rows taken from the index of the city are naturally sorted by increasing name, can you not sort them again?
  • So think of the joint index, create (city, name) joint index, the sql statement is as follows:
alter table user add index city_user(city, name);
  • The index tree at this time is as follows:
Write order by every day, do you know how to execute Mysql bottom layer?

 

 

  • In this index, we can still use the tree search method to locate the first record that satisfies city = 'Suzhou', and additionally ensure that, in the traversal process of taking the "next record" in sequence, as long as the city's If the value is Hangzhou, the value of name must be ordered.
  • According to the above figure, the entire query process is as follows: From the index (city, name), find the first primary key id that satisfies the condition city = 'Suzhou'. Take the entire row to the primary key id index, take the values ​​of the three fields name, city, and age, and return it directly as part of the result set. Remove the primary key id of a record from the index (city, name). Repeat steps 2 and 3 until the 1000th record is found, or the loop ends when the city = 'Suzhou' condition is not met.
  • The corresponding flowchart is as follows:
Write order by every day, do you know how to execute Mysql bottom layer?

 

 

  • As you can see, this query process does not require a temporary table or sorting. Next, we use the results of explain to confirm.
    image
  • As you can see from the figure, there is no Using filesort in the Extra field, that is, no sorting is required. And because the (city, name) joint index itself is ordered, this query does not have to read all 4000 rows, as long as the first 1000 records that meet the conditions are found, you can exit. In other words, in our example, we only need to scan 1000 times.
  • Is this enough to satisfy? Can this query be optimized?
Write order by every day, do you know how to execute Mysql bottom layer?

 

 

  • Do friends remember to cover the index? The advantage of covering indexes is to avoid querying the table again.
  • We create a (city, name, age) joint index, so that when the above query statement is executed, the covered index can be used, and the query back to the table is avoided. The SQL statement is as follows:
讲真,这两款idea插件,能治愈你英语不好的病alter table user add index city_user_age(city, name, age);
  • The execution flow chart at this time is as follows:
Write order by every day, do you know how to execute Mysql bottom layer?

 

 

  • Of course, covering indexes can improve efficiency, but maintaining indexes also comes at a price, so there is a trade-off.

Guess you like

Origin www.cnblogs.com/CQqfjy/p/12717509.html