Mixed use of order by and limit in MySQL, the paging result does not match the expectation

  In MySQL, we often use order by for sorting, and limit for paging. When we need to sort first and then paging, we often use similar writing "select * from table name order by sort field limit M,N". But this way of writing hides a deeper use trap. In the case of data duplication in the sorting field, it is easy to cause the sorting result to be inconsistent with expectations.

1. Abnormal phenomena

  For example, in the MySQL database of version 5.6.17, there is a tbl_mgm_tour table, the table structure is as follows:

mysql> show full columns from tbl_mgm_tour;
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+--------------+
| Field   | Type         | Collation       | Null | Key | Default | Extra | Privileges                      | Comment      |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+--------------+
| tour_id | char(15)     | utf8_general_ci | NO   | PRI |         |       | select,insert,update,references | 景区编号     |
| name    | varchar(100) | utf8_general_ci | NO   |     |         |       | select,insert,update,references | 景区名称     |
| grade   | varchar(10)  | utf8_general_ci | NO   |     |         |       | select,insert,update,references | 景区等级     |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+--------------+
3 rows in set (0.03 sec)

  The table data is as follows:

mysql> select * from tbl_mgm_tour;
+---------+----------------------------------------------+-------+
| tour_id | name                                         | grade |
+---------+----------------------------------------------+-------+
| 001     | 东方明珠广播电视塔                           | 5A    |
| 002     | 上海野生动物园                               | 5A    |
| 003     | 上海科技馆                                   | 5A    |
| 005     | 上海博物馆                                   | 4A    |
| 006     | 上海佘山国家森林公园·东佘山园                | 4A    |
| 007     | 上海佘山国家森林公园·西佘山园                | 4A    |
| 008     | 上海豫园                                     | 4A    |
| 009     | 金茂大厦88层观光厅                           | 4A    |
| 056     | 上海南汇桃花村                               | 3A    |
| 057     | 大宁郁金香公园                               | 3A    |
| 058     | 东方假日田园                                 | 3A    |
| 059     | 廊下生态园                                   | 3A    |
| 060     | 中国农民画村                                 | 3A    |
+---------+----------------------------------------------+-------+
13 rows in set (0.00 sec)

  Now I want to query the tbl_mgm_tour table in descending order according to the scenic spot level, and query by page, with 5 entries per page. It is easy to write the sql statement as:

SELECT * FROM tbl_mgm_tour ORDER BY grade DESC LIMIT 0, 5;

  During the execution of the query, you will find that when querying the first page of data, the result is:

mysql> SELECT * FROM tbl_mgm_tour ORDER BY grade DESC LIMIT 0, 5;
+---------+----------------------------------------------+-------+
| tour_id | name                                         | grade |
+---------+----------------------------------------------+-------+
| 001     | 东方明珠广播电视塔                           | 5A    |
| 002     | 上海野生动物园                               | 5A    |
| 003     | 上海科技馆                                   | 5A    |
| 006     | 上海佘山国家森林公园·东佘山园                | 4A    |
| 007     | 上海佘山国家森林公园·西佘山园                | 4A    |
+---------+----------------------------------------------+-------+
5 rows in set (0.00 sec)

  When querying the data on the second page, the result is:

mysql> SELECT * FROM tbl_mgm_tour ORDER BY grade DESC LIMIT 5, 5;
+---------+----------------------------------------------+-------+
| tour_id | name                                         | grade |
+---------+----------------------------------------------+-------+
| 007     | 上海佘山国家森林公园·西佘山园                | 4A    |
| 006     | 上海佘山国家森林公园·东佘山园                | 4A    |
| 005     | 上海博物馆                                   | 4A    |
| 060     | 中国农民画村                                 | 3A    |
| 057     | 大宁郁金香公园                               | 3A    |
+---------+----------------------------------------------+-------+
5 rows in set (0.00 sec)

   The tbl_mgm_tour table has 13 pieces of data and 3 pages of data, but in the actual query process, the same data appears on the first page and the second page.

2. Anomaly analysis

   what's going on? Isn't the above paging SQL sorting the table data first, and then fetching the data corresponding to the paging?

   The actual implementation results above have proved that there is often a gap between reality and imagination. The actual SQL execution is not executed in the above manner. In fact, MySQL will optimize Limit. Please refer to the official document for the specific optimization method: https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html (this is the description of version 5.7), extract Several directly related points are explained below.

  • If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.

    One manifestation of this behavior is that an ORDER BY query with and without LIMIT may return rows in different order, as described later in this section.

  It is mentioned in the official document above that if you mix Limit rowcount with order by, MySQL will find the sorted rowcount and return it immediately, instead of sorting the entire query result and returning it. If it is sorted by index, it will be very fast; if it is file sorting, all rows matching the query (without Limit) will be selected, and most or all of the selected rows will be sorted until the rowcount required by limit is found. If the rowcount row required by limit is found, MySQL will not sort the remaining rows in the result set.

  Here we look at the execution plan of the corresponding SQL:

mysql> EXPLAIN SELECT * FROM tbl_mgm_tour ORDER BY grade DESC LIMIT 0, 5;
+----+-------------+--------------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+--------------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | tbl_mgm_tour | ALL  | NULL          | NULL | NULL    | NULL |   13 | Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

  It can be confirmed that the file sorting is used, and the table does not add an additional index. So we can be sure that when this SQL is executed, it will find the row required by limit and immediately return the query result.

  But even if it returns immediately, why is the pagination inaccurate? The following instructions are made in the official document:

If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.

  If the order by field has multiple rows with the same value, MySQL will return the query results in a random order, depending on the corresponding execution plan. That is to say, if the sorted column is unordered, then the order of the sorted result row is also uncertain.

  Based on this, we basically know why the paging is inaccurate, because the field we sort is grade, and there are just a few rows of data with the same grade value. In the actual execution, the order of the rows corresponding to the returned results is uncertain. Corresponding to the above situation, the names returned on the first page are "Shanghai Sheshan National Forest Park·Dongsheshan Park" and "Shanghai Sheshan National Forest Park·Xishanshan Park" data may just be ranked first, while the second page is in the query , The above two data rows are just behind, so the second page appears again.

Three, the solution

  How should this situation be resolved? The official solution is given:

If it is important to ensure the same row order with and without LIMIT, include additional columns in the ORDER BY clause to make the order deterministic. For example, if id values are unique, you can make rows for a given category value appear in id order by sorting like this:

  If you want to guarantee the same sorting result in the presence or absence of Limit, you can add an additional sorting condition. For example, the id field is unique, you can consider adding an additional id sort to the sort field to ensure the order is stable.

  So in the above case, you can add another sorting field in SQL, such as the primary key tour_id field of the tbl_mgm_tour table, so that the paging problem is solved. The modified SQL is as follows:

mysql> SELECT * FROM tbl_mgm_tour ORDER BY grade DESC, tour_id LIMIT 0, 5;

  Test again and solve the problem!

Four, supplementary explanation

  For the same data in different database versions, the sorting results may be normal or abnormal. The database version tested above is 5.6.17. When tested in the 5.7.29 version of the database, the sorting results are normal.

Reference article:

Guess you like

Origin blog.csdn.net/piaoranyuji/article/details/113883210