The difference between distinct and group by deduplication efficiency in MySQL

1. Distinct

The role of distinct

In mysql, the main function of the distinct keyword is to filter the repeated data in one or more fields in the database table, and only return one of the data to the user. distinct can only be used in select

The principle of distinct

The main principle of distinct deduplication is to group the data to be deduplicated first, and then return one piece of each grouped data to the client. In this grouping process, two different situations may occur :

The fields that distinct depends on all contain indexes:

In this case, mysql directly groups the data that meets the conditions by operating the index, and then removes one piece of data from each group of data after grouping.

The fields that distinct depends on do not all contain indexes:

In this case, because the index cannot satisfy the entire deduplication grouping process, a temporary table is needed. Mysql first needs to put the data that meets the conditions into the temporary table, and then group the part of the data in the temporary table, and then from the temporary table One piece of data is removed from each grouped data in the temporary table, and the data will not be sorted during the grouping process in the temporary table.

The syntax of distinct:

select distinct expression[,expression…] from tables [where conditions];

In the process of using distinct, pay attention to the following points:

When deduplicating fields, ensure that distinct is at the top of all fields.
If there are multiple fields behind the distinct keyword, multiple fields will be deduplicated. Only the combined values ​​of multiple fields are equal. will be deduplicated

Two, group by

groupby will perform implicit sorting before Mysql8.0, which will trigger filesort, and the execution efficiency of SQL will be low. Since Mysql8.0, Mysql will delete the implicit sorting

implicit ordering

For implicit sorting, we can refer to the official Mysql explanation:

MySQL :: MySQL 5.7 Reference Manual :: 8.2.1.14 ORDER BY Optimization

GROUP BY implicitly sorts by default (that is, in the absence of ASC
or DESC designators for GROUP BY columns). However, relying on
implicit GROUP BY sorting (that is, sorting in the absence of ASC or
DESC designators) or explicit sorting for GROUP BY (that is, by using
explicit ASC or DESC designators for GROUP BY columns) is deprecated.
To produce a given sort order, provide an ORDER BY clause.

To roughly explain:

GROUP BY defaults to implicit sorting (meaning that sorting will be performed even if the GROUP BY column does not have an ASC or DESC indicator). However, GROUP BY for explicit or implicit ordering is deprecated. To generate a given sort order, provide an ORDER BY clause.

Therefore, before Mysql8.0, Group by will sort the results according to the role field (the field after Groupby) by default. When the index can be used, Group by does not need additional sorting operations; but when the index sorting cannot be used, the Mysql optimizer has to choose to implement GROUP BY by using a temporary table and then sorting. And when the size of the result set exceeds the size of the temporary table set by the system, Mysql will copy the temporary table data to the disk for operation, and the execution efficiency of the statement will become extremely low. This is why Mysql chose to deprecate this operation (implicit sort).

Based on the above reasons, Mysql has been optimized and updated in 8.0:

MySQL :: MySQL 8.0 Reference Manual :: 8.2.1.16 ORDER BY Optimization

Previously (MySQL 5.7 and lower), GROUP BY sorted implicitly under
certain conditions. In MySQL 8.0, that no longer occurs, so specifying
ORDER BY NULL at the end to suppress implicit sorting (as was done
previously) is no longer necessary. However, query results may differ
from previous MySQL versions. To produce a given sort order, provide
an ORDER BY claus

To roughly explain:

In the past (before Mysql5.7 version), Group by will be implicitly sorted according to certain conditions. In mysql
8.0, this feature has been removed, so it is no longer necessary to disable implicit sorting by adding order by null, however, query results may differ from previous MySQL versions. To generate results in a given order, specify the fields that need to be sorted by ORDER BY.

3. Comparison between distinct and group by

In the case of the same semantics, with indexes:

Both group by and distinct can use indexes with the same efficiency. Because groupby and distinct are almost equivalent, distinct can be regarded as a special group by.

With the same semantics, without indexes:

distinct is more efficient than group by. The reason is that both distinct and group by perform grouping operations, but group
by performs implicit sorting before Mysql8.0, which causes filesort to be triggered and SQL execution efficiency is low.
However, since Mysql8.0, Mysql has deleted the implicit sorting. Therefore, in the case of the same semantics and no index, the execution efficiency of groupby and distinct is almost equivalent.

Reasons for recommending group by:

The semantics of group by are clearer. Group by can perform more complex processing on data. Compared with distinct, group by has clear semantics. And because the distinct keyword will take effect on all fields, the use of group by is more flexible when performing composite business processing. group by can perform more complex processing on data according to the grouping situation, such as filtering data by having, Or operate on data through aggregate functions.

distinct mainly compares the data pair by pair, and needs to traverse the entire table

group by is to group the data according to the grouping field and then query when querying. When the amount of data is large, the speed of group by is better than that of distinct

Guess you like

Origin blog.csdn.net/lijie0213/article/details/128789271