Interviewer: Which one is more efficient between distinct and group by in MySQL?

 
  
 
  
 
  
您好,我是路人,更多优质文章见个人博客:http://itsoku.com

Let me talk about the general conclusion first (the complete conclusion is at the end of the article):

  • In the case of the same semantics and an index: Both group bythe index and distinct can use the index, and the efficiency is the same.

  • In the case of the same semantics and no index: distinct is more efficient group by. The reason is that both distinct and group bywill perform grouping operations, but group bythey may be sorted, triggering filesort, resulting in low SQL execution efficiency.

Based on this conclusion, you might ask:

  • Why is it the same efficiency as distinct when the semantics are group bythe same and there are indexes?

  • Under what circumstances group bywill the sort operation be performed?

Find answers to these two questions. Next, let's take a look at group bythe basic use of distinct and sum.

The use of distinct

distinct Usage
SELECT DISTINCT columns FROM table_name WHERE where_conditions;

For example:

mysql> select distinct age from student;
+------+
| age  |
+------+
|   10 |
|   12 |
|   11 |
| NULL |
+------+
4 rows in set (0.01 sec)

DISTINCTKeywords are used to return uniquely distinct values. It is used before the first field in the query statement and acts on all columns of the main clause.

If a column has NULL values, and DISTINCTa clause is used on that column, MySQL will keep one NULL value and drop the other NULL values, because DISTINCTthe clause treats all NULL values ​​as the same value.

distinct multi-column deduplication

The deduplication of distinct multiple columns is performed according to the specified deduplication column information, that is, only if all the specified column information is the same, it will be considered as duplicate information.

SELECT DISTINCT column1,column2 FROM table_name WHERE where_conditions;
mysql> select distinct sex,age from student;
+--------+------+
| sex    | age  |
+--------+------+
| male   |   10 |
| female |   12 |
| male   |   11 |
| male   | NULL |
| female |   11 |
+--------+------+
5 rows in set (0.02 sec)

The use of group by

For basic deduplication, group bythe use of is similar to distinct:

Single Column Deduplication

grammar:

SELECT columns FROM table_name WHERE where_conditions GROUP BY columns;

implement:

mysql> select age from student group by age;
+------+
| age  |
+------+
|   10 |
|   12 |
|   11 |
| NULL |
+------+
4 rows in set (0.02 sec)
Multi-column deduplication

grammar:

SELECT columns FROM table_name WHERE where_conditions GROUP BY columns;

implement:

mysql> select sex,age from student group by sex,age;
+--------+------+
| sex    | age  |
+--------+------+
| male   |   10 |
| female |   12 |
| male   |   11 |
| male   | NULL |
| female |   11 |
+--------+------+
5 rows in set (0.03 sec)
example of difference

The grammatical difference between the two is that group bysingle-column deduplication can be performed. group byThe principle is to group and sort the results first, and then return the first piece of data in each group. And it is group bydeduplicated according to the following fields.

For example:

mysql> select sex,age from student group by sex;
+--------+-----+
| sex    | age |
+--------+-----+
| male   |  10 |
| female |  12 |
+--------+-----+
2 rows in set (0.03 sec)

The principle of distinct and group by

In most cases, DISTINCTit can be regarded as special GROUP BY. Their implementation is based on grouping operations, and they can be scanned through loose index scans and compact index scans (the content of index scans will be described in detail in other articles, so I won’t do it here. Introduced in detail) to achieve.

DISTINCTand GROUP BYboth can be scanned and searched using the index. For example, the following two SQLs (just look at the contents of the extra at the end of the table), we analyze these two SQLs, we can see that in the extras, these two SQLs both use compact index scanning Using index for group-by.

Therefore, in general, we can use the same index optimization method to optimize DISTINCTand statements with the same semantics.GROUP BY

mysql> explain select int1_index from test_distinct_groupby group by int1_index;
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table                 | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test_distinct_groupby | NULL       | range | index_1       | index_1 | 5       | NULL |  955 |   100.00 | Using index for group-by |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set (0.05 sec)

mysql> explain select distinct int1_index from test_distinct_groupby;
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table                 | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test_distinct_groupby | NULL       | range | index_1       | index_1 | 5       | NULL |  955 |   100.00 | Using index for group-by |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set (0.05 sec)

But for GROUP BYMYSQL8.0, before MYSQL8.0, GROUP Ythe default will be implicitly sorted by field.

As you can see, the following sql statement also performs filesort while using a temporary table.

mysql> explain select int6_bigger_random from test_distinct_groupby GROUP BY int6_bigger_random;
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+-------+----------+---------------------------------+
| id | select_type | table                 | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                           |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+-------+----------+---------------------------------+
|  1 | SIMPLE      | test_distinct_groupby | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 97402 |   100.00 | Using temporary; Using filesort |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+-------+----------+---------------------------------+
1 row in set (0.04 sec)
implicit ordering

For implicit sorting, we can refer to the official Mysql explanation:

https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

GROUP BY implicitly sorts by default (that is, in the absence of ASC or DESC designators for GROUP BY columns). However, relying on implicit GROUP BY sorting (that is, sorting in the absence of ASC or DESC designators) or explicit sorting for GROUP BY (that is, by using explicit ASC or DESC designators for GROUP BY columns) is deprecated. To produce a given sort order, provide an ORDER BY clause.

To roughly explain:

GROUP BY defaults to implicit sorting (meaning that sorting will be performed even if the GROUP BY column does not have an ASC or DESC indicator). However, GROUP BY for explicit or implicit ordering is deprecated. To generate a given sort order, provide an ORDER BY clause.

Therefore, before Mysql8.0, the results Group bywill be sorted by default according to the role field ( Group bythe subsequent field). In the case where the index can be used, Group byno additional sorting operation is required; but when the index sorting cannot be used, the Mysql optimizer has to choose to use a temporary table and then sort it GROUP BY.

And when the size of the result set exceeds the size of the temporary table set by the system, Mysql will copy the temporary table data to the disk for operation, and the execution efficiency of the statement will become extremely low. This is why Mysql chose to deprecate this operation (implicit sort).

Based on the above reasons, Mysql has been optimized and updated :

https://dev.mysql.com/doc/refman/8.0/en/order-by-optimization.html

Previously (MySQL 5.7 and lower), GROUP BY sorted implicitly under certain conditions. In MySQL 8.0, that no longer occurs, so specifying ORDER BY NULL at the end to suppress implicit sorting (as was done previously) is no longer necessary. However, query results may differ from previous MySQL versions. To produce a given sort order, provide an ORDER BY clause.

To roughly explain:

In the past (before Mysql5.7 version), Group by will be implicitly sorted according to certain conditions. In mysql 8.0, this feature has been removed, so it is no longer necessary to order by nulldisable implicit sorting by adding , however, query results may differ from previous MySQL versions. To generate results in a given order, specify the fields that need to be sorted by ORDER BY.

Therefore, our conclusion also came out:

  • In the case of the same semantics, with indexes:

group byBoth and distinct can use indexes with the same efficiency. Because group byit is almost equivalent to distinct, distinct can be regarded as special group by.

  • With the same semantics, without indexes:

distinct is more efficient than group by. The reason is that both distinct and group bywill perform grouping operations, but group bybefore Mysql8.0, implicit sorting will be performed, causing filesort to be triggered and sql execution to be inefficient.

However, since Mysql8.0, Mysql has deleted the implicit sorting. Therefore, at this time, the execution efficiency of distinct is almost equivalent to that of distinct when the semantics are the same and there is no index group by.

Reasons for recommending group by

  1. group bySemantics are clearer

  2. group byMore complex processing of data is possible

Compared with distinct, group bythe semantics are clear. group byAnd because the distinct keyword will take effect on all fields, it is more flexible to use when performing composite business processing, and group bycan perform more complex processing on data according to the grouping situation, such as filtering data by having, or by aggregation Functions operate on data.

1ddaf5aa47ef74a2eec3933f37ef08a4.png

↓  Click to read the original text and go directly to my personal blog

ee81fcf4c644a2cfc95fc73233e6a6ed.jpeg Are you looking

Guess you like

Origin blog.csdn.net/likun557/article/details/131467238