mysql-group by optimization

group by

The group by operation process is to scan the whole table (without index). Each group creates a row of records in the memory table. When the table is scanned, the records in the temporary table are updated. If the memory table is occupied, the temporary table is flushed to in the disk.

Test table:
CREATE TABLE tb_a(
`id` INT(11) NOT NULL AUTO_INCREMENT,

`user_id` INT(11) NOT NULL)

1. Cancel the default sorting of group by, requiring mysql to only use temporary tables to process grouping.

There is no index on user_id, the query plan is as follows:

sql:
EXPLAIN SELECT user_id,COUNT(1) FROM tb_a
GROUP BY user_id

Implementation plan:


The query is a full table scan, using file sorting and temporary tables.


By default, group by will sort the results. You can use group by null to not sort the results. The execution plan is as follows:


sql:
EXPLAIN SELECT user_id,COUNT(1) FROM tb_a
GROUP BY user_id
ORDER BY NULL
Execution plan:


Order by null is added to sql to force mysql group by to use a temporary table.

2. Use an index
to create an index on user_id:
ALTER TABLE tb_a ADD INDEX user_id_index (user_id);

sql:
EXPLAIN SELECT user_id,COUNT(1) FROM tb_a
GROUP BY user_id
#ORDER BY NULL
Execution plan:


In the execution plan, the query type is index, and temporary table and file sorting are not used, because the index is already sorted (and sorted by group), and the data queried using the covering index is in the index, there will be no Random I/O.
So this sql with or without order by null execution plan is the same.


Delete the user_id_index index for use in the following tests.
ALTER TABLE tb_a DROP INDEX user_id_index;



3. If there is no index available for the grouping or if you want to query data other than the index, a large amount of random I/O will be generated (in this case, the covering index is not used), then you can use a temporary table or file sorting according to the size of the grouped data :

When processing groups, each group first creates a row of records in the temporary table. When scanning the full table, it will be processed row by row, and the rows in the temporary table will be updated. If the amount of data is large (the size of the temporary table can be configured), it will cause disk I/O. In this case, it is more advantageous to use file sorting directly.


3.1 If the grouped data is small, use the temporary table

sql:
EXPLAIN SELECT SQL_SMALL_RESULT user_id, COUNT(1) FROM tb_a
GROUP BY user_id

If the temporary space is set to be small, the file sorting will not be used. Temporary tables and file sorting are also used in the execution plan, please give pointers.

The official website of SQL_SMALL_RESULT
explains as follows:
can be used with GROUP BY or DISTINCT to tell the
  optimizer that the result set is small. In this case, MySQL uses
  fast temporary tables to store the resulting table instead of using
  sorting. This should not normally be needed.


3.2 If the grouped data is large, use file sorting
sql:
EXPLAIN SELECT SQL_BIG_RESULT user_id,COUNT(1) FROM tb_a
GROUP BY user_id
Execution plan:


The SQL_BIG_RESULT hint is used in sql to let msyql select file sorting, so as to avoid generating data in the temporary table first, and then flushing the temporary table data to the disk.


4
The above methods of skip-scan are suitable for other aggregate functions. If it is min()/max(), skip-scan optimization should be used. This optimization must be applicable if each group has a large amount of data, otherwise mysql will not perform skip-can optimized.
EXPLAIN SELECT user_id, MAX(id) FROM tb_b

GROUP BY user_id




Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326817846&siteId=291194637