(Transfer) Internal Temporary Table for MySQL Feature Analysis

Two types of temporary tables in MySQL
External temporary tables Temporary tables
created by CREATE TEMPORARY TABLE are called external temporary tables. This kind of temporary table is only visible to the current user. When the current session ends, the temporary table will be automatically closed. Such a temporary table can be named with the same name as a non-temporary table (after the same name, the non-temporary table will not be visible to the current session until the temporary table is dropped).

Internal temporary table
Internal temporary table is a special lightweight temporary table, used for performance optimization. Such temporary tables are automatically created by MySQL and used to store intermediate results of certain operations. These operations may be included in the optimization phase or the execution phase. This internal table is invisible to the user, but through EXPLAIN or SHOW STATUS you can check whether MYSQL uses an internal temporary table to help complete an operation. Internal temporary tables play a very important role in the optimization process of SQL statements. Many operations in MySQL rely on internal temporary tables for optimization. However, the use of internal temporary tables requires the creation of tables and the access cost of intermediate data, so users should try to avoid using temporary tables when writing SQL statements.

There are two types of internal temporary tables: one is the HEAP temporary table, all data of this temporary table will be stored in memory, and IO operations are not required for the operation of this table. The other is the OnDisk temporary table, which, as the name suggests, stores data on disk. The OnDisk temporary table is used to handle operations with large intermediate results. If the data stored in the HEAP temporary table is larger than MAX_HEAP_TABLE_SIZE (for details, please refer to the System Variables section in the MySQL manual), the HEAP temporary table will be automatically converted to an OnDisk temporary table. OnDisk temporary tables in 5.7 can choose to use the MyISAM engine or the InnoDB engine through the INTERNAL_TMP_DISK_STORAGE_ENGINE system variable.

This article focuses on what operations may take advantage of internal temporary tables. If the user can use the internal temporary table as little as possible for query optimization when writing SQL statements, the efficiency of query execution will be effectively improved.

First we define a table t1,
CREATE TABLE t1( a int, b int); INSERT INTO t1 VALUES(1,2),(3,4);
All the following operations are based on table t1 for example.

Use the SQL_BUFFER_RESULT hint in the SQL statement
SQL_BUFFER_RESULT is mainly used to let MySQL release the lock on the table as soon as possible. Because if the amount of data is large, it takes a long time to send the data to the client. By buffering the data into a temporary table, the time occupied by the read lock on the table can be effectively reduced.

For example:
mysql> explain format=json select SQL_BUFFER_RESULT * from t1;
EXPLAIN
{
  "query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "2.00"
},
"buffer_result": {
  "using_temporary_table": true,
  "table": {
"table_name": "t1",
"access_type": "ALL",




In 5.7, due to the new optimization method, we need to use set optimizer_switch='derived_merge=off' to prohibit the derived table from being merged into the outer Query.

For example:
mysql> explain format=json select * from (select * from t1) as tt;
EXPLAIN
{
  "query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "2.40"
},
"table" : {
  "table_name": "tt",
  "access_type": "ALL",
  ...
  "materialized_from_subquery": {
"using_temporary_table": true,
...
...


If we query the system table, the data of the system table will be stored in the internal temporary table.
We cannot currently use EXPLAIN to see if reading system table data needs to use the internal temporary table, but we can use SHOW STATUS to see whether the internal temporary table is used.

E.g:
mysql> select * from information_schema.character_sets;
mysql> show status like 'CREATE%';


If the DISTINCT statement is not optimized, that is, the DISTINCT statement is optimized and converted to GROUP BY operation or use UNIQUE INDEX to eliminate DISTINCT, the internal temporary table will be use.
mysql> explain format=json select distinct a from t1;
EXPLAIN
{
{
  "query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "1.60"
},
"duplicates_removal": {
  "using_temporary_table": true ,
...


if the query has an ORDER BY statement and cannot be optimized away. The following situations will use the internal temporary table to cache the intermediate data, and then sort the intermediate data.
1) If the connection table uses BNL (Batched Nestloop)/BKA (Batched Key Access)
for example:
1)) BNL is turned on by default
mysql> explain format=json select * from t1, t1 as t2 order by t1.a;
EXPLAIN
{
  "query_block": {
  "select_id": 1,
  "cost_info": {
"query_cost": "22.00"
  },
  "ordering_operation": {
"using_temporary_table": true,
  ...


2))关掉BNL后,ORDER BY将直接使用filesort。
mysql> set optimizer_switch='block_nested_loop=off';
Query OK, 0 rows affected (0.00 sec)
mysql> explain format=json select * from t1, t1 as t2 order by t1.a;
EXPLAIN
{
   "query_block": {
"select_id": 1,
"cost_info":{
  "query_cost": "25.00"
},
"ordering_operation": {
  "using_filesort": true,
...


2) The column of ORDER BY does not belong to the column of the first join table in the execution plan.
For example:
mysql> explain format=json select * from t as t1, t as t2 order by t2.a;
EXPLAIN
{
   "query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "25.00"
} ,
"ordering_operation": {
  "using_temporary_table": true,
...


3) If the expression of ORDER BY is a complex expression.

So what kind of ORDER BY expression does MySQL consider to be a complex expression?

1)) If the sort expression is SP or UDF.

For example:
drop function if exists func1;





set z1 = x;
set z2 = z1+2;
return z2;
end|
delimiter ;
explain format=json select * from t1 order by func1(a);
{
"query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "2.20"
},
"ordering_operation": {
  "using_temporary_table": true,
...


2)) ORDER BY columns contain aggregate functions
To simplify the execution plan, we use INDEX to optimize the GROUP BY statement.

For example:
create index idx1 on t1(a);
  explain format=json SELECt a FROM t1 group by a order by sum(a);
  | {
   "query_block": {
"select_id": 1,
"


"ordering_operation": {
  "using_temporary_table": true,
  "using_filesort": true,
  "grouping_operation": {
"using_filesort": false,
...
  drop index idx1 on t1;


3))ORDER BY的列中包含有SCALAR SUBQUERY,当然该SCALAR SUBQUERY没有被优化掉。

例如:
explain format=json select (select rand() from t1 limit 1) as a from t1 order by a;
| {
  "query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "1.20"
},
"ordering_operation": {
  "using_temporary_table": true,
  "using_filesort": true, 4) If the query has both ORDER BY and GROUP BY statements, but the columns used by the two statements are not the same.
...



Note: If it is 5.7, we need to set sql_mode to non-only_full_group_by mode, otherwise an error will be reported.

Also in order to simplify the execution plan, we use INDEX to optimize the GROUP BY statement.

For example:
set sql_mode='';
create index idx1 on t1(b);
explain format=json select t1.a from t1 group by t1.b order by 1;
| {
"query_block": {
"select_id": 1,
" cost_info": {
  "query_cost": "1.40"
},
"ordering_operation": {
  "using_temporary_table": true,
  "using_filesort": true,
  "grouping_operation": {
"using_filesort": false,
...
drop index idx1 on t1;

If the query has a GROUP BY statement and cannot be optimized away. The following situations will use the internal temporary table to cache intermediate data, and then perform GROUP BY on the intermediate data.
1) If the join table uses BNL (Batched Nestloop)/BKA (Batched Key Access).

For example:
explain format=json select t2.a from t1, t1 as t2 group by t1.a;
| {
"query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "8.20"
},
" grouping_operation": {
  "using_temporary_table": true,
  "using_filesort": true,
  "cost_info": {
"sort_cost": "4.00"
...

2) If the GROUP BY column does not belong to the first join table in the execution plan.

For example:
explain format=json select t2.a from t1, t1 as t2 group by t2.a;
| {
"query_block": {
"


},
"grouping_operation": {
  "using_temporary_table": true,
  "using_filesort": true,
  "nested_loop": [
...

3) 如果GROUP BY语句使用的列与ORDER BY语句使用的列不同。

例如:
set sql_mode='';
explain format=json select t1.a from t1 group by t1.b order by t1.a;
| {
   "query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "1.40"
},
"ordering_operation": {
  "using_filesort": true,
  "grouping_operation": {
"using_temporary_table": true,
"using_filesort": false, For example: 4) If GROUP BY with ROLLUP and is based on multi-table outer join.
...




explain format=json select sum(t1.a) from t1 left join t1 as t2 on true group by t1.a with rollup;
| {
"query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "7.20"
},
"grouping_operation": {
  "using_temporary_table": true,
  "using_filesort": true,
  "cost_info": {
"sort_cost": "4.00"
  },
...


5) 如果GROUP BY语句使用的列来自于SCALAR SUBQUERY,并且没有被优化掉。

例如:
explain format=json select (select avg(a) from t1) as a from t1 group by a;
| {
"query_block": {
"select_id":1,
"cost_info": {
  "query_cost": "3.40"
},
"grouping_operation": {
  "using_temporary_table": true,
  "using_filesort": true,
  "cost_info": {
"sort_cost": "2.00"
  },
...

IN表达式转换为semi-join进行优化
1) 如果semi-join执行方式为Materialization
例如:
set optimizer_switch='firstmatch=off,duplicateweedout=off';
explain format=json select * from t1 where a in (select b from t1);
| {
"query_block": {
"select_id": 1,
"cost_info": {
  "query_cost": "5.60"
},
"nested_loop": [
  {
"rows_examined_per_scan": 1,
  "materialized_from_subquery": {
"using_temporary_table": true,



























"query_block": {
"union_result": {
  "using_temporary_table": true,
  "table_name": "<union1,2>",
...

if the query statement uses multi-table updates.
Here Explain cannot see that the internal temporary table is being used, so you need to check the status.
For example:
update t1, t1 as t2 set t1.a=3;
show status like 'CREATE%';

If the aggregate function contains the following functions, the internal temporary table will also be used.
1) count(distinct *)
example:
explain format=json select count(distinct a) from t1;
2) group_concat
example:
explain format=json select group_concat(b) from t1;


In short, there are 10 cases listed above, MySQL The internal temporary table will be used for intermediate result caching. If the amount of data is relatively large, the internal temporary table will store the data on the disk, which will obviously affect the performance. In order to reduce the performance loss as much as possible, we need to avoid the above situation as much as possible.

Source: Database Kernel Month
Original : http://mysql.taobao.
If there is any infringement or inappropriateness, please contact Ruofei (WeChat: 1321113940) to delete it immediately, thank you!


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326483867&siteId=291194637