MySQL table field character set caused by index invalidation problem

Transfer from: MySQL table field character set caused by the index invalidation problem

1 Overview

Yesterday, I found such a problem on a classmate's MySQL machine. When MySQL two tables were left joined, the execution plan showed that there was a table using a full table scan, scanning nearly 1 million rows of records in the full table The SQL came over and the database became almost unusable. The MySQL version is the official 5.7.12.

2. Reproduce the problem

First, the table structure and table records are as follows:

mysql> show create table t1\G
*************** 1. row ***************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`code` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_code` (`code`),
KEY `idx_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> show create table t2\G
*********** 1. row *******************
Table: t2
Create Table: CREATE TABLE `t2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`code` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_code` (`code`),
KEY `idx_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> select * from t1;
+—-+——+———————————-+
| id | name | code |
+—-+——+———————————-+
| 1 | aaaa | ...... |
| 2 | bbbb | ...... |
| 3 | cccc | ...... |
| 4 | dddd | ...... |
| 5 | eeee | ...... |
+—-+——+———————————-+
5 rows in set (0.00 sec)

mysql> select * from t2;
+—-+——+———————————-+
| id | name | code |
+—-+——+———————————-+
| 1 | aaaa | ...... |
| 2 | bbbb | ...... |
| 3 | cccc | ...... |
| 4 | dddd | ...... |
| 5 | eeee | ...... |
+—-+——+———————————-+
5 rows in set (0.00 sec)

The execution plan of the two tables left join is as follows:

mysql> desc select * from t2 left join t1 on t1.code = t2.code where t2.name = 'dddd'\G
******************* 1. row ****************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ref
possible_keys: idx_name
key: idx_name
key_len: 83
ref: const
rows: 1
filtered: 100.00
Extra: NULL
****************** 2. row **************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 100.00
Extra: Using where; Using join buffer (Block Nested Loop)
2 rows in set, 1 warning (0.01 sec)

It can be clearly seen that t2.name = 'dddd' uses the index, and the associated condition of t1.code = t2.code does not use the index on t1.code. Scott was also puzzled at first, but the machine Won't lie. Scott used show warnings to view the rewritten execution plan as follows:

mysql> show warnings;

| Level | Code | Message |
| Note | 1003 | /* select#1 */ select `testdb`.`t2`.`id` AS `id`,`testdb`.`t2`.`name` AS `name`,`testdb`.`t2`.`code` AS `code`,`testdb`.`t1`.`id` AS `id`,`testdb`.`t1`.`name` AS `name`,`testdb`.`t1`.`code` AS `code` from `testdb`.`t2` left join `testdb`.`t1` on((convert(`testdb`.`t1`.`code` using utf8mb4) = `testdb`.`t2`.`code`)) where (`testdb`.`t2`.`name` = 'dddd') |

1 row in set (0.00 sec)

After discovering convert (testdb.t1.code using utf8mb4), Scott found that the character sets of the two tables were different. t1 is utf8 and t2 is utf8mb4. But why the table character set is different (actually the field character set is different) will cause t1 full table scan? Let's do an analysis.

First, t2 left join t1 determines that t2 is the driving table. This step is equivalent to executing select * from t2 where t2.name = 'dddd', and taking out the value of the code field, here is '8a77a32a7e0825f7c8634226105c42e5';
Then take the value of the code found in t2 to search in t1 according to the join condition. This step is equivalent to executing select * from t1 where t1.code = '8a77a32a7e0825f7c8634226105c42e5';
However, because the code field retrieved from the t2 table in step (1) is the utf8mb4 character set, and the code in the t1 table is the utf8 character set, character set conversion is required here, and character set conversion follows the principle of small to large, because utf8mb4 It is a superset of utf8, so convert utf8 to utf8mb4, that is, convert t1.code to utf8mb4 character set. After conversion, the index on t1.code is still utf8 character set, so this index is ignored by the execution plan Then, t1 table can only select full table scan. To make matters worse, if t2 filters out more than one record, then t1 will be scanned multiple times by the full table, and the performance difference can be imagined.

3. Problem solving

Now that the cause is clear, how to solve it? Of course, the character set has been changed. Change t1 to be the same as t2 or t2 to t1. Here we choose to convert t1 to utf8mb4. How do you change the character set?

Some students will use alter table t1 charset utf8mb4; but this is wrong. This is only to change the default character set of the table, that is, the new field will use utf8mb4, and the existing field is still utf8.

mysql> alter table t1 charset utf8mb4;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> show create table t1\G
************** 1. row ***************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) CHARACTER SET utf8 DEFAULT NULL,
`code` varchar(50) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_code` (`code`),
KEY `idx_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

Only use alter table t1 convert to charset utf8mb4; is correct.

However, it should also be noted that the operation of changing the character set of alter table is blocked and written (using lock = node will report an error), so please do not operate at the peak of the business. Even during the period of low business, the operation of large tables is still recommended to use pt-online -schema-change modify the character set online.

mysql> alter table t1 convert to charset utf8mb4, lock=none;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: Cannot change column type INPLACE. Try LOCK=SHARED.
mysql> alter table t1 convert to charset utf8mb4, lock=shared;
Query OK, 5 rows affected (0.04 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> show create table t1\G
******************** 1. row **************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`code` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_code` (`code`),
KEY `idx_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

Now look at the execution plan again, you can see that there is no problem.

mysql> desc select * from t2 join t1 on t1.code = t2.code where t2.name = 'dddd'\G
******** 1. row ******************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ref
possible_keys: idx_code,idx_name
key: idx_name
key_len: 83
ref: const
rows: 1
filtered: 100.00
Extra: Using where
********* 2. row *************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ref
possible_keys: idx_code
key: idx_code
key_len: 203
ref: testdb.t2.code
rows: 1
filtered: 100.00
Extra: NULL
2 rows in set, 1 warning (0.00 sec)

4. Points to note

When the table character set is different, it may cause that the join SQL cannot use the index, causing serious performance problems;
Before SQL goes online, do a good job in SQL Review, and try to review in the same environment as the production environment;
Alter table operations that change the character set will block writing. Try to operate at low peaks in the business. It is recommended to use pt-online-schema-change;
The character set of the table structure should be consistent, and the review work should be done when publishing;
If you want to modify the character set of the table in large quantities, do the same for SQL Review, and modify the character set of the associated table together.

5. Discussion

Finally, ask a question, assuming that the character sets of the t1 and t2 tables have not been modified, if the SQL above is replaced as follows (that is, t2 left join t1 is replaced by t1 left join t2), will there be an index failure problem? why?

select * from t1 join t2 on t1.code = t2.code where t1.name = 'dddd'

aladdin_sun blog expert

Published 136 original articles · Like 58 · Visits 360,000+

Private letter concerns