Condition field function operation, implicit type conversion, implicit character encoding conversion in select query

Condition field function operation

Come up with examples:

mysql> CREATE TABLE `tradelog` (
  `id` int(11) NOT NULL,
  `tradeid` varchar(32) DEFAULT NULL,
  `operator` int(11) DEFAULT NULL,
  `t_modified` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `tradeid` (`tradeid`),
  KEY `t_modified` (`t_modified`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Now that all the data from the beginning of 2016 to the end of 2018 have been recorded, the operation department has a requirement to count the total number of transaction records that occurred in July in all years.

mysql> select count(*) from tradelog where month(t_modified)=7;

Because there is an index on the t_modified field, you can safely execute this statement in the production library, but you find that it took a long time to execute before returning the result.

The reason is that the function calculation is performed on the field, and the index is not used. Then why the index can be used when the condition is where t_modified='2018-7-1', but not when it is changed to where month(t_modified)=7?

In fact, if you calculate the month() function, you will see that when you pass in 7, you don't know what to do at the first level of the tree. Therefore, performing functional operations on index fields may destroy the order of index values, so the optimizer decides to abandon the tree search function. It should be noted that the optimizer does not want to give up using this index.

In this example, the tree search function is abandoned. The optimizer can choose to traverse the primary key index or the index t_modified. After comparing the index size, the optimizer finds that the index t_modified is smaller, and traversing this index is faster than traversing the primary key index. Therefore, the index t_modified will eventually be selected.

key="t_modified" means that the t_modified index is used; I inserted 100,000 rows of data in the test table data, rows=100335, indicating that this statement scanned all the values ​​of the entire index; the Using index of the Extra field, Indicates that the covering index is used.

Finally, due to the addition of the month() function operation, MySQL can no longer use the index fast positioning function, but can only use the full index scan . Even for functions that do not change the orderliness, indexes are not considered . For example, for the SQL statement select * from tradelog where id + 1 = 10000, this addition operation does not change the orderliness, but the MySQL optimizer still cannot quickly locate the row 9999 using the id index.

Implicit type conversion

mysql> select * from tradelog where tradeid=110717;

The field type of tradeid is varchar(32), but the input parameter is an integer, so type conversion is required.

Conversion rules in MySQL : In MySQL, when a string is compared with a number, the string is converted to a number .

So the above query is equivalent to:

mysql> select * from tradelog where  CAST(tradid AS signed int) = 110717;

It becomes the conditional field function operation mentioned above.

Implicit character encoding conversion

Suppose there is another table trade_detail in the system, which is used to record the operation details of the transaction. In order to facilitate quantitative analysis and recurrence, I inserted some data into the two tables, tradelog and trade_detail.

mysql> CREATE TABLE `trade_detail` (
  `id` int(11) NOT NULL,
  `tradeid` varchar(32) DEFAULT NULL,
  `trade_step` int(11) DEFAULT NULL, /*操作步骤*/
  `step_info` varchar(32) DEFAULT NULL, /*步骤信息*/
  PRIMARY KEY (`id`),
  KEY `tradeid` (`tradeid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into tradelog values(1, 'aaaaaaaa', 1000, now());
insert into tradelog values(2, 'aaaaaaab', 1000, now());
insert into tradelog values(3, 'aaaaaaac', 1000, now());
insert into trade_detail values(1, 'aaaaaaaa', 1, 'add');
insert into trade_detail values(2, 'aaaaaaaa', 2, 'update');
insert into trade_detail values(3, 'aaaaaaaa', 3, 'commit');
insert into trade_detail values(4, 'aaaaaaab', 1, 'add');
insert into trade_detail values(5, 'aaaaaaab', 2, 'update');
insert into trade_detail values(6, 'aaaaaaab', 3, 'update again');
insert into trade_detail values(7, 'aaaaaaab', 4, 'commit');
insert into trade_detail values(8, 'aaaaaaac', 1, 'add');
insert into trade_detail values(9, 'aaaaaaac', 2, 'update');
insert into trade_detail values(10, 'aaaaaaac', 3, 'update again');
insert into trade_detail values(11, 'aaaaaaac', 4, 'commit');

Do query: mysql> select d.* from tradelog l, trade_detail d where d.tradeid=l.tradeid and l.id=2; /*Statement Q1*/

  • The first line shows that the optimizer will first find the row with id=2 in the transaction record table tradelog. This step uses the primary key index, rows=1 means only scan one row;
  • The second line, key=NULL, means that the tradeid index on the trade detail table trade_detail is not used, and a full table scan is performed.

We call tradelog the driven table, trade_detail the driven table, and tradeid the associated field . There is an index on the tradeid field in the table trade_detail. We originally hoped to quickly locate the equivalent row by using the tradeid index. But, it's not here.

The problem here is that the character sets are inconsistent. One of the two tables is utf8mb4 and the other is utf8. The character set utf8mb4 is a superset of utf8. Therefore, when these two types of strings are compared, the internal operation of MySQL is: Convert the utf8 string to utf8mb4 character set, and then compare.

In the programming language, when doing automatic type conversion, in order to avoid data errors caused by truncation during the conversion process, the conversion is performed "in the direction of increasing data length".

The above select query becomes: select * from trade_detail where CONVERT (traideid USING utf8mb4)=$L2.tradeid.value; 

This once again triggers the principle we mentioned above: to perform function operations on the index field, the optimizer will give up the tree search function. The difference in character set is only one of the conditions. The requirement to add function operations on the index field of the driven table during the connection process directly leads to the full table scan of the driven table.

As a comparison verification, change the requirement to "Find the operation with id=4 in the trade_detail table, who is the corresponding operator", and then look at this statement and its execution plan.

mysql>select l.operator from tradelog l , trade_detail d where  d.tradeid=l.tradeid  and  d.id=4;

In this statement, the trade_detail table becomes the driving table, but the second line of the explain result shows that this query operation uses the index (tradeid) in the driven table tradelog, and the number of scanned rows is 1.

This statement is equivalent to: select operator from tradelog where traideid =$R4.tradeid.value;

The character set of $R4.tradeid.value is utf8. According to the character set conversion rules, it needs to be converted to utf8mb4, so this process is rewritten as:

select operator from tradelog  where traideid =CONVERT($R4.tradeid.value USING utf8mb4); 

So to optimize statement Q1, there are two approaches:

  • A more common optimization method is to change the character set of the tradeid field on the trade_detail table to utf8mb4, so that there is no problem of character set conversion.
alter table trade_detail modify tradeid varchar(32) CHARACTER SET utf8mb4 default null;
  • It is best if you can modify the character set of the field. But if the amount of data is relatively large, or the DDL cannot be done in the business for the time being, then the only way to modify the SQL statement is to use.
mysql> select d.* from tradelog l , trade_detail d where d.tradeid=CONVERT(l.tradeid USING utf8) and l.id=2; 

in conclusion

Three examples are mentioned , which are actually talking about the same thing, that is: performing functional operations on index fields may destroy the order of index values, so the optimizer decided to abandon the tree search function.

 

Content source: Lin Xiaobin "45 Lectures on MySQL Actual Combat"

Guess you like

Origin blog.csdn.net/qq_24436765/article/details/112658031