Stepping on the "pit" of MySQL in subquery
PlatformDev 360 Cloud Computing
Heroine declaration
what's the situation? A simple SQL subquery, querying tens of thousands of records, took 33 seconds! Come and see, how to skip this pit.
PS: Rich first-line technology and diversified forms of expression are all in the "HULK first-line technology talk", please pay attention!
Preface
MySQL is a commonly used database in projects, and the in query is also very commonly used. During the recent project debugging process, I encountered an unexpected select query, which took 33 seconds!
1. Table structure
1. userinfo 表
2. article table
Second, the problem SQL instance
select * from userinfo where id in (select author_id from artilce where type = 1);
When you first see the SQL above, you may think that this is a very simple subquery. First find out the author_id, and then query it with in.
If there are related indexes, it will be very fast. The disassembly is as follows:
- select author_id from artilce where type = 1;
- select * from userinfo where id in (1,2,3);
But the fact is this:
mysql> select count(*) from userinfo;
mysql> select count(*) from article;
mysql> select id,username from userinfo where id in (select author_id from article where type = 1);
33 seconds! Why is it so slow?
Three, the cause of the problem
The official document explains: the in clause is sometimes converted to exists in the query, and becomes a record-by-record traversal (existing in version 5.5, optimized in 5.6).
Reference:
https://dev.mysql.com/doc/refman/5.5/en/subquery-optimization.html
Fourth, the solution (version 5.5)
1. Use temporary tables
select id,username from userinfo
where id in (select author_id from
(select author_id from article where type = 1) as tb
);
2. Use join
select a.id,a.username from userinfo a, article b
where a.id = b.author_id and b.type = 1;
Five, supplement
Version 5.6 has been optimized for sub-queries, the method is the same as the temporary table method in [4], refer to the official document:
If materialization is not used, the optimizer sometimes rewrites a noncorrelated subquery as a correlated subquery.
For example, the following IN subquery is noncorrelated ( where_condition involves only columns from t2 and not t1 ):
select * from t1
where t1.a in (select t2.b from t2 where where_condition);
The optimizer might rewrite this as an EXISTS correlated subquery:
select * from t1
where exists (select t2.b from t2 where where_condition and t1.a=t2.b);
Subquery materialization using a temporary table avoids such rewrites and makes it possible to execute the subquery only once rather than once per row of the outer query.
https://dev.mysql.com/doc/refman/5.6/en/subquery-materialization.html