Reprinted from http://sunxiaqw.blog.163.com/blog/static/990654382013430105130443/
exists uses loop to query the outer table one by one. Each query will check the conditional statement of exists. When the conditional statement in exists can return the record row (no matter how many record rows are, as long as it can be returned), the condition is true and the current loop is returned. On the other hand, if the conditional statement in exists cannot return the record row, the record to which the current loop arrives is discarded. The condition of exists is like a bool condition. When the result set can be returned, it is true, but the result cannot be returned. set is false
as follows:
select * from user where exists (select 1);
The records of the user table are taken out one by one. Since select 1 in the sub-condition can always return record rows, all the records of the user table will be added to the result set, so it is the same as select * from user;
and as follows
select * from user where exists (select * from user where userId = 0);
You can know that when looping on the user table, check the conditional statement (select * from user where userId = 0). Since userId is never 0, the conditional statement will always return an empty set, and the condition will always be false, then all records in the user table will be will be discarded
not exists is the opposite of exists, that is, when the exists condition returns a result set, the record to which the loop is returned will be discarded, otherwise the record to which the loop is added will be added to the result set
In general, if there are n records in table A, the exists query is to take out these n records one by one, and then judge the exists condition n times.
The in query is equivalent to the superposition of multiple or conditions, which is easy to understand, such as the following query
select * from user where userId in (1, 2, 3);
equal to
select * from user where userId = 1 or userId = 2 or userId = 3;
not in is the opposite of in, as follows
select * from user where userId not in (1, 2, 3);
equal to
select * from user where userId != 1 and userId != 2 and userId != 3;
In general, the in query is to first find out all the records of the sub-query condition, assuming that the result set is B, there are m records in total, and then decompose the result set of the sub-query condition into m, and then perform m queries
It is worth mentioning that the return result of the sub-condition of the in query must have only one field, for example
select * from user where userId in (select id from B);
rather than
select * from user where userId in (select id, age from B);
And exists does not have this limitation
Let's consider the performance of exists and in
Consider the following SQL statement
1: select * from A where exists (select * from B where B.id = A.id);
2: select * from A where A.id in (select id from B);
Query 1. The following pseudo code can be transformed for easy understanding
for ($i = 0; $i < count(A); $i++) {
$a = get_record(A, $i); #Get records one by one from table A
if (B.id = $a[id]) #If the sub-condition holds
$result[] = $a;
}
return $result;
This is probably the meaning. In fact, it can be seen that query 1 mainly uses the index of table B, and how table A has little effect on the efficiency of the query.
Assuming all ids of table B are 1, 2, 3, query 2 can be converted to
select * from A where A.id = 1 or A.id = 2 or A.id = 3;
这个好理解了,这里主要是用到了A的索引,B表如何对查询影响不大
下面再看not exists 和 not in
1. select * from A where not exists (select * from B where B.id = A.id);
2. select * from A where A.id not in (select id from B);
看查询1,还是和上面一样,用了B的索引
而对于查询2,可以转化成如下语句
select * from A where A.id != 1 and A.id != 2 and A.id != 3;
可以知道not in是个范围查询,这种!=的范围查询无法使用任何索引,等于说A表的每条记录,都要在B表里遍历一次,查看B表里是否存在这条记录
故not exists比not in效率高
mysql中的in语句是把外表和内表作hash 连接,而exists语句是对外表作loop循环,每次loop循环再对内表进行查询。一直大家都认为exists比in语句的效率要高,这种说法其实是不准确的。这个是要区分环境的。