Analysis of the use of exists and in in MySQL

Reprinted from http://sunxiaqw.blog.163.com/blog/static/990654382013430105130443/

 

exists uses loop to query the outer table one by one. Each query will check the conditional statement of exists. When the conditional statement in exists can return the record row (no matter how many record rows are, as long as it can be returned), the condition is true and the current loop is returned. On the other hand, if the conditional statement in exists cannot return the record row, the record to which the current loop arrives is discarded. The condition of exists is like a bool condition. When the result set can be returned, it is true, but the result cannot be returned. set is false

as follows:

select * from user where exists (select 1);

The records of the user table are taken out one by one. Since select 1 in the sub-condition can always return record rows, all the records of the user table will be added to the result set, so it is the same as select * from user;

and as follows

select * from user where exists (select * from user where userId = 0);

You can know that when looping on the user table, check the conditional statement (select * from user where userId = 0). Since userId is never 0, the conditional statement will always return an empty set, and the condition will always be false, then all records in the user table will be will be discarded

not exists is the opposite of exists, that is, when the exists condition returns a result set, the record to which the loop is returned will be discarded, otherwise the record to which the loop is added will be added to the result set

In general, if there are n records in table A, the exists query is to take out these n records one by one, and then judge the exists condition n times. 

 

 

The in query is equivalent to the superposition of multiple or conditions, which is easy to understand, such as the following query

select * from user where userId in (1, 2, 3);

equal to

select * from user where userId = 1 or userId = 2 or userId = 3;

not in is the opposite of in, as follows

select * from user where userId not in (1, 2, 3);

equal to

select * from user where userId != 1 and userId != 2 and userId != 3;

In general, the in query is to first find out all the records of the sub-query condition, assuming that the result set is B, there are m records in total, and then decompose the result set of the sub-query condition into m, and then perform m queries

 

It is worth mentioning that the return result of the sub-condition of the in query must have only one field, for example

select * from user where userId in (select id from B);

rather than

select * from user where userId in (select id, age from B);

And exists does not have this limitation

 

Let's consider the performance of exists and in

Consider the following SQL statement

1: select * from A where exists (select * from B where B.id = A.id);

2: select * from A where A.id in (select id from B);

 

Query 1. The following pseudo code can be transformed for easy understanding

for ($i = 0; $i < count(A); $i++) {

  $a = get_record(A, $i); #Get records one by one from table A

  if (B.id = $a[id]) #If the sub-condition holds

    $result[] = $a;

}

return $result;

This is probably the meaning. In fact, it can be seen that query 1 mainly uses the index of table B, and how table A has little effect on the efficiency of the query.

 

Assuming all ids of table B are 1, 2, 3, query 2 can be converted to

select * from A where A.id = 1 or A.id = 2 or A.id = 3;

这个好理解了,这里主要是用到了A的索引,B表如何对查询影响不大

 

下面再看not exists 和 not in

1. select * from A where not exists (select * from B where B.id = A.id);

2. select * from A where A.id not in (select id from B);

看查询1,还是和上面一样,用了B的索引

而对于查询2,可以转化成如下语句

select * from A where A.id != 1 and A.id != 2 and A.id != 3;

可以知道not in是个范围查询,这种!=的范围查询无法使用任何索引,等于说A表的每条记录,都要在B表里遍历一次,查看B表里是否存在这条记录

故not exists比not in效率高

 

mysql中的in语句是把外表和内表作hash 连接,而exists语句是对外表作loop循环,每次loop循环再对内表进行查询。一直大家都认为exists比in语句的效率要高,这种说法其实是不准确的。这个是要区分环境的。
 

如果查询的两个表大小相当,那么用in和exists差别不大。 
如果两个表中一个较小,一个是大表,则子查询表大的用exists,子查询表小的用in: 
例如:表A(小表),表B(大表)
 
1:
select * from A where cc in (select cc from B)  效率低,用到了A表上cc列的索引;
 
select * from A where exists(select cc from B where cc=A.cc)  效率高,用到了B表上cc列的索引。 
相反的
 
2:
select * from B where cc in (select cc from A)  效率高,用到了B表上cc列的索引;
 
select * from B where exists(select cc from A where cc=B.cc)  效率低,用到了A表上cc列的索引。
 
 
not in 和not exists如果查询语句使用了not in 那么内外表都进行全表扫描,没有用到索引;而not extsts 的子查询依然能用到表上的索引。 所以无论那个表大,用not exists都比not in要快。 
in 与 =的区别 
select name from student where name in ('zhang','wang','li','zhao'); 
与 
select name from student where name='zhang' or name='li' or name='wang' or name='zhao' 
的结果是相同的。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325526501&siteId=291194637
Recommended