exists uses loop to query the outer table one by one. Each query will check the conditional statement of exists. When the conditional statement in exists can return the record row (no matter how many record rows are, as long as it can be returned), the condition is true and the current loop is returned. On the other hand, if the conditional statement in exists cannot return the record row, the record to which the current loop arrives is discarded. The condition of exists is like a bool condition. When the result set can be returned, it is true, and the result cannot be returned. set is false
as follows:
select * from user where exists (select 1);
The records of the user table are taken out one by one. Since select 1 in the sub-condition can always return record rows, all the records of the user table will be added to the result set, so it is the same as select * from user;
and as follows
select * from user where exists (select * from user where userId = 0);
You can know that when looping on the user table, check the conditional statement (select * from user where userId = 0). Since userId is never 0, the conditional statement will always return an empty set, and the condition will always be false, then all records in the user table will be will be discarded
not exists is the opposite of exists, that is, when the exists condition has a result set returned, the record to which the loop is returned will be discarded, otherwise the record to which the loop is added will be added to the result set
In general, if there are n records in table A, the exists query is to take out these n records one by one, and then judge the exists condition n times.
The in query is equivalent to the superposition of multiple or conditions, which is easy to understand, such as the following query
select * from user where userId in (1, 2, 3);
equal to
select * from user where userId = 1 or userId = 2 or userId = 3;
not in is the opposite of in, as follows
select * from user where userId not in (1, 2, 3);
equal to
select * from user where userId != 1 and userId != 2 and userId != 3;
In general, the in query is to first find out all the records of the sub-query condition, assuming that the result set is B, there are m records in total, and then decompose the result set of the sub-query condition into m, and then perform m queries
It is worth mentioning that the return result of the sub-condition of the in query must have only one field, for example
select * from user where userId in (select id from B);
rather than
select * from user where userId in (select id, age from B);
And exists does not have this limitation
Let's consider the performance of exists and in
Consider the following SQL statement
1: select * from A where exists (select * from B where B.id = A.id);
2: select * from A where A.id in (select id from B);
Query 1. The following pseudo code can be transformed for easy understanding
for ($i = 0; $i < count(A); $i++) {
$a = get_record(A, $i); #Get records one by one from table A
if (B.id = $a[id]) #If the sub-condition holds
$result[] = $a;
}
return $result;
This is probably the meaning. In fact, it can be seen that query 1 mainly uses the index of table B, and how table A has little effect on the efficiency of the query.
Assuming all ids of table B are 1, 2, 3, query 2 can be converted to
select * from A where A.id = 1 or A.id = 2 or A.id = 3;
This is easy to understand. The index of A is mainly used here. How does the B table have little effect on the query
Let's look at not exists and not in
1. select * from A where not exists (select * from B where B.id = A.id);
2. select * from A where A.id not in (select id from B);
Looking at query 1, it is still the same as above, using the index of B
For query 2, it can be transformed into the following statement
select * from A where A.id != 1 and A.id != 2 and A.id != 3;
It can be known that not in is a range query. This kind of != range query cannot use any index, which means that each record in table A must be traversed in table B once to check whether this record exists in table B
Therefore, not exists is more efficient than not in
The in statement in mysql is a hash connection between the outer table and the inner table, and the exists statement is a loop loop for the outer table, and the inner table is queried each time the loop loops. Everyone has always believed that exists is more efficient than the in statement. This statement is actually inaccurate. This is to distinguish the environment.
exists uses loop to query the outer table one by one. Each query will check the conditional statement of exists. When the conditional statement in exists can return the record row (no matter how many record rows are, as long as it can be returned), the condition is true and the current loop is returned. On the other hand, if the conditional statement in exists cannot return the record row, the record to which the current loop arrives is discarded. The condition of exists is like a bool condition. When the result set can be returned, it is true, and the result cannot be returned. set is false
as follows:
select * from user where exists (select 1);
The records of the user table are taken out one by one. Since select 1 in the sub-condition can always return record rows, all the records of the user table will be added to the result set, so it is the same as select * from user;
and as follows
select * from user where exists (select * from user where userId = 0);
You can know that when looping on the user table, check the conditional statement (select * from user where userId = 0). Since userId is never 0, the conditional statement will always return an empty set, and the condition will always be false, then all records in the user table will be will be discarded
not exists is the opposite of exists, that is, when the exists condition has a result set returned, the record to which the loop is returned will be discarded, otherwise the record to which the loop is added will be added to the result set
In general, if there are n records in table A, the exists query is to take out these n records one by one, and then judge the exists condition n times.
The in query is equivalent to the superposition of multiple or conditions, which is easy to understand, such as the following query
select * from user where userId in (1, 2, 3);
equal to
select * from user where userId = 1 or userId = 2 or userId = 3;
not in is the opposite of in, as follows
select * from user where userId not in (1, 2, 3);
equal to
select * from user where userId != 1 and userId != 2 and userId != 3;
In general, the in query is to first find out all the records of the sub-query condition, assuming that the result set is B, there are m records in total, and then decompose the result set of the sub-query condition into m, and then perform m queries
It is worth mentioning that the return result of the sub-condition of the in query must have only one field, for example
select * from user where userId in (select id from B);
rather than
select * from user where userId in (select id, age from B);
And exists does not have this limitation
Let's consider the performance of exists and in
Consider the following SQL statement
1: select * from A where exists (select * from B where B.id = A.id);
2: select * from A where A.id in (select id from B);
Query 1. The following pseudo code can be transformed for easy understanding
for ($i = 0; $i < count(A); $i++) {
$a = get_record(A, $i); #Get records one by one from table A
if (B.id = $a[id]) #If the sub-condition holds
$result[] = $a;
}
return $result;
This is probably the meaning. In fact, it can be seen that query 1 mainly uses the index of table B, and how table A has little effect on the efficiency of the query.
Assuming all ids of table B are 1, 2, 3, query 2 can be converted to
select * from A where A.id = 1 or A.id = 2 or A.id = 3;
This is easy to understand. The index of A is mainly used here. How does the B table have little effect on the query
Let's look at not exists and not in
1. select * from A where not exists (select * from B where B.id = A.id);
2. select * from A where A.id not in (select id from B);
Looking at query 1, it is still the same as above, using the index of B
For query 2, it can be transformed into the following statement
select * from A where A.id != 1 and A.id != 2 and A.id != 3;
It can be known that not in is a range query. This kind of != range query cannot use any index, which means that each record in table A must be traversed in table B once to check whether this record exists in table B
Therefore, not exists is more efficient than not in
The in statement in mysql is a hash connection between the outer table and the inner table, and the exists statement is a loop loop for the outer table, and the inner table is queried each time the loop loops. Everyone has always believed that exists is more efficient than the in statement. This statement is actually inaccurate. This is to distinguish the environment.