Use exist instead of in in Mysql

exists uses loop to query the outer table one by one. Each query will check the conditional statement of exists. When the conditional statement in exists can return the record row (no matter how many record rows are, as long as it can be returned), the condition is true and the current loop is returned. On the other hand, if the conditional statement in exists cannot return the record row, the record to which the current loop arrives is discarded. The condition of exists is like a bool condition. When the result set can be returned, it is true, and the result cannot be returned. set is false

as follows:

select * from user where exists (select 1);

The records of the user table are taken out one by one. Since select 1 in the sub-condition can always return record rows, all the records of the user table will be added to the result set, so it is the same as select * from user;

and as follows

select * from user where exists (select * from user where userId = 0);

You can know that when looping on the user table, check the conditional statement (select * from user where userId = 0). Since userId is never 0, the conditional statement will always return an empty set, and the condition will always be false, then all records in the user table will be will be discarded

not exists is the opposite of exists, that is, when the exists condition has a result set returned, the record to which the loop is returned will be discarded, otherwise the record to which the loop is added will be added to the result set

In general, if there are n records in table A, the exists query is to take out these n records one by one, and then judge the exists condition n times. 

 

 

The in query is equivalent to the superposition of multiple or conditions, which is easy to understand, such as the following query

select * from user where userId in (1, 2, 3);

equal to

select * from user where userId = 1 or userId = 2 or userId = 3;

not in is the opposite of in, as follows

select * from user where userId not in (1, 2, 3);

equal to

select * from user where userId != 1 and userId != 2 and userId != 3;

In general, the in query is to first find out all the records of the sub-query condition, assuming that the result set is B, there are m records in total, and then decompose the result set of the sub-query condition into m, and then perform m queries

 

It is worth mentioning that the return result of the sub-condition of the in query must have only one field, for example

select * from user where userId in (select id from B);

rather than

select * from user where userId in (select id, age from B);

And exists does not have this limitation

 

Let's consider the performance of exists and in

Consider the following SQL statement

1: select * from A where exists (select * from B where B.id = A.id);

2: select * from A where A.id in (select id from B);

 

Query 1. The following pseudo code can be transformed for easy understanding

for ($i = 0; $i < count(A); $i++) {

  $a = get_record(A, $i); #Get records one by one from table A

  if (B.id = $a[id]) #If the sub-condition holds

    $result[] = $a;

}

return $result;

This is probably the meaning. In fact, it can be seen that query 1 mainly uses the index of table B, and how table A has little effect on the efficiency of the query.

 

Assuming all ids of table B are 1, 2, 3, query 2 can be converted to

select * from A where A.id = 1 or A.id = 2 or A.id = 3;

This is easy to understand. The index of A is mainly used here. How does the B table have little effect on the query

 

Let's look at not exists and not in

1. select * from A where not exists (select * from B where B.id = A.id);

2. select * from A where A.id not in (select id from B);

Looking at query 1, it is still the same as above, using the index of B

For query 2, it can be transformed into the following statement

select * from A where A.id != 1 and A.id != 2 and A.id != 3;

It can be known that not in is a range query. This kind of != range query cannot use any index, which means that each record in table A must be traversed in table B once to check whether this record exists in table B

Therefore, not exists is more efficient than not in

 

The in statement in mysql is a hash connection between the outer table and the inner table, and the exists statement is a loop loop for the outer table, and the inner table is queried each time the loop loops. Everyone has always believed that exists is more efficient than the in statement. This statement is actually inaccurate. This is to distinguish the environment.
 

If the two tables queried are of the same size, there is little difference between using in and exists
If one of the two tables is smaller and the other is a large table, use exists for the larger subquery table, and in for the smaller subquery table: 
For example: table A (small table), table B (large table)
 
1:
select * from A where cc in (select cc from B) is inefficient and uses the index of the cc column on table A;
 
select * from A where exists(select cc from B where cc=A.cc) is efficient and uses the index of the cc column on the B table. 
The opposite of
 
2:
select * from B where cc in (select cc from A) is efficient and uses the index of the cc column on the B table;
 
select * from B where exists(select cc from A where cc=B.cc) is inefficient and uses the index of the cc column on the A table.
 
 
Not in and not exists If the query statement uses not in, then both the inner and outer tables will perform a full table scan without using the index; and the subquery of not extsts can still use the index on the table. So no matter the size of the table, using not exists is faster than not in
The difference between in and = 
select name from student where name in ('zhang','wang','li','zhao'); 
and 
select name from student where name='zhang' or name='li' or name='wang' or name='zhao' 
The result is the same.

exists uses loop to query the outer table one by one. Each query will check the conditional statement of exists. When the conditional statement in exists can return the record row (no matter how many record rows are, as long as it can be returned), the condition is true and the current loop is returned. On the other hand, if the conditional statement in exists cannot return the record row, the record to which the current loop arrives is discarded. The condition of exists is like a bool condition. When the result set can be returned, it is true, and the result cannot be returned. set is false

as follows:

select * from user where exists (select 1);

The records of the user table are taken out one by one. Since select 1 in the sub-condition can always return record rows, all the records of the user table will be added to the result set, so it is the same as select * from user;

and as follows

select * from user where exists (select * from user where userId = 0);

You can know that when looping on the user table, check the conditional statement (select * from user where userId = 0). Since userId is never 0, the conditional statement will always return an empty set, and the condition will always be false, then all records in the user table will be will be discarded

not exists is the opposite of exists, that is, when the exists condition has a result set returned, the record to which the loop is returned will be discarded, otherwise the record to which the loop is added will be added to the result set

In general, if there are n records in table A, the exists query is to take out these n records one by one, and then judge the exists condition n times. 

 

 

The in query is equivalent to the superposition of multiple or conditions, which is easy to understand, such as the following query

select * from user where userId in (1, 2, 3);

equal to

select * from user where userId = 1 or userId = 2 or userId = 3;

not in is the opposite of in, as follows

select * from user where userId not in (1, 2, 3);

equal to

select * from user where userId != 1 and userId != 2 and userId != 3;

In general, the in query is to first find out all the records of the sub-query condition, assuming that the result set is B, there are m records in total, and then decompose the result set of the sub-query condition into m, and then perform m queries

 

It is worth mentioning that the return result of the sub-condition of the in query must have only one field, for example

select * from user where userId in (select id from B);

rather than

select * from user where userId in (select id, age from B);

And exists does not have this limitation

 

Let's consider the performance of exists and in

Consider the following SQL statement

1: select * from A where exists (select * from B where B.id = A.id);

2: select * from A where A.id in (select id from B);

 

Query 1. The following pseudo code can be transformed for easy understanding

for ($i = 0; $i < count(A); $i++) {

  $a = get_record(A, $i); #Get records one by one from table A

  if (B.id = $a[id]) #If the sub-condition holds

    $result[] = $a;

}

return $result;

This is probably the meaning. In fact, it can be seen that query 1 mainly uses the index of table B, and how table A has little effect on the efficiency of the query.

 

Assuming all ids of table B are 1, 2, 3, query 2 can be converted to

select * from A where A.id = 1 or A.id = 2 or A.id = 3;

This is easy to understand. The index of A is mainly used here. How does the B table have little effect on the query

 

Let's look at not exists and not in

1. select * from A where not exists (select * from B where B.id = A.id);

2. select * from A where A.id not in (select id from B);

Looking at query 1, it is still the same as above, using the index of B

For query 2, it can be transformed into the following statement

select * from A where A.id != 1 and A.id != 2 and A.id != 3;

It can be known that not in is a range query. This kind of != range query cannot use any index, which means that each record in table A must be traversed in table B once to check whether this record exists in table B

Therefore, not exists is more efficient than not in

 

The in statement in mysql is a hash connection between the outer table and the inner table, and the exists statement is a loop loop for the outer table, and the inner table is queried each time the loop loops. Everyone has always believed that exists is more efficient than the in statement. This statement is actually inaccurate. This is to distinguish the environment.
 

If the two tables queried are of the same size, there is little difference between using in and exists
If one of the two tables is smaller and the other is a large table, use exists for the larger subquery table, and in for the smaller subquery table: 
For example: table A (small table), table B (large table)
 
1:
select * from A where cc in (select cc from B) is inefficient and uses the index of the cc column on table A;
 
select * from A where exists(select cc from B where cc=A.cc) is efficient and uses the index of the cc column on the B table. 
The opposite of
 
2:
select * from B where cc in (select cc from A) is efficient and uses the index of the cc column on the B table;
 
select * from B where exists(select cc from A where cc=B.cc) is inefficient and uses the index of the cc column on the A table.
 
 
Not in and not exists If the query statement uses not in, then both the inner and outer tables will perform a full table scan without using the index; and the subquery of not extsts can still use the index on the table. So no matter the size of the table, using not exists is faster than not in
The difference between in and = 
select name from student where name in ('zhang','wang','li','zhao'); 
and 
select name from student where name='zhang' or name='li' or name='wang' or name='zhao' 
The result is the same.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324492201&siteId=291194637