Detailed explanation of in and exists in Sql

Conversion of in and exists

1 Conclusion

  1. in() is suitable for situations where the subquery result set is smaller than the external query result set (the number of records in the subtable query result set determines the number of database interactions)
  2. exists() is suitable for situations where the subquery result set is larger than the external query result set (the number of records in the external query result set determines the number of database interactions)
  3. When the outer query result set is as large as the subquery result set, the efficiency of in and exists is almost the same. You can choose either one.
  4. Small tables drive large tables (more accurately, small query result sets drive large query result sets)
  5. IN queries can use indexes on both internal tables and external tables.
  6. Exists queries can only use indexes on internal tables.
  7. The size of the table does not depend on the number of records in the internal table and external table, but the size of the number of records in the external table and subquery result set.

2 The difference between in and exists

Performance analysis of 2.1 in

select * from A
where id in(select id from B)

The above SQL will first execute the subquery in the brackets and then execute the main query, so it is equivalent to the following process:

for select id from B
for select * from A where A.id = B.id

The above query uses the in statement. in() is only executed once. After it finds out all the id fields in table B and caches them in memory, it checks whether the id in table A is equal to the id in table B. If they are equal, it will The records of the table are added to the result set until all the records of table A are traversed.
Its query process is similar to the following process

List resultSet=[];
Array A=(select * from A);
Array B=(select id from B);

for(int i=0;i<A.length;i++) {
    
    
   for(int j=0;j<B.length;j++) {
    
    
      if(A[i].id==B[j].id) {
    
    
         resultSet.add(A[i]);
         break;
      }
   }
}
return resultSet;

analyze:

  1. The current in subquery is table B driving table A
  2. MySQL first retrieves the data from table B and stores it in memory at one time . The number of records in table B determines the number of interactions in the database.
  3. Traverse the data in table B, and then check table A (each traversal is a connection interaction , which consumes resources)
  4. Assume that B has 100,000 records and A has 10 records, and the database will be interacted with 100,000 times; assuming that B has 10 records and A has 100,000 records, only 10 interactions will occur.

Conclusion:
in() is suitable for situations where the data in table B is smaller than the data in table A.

2.2 Performance analysis of Exists

select a.* from A a
where exists(select 1 from B b where a.id=b.id)

A process similar to the following:

for  select * from A
for  select 1 from B where B.id = A.id 

Its query process is similar to the following process

List resultSet=[];
Array A=(select * from A)

for(int i=0;i<A.length;i++) {
    
    
   if(exists(A[i].id) {
    
        //执行select 1 from B b where b.id=a.id是否有记录返回
       resultSet.add(A[i]);
   }
}
return resultSet;

analyze:

  1. The current exists query is table A driving table B
  2. Unlike in, exists queries the records of A into memory , so the number of records in table A determines the number of interactions in the database.
  3. Suppose A has 10,000 records, B has 10 records, and the number of database interactions is 10,000; suppose A has 10 records, B has 10,000 records, and the number of database interactions is 10.

2.3 Examples

1. Create table sql

#–1.学生表 
#-Student(s_id,s_name,s_birth,s_sex) –学生编号,学生姓名, 出生年月,学生性别
CREATE TABLE `Student` (
    `s_id` VARCHAR(20),
    s_name VARCHAR(20) NOT NULL DEFAULT '',
    s_brith VARCHAR(20) NOT NULL DEFAULT '',
    s_sex VARCHAR(10) NOT NULL DEFAULT '',
    PRIMARY KEY(s_id)
);

#–2.成绩表 
#Score(s_id,c_id,s_score) –学生编号,课程编号,分数
Create table Score(
    s_id VARCHAR(20),
    c_id VARCHAR(20) not null default '',
    s_score INT(3),
    primary key(`s_id`,`c_id`)
);

#-3.插入学生表数据
insert into Student values('01' , '赵雷' , '1990-01-01' , '男');
insert into Student values('02' , '钱电' , '1990-12-21' , '男');
insert into Student values('03' , '孙风' , '1990-05-20' , '男');
insert into Student values('04' , '李云' , '1990-08-06' , '男');
insert into Student values('05' , '周梅' , '1991-12-01' , '女');
insert into Student values('06' , '吴兰' , '1992-03-01' , '女');
insert into Student values('07' , '郑竹' , '1989-07-01' , '女');
insert into Student values('08' , '王菊' , '1990-01-20' , '女');

#-4.成绩表数据
insert into Score values('01' , '01' , 80);
insert into Score values('01' , '02' , 90);
insert into Score values('01' , '03' , 99);
insert into Score values('02' , '01' , 70);
insert into Score values('02' , '02' , 60);
insert into Score values('02' , '03' , 80);
insert into Score values('03' , '01' , 80);
insert into Score values('03' , '02' , 80);
insert into Score values('03' , '03' , 80);
insert into Score values('04' , '01' , 50);
insert into Score values('04' , '02' , 30);
insert into Score values('04' , '03' , 20);
insert into Score values('05' , '01' , 76);
insert into Score values('05' , '02' , 87);
insert into Score values('06' , '01' , 31);
insert into Score values('06' , '03' , 34);
insert into Score values('07' , '02' , 89);
insert into Score values('07' , '03' , 98);

Data display:
image.png image.png
2. in method

SELECT
	a.* 
FROM
	Student a 
WHERE
	a.s_id IN (SELECT b.s_id FROM Score b WHERE b.c_id = '01')

3. exists method

SELECT
	a.* 
FROM
	Student a 
WHERE
	EXISTS(SELECT * FROM Score b WHERE a.s_id = b.s_id AND b.c_id = '01')

4.Results _

image.png

3 not in 和not exists

If the query statement uses not in, then the entire table is scanned on both the inner and outer tables , and no index is used; however, the subquery of not extsts can still use the index on the table. So no matter which table is larger, using not exists is faster than not in.

Supongo que te gusta

Origin blog.csdn.net/hansome_hong/article/details/127471694
Recomendado
Clasificación