Sql statement of the difference between IN and EXISTS

IN statement: execute only once

    Determine whether a given value or subquery values ​​in the list of matches. When the query in the first query table sub-query, and then look in the table and do a Cartesian product, then in accordance with the conditions for screening. So relatively small in the table when, in faster.

    Specific examples sql:

SQL statement execution order See: https: //blog.csdn.net/wqc19920906/article/details/79411854

1、select * from student s where s.stuid in(select stuid from score ss where ss.stuid = s.stuid)
执行结果:


2、 select * from student s where s.stuid in(select stuid from score ss where ss.stuid <1005)
执行结果:

 

More than two statements execution flow:

First, find out the student will perform statement from the table and then perform in subquery inside, and then the results of the query to the original user table and do a Cartesian product, then in accordance with the conditions of our student.stuid IN score.stuid the results of the screening (both relatively stuid column values ​​are equal, the unequal deleted). Finally, get qualified data.

 

EXISTS statement: execute student.length times

Specify a subquery, detect the presence of the line. Traversing the outer loop, then look at the appearance of the record and there is no data in the same table. Matches will result into the result set.

Specific examples:

select * from student s where EXISTS ( select stuid from score ss where ss.stuid = s.stuid)
first execution result of the execution result of this sql statement in the above is the same.


However, it is not the same as their execution process is completely different:

  Use the keyword query exists when, first of all, let's not query the contents of sub-queries, but the look-up table of our Lord query, that is, sql statement we first do is:

select * from student s
results:

 

 Then, according to the table of each record, execute the following statement, in order to judge the back where the condition is satisfied:

EXISTS (select stuid from score ss where ss.stuid = s.stuid)
 If the establishment does not hold true returns false. If the return is true, then the bank retains result, if the return is false, then delete the row, the final result will be returned.


Difference and application scenarios
    difference in and exists of: if fewer sub-query results obtained record, in the main query tables large and when there should be used in an index, whereas if the lower outer main query record, sub query large tables, use when there exists an index. In fact, we distinguish between in and exists mainly caused by the change in driving sequence (which is the key performance variations), if it exists, then the outside layer table as the driving table, the first to be visited, if it is IN, then the first implementation of sub-queries, so we will quickly return to the table as the target drive, it will take into account the relationship between the index and the result set, the other not NULL when processing iN.

    in that the outer and inner tables for the hash join, and exists as a loop cycle is the appearance of each loop recycled inner table query. It has been considered to be inaccurate than exists in high efficiency argument.

not in and not exists
    if the query using not in appearance are so within a full table scan, did not use the index; and the child still can not extsts query to the index on the table. So whether that table is large, not exists than not in fast.

 

 

The following is a reprint content

 

Principle resolved added:

 


select * from A
where id in(select id from B)

After the above query used in the statement, in () is executed only once, it detects all id field B of the table and cached., Id check A table is equal to the id of the table B, then A table if they are equal the record is added in the result set until you have traversed all the records a table.
it is a process similar to the process of inquiry

List resultSet=[];
Array A=(select * from A);
Array B=(select id from B);

for(int i=0;i<A.length;i++) {
   for(int j=0;j<B.length;j++) {
      if(A[i].id==B[j].id) {
         resultSet.add(A[i]);
         break;
      }
   }
}
return resultSet;

As can be seen, when a large data table B is not suitable for use in (), because it will traverse all of the data table B once
as: A table has 10,000 records, B table has one million records, then it is possible to traverse at most 10,000 * 1000000 times, efficiency is poor.
Another example: A table has 10,000 records, B table has 100 records, then it is most likely to traverse the 10000 * 100 times, greatly reducing the number of traversal, efficiency is greatly improved.

Conclusion: in () for the data smaller than the case of Table B Table A

select a.* from A a 
where exists(select 1 from B b where a.id=b.id)

Use the above query statement exists, exists () performs A.length times, it does not cache exists () result set as exists () the contents of the result set is not important, important is whether the result set record, if there is returns true, not false.
it is a process similar to the process of inquiry

List resultSet=[];
Array A=(select * from A)

for(int i=0;i<A.length;i++) {
   if(exists(A[i].id) {    //执行select 1 from B b where b.id=a.id是否有记录返回
       resultSet.add(A[i]);
   }
}
return resultSet;

Suitable When Table B is larger than A table data using the exists (), because it is not so traversal, just need to perform a query on the list.
As: A table has 10,000 records, Table B there are 1 million records, then the exists () 10000 executes to judge a table id is equal to the B table id.
, such as: a table has 10,000 records, table B there are 100 million records, then the exists () is performed 10,000 times, because it performs a .length times, the more visible the data table B, for the exists () an effect.
Another example: A table has 10 000 records, 100 records in table B, it exists () is executed 10000 times, not as used in ( ) traversing 10000 * 100, because in () is traversed relatively in memory, and exists () needs to query the database, we all know that higher consumption database query performance, while memory is relatively fast.

Conclusion: exists () is larger than for the case of the data in Table A Table B

When the table data A and B as large as table data, and exists in almost efficiency, a used optionally.

 

 

For example, there is a query in the Northwind database as
the SELECT c.CustomerId, the FROM the Customers CompanyName c
the WHERE EXISTS (
the SELECT OrderID the FROM the WHERE o.CustomerID the Orders O = c.CustomerID) 
This is how it works EXISTS inside it? Subquery returns OrderId field, but outside of queries looking for is CustomerID and CompanyName field, the two fields are not sure OrderID inside ah, this is how to match it? 

EXISTS subquery for checking whether the at least one row of data is returned, the subquery does not actually return any data, but returns the value True or False
EXISTS specify a subquery, detecting the presence of the line.

Syntax: EXISTS subquery
parameters: subquery is a restricted SELECT statement (not allowed COMPUTE clause and the INTO keyword).
Result Type: Boolean If the subquery contains rows, it returns TRUE, otherwise returns FLASE.

TABLE Example A: TableIn Example Table B: TableEx

. (A) with a NULL in a subquery still returns the result set
select * from TableIn where exists (select null)
is equivalent to: SELECT * from TableIn
 
. (II) and compared using EXISTS IN queries. Note that both queries return the same result.
* from TableIn WHERE EXISTS SELECT (SELECT = BID from TableEx WHERE BNAME TableIn.ANAME)
SELECT * WHERE ANAME from TableIn in (SELECT BNAME from TableEx)

(C) Comparison using EXISTS and = ANY queries. Note that both queries return the same result.
* from TableIn WHERE EXISTS SELECT (SELECT = BID from TableEx WHERE BNAME TableIn.ANAME)
SELECT * WHERE ANAME TableIn from the ANY = (SELECT BNAME from TableEx)

EXISTS and NOT EXISTS role opposite. If the subquery returns no rows, to meet the NOT EXISTS WHERE clause.

Conclusion:
EXISTS (including NOT EXISTS) clause return value is a BOOL value. Inside there is a sub-query EXISTS (SELECT ... FROM ...), I call EXIST in the query. The inner query returns a result set. EXISTS clause The result set is empty or non-empty query statement therein, returns a Boolean value.

Can be understood as a popular: an outer row of each lookup table, as a test the query substituting, if the result of the query returns a non-null value is taken, the EXISTS clause returns TRUE, the result can be used as an outer row of the query trekking otherwise, not as a result.

Parser will look at the first word of the sentence, when it finds the first word is SELECT keyword when it jumps FROM keyword, then find a table's name and the keyword FROM table into memory. Next is to find WHERE keyword, if you can not find the resolve to return to the field to find the SELECT, WHERE If found, the analysis of the conditions in which, after the completion of the analysis back to the SELECT field. Finally, the formation of a virtual table we want.
WHERE keyword is behind the conditional expression. After completion of the calculation of the conditional expression, there will be a return value, i.e., 0 or non-zero, non-zero that is true (to true), that is false 0 (false). Similarly WHERE latter condition also has a return value, true or false, to determine the next execution is not executed SELECT.
Analyzer to find a keyword SELECT, FROM keyword will then jump STUDENT table into memory and pass the pointer to find the first record, followed by the keyword WHERE find its conditional expression is calculated, if this is true then the record means to which a virtual table pointer point to the next record. If false then the next record pointers directly, without other operations. Search is complete tables, and the retrieved virtual table returned to the user. EXISTS is part of a conditional expression, it also has a return value (true or false).

Before the insertion recording, it is necessary to check whether this record already exists, only when there is no record insert operation, repeated recording can be prevented by using the inserted conditional EXISTS.
The INTO TableIn the INSERT (ANAME, ASEX) 
the SELECT Top. 1 'John Doe', 'M' the FROM TableIn
the WHERE EXISTS Not (SELECT * WHERE TableIn.AID from TableIn =. 7)

EXISTS IN and efficiency of the problem, usually using exists than in high efficiency, because IN do not take the index, but it depends on the specific use of the actual situation:

IN is suitable for large and small outer inner case; EXISTS adapted to the outer case of the small and large table.

 


Problems and solutions

Question 1:

 

--users table has 1000 records, id increment, id is greater than 0

select * from users where exists (select * from users limit 0); - how many records the output?

select * from users where exists (select * from users where id <0); - how many records the output?

The answer (select to view):

10000

0

 the reason:

The nature of the query exists, as long as the hit record, then return true; so the limit would not want to control, or unreachable.

 

Question 2:

can completely replace exists in it?

Can not.

E.g:

- there is no associated field situation: enumeration constant

select * from areas where id in (4, 5, 6);

- there is no associated field situation: this exists subquery, either all true, all-or-false

select * from areas where id in (select city_id from deals where deals.name = 'xxx'); 

 

 

 

For a correlation exists sql optimization examples:

9, with an alternative exists in (found a lot of programmers do not know how this use): 
in a query based on the underlying table in order to satisfy a condition that often require many join another table. 
In this case, use exists (or not exists) will generally improve the efficiency of queries. 
Example: 
(inefficient) 
SELECT ... WHERE t1.id from table1 T1> 10 and in PNO (from Table2 WHERE NO SELECT name like '% WWW'); 
(efficient) 
SELECT ... table1 from T1 WHERE t1.id > 10 and EXISTS (SELECT. 1 from Table2 T2 WHERE t1.pno = t2.no and name like '% WWW'); 
10, not exists with an alternative not in: 
in a subquery, not in clause will execute an internal sorting and merging. 
In either case, not in the least efficient of all (because of its sub-tables in the query to perform a full table traverse). 
In order to avoid the use not in, we can rewrite it into outer join (Outer Joins) or not exists. 
11, replaced with distinct exists: 
When submitting a query table information contains many, to avoid the use of distinct select clause exists generally consider the use of alternatives. 
For example: 
(ineffective) 
DISTINCT d.dept_no SELECT, d.dept_name from t_dept is D, E t_emp WHERE d.dept_no = e.dept_no; 
(efficient) 
SELECT d.dept_no, d.dept_name from t_dept is D WHERE EXISTS (SELECT. 1 = from t_emp WHERE d.dept_no e.dept_no); 
exists to make queries more quickly, because the RDBMS core module once the conditions meet subquery immediately returns a result. 
12, the connection table exists Alternatively: 
in general, the table is connected by way of more than exists Efficient. 
Example: 
(inefficient) 
SELECT ename from E EMP WHERE EXISTS (SELECT from Dept. 1 and dept_cat WHERE dept_no = e.dept_no = 'W is'); 
the SELECT the ENAME 
(efficient) 
SELECT ename from Dept D, E EMP WHERE e.dept_no = and dept_cat = d.dept_no 'W';
---------------------
Disclaimer: This article is CSDN blogger "jcpp9527 'original article, follow the CC 4.0 by -sa copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/wqc19920906/article/details/79800374

Guess you like

Origin www.cnblogs.com/yszr/p/11316285.html