In-depth explanation about exists in mysql

Explanation of exists in mysql

I think existsgrammar is a very powerful tool in mysql, which can simply implement some complex data processing.

Let me talk about three aspects related to exists .


all and any

First of all, when I see exists, I will inevitably think of all and any, which are easier to understand than exists. Both all and any can compare one row of data with multiple rows of data , which is their main function.

create table T(X int);
insert into T(X) values(1),(2),(3),(4);

# eg.1
select * from T where X > all( select * from T where X < 3 );	#输出3,4

# eg.2
select * from T where X > any( select * from T where X > 1 );	#输出3,4

Looking first at eg.1, the select * from T where X < 3result is obviously 1, 2; all requires the existence of X greater than any element in the set {1,2}, namely 3,4.

Similarly, for eg.2, the select * from T where X > 1result is 2,3,4; the requirement of any is that there is an element with X greater than the set {2,3,4}, namely 3,4.


Partition table

Before talking about exists, let's look at a more special statement about the use of "dividing" of tables.

eg.1

# fruitTable
Id  Name  Class Count  Date
 1   苹果    水果    10     2011-7-1
 1   桔子    水果    20     2011-7-2
 1   香蕉    水果    15     2011-7-3
 2   白菜    蔬菜    12     2011-7-1
 2   青菜    蔬菜    19     2011-7-2 

Screening is now required, with the condition that Id is unique and Date chooses the most recent

This screening condition lurks the requirements for table division. Taking fruitTable as an example, it needs to be divided into 2 sub-tables, one with Id 1 is another sub-table, and the other with Id 2 is another sub-table, and then select the tuple with the longest time from each sub-table.

First look at the following wrong solution

SELECT DISTINCT Id, Name, Class, Count, Date FROM fruitTable t1
	WHERE (Date IN 
           (SELECT MAX(Date) FROM fruitTable t2 GROUP BY Id));
           
# 结果
 1   桔子    水果    20     2011-7-2
 1   香蕉    水果    15     2011-7-3
 2   青菜    蔬菜    19     2011-7-2

This week's solution is logically flawed. It mixes the maximum time of different Ids without really dividing the table.

Let's take a look at the correct solution

The idea of ​​dividing the table is correct, but the problem is how to divide it. If you create two new tables, it is obviously too much trouble, so you have the following way of writing.

SELECT DISTINCT Id, Name, Class, Count, Date FROM fruitTable t1
	WHERE (Date = 
           (SELECT MAX(Date) FROM fruitTable t2 WHERE t2.Id=t1.Id));

Note The WHERE t2.Id=t1.Idtable t2 is subtly divided based on the criterion t2.Id = t1.Id. It can be deduced, such as traversing the table t1, first the first tuple:, 1 苹果 水果 10 2011-7-1you can know that t1.Id = 1, bring in the second select : (SELECT MAX(Date) FROM fruitTable t2 WHERE t2.Id=1), observe the filter condition WHERE t2.Id = 1 of this select statement, and find its The range is limited to tuples with Id 1. The aggregation function MAX (Date) returns the largest value of Date among all tuples with Id 1 (2011-7-3).

Therefore, for the table t1, when t1.Id = 1, only the tuple with Date = 2011-7-3 will be selected; and when tl.Id = 2, the second select becomes again SELECT MAX(Date) FROM fruitTable t2 WHERE t2.Id=2, all will be returned The maximum value of Date in the tuple with Id = 2 (2011-7-2).

It can be found that the table t2 is controlled by t1.Id, and is divided into different sub-tables according to the difference of t1.Id. This is the table division, and there is no need to create a new table.


exists

First talk about the basic usage of exists

create table R(
	X int, Y varchar(5), Z varchar(5)
);

create table S(
	Y varchar(5), Z varchar(5), Q int
);


insert into R(X,Y,Z) values(
	1,'a','A'
),(
	1,'b','B'
),(
	1,'a','B'
),(
	1,'c','C'
),(
	2,'a','B'
),(
	2,'b','B'
),(
	2,'c','A'
),(
	3,'z','Z'
);


insert into S(Y,Z,Q) values(
	'b','B',1
),(
	'a','B',2
);

-----------------------------

select * from R where exists( select * from S where S.Y='b' and R.Y=S.Y );
# 结果
'1', 'b', 'B'
'2', 'b', 'B'

For exists, it can be simply understood as if judgment.
Such statements select * from R where exists( select * from S where S.Y='b' and R.Y=S.Y );can be understood as selected to meet the condition SY = 'b' and RY = SY from the table R (select * from S where SY = 'b' and RY = SY) tuples.

Two properties can be seen in this property

  • First of all, the list in the brackets of exists () will not affect the final result returned. For example, in the above example, the returned result is always a tuple about table R, which has nothing to do with table S
  • For the exists () statement, the key is the where clause in parentheses. For exists (select * from S where SY = 'b' and RY = SY) this kind of statement can be directly regarded as if (SY == 'b' and RY == SY). Of course, it is not to say that select is not important. For example, exists (select 1 from S where SY = 'b' and RY = SY) is a condition that is always true.

By clarifying the above two points, we can more realize that exists is very much like a statement about conditional judgment.

The following example is similar

# 选了张三老师课的学生
select distinct sc.sid from sc 
	where exists (
		select * from course c,teacher t 
			where sc.cid = c.cid and c.tid = t.tid and t.tname = "张三");

But only exists is not enough, because many other statements can also achieve this function, the really powerful is not exists.

Find the best value

SELECT DISTINCT Id, Name, Class, Count, Date FROM fruitTable t1
	WHERE (Date = 
           (SELECT MAX(Date) FROM fruitTable t2 WHERE t2.Id=t1.Id));
#用not exists
SELECT DISTINCT Id, Name, Class, Count, Date FROM fruitTable t1
	WHERE NOT EXISTS(
           SELECT * FROM fruitTable t2 WHERE t2.Id=t1.Id and t2.Date > t1.Date );

Here not exists can also be regarded as not if, the key is to understand which part of the condition is denied (not). According to the previous theory, the condition here is obviously t2.Id=t1.Id and t2.Date > t1.Date, and t2.Id = t1.Id cannot be used as a negative object, because this is inevitable (think about it, t1 and t2 have the same content), which is used to limit the scope of table t2 That is, the division sub-table mentioned earlier), look again t2.Date > t1.Date, this is the negative part, that is, the Date for all tuples with an Id of t1.Id in t2 is not greater than t1.Date, and at this time t1.Date The maximum value.

Nested not exists

There is a more complicated situation, multiple layers of not exists nested. For example, to implement the division operation in relational algebra.

# 表R,S的定义上面已经给出
select distinct R1.x from R R1 where not exists ( 
	select * from S where not exists (
		select * from R R2 where R1.X=R2.X and R2.Y=S.Y and R2.Z=S.Z ));

There are 3 options and 2 not exists.
The innermost not exists is used to negate R2.Y=S.Y and R2.Z=S.Z(because R1.X = R2.X must be established, this is used to divide the sub-table), the outermost not exists is used to indicate that there is no such thing, you will find that the last The meaning of this sentence is the definition of division in relational algebra.

Guess you like

Origin www.cnblogs.com/friedCoder/p/12678145.html