MySQL: left join avoid pit Guide

phenomenon

left join in the process we use mysql query can be described as very common, such as the blog in an article about how many comments, how many goods store in a comment, a comment like how many and so on. However, due to join, on, where keywords such as not familiar with can sometimes cause results not as expected, so today I will summarize together to avoid the pit.

Here I would like to give a scene, and throw two questions, if you can answer it in this article do not read.

Suppose you have a class management application, there is a table classes, saved all classes; there is a table students, saved all students, specific data as follows (thanks Liao Xuefeng online SQL):

SELECT * FROM classes;
Here Insert Picture Description
SELECT * FROM students;.
Here Insert Picture Description
So now there are two requirements:

The number to find out the name of each class of its corresponding female students
to find out the total number of students per class

For Requirement 1, most people will be able to come up with the following two sql without hesitation wording

正确 ✔
	SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    and s.gender = 'F'
    group by c.name

or

错误❌
	SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    where s.gender = 'F'
    group by c.name

2 for the needs of the majority of people can come up with the following two sql without thinking of writing, I ask what is right?

正确 ✔
	SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    where c.name = '一班' 
    group by c.name

or

错误❌
SELECT c.name, count(s.name) as num 
  FROM classes c left join students s 
  on s.class_id = c.id 
  and c.name = '一班' 
  group by c.name

source

For similar manner mysql nested loops from left join to process, as an example in the following statement:

SELECT * FROM LT LEFT JOIN RT ON P1(LT,RT)) WHERE P2(LT,RT)

Wherein P1 is filtered on condition that the deletion is TRUE, the filter condition where P2 is, that is also missing is TRUE, the execution logic statements that can be described as:

FOR each row lt in LT {// 遍历左表的每一行
  BOOL b = FALSE;
  FOR each row rt in RT such that P1(lt, rt) {// 遍历右表每一行,找到满足join条件的行
    IF P2(lt, rt) {//满足 where 过滤条件
      t:=lt||rt;//合并行,输出该行
    }
    b=TRUE;// lt在RT中有对应的行
  }
  IF (!b) { // 遍历完RT,发现lt在RT中没有有对应的行,则尝试用null补一行
    IF P2(lt,NULL) {// 补上null后满足 where 过滤条件
      t:=lt||NULL; // 输出lt和null补上的行
    }         
  }
}

Of course, the actual situation would be the way to use MySQL buffer optimization, reducing the number of comparisons line, but this does not affect the implementation of critical processes, not within the scope of this article.

From this pseudo-code, we can see two things:

  • If you want to restrict the right table, it must be carried out on the conditions, if carried out where the missing data may result, leading to the left table without the right table rows in the final result of the matching row does not appear, contrary to our understanding of the left join in. Because the terms of the no right-table match rows of the left table, traverse the right table b = FALSE, NULL filled it will try to use the right table, but at this time we are on the right table rows P2 restrictions, NULL if not satisfied P2 (NULL generally do not satisfy the constraints, unless this iS NULL), then the end result will not be added, resulting in lack of results.

  • If there are no conditions where, no matter on what conditions on the left table limits, the synthesis of the results of each row in the left table will have at least one row of the table for left line, if the right table if there is no corresponding row, then traverse the right table after b = FALSE, NULL will be used to generate the data line, and this data is redundant. So the left to filter the table must be where.

Expand following the results of two errors statement needs and the wrong reasons:

Requirement 1
Here Insert Picture Description
Requirement 2
Here Insert Picture Description

Since the demand for a condition where the right to restrict the table, resulting in missing data (four classes as a result should have 0)

2 Since the demand on the conditions left the table limit, resulting in redundant data (the results of other classes came out that was wrong)

to sum up

Symptoms and by the above analysis, the conclusion can be drawn: the left join statement, the left table filter must be placed where conditions, the filter must be put on the table the right conditions, so the results can not too much, just right.

SQL sounds simple, but there are many details of the principles in it, it will cause a little confusion results not as expected, so always pay attention to these details principle, the key to avoiding the wrong time.

Published 11 original articles · won praise 0 · Views 177

Guess you like

Origin blog.csdn.net/it147/article/details/104525928