Detailed explanation of multi-table connection query in Sql

Multi-table join query

Lecturer: Shang Silicon Valley-Song Hongkang (known as Master Kong in Jianghu)
Official website: http://www.atguigu.com

Multi-table query, also known as associated query, refers to two or more tables completing the query operation together.
Prerequisite: There is a relationship between these tables queried together (one-to-one, one-to-many), and there must be related fields between them. This related field may or may not have a foreign key established. For example: employee table and department table, these two tables are related by "department number".

1. Multi-table join caused by a case

1.1 Case description

1554974984600.png
Get data from multiple tables:
1554975020388.png

#案例:查询员工的姓名及其部门名称
SELECT last_name, department_name
FROM employees, departments;

1554975097631.png
search result:

+-----------+----------------------+
| last_name | department_name      |
+-----------+----------------------+
| King      | Administration       |
| King      | Marketing            |
| King      | Purchasing           |
| King      | Human Resources      |
| King      | Shipping             |
| King      | IT                   |
| King      | Public Relations     |
| King      | Sales                |
| King      | Executive            |
| King      | Finance              |
| King      | Accounting           |
| King      | Treasury             |
...
| Gietz     | IT Support           |
| Gietz     | NOC                  |
| Gietz     | IT Helpdesk          |
| Gietz     | Government Sales     |
| Gietz     | Retail Sales         |
| Gietz     | Recruiting           |
| Gietz     | Payroll              |
+-----------+----------------------+
2889 rows in set (0.01 sec)

Analyze the error situation:

SELECT COUNT(employee_id) FROM employees;
#输出107行

SELECT COUNT(department_id)FROM departments;
#输出27行

SELECT 107*27 FROM dual;

We call the problem that occurs in the above multi-table query: Cartesian product error.

1.2 Understanding of Cartesian product (or cross connection)

Cartesian product is a mathematical operation. Suppose I have two sets X and Y, then the Cartesian product of X and Y is all possible combinations of X and Y, that is, all possible combinations of the first object from X and the second object from Y. The number of combinations is the product of the number of elements in the two sets.

In SQL92, the Cartesian product is also called a cross join , which in English is ** CROSS JOIN **. In SQL99, CROSS JOIN is also used to represent cross connections. Its function is to join any table, even if the two tables are not related. In MySQL, Cartesian product will occur in the following situations:

#查询员工姓名和所在部门名称
SELECT last_name,department_name FROM employees,departments;
SELECT last_name,department_name FROM employees CROSS JOIN departments;
SELECT last_name,department_name FROM employees INNER JOIN departments;
SELECT last_name,department_name FROM employees JOIN departments;

1.3 Case analysis and problem solving

  • Cartesian product errors occur under the following conditions :
    • Omit join conditions (or association conditions) for multiple tables
    • The join condition (or association condition) is invalid
    • All rows in all tables are connected to each other
  • In order to avoid Cartesian product, you can add valid join conditions in WHERE.
  • After adding the connection conditions, the query syntax is:
SELECT	table1.column, table2.column
FROM	table1, table2
WHERE	table1.column1 = table2.column2;  #连接条件
  • Write the join condition in the WHERE clause.
  • Correct way to write:
#案例:查询员工的姓名及其部门名称
SELECT last_name, department_name
FROM employees, departments
WHERE employees.department_id = departments.department_id;
  • When there are identical columns in a table, prefix the column name with the table name.

2. Explanation of multi-table query classification

Category 1: Equivalent connection vs non-equivalent connection

Equijoin

1554975496900.png

SELECT employees.employee_id, employees.last_name, employees.department_id, departments.department_id, departments.location_id
FROM   employees, departments
WHERE  employees.department_id = departments.department_id;

1554975522600.png
1554975526339.png
Extension 1: Multiple connection conditions and AND operator
1554975606231.png
Extension 2: Distinguish duplicate column names

  • When there are identical columns in multiple tables, the table name must be prefixed before the column name.
  • Columns with the same column name in different tables can be distinguished by the table name.
SELECT employees.last_name, departments.department_name,employees.department_id
FROM employees, departments
WHERE employees.department_id = departments.department_id;

Extension 3: Table alias

  • Using aliases can simplify queries.
  • Using the table name prefix before the column name can improve query efficiency.
SELECT e.employee_id, e.last_name, e.department_id, d.department_id, d.location_id
FROM   employees e , departments d
WHERE  e.department_id = d.department_id;

It should be noted that if we use the alias of the table, we can only use the alias in the query field and filter condition, and we cannot use the original table name, otherwise an error will be reported.
Alibaba development specifications:
[Mandatory] For queries and changes to table records in the database, as long as multiple tables are involved, the table alias (or table name) needs to be qualified before the column name.
Note: When querying records, updating records, or deleting records in multiple tables, if the operation column does not have an alias (or table name) that qualifies the table, and the operation column exists in multiple tables, an exception will be thrown.
Positive example: select t1.name from table_first as t1, table_second as t2 where t1.id=t2.id;
Counterexample: In a certain business, since the multi-table association query statement does not have the restriction of table alias (or table name), After running normally for two years, a field with the same name was recently added to a table. After database changes were made in the pre-release environment, a 1052 exception occurred in the online query statement: Column 'name' in field list is ambiguous.
Extension 4: Connect multiple tables
1554978354431.png
** Summary: To connect n tables, at least n-1 connection conditions are required. **For example, to join three tables, at least two join conditions are required.
Exercise: Query the last_name, department_name, city of company employees

non-equijoin

1554978442447.png

SELECT e.last_name, e.salary, j.grade_level
FROM   employees e, job_grades j
WHERE  e.salary BETWEEN j.lowest_sal AND j.highest_sal;

1554978477013.png

Category 2: Self-connection vs. non-self-connection

1554978514321.png

  • When table1 and table2 are essentially the same table, they are just virtualized into two tables using aliases to represent different meanings. Then the two tables perform inner joins, outer joins and other queries.

Title: Query the employees table and return "Xxx works for Xxx"

SELECT CONCAT(worker.last_name ,' works for ', manager.last_name)
FROM   employees worker, employees manager
WHERE  worker.manager_id = manager.employee_id ;

1554978684947.png
1554978690764.png

Category 3: Inner join vs outer join

In addition to querying records that meet the conditions, outer joins can also query records that do not meet the conditions on one side.
1554978955659.png

  • Inner join: Merge rows from more than two tables with the same column. The result set does not contain rows from one table that do not match another table.
  • Outer join: During the connection process of two tables, in addition to returning rows that meet the join conditions, it also returns rows in the left (or right) table that do not meet the conditions. This type of join is called a left (or right) outer join . When there are no matching rows, the corresponding column in the result table is NULL.
  • If it is a left outer join, the table on the left in the join condition is also called the master table, and the table on the right is called the slave table. If it is a right outer join, the table on the right in the join condition is also called the master table, and the table on the left is called the slave table.

3. Implement multi-table query using SQL99 syntax

3.1 Basic syntax

  • Syntax structure for creating joins using JOIN…ON clause:
SELECT table1.column, table2.column,table3.column
FROM table1
JOIN table2 ON table1 和 table2 的连接条件
JOIN table3 ON table2 和 table3 的连接条件

Its nested logic is similar to the FOR loop we use:

for t1 in table1:
    for t2 in table2:
       if condition1:
           for t3 in table3:
              if condition2:
                  output t1 + t2 + t3
  • Syntax description:
    • Additional join conditions can be specified using the ON clause .
    • This join condition is separate from other conditions.
    • The ON clause makes the statement more readable .
    • The keywords JOIN, INNER JOIN, and CROSS JOIN have the same meaning, and they all represent inner joins.

3.2 Implementation of INNER JOIN

  • grammar:
SELECT 字段列表
FROM A表 INNER JOIN B表
ON 关联条件
WHERE 等其他子句;

Question 1:

SELECT e.employee_id, e.last_name, e.department_id, 
d.department_id, d.location_id
FROM employees e JOIN departments d
ON (e.department_id = d.department_id);

1554979073996.png
1554979079395.png
Question 2:

SELECT employee_id, city, department_name
FROM employees e 
JOIN departments d
ON d.department_id = e.department_id 
JOIN locations l
ON d.location_id = l.location_id;

1554979110008.png
1554979115642.png

3.3 Implementation of OUTER JOIN

3.3.1 LEFT OUTER JOIN
  • grammar:
#实现查询结果是A
SELECT 字段列表
FROM A表 LEFT JOIN B表sq
ON 关联条件
WHERE 等其他子句;
  • Example:
SELECT e.last_name, e.department_id, d.department_name
FROM   employees e
LEFT OUTER JOIN departments d
ON   (e.department_id = d.department_id) ;

1554979200961.png

3.3.2 Right OUTER JOIN
  • grammar:
#实现查询结果是B
SELECT 字段列表
FROM A表 RIGHT JOIN B表
ON 关联条件
WHERE 等其他子句;
  • Example:
SELECT e.last_name, e.department_id, d.department_name
FROM   employees e
RIGHT OUTER JOIN departments d
ON    (e.department_id = d.department_id) ;

1554979243194.png
It should be noted that LEFT JOIN and RIGHT JOIN only exist in SQL99 and later standards, but do not exist in SQL92 and can only be represented by (+).

3.3.3 FULL OUTER JOIN
  • The result of a full outer join = matching data in the left and right tables + no matching data in the left table + no matching data in the right table.
  • SQL99 supports full external connections. Use FULL JOIN or FULL OUTER JOIN to achieve this.
  • It should be noted that MySQL does not support FULL JOIN, but you can use LEFT JOIN UNION RIGHT join instead.

4. Use of UNION

Merging query results Using the UNION keyword, you can give multiple SELECT statements and combine their results into a single result set. When merging, the number of columns and data types corresponding to the two tables must be the same and correspond to each other. Each SELECT statement is separated by the UNION or UNION ALL keyword.
Syntax format:

SELECT column,... FROM table1
UNION [ALL]
SELECT column,... FROM table2

UNION Operator
1554979317187.png
The UNION operator returns the union of the result sets of two queries, removing duplicate records.
UNION ALL OperatorThe
1554979343634.png
UNION ALL operator returns the union of the result sets of two queries. Duplicate parts of the two result sets are not deduplicated.
Note: The UNION ALL statement requires fewer resources than the UNION statement. If it is clear that there is no duplicate data in the resultant data after merging the data, or there is no need to remove duplicate data, try to use the UNION ALL statement to improve the efficiency of data query.
Example: Query employee information whose department number is >90 or whose email address contains a.
Use UNION because there may be duplicate data.

#方式1
SELECT * FROM employees WHERE email LIKE '%a%' OR department_id>90;
#方式2
SELECT * FROM employees  WHERE email LIKE '%a%'
UNION
SELECT * FROM employees  WHERE department_id>90;

Example: Query the information of male users in China and the user information of middle-aged males in the United States.
UNION ALL is used because there is no duplicate data, which improves query efficiency.

SELECT id,cname FROM t_chinamale WHERE csex='男'
UNION ALL
SELECT id,tname FROM t_usmale WHERE tGender='male';

5. Implementation of 7 SQL JOINS

1554979255233.png

5.1 Code implementation

#中图:内连接 A∩B
SELECT employee_id,last_name,department_name
FROM employees e JOIN departments d
ON e.`department_id` = d.`department_id`;
#左上图:左外连接
SELECT employee_id,last_name,department_name
FROM employees e LEFT JOIN departments d
ON e.`department_id` = d.`department_id`;
#右上图:右外连接
SELECT employee_id,last_name,department_name
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`;
#左中图:A - A∩B
SELECT employee_id,last_name,department_name
FROM employees e LEFT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE d.`department_id` IS NULL
#右中图:B-A∩B
SELECT employee_id,last_name,department_name
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE e.`department_id` IS NULL
#左下图:满外连接
# 左中图 + 右上图  A∪B
SELECT employee_id,last_name,department_name
FROM employees e LEFT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE d.`department_id` IS NULL
UNION ALL  #没有去重操作,效率高
SELECT employee_id,last_name,department_name
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`;
#右下图
#左中图 + 右中图  A ∪B- A∩B 或者 (A -  A∩B) ∪ (B - A∩B)
SELECT employee_id,last_name,department_name
FROM employees e LEFT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE d.`department_id` IS NULL
UNION ALL
SELECT employee_id,last_name,department_name
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE e.`department_id` IS NULL

5.2 Summary of syntax format

  • Middle left picture
#实现A -  A∩B
select 字段列表
from A表 left join B表
on 关联条件
where 从表关联字段 is null and 等其他子句;
  • Middle right picture
#实现B -  A∩B
select 字段列表
from A表 right join B表
on 关联条件
where 从表关联字段 is null and 等其他子句;
  • Lower left picture
#实现查询结果是A∪B
#用左外的A,union 右外的B
select 字段列表
from A表 left join B表
on 关联条件
where 等其他子句

union 

select 字段列表
from A表 right join B表
on 关联条件
where 等其他子句;
  • lower right picture
#实现A∪B -  A∩B  或   (A -  A∩B) ∪ (B - A∩B)
#使用左外的 (A -  A∩B)  union 右外的(B - A∩B)
select 字段列表
from A表 left join B表
on 关联条件
where 从表关联字段 is null and 等其他子句

union

select 字段列表
from A表 right join B表
on 关联条件
where 从表关联字段 is null and 等其他子句

Note:
We need to control the number of join tables. Multi-table joins are equivalent to nested for loops, which consume a lot of resources and seriously degrade SQL query performance. Therefore, do not join unnecessary tables. In many DBMS, there is also a limit on the maximum join table. [Mandatory] Joining more than three tables is prohibited. The data types of the fields that need to be joined must be absolutely consistent; when performing multi-table related queries, it is ensured that the related fields need to have indexes. Note: Even when joining two tables, you must pay attention to table indexes and SQL performance. Source: Alibaba "Java Development Manual"


Guess you like

Origin blog.csdn.net/hansome_hong/article/details/127473607