Multi-table join query
Lecturer: Shang Silicon Valley-Song Hongkang (known as Master Kong in Jianghu)
Official website: http://www.atguigu.com
Multi-table query, also known as associated query, refers to two or more tables completing the query operation together.
Prerequisite: There is a relationship between these tables queried together (one-to-one, one-to-many), and there must be related fields between them. This related field may or may not have a foreign key established. For example: employee table and department table, these two tables are related by "department number".
1. Multi-table join caused by a case
1.1 Case description
Get data from multiple tables:
#案例:查询员工的姓名及其部门名称
SELECT last_name, department_name
FROM employees, departments;
search result:
+-----------+----------------------+
| last_name | department_name |
+-----------+----------------------+
| King | Administration |
| King | Marketing |
| King | Purchasing |
| King | Human Resources |
| King | Shipping |
| King | IT |
| King | Public Relations |
| King | Sales |
| King | Executive |
| King | Finance |
| King | Accounting |
| King | Treasury |
...
| Gietz | IT Support |
| Gietz | NOC |
| Gietz | IT Helpdesk |
| Gietz | Government Sales |
| Gietz | Retail Sales |
| Gietz | Recruiting |
| Gietz | Payroll |
+-----------+----------------------+
2889 rows in set (0.01 sec)
Analyze the error situation:
SELECT COUNT(employee_id) FROM employees;
#输出107行
SELECT COUNT(department_id)FROM departments;
#输出27行
SELECT 107*27 FROM dual;
We call the problem that occurs in the above multi-table query: Cartesian product error.
1.2 Understanding of Cartesian product (or cross connection)
Cartesian product is a mathematical operation. Suppose I have two sets X and Y, then the Cartesian product of X and Y is all possible combinations of X and Y, that is, all possible combinations of the first object from X and the second object from Y. The number of combinations is the product of the number of elements in the two sets.
In SQL92, the Cartesian product is also called a cross join , which in English is ** CROSS JOIN **. In SQL99, CROSS JOIN is also used to represent cross connections. Its function is to join any table, even if the two tables are not related. In MySQL, Cartesian product will occur in the following situations:
#查询员工姓名和所在部门名称
SELECT last_name,department_name FROM employees,departments;
SELECT last_name,department_name FROM employees CROSS JOIN departments;
SELECT last_name,department_name FROM employees INNER JOIN departments;
SELECT last_name,department_name FROM employees JOIN departments;
1.3 Case analysis and problem solving
- Cartesian product errors occur under the following conditions :
- Omit join conditions (or association conditions) for multiple tables
- The join condition (or association condition) is invalid
- All rows in all tables are connected to each other
- In order to avoid Cartesian product, you can add valid join conditions in WHERE.
- After adding the connection conditions, the query syntax is:
SELECT table1.column, table2.column
FROM table1, table2
WHERE table1.column1 = table2.column2; #连接条件
- Write the join condition in the WHERE clause.
- Correct way to write:
#案例:查询员工的姓名及其部门名称
SELECT last_name, department_name
FROM employees, departments
WHERE employees.department_id = departments.department_id;
- When there are identical columns in a table, prefix the column name with the table name.
2. Explanation of multi-table query classification
Category 1: Equivalent connection vs non-equivalent connection
Equijoin
SELECT employees.employee_id, employees.last_name, employees.department_id, departments.department_id, departments.location_id
FROM employees, departments
WHERE employees.department_id = departments.department_id;
Extension 1: Multiple connection conditions and AND operator
Extension 2: Distinguish duplicate column names
- When there are identical columns in multiple tables, the table name must be prefixed before the column name.
- Columns with the same column name in different tables can be distinguished by the table name.
SELECT employees.last_name, departments.department_name,employees.department_id
FROM employees, departments
WHERE employees.department_id = departments.department_id;
Extension 3: Table alias
- Using aliases can simplify queries.
- Using the table name prefix before the column name can improve query efficiency.
SELECT e.employee_id, e.last_name, e.department_id, d.department_id, d.location_id
FROM employees e , departments d
WHERE e.department_id = d.department_id;
It should be noted that if we use the alias of the table, we can only use the alias in the query field and filter condition, and we cannot use the original table name, otherwise an error will be reported.
Alibaba development specifications:
[Mandatory] For queries and changes to table records in the database, as long as multiple tables are involved, the table alias (or table name) needs to be qualified before the column name.
Note: When querying records, updating records, or deleting records in multiple tables, if the operation column does not have an alias (or table name) that qualifies the table, and the operation column exists in multiple tables, an exception will be thrown.
Positive example: select t1.name from table_first as t1, table_second as t2 where t1.id=t2.id;
Counterexample: In a certain business, since the multi-table association query statement does not have the restriction of table alias (or table name), After running normally for two years, a field with the same name was recently added to a table. After database changes were made in the pre-release environment, a 1052 exception occurred in the online query statement: Column 'name' in field list is ambiguous.
Extension 4: Connect multiple tables
** Summary: To connect n tables, at least n-1 connection conditions are required. **For example, to join three tables, at least two join conditions are required.
Exercise: Query the last_name, department_name, city of company employees
non-equijoin
SELECT e.last_name, e.salary, j.grade_level
FROM employees e, job_grades j
WHERE e.salary BETWEEN j.lowest_sal AND j.highest_sal;
Category 2: Self-connection vs. non-self-connection
- When table1 and table2 are essentially the same table, they are just virtualized into two tables using aliases to represent different meanings. Then the two tables perform inner joins, outer joins and other queries.
Title: Query the employees table and return "Xxx works for Xxx"
SELECT CONCAT(worker.last_name ,' works for ', manager.last_name)
FROM employees worker, employees manager
WHERE worker.manager_id = manager.employee_id ;
Category 3: Inner join vs outer join
In addition to querying records that meet the conditions, outer joins can also query records that do not meet the conditions on one side.
- Inner join: Merge rows from more than two tables with the same column. The result set does not contain rows from one table that do not match another table.
- Outer join: During the connection process of two tables, in addition to returning rows that meet the join conditions, it also returns rows in the left (or right) table that do not meet the conditions. This type of join is called a left (or right) outer join . When there are no matching rows, the corresponding column in the result table is NULL.
- If it is a left outer join, the table on the left in the join condition is also called the master table, and the table on the right is called the slave table. If it is a right outer join, the table on the right in the join condition is also called the master table, and the table on the left is called the slave table.
3. Implement multi-table query using SQL99 syntax
3.1 Basic syntax
- Syntax structure for creating joins using JOIN…ON clause:
SELECT table1.column, table2.column,table3.column
FROM table1
JOIN table2 ON table1 和 table2 的连接条件
JOIN table3 ON table2 和 table3 的连接条件
Its nested logic is similar to the FOR loop we use:
for t1 in table1:
for t2 in table2:
if condition1:
for t3 in table3:
if condition2:
output t1 + t2 + t3
- Syntax description:
- Additional join conditions can be specified using the ON clause .
- This join condition is separate from other conditions.
- The ON clause makes the statement more readable .
- The keywords JOIN, INNER JOIN, and CROSS JOIN have the same meaning, and they all represent inner joins.
3.2 Implementation of INNER JOIN
- grammar:
SELECT 字段列表
FROM A表 INNER JOIN B表
ON 关联条件
WHERE 等其他子句;
Question 1:
SELECT e.employee_id, e.last_name, e.department_id,
d.department_id, d.location_id
FROM employees e JOIN departments d
ON (e.department_id = d.department_id);
Question 2:
SELECT employee_id, city, department_name
FROM employees e
JOIN departments d
ON d.department_id = e.department_id
JOIN locations l
ON d.location_id = l.location_id;
3.3 Implementation of OUTER JOIN
3.3.1 LEFT OUTER JOIN
- grammar:
#实现查询结果是A
SELECT 字段列表
FROM A表 LEFT JOIN B表sq
ON 关联条件
WHERE 等其他子句;
- Example:
SELECT e.last_name, e.department_id, d.department_name
FROM employees e
LEFT OUTER JOIN departments d
ON (e.department_id = d.department_id) ;
3.3.2 Right OUTER JOIN
- grammar:
#实现查询结果是B
SELECT 字段列表
FROM A表 RIGHT JOIN B表
ON 关联条件
WHERE 等其他子句;
- Example:
SELECT e.last_name, e.department_id, d.department_name
FROM employees e
RIGHT OUTER JOIN departments d
ON (e.department_id = d.department_id) ;
It should be noted that LEFT JOIN and RIGHT JOIN only exist in SQL99 and later standards, but do not exist in SQL92 and can only be represented by (+).
3.3.3 FULL OUTER JOIN
- The result of a full outer join = matching data in the left and right tables + no matching data in the left table + no matching data in the right table.
- SQL99 supports full external connections. Use FULL JOIN or FULL OUTER JOIN to achieve this.
- It should be noted that MySQL does not support FULL JOIN, but you can use LEFT JOIN UNION RIGHT join instead.
4. Use of UNION
Merging query results Using the UNION keyword, you can give multiple SELECT statements and combine their results into a single result set. When merging, the number of columns and data types corresponding to the two tables must be the same and correspond to each other. Each SELECT statement is separated by the UNION or UNION ALL keyword.
Syntax format:
SELECT column,... FROM table1
UNION [ALL]
SELECT column,... FROM table2
UNION Operator
The UNION operator returns the union of the result sets of two queries, removing duplicate records.
UNION ALL OperatorThe
UNION ALL operator returns the union of the result sets of two queries. Duplicate parts of the two result sets are not deduplicated.
Note: The UNION ALL statement requires fewer resources than the UNION statement. If it is clear that there is no duplicate data in the resultant data after merging the data, or there is no need to remove duplicate data, try to use the UNION ALL statement to improve the efficiency of data query.
Example: Query employee information whose department number is >90 or whose email address contains a.
Use UNION because there may be duplicate data.
#方式1
SELECT * FROM employees WHERE email LIKE '%a%' OR department_id>90;
#方式2
SELECT * FROM employees WHERE email LIKE '%a%'
UNION
SELECT * FROM employees WHERE department_id>90;
Example: Query the information of male users in China and the user information of middle-aged males in the United States.
UNION ALL is used because there is no duplicate data, which improves query efficiency.
SELECT id,cname FROM t_chinamale WHERE csex='男'
UNION ALL
SELECT id,tname FROM t_usmale WHERE tGender='male';
5. Implementation of 7 SQL JOINS
5.1 Code implementation
#中图:内连接 A∩B
SELECT employee_id,last_name,department_name
FROM employees e JOIN departments d
ON e.`department_id` = d.`department_id`;
#左上图:左外连接
SELECT employee_id,last_name,department_name
FROM employees e LEFT JOIN departments d
ON e.`department_id` = d.`department_id`;
#右上图:右外连接
SELECT employee_id,last_name,department_name
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`;
#左中图:A - A∩B
SELECT employee_id,last_name,department_name
FROM employees e LEFT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE d.`department_id` IS NULL
#右中图:B-A∩B
SELECT employee_id,last_name,department_name
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE e.`department_id` IS NULL
#左下图:满外连接
# 左中图 + 右上图 A∪B
SELECT employee_id,last_name,department_name
FROM employees e LEFT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE d.`department_id` IS NULL
UNION ALL #没有去重操作,效率高
SELECT employee_id,last_name,department_name
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`;
#右下图
#左中图 + 右中图 A ∪B- A∩B 或者 (A - A∩B) ∪ (B - A∩B)
SELECT employee_id,last_name,department_name
FROM employees e LEFT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE d.`department_id` IS NULL
UNION ALL
SELECT employee_id,last_name,department_name
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`
WHERE e.`department_id` IS NULL
5.2 Summary of syntax format
- Middle left picture
#实现A - A∩B
select 字段列表
from A表 left join B表
on 关联条件
where 从表关联字段 is null and 等其他子句;
- Middle right picture
#实现B - A∩B
select 字段列表
from A表 right join B表
on 关联条件
where 从表关联字段 is null and 等其他子句;
- Lower left picture
#实现查询结果是A∪B
#用左外的A,union 右外的B
select 字段列表
from A表 left join B表
on 关联条件
where 等其他子句
union
select 字段列表
from A表 right join B表
on 关联条件
where 等其他子句;
- lower right picture
#实现A∪B - A∩B 或 (A - A∩B) ∪ (B - A∩B)
#使用左外的 (A - A∩B) union 右外的(B - A∩B)
select 字段列表
from A表 left join B表
on 关联条件
where 从表关联字段 is null and 等其他子句
union
select 字段列表
from A表 right join B表
on 关联条件
where 从表关联字段 is null and 等其他子句
Note:
We need to control the number of join tables. Multi-table joins are equivalent to nested for loops, which consume a lot of resources and seriously degrade SQL query performance. Therefore, do not join unnecessary tables. In many DBMS, there is also a limit on the maximum join table. [Mandatory] Joining more than three tables is prohibited. The data types of the fields that need to be joined must be absolutely consistent; when performing multi-table related queries, it is ensured that the related fields need to have indexes. Note: Even when joining two tables, you must pay attention to table indexes and SQL performance. Source: Alibaba "Java Development Manual"