Chapter 09_Subqueries
A subquery refers to a query in which a query statement is nested inside another query statement. This feature has been introduced since MySQL 4.1.
The use of subqueries in SQL greatly enhances the ability of SELECT queries, because many times queries need to obtain data from the result set, or need to calculate a data result from the same table first, and then compare this data result (maybe a certain scalar, or possibly a collection) for comparison.
1. Requirements analysis and problem solving
1.1 Practical issues
Existing solutions:
#方式一:
SELECT salary
FROM employees
WHERE last_name = 'Abel';
SELECT last_name,salary
FROM employees
WHERE salary > 11000;
#方式二:自连接
SELECT e2.last_name,e2.salary
FROM employees e1,employees e2
WHERE e1.last_name = 'Abel'
AND e1.`salary` < e2.`salary`
#方式三:子查询
SELECT last_name,salary
FROM employees
WHERE salary > (
SELECT salary
FROM employees
WHERE last_name = 'Abel'
);
1.2 Basic use of subqueries
- The basic grammatical structure of a subquery:
- A subquery (inner query) is executed all at once before the main query.
- The result of the subquery is used by the main query (outer query).
- Precautions
- Subqueries are enclosed in parentheses
- Put the subquery on the right side of the comparison condition
- Single-row operators correspond to single-row subqueries, and multi-row operators correspond to multi-row subqueries
1.3 Classification of subqueries
Classification method 1:
We return one or more records according to the result of the inner query, and divide the subquery into 单行子查询
, 多行子查询
.
-
single row subquery
-
multi-row subquery
Classification method 2:
相关(或关联)子查询
We divide the subquery into and by whether the inner query is executed multiple times 不相关(或非关联)子查询
.
The subquery queries the data result from the data table. If the data result is executed only once, and then the data result is executed as the condition of the main query, then such a subquery is called an uncorrelated subquery.
Similarly, if the subquery needs to be executed multiple times, that is, in a looping manner, start with the outer query, pass in the subquery for query each time, and then feed back the results to the outside. This nested execution method is called Correlated subqueries.
2. Single row subquery
2.1 Single row comparison operators
operator | meaning |
---|---|
= | equal to |
> | greater than |
>= | greater than or equal to |
< | less than |
<= | less than or equal to |
<> | not equal to |
2.2 Code example
Topic: Query the information of employees whose salary is greater than the salary of employee No. 149
Topic: Return the name, job_id and salary of the employee whose job_id is the same as employee No. 141 and whose salary is more than that of employee No. 143
SELECT last_name, job_id, salary
FROM employees
WHERE job_id =
(SELECT job_id
FROM employees
WHERE employee_id = 141)
AND salary >
(SELECT salary
FROM employees
WHERE employee_id = 143);
Topic: Return the last_name, job_id and salary of the employee with the lowest salary in the company
SELECT last_name, job_id, salary
FROM employees
WHERE salary =
(SELECT MIN(salary)
FROM employees);
Topic: Query the employee_id, manager_id, department_id of other employees who have the same manager_id and department_id as employee No. 141 or No. 174
Implementation 1: Unpaired comparison
SELECT employee_id, manager_id, department_id
FROM employees
WHERE manager_id IN
(SELECT manager_id
FROM employees
WHERE employee_id IN (174,141))
AND department_id IN
(SELECT department_id
FROM employees
WHERE employee_id IN (174,141))
AND employee_id NOT IN(174,141);
Implementation 2: Pairwise comparison
SELECT employee_id, manager_id, department_id
FROM employees
WHERE (manager_id, department_id) IN
(SELECT manager_id, department_id
FROM employees
WHERE employee_id IN (141,174))
AND employee_id NOT IN (141,174);
2.3 Subqueries in HAVING
- The subquery is executed first.
- Return results to the HAVING clause in the main query.
Topic: Query the department id and its minimum salary whose minimum salary is greater than the minimum salary of department No. 50
SELECT department_id, MIN(salary)
FROM employees
GROUP BY department_id
HAVING MIN(salary) >
(SELECT MIN(salary)
FROM employees
WHERE department_id = 50);
2.4 Subquery in CASE
Use a single-column subquery in a CASE expression:
Topic: Explicit employee_id, last_name and location. Among them, if the employee's department_id is the same as the department_id whose location_id is 1800, then the location is 'Canada', and the rest are 'USA'.
SELECT employee_id, last_name,
(CASE department_id
WHEN
(SELECT department_id FROM departments
WHERE location_id = 1800)
THEN 'Canada' ELSE 'USA' END) location
FROM employees;
2.5 Null value problem in subquery
SELECT last_name, job_id
FROM employees
WHERE job_id =
(SELECT job_id
FROM employees
WHERE last_name = 'Haas');
subquery returns no rows
2.5 Illegal use of subqueries
SELECT employee_id, last_name
FROM employees
WHERE salary =
(SELECT MIN(salary)
FROM employees
GROUP BY department_id);
Multi-row subqueries use single-row comparators
3. Multi-row subqueries
- Also known as a set comparison subquery
- Inner query returns multiple rows
- Using multi-line comparison operators
3.1 Multi-line comparison operators
operator | meaning |
---|---|
IN | is equal to any one in the list |
ANY | Need to be used with a single-row comparison operator to compare with a value returned by a subquery |
ALL | Needs to be used with a single row comparison operator to compare all values returned by the subquery |
SOME | In fact, it is an alias of ANY, which has the same function, and ANY is often used. |
Experience the difference between ANY and ALL
3.2 Code example
Topic: Return the employee number, name, job_id and salary of any employee whose salary is lower than that of any department whose job_id is 'IT_PROG' among other job_ids
Topic: Return the employee number, name, job_id and salary of all employees whose salary is lower than that of the department whose job_id is 'IT_PROG' among other job_ids
Topic: Query the department id with the lowest average salary
#方式1:
SELECT department_id
FROM employees
GROUP BY department_id
HAVING AVG(salary) = (
SELECT MIN(avg_sal)
FROM (
SELECT AVG(salary) avg_sal
FROM employees
GROUP BY department_id
) dept_avg_sal
)
#方式2:
SELECT department_id
FROM employees
GROUP BY department_id
HAVING AVG(salary) <= ALL (
SELECT AVG(salary) avg_sal
FROM employees
GROUP BY department_id
)
3.3 Null problem
SELECT last_name
FROM employees
WHERE employee_id NOT IN (
SELECT manager_id
FROM employees
);
4. Correlated subqueries
4.1 Correlated subquery execution process
If the execution of the subquery depends on the external query, it is usually because the tables in the subquery use external tables and are associated with conditions. Therefore, each time the external query is executed, the subquery must be recalculated. The subquery is called 关联子查询
.
Correlated subqueries are executed row by row, and the subquery is executed once for each row of the main query.
Explanation: Use the columns in the main query in the subquery
4.2 Code example
Topic: Query the last_name, salary and department_id of employees whose salary is greater than the average salary of the department
Method 1: Correlated subqueries
Method 2: Use subquery in FROM
SELECT last_name,salary,e1.department_id
FROM employees e1,(SELECT department_id,AVG(salary) dept_avg_sal FROM employees GROUP BY department_id) e2
WHERE e1.`department_id` = e2.department_id
AND e2.dept_avg_sal < e1.`salary`;
From-type subquery: The subquery is part of from, the subquery must be quoted with (), and an alias must be given to the subquery, and
it is used as a "temporary virtual table".
Use subquery in ORDER BY:
Topic: Query the employee's id, salary, and sort by department_name
SELECT employee_id,salary
FROM employees e
ORDER BY (
SELECT department_name
FROM departments d
WHERE e.`department_id` = d.`department_id`
);
Topic: If the number of employee_id in the employees table is the same as the employee_id in the job_history table is not less than 2, output the employee_id, last_name and job_id of these employees with the same id
SELECT e.employee_id, last_name,e.job_id
FROM employees e
WHERE 2 <= (SELECT COUNT(*)
FROM job_history
WHERE employee_id = e.employee_id);
4.3 EXISTS and NOT EXISTS keywords
- Correlated subqueries are usually used together with the EXISTS operator to check whether there are rows satisfying the condition in the subquery.
- If there are no rows satisfying the condition in the subquery:
- condition returns FALSE
- Continue to find in the subquery
- If there are rows satisfying the condition in the subquery:
- Do not continue searching in the subquery
- condition returns TRUE
- The NOT EXISTS keyword means that if a certain condition does not exist, it returns TRUE, otherwise it returns FALSE.
Topic: Query the employee_id, last_name, job_id, department_id information of the company manager
method one:
SELECT employee_id, last_name, job_id, department_id
FROM employees e1
WHERE EXISTS ( SELECT *
FROM employees e2
WHERE e2.manager_id =
e1.employee_id);
Method 2: Self-join
SELECT DISTINCT e1.employee_id, e1.last_name, e1.job_id, e1.department_id
FROM employees e1 JOIN employees e2
WHERE e1.employee_id = e2.manager_id;
Method 3:
SELECT employee_id,last_name,job_id,department_id
FROM employees
WHERE employee_id IN (
SELECT DISTINCT manager_id
FROM employees
);
Topic: Query the departments table, the department_id and department_name of the departments that do not exist in the employees table
SELECT department_id, department_name
FROM departments d
WHERE NOT EXISTS (SELECT 'X'
FROM employees
WHERE department_id = d.department_id);
4.4 Related updates
UPDATE table1 alias1
SET column = (SELECT expression
FROM table2 alias2
WHERE alias1.column = alias2.column);
Use correlated subqueries to update data in one table based on data in another table.
Topic: Add a department_name field to employees, and the data is the department name corresponding to the employee
# 1)
ALTER TABLE employees
ADD(department_name VARCHAR2(14));
# 2)
UPDATE employees e
SET department_name = (SELECT department_name
FROM departments d
WHERE e.department_id = d.department_id);
4.4 Related deletion
DELETE FROM table1 alias1
WHERE column operator (SELECT expression
FROM table2 alias2
WHERE alias1.column = alias2.column);
Use correlated subqueries to delete data from one table based on data from another table.
Topic: Delete the data in the table employees, which is shared with the emp_history table
DELETE FROM employees e
WHERE employee_id in
(SELECT employee_id
FROM emp_history
WHERE employee_id = e.employee_id);
5. Throw a thought question
**Question:**Who has a higher salary than Abel?
answer:
#方式1:自连接
SELECT e2.last_name,e2.salary
FROM employees e1,employees e2
WHERE e1.last_name = 'Abel'
AND e1.`salary` < e2.`salary`
#方式2:子查询
SELECT last_name,salary
FROM employees
WHERE salary > (
SELECT salary
FROM employees
WHERE last_name = 'Abel'
);
Question: Is there any difference between the above two ways?
Answer: The self-connection method is good!
You can use subqueries or self-joins in the title. In general, it is recommended that you use self-joins, because in the processing of many DBMSs, the processing speed of self-joins is much faster than that of subqueries.
It can be understood in this way: the subquery actually judges the conditions after the query through the unknown table, and the self-join is the condition judgment through the known own data table, so the self-join processing is optimized in most DBMSs.