Chapter 09_Subqueries

Chapter 09_Subqueries


A subquery refers to a query in which a query statement is nested inside another query statement. This feature has been introduced since MySQL 4.1.

The use of subqueries in SQL greatly enhances the ability of SELECT queries, because many times queries need to obtain data from the result set, or need to calculate a data result from the same table first, and then compare this data result (maybe a certain scalar, or possibly a collection) for comparison.

1. Requirements analysis and problem solving

1.1 Practical issues

1554991034688

Existing solutions:

#方式一:
SELECT salary
FROM employees
WHERE last_name = 'Abel';

SELECT last_name,salary
FROM employees
WHERE salary > 11000;

#方式二:自连接
SELECT e2.last_name,e2.salary
FROM employees e1,employees e2
WHERE e1.last_name = 'Abel'
AND e1.`salary` < e2.`salary`
#方式三:子查询
SELECT last_name,salary
FROM employees
WHERE salary > (
		SELECT salary
		FROM employees
		WHERE last_name = 'Abel'
		);

1554991316599

1.2 Basic use of subqueries

  • The basic grammatical structure of a subquery:

1554991054388

  • A subquery (inner query) is executed all at once before the main query.
  • The result of the subquery is used by the main query (outer query).
  • Precautions
    • Subqueries are enclosed in parentheses
    • Put the subquery on the right side of the comparison condition
    • Single-row operators correspond to single-row subqueries, and multi-row operators correspond to multi-row subqueries

1.3 Classification of subqueries

Classification method 1:

We return one or more records according to the result of the inner query, and divide the subquery into 单行子查询, 多行子查询.

  • single row subquery

    1554991538719

  • multi-row subquery

1554991555669

Classification method 2:

相关(或关联)子查询We divide the subquery into and by whether the inner query is executed multiple times 不相关(或非关联)子查询.

The subquery queries the data result from the data table. If the data result is executed only once, and then the data result is executed as the condition of the main query, then such a subquery is called an uncorrelated subquery.

Similarly, if the subquery needs to be executed multiple times, that is, in a looping manner, start with the outer query, pass in the subquery for query each time, and then feed back the results to the outside. This nested execution method is called Correlated subqueries.

2. Single row subquery

2.1 Single row comparison operators

operator meaning
= equal to
> greater than
>= greater than or equal to
< less than
<= less than or equal to
<> not equal to

2.2 Code example

Topic: Query the information of employees whose salary is greater than the salary of employee No. 149

image-20210914232952626

image-20210914232935062

Topic: Return the name, job_id and salary of the employee whose job_id is the same as employee No. 141 and whose salary is more than that of employee No. 143

SELECT last_name, job_id, salary
FROM   employees
WHERE  job_id =  
                (SELECT job_id
                 FROM   employees
                 WHERE  employee_id = 141)
AND    salary >
                (SELECT salary
                 FROM   employees
                 WHERE  employee_id = 143);

1554991892770

Topic: Return the last_name, job_id and salary of the employee with the lowest salary in the company

SELECT last_name, job_id, salary
FROM   employees
WHERE  salary = 
                (SELECT MIN(salary)
                 FROM   employees);

1554991935186

Topic: Query the employee_id, manager_id, department_id of other employees who have the same manager_id and department_id as employee No. 141 or No. 174

Implementation 1: Unpaired comparison

SELECT  employee_id, manager_id, department_id
FROM    employees
WHERE   manager_id IN
		  (SELECT  manager_id
                   FROM    employees
                   WHERE   employee_id IN (174,141))
AND     department_id IN 
		  (SELECT  department_id
                   FROM    employees
                   WHERE   employee_id IN (174,141))
AND	employee_id NOT IN(174,141);

Implementation 2: Pairwise comparison

SELECT	employee_id, manager_id, department_id
FROM	employees
WHERE  (manager_id, department_id) IN
                      (SELECT manager_id, department_id
                       FROM   employees
                       WHERE  employee_id IN (141,174))
AND	employee_id NOT IN (141,174);

2.3 Subqueries in HAVING

  • The subquery is executed first.
  • Return results to the HAVING clause in the main query.

Topic: Query the department id and its minimum salary whose minimum salary is greater than the minimum salary of department No. 50

SELECT   department_id, MIN(salary)
FROM     employees
GROUP BY department_id
HAVING   MIN(salary) >
                       (SELECT MIN(salary)
                        FROM   employees
                        WHERE  department_id = 50);

2.4 Subquery in CASE

Use a single-column subquery in a CASE expression:

Topic: Explicit employee_id, last_name and location. Among them, if the employee's department_id is the same as the department_id whose location_id is 1800, then the location is 'Canada', and the rest are 'USA'.

SELECT employee_id, last_name,
       (CASE department_id
        WHEN
             (SELECT department_id FROM departments
	      WHERE location_id = 1800)           
        THEN 'Canada' ELSE 'USA' END) location
FROM   employees;

2.5 Null value problem in subquery

SELECT last_name, job_id
FROM   employees
WHERE  job_id =
                (SELECT job_id
                 FROM   employees
                 WHERE  last_name = 'Haas');

1554992067381

subquery returns no rows

2.5 Illegal use of subqueries

SELECT employee_id, last_name
FROM   employees
WHERE  salary =
                (SELECT   MIN(salary)
                 FROM     employees
                 GROUP BY department_id);

1554992135819

Multi-row subqueries use single-row comparators

3. Multi-row subqueries

  • Also known as a set comparison subquery
  • Inner query returns multiple rows
  • Using multi-line comparison operators

3.1 Multi-line comparison operators

operator meaning
IN is equal to any one in the list
ANY Need to be used with a single-row comparison operator to compare with a value returned by a subquery
ALL Needs to be used with a single row comparison operator to compare all values ​​returned by the subquery
SOME In fact, it is an alias of ANY, which has the same function, and ANY is often used.

Experience the difference between ANY and ALL

3.2 Code example

Topic: Return the employee number, name, job_id and salary of any employee whose salary is lower than that of any department whose job_id is 'IT_PROG' among other job_ids

1554992658876

1554992664594

1554992668429

Topic: Return the employee number, name, job_id and salary of all employees whose salary is lower than that of the department whose job_id is 'IT_PROG' among other job_ids

1554992753654

1554992759467

Topic: Query the department id with the lowest average salary

#方式1:
SELECT department_id
FROM employees
GROUP BY department_id
HAVING AVG(salary) = (
			SELECT MIN(avg_sal)
			FROM (
				SELECT AVG(salary) avg_sal
				FROM employees
				GROUP BY department_id
				) dept_avg_sal
			)
#方式2:
SELECT department_id
FROM employees
GROUP BY department_id
HAVING AVG(salary) <= ALL (
				SELECT AVG(salary) avg_sal
				FROM employees
				GROUP BY department_id
)

3.3 Null problem

SELECT last_name
FROM employees
WHERE employee_id NOT IN (
			SELECT manager_id
			FROM employees
			);

image-20211027195906773

4. Correlated subqueries

4.1 Correlated subquery execution process

If the execution of the subquery depends on the external query, it is usually because the tables in the subquery use external tables and are associated with conditions. Therefore, each time the external query is executed, the subquery must be recalculated. The subquery is called 关联子查询.

Correlated subqueries are executed row by row, and the subquery is executed once for each row of the main query.

Explanation: Use the columns in the main query in the subquery

4.2 Code example

Topic: Query the last_name, salary and department_id of employees whose salary is greater than the average salary of the department

Method 1: Correlated subqueries

1554992986225

Method 2: Use subquery in FROM

SELECT last_name,salary,e1.department_id
FROM employees e1,(SELECT department_id,AVG(salary) dept_avg_sal FROM employees GROUP BY department_id) e2
WHERE e1.`department_id` = e2.department_id
AND e2.dept_avg_sal < e1.`salary`;

From-type subquery: The subquery is part of from, the subquery must be quoted with (), and an alias must be given to the subquery, and
it is used as a "temporary virtual table".

Use subquery in ORDER BY:

Topic: Query the employee's id, salary, and sort by department_name

SELECT employee_id,salary
FROM employees e
ORDER BY (
	  SELECT department_name
	  FROM departments d
	  WHERE e.`department_id` = d.`department_id`
	);

Topic: If the number of employee_id in the employees table is the same as the employee_id in the job_history table is not less than 2, output the employee_id, last_name and job_id of these employees with the same id

SELECT e.employee_id, last_name,e.job_id
FROM   employees e 
WHERE  2 <= (SELECT COUNT(*)
             FROM   job_history 
             WHERE  employee_id = e.employee_id);

4.3 EXISTS and NOT EXISTS keywords

  • Correlated subqueries are usually used together with the EXISTS operator to check whether there are rows satisfying the condition in the subquery.
  • If there are no rows satisfying the condition in the subquery:
    • condition returns FALSE
    • Continue to find in the subquery
  • If there are rows satisfying the condition in the subquery:
    • Do not continue searching in the subquery
    • condition returns TRUE
  • The NOT EXISTS keyword means that if a certain condition does not exist, it returns TRUE, otherwise it returns FALSE.

Topic: Query the employee_id, last_name, job_id, department_id information of the company manager

method one:

SELECT employee_id, last_name, job_id, department_id
FROM   employees e1
WHERE  EXISTS ( SELECT *
                 FROM   employees e2
                 WHERE  e2.manager_id = 
                        e1.employee_id);

Method 2: Self-join

SELECT DISTINCT e1.employee_id, e1.last_name, e1.job_id, e1.department_id
FROM   employees e1 JOIN employees e2
WHERE e1.employee_id = e2.manager_id;

Method 3:

SELECT employee_id,last_name,job_id,department_id
FROM employees
WHERE employee_id IN (
		     SELECT DISTINCT manager_id
		     FROM employees
		     
		     );

Topic: Query the departments table, the department_id and department_name of the departments that do not exist in the employees table

SELECT department_id, department_name
FROM departments d
WHERE NOT EXISTS (SELECT 'X'
                  FROM   employees
                  WHERE  department_id = d.department_id);

1554993169269

4.4 Related updates

UPDATE table1 alias1
SET    column = (SELECT expression
                 FROM   table2 alias2
                 WHERE  alias1.column = alias2.column);

Use correlated subqueries to update data in one table based on data in another table.

Topic: Add a department_name field to employees, and the data is the department name corresponding to the employee

# 1)
ALTER TABLE employees
ADD(department_name VARCHAR2(14));

# 2)
UPDATE employees e
SET department_name =  (SELECT department_name 
	                       FROM   departments d
	                       WHERE  e.department_id = d.department_id);

4.4 Related deletion

 DELETE FROM table1 alias1
 WHERE column operator (SELECT expression
                        FROM   table2 alias2
                        WHERE  alias1.column = alias2.column);

Use correlated subqueries to delete data from one table based on data from another table.

Topic: Delete the data in the table employees, which is shared with the emp_history table

DELETE FROM employees e
WHERE employee_id in  
           (SELECT employee_id
            FROM   emp_history 
            WHERE  employee_id = e.employee_id);

5. Throw a thought question

**Question:**Who has a higher salary than Abel?

answer:

#方式1:自连接
SELECT e2.last_name,e2.salary
FROM employees e1,employees e2
WHERE e1.last_name = 'Abel'
AND e1.`salary` < e2.`salary`
#方式2:子查询
SELECT last_name,salary
FROM employees
WHERE salary > (
		SELECT salary
		FROM employees
		WHERE last_name = 'Abel'
		);

Question: Is there any difference between the above two ways?

Answer: The self-connection method is good!

You can use subqueries or self-joins in the title. In general, it is recommended that you use self-joins, because in the processing of many DBMSs, the processing speed of self-joins is much faster than that of subqueries.

It can be understood in this way: the subquery actually judges the conditions after the query through the unknown table, and the self-join is the condition judgment through the known own data table, so the self-join processing is optimized in most DBMSs.

Guess you like

Origin blog.csdn.net/qq_29216579/article/details/130778200