SQL Statements - Aggregate Functions, Grouping, Subqueries and Combined Queries

1. Aggregate function

Aggregate functions: Aggregate functions provided in SQL can be used for statistics, summation, maximum value, and so on.

Classification:

COUNT: count the number of rows
SUM: Get the total value of a single column
AVG: Calculate the average of a column
MAX: Calculates the maximum value of a column
MIN: Calculates the minimum value of a column

First, create the data table as follows:
write picture description here

1. Execute column, row count (count):

standard format

SELECT COUNT(<计数规范>) FROM <表名>

Among them, the counting specification includes:

* : count all selected rows, including NULL values;
ALL column name: counts all non-null rows of the specified column, if not written, the default is ALL;
DISTINCT Column Name: Counts unique non-null rows for the specified column.

For example, to count how many students are in the class:

SELECT COUNT(*) FROM t_student;

write picture description here

Filter conditions can also be added, such as the number of female students:

SELECT COUNT(*) FROM t_student WHERE student_sex='女';

write picture description here

If you want to count the number of classes, you need to use DISTINCT:

SELECT COUNT(DISTINCT student_class) FROM t_student;

write picture description here

DISTINCT means deduplication. If DISTINCT is not added, the result is the number of table rows - 5.

2. Return the column total value (SUM):

Note: sum has only two counting specifications, ALL and DISTINCT, without *.

Calculate the sum of the student's ages:

SELECT SUM(student_age) FROM t_student;

write picture description here

3. Returns the column average (AVG):

Calculate the average age of students:

SELECT AVG(student_age)FROM t_student;

write picture description here

4. Return the maximum/minimum value (MAX/MIN):

Find the information of the student with the oldest age (the same is true for the minimum value):

SELECT MAX(student_age) FROM t_student;

write picture description here

Note: Only the maximum age can be obtained here. To display all the information of the oldest student, you need to use the following subquery.

2. GROUP BY:

In SQL, data can be grouped by column name, which is very useful with aggregate functions.
For example, count the number of people in each class:

SELECT student_class,COUNT(ALL student_name) AS 总人数 FROM t_student GROUP BY (student_class);

AS is to define aliases. The use of aliases will have a very good effect when combining and joining queries, which will be discussed later.

write picture description here

The filter condition WHERE can also be added to the grouping, but it must be noted here that the execution order is: WHERE filtering → grouping → aggregation function. keep in mind!

Count the number of students over the age of 20 in each class:

SELECT student_class,COUNT(student_name) AS 总人数 FROM t_student WHERE student_age >20 GROUP BY (student_class);

write picture description here

Three, having filter conditions

I mentioned the execution order of grouping operations, aggregation functions, and WHERE filtering before. What if we want to execute the filter conditions after aggregation?

For example, we want to query for classes whose average age is over 20 years old

Can the following statement be used?

SELECT student_class, AVG(student_age) FROM t_student WHERE AVG(student_age)>20 GROUP BY student_class;

The result will be wrong. Because the aggregate function is executed after the WHERE, it is impossible to add the aggregate function to the WHERE condition here.

Here it can be done using HAIVING:

SELECT student_class,AVG(student_age) AS 平均年龄 FROM t_student GROUP BY (student_class) HAVING AVG(student_age)>20;

write picture description here

One more word here

Execution order of SQL:

Step 1: Execute FROM
Step 2: Filter by WHERE condition
The third step: GROUP BY grouping
Step 4: Execute SELECT Projection Columns
Step 5: HAVING condition filtering
Step 6: Perform ORDER BY sorting

Fourth, the sub query:

Why subqueries?
An existing data table is as follows:
write picture description here

Based on the previous knowledge, we can find out the highest score of each subject, but it is impossible to find out the information of the student who has achieved the highest score. At this time, you need to use subqueries to obtain complete information.

What is a subquery? A subquery is a query nested within the main query.
Subqueries can be nested anywhere in the main query, including SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY.
But not every positional nested subquery is meaningful and practical. Here are some practical subqueries described.
There are two tables: a student table and a class table. id associated

write picture description here

1. Nested in SELECT:

The student information and class name are in different tables. To find out the student's student number, name, and class name in the same table:

SELECT s.student_id,s.student_name,(SELECT class_name FROM t_class c WHERE c.class_id=s.class_id) FROM t_student s GROUP BY s.student_id;

First of all, this SQL statement uses an alias, which is written by adding a character such as FROM t_student s after the table name of FORM, so that when a column of t_student is called later, s.student_id can be used to emphasize that this column comes from the corresponding alias of that table.

The application of aliases in sub-queries and join queries has a very good effect. When two tables have the same column names or to enhance readability, different aliases are added to the tables to distinguish which columns belong to which table. .

In another case, in the subquery or join query, both the main query and the subquery operate on the same table. Adding different aliases to the tables in the main and subqueries can well distinguish which columns are operated on. In the main query, which column operations are performed in the subquery, there will be examples below.

Next, go back to the above SQL statement. It can be seen that the nesting of this subquery is in the SELECT position (the part enclosed in parentheses), which is separated from the student ID and student name by commas and listed in the SELECT position, that is, Say it is a column we want to find out, the
sub query finds out that the class id in the class table is the same as the class id in the student table, pay attention to WHERE c.class_id=s.class_id Here is the alias usage A good representation, distinguishing columns with the same column name in two tables.
Result The
write picture description here
last GROUP BY can be understood as deduplication of duplicate rows, if not added:

2. Nested in WHERE:

Now to find out the information of the students with the highest C language scores:

SELECT * FROM t_student WHERE student_subject='C语言' AND student_score>=ALL (SELECT student_score FROM t_student WHERE student_subject='C语言') ;

result:

write picture description here

There is an ALL here, which is the subquery operator

Classification:

The ALL operator
　　is compared with the results of the subquery one by one, and the value of the expression is true only when all of them are satisfied.
The ANY operator
　　is compared with the results of the subquery one by one, and if one of the records satisfies the condition, the value of the expression is true.
EXISTS/NOT EXISTS operator
　　EXISTS determines whether there is data in the subquery, if there is data, the expression is true, otherwise it is false. NOT EXISTS is the opposite.

In subqueries or related queries, the maximum value of a certain column is required, usually ALL is used to compare, to the effect that the value larger than other rows is the maximum value.

To find out information about students with higher C scores than Li Si:

SELECT * FROM t_student WHERE student_subject='C语言' AND student_score >(SELECT student_score FROM t_student WHERE student_name='李四' AND student_subject='C语言');

write picture description here

Through the above two examples, you should be able to understand the role of subqueries nested in WHERE. The column value returned in the subquery is used as the comparison object, and different comparison operators are used in the WHERE to compare them to obtain the result.

Now let's go back to the original question, how to find out the information of the students with the highest grades in each course:

SELECT * FROM t_student s1 WHERE s1.student_score >= ALL(SELECT s2.student_score FROM t_student s2 WHERE s1.`student_subject`=s2.student_subject);

Here is the second usage of the alias mentioned above. The main and sub-queries operate on the same table, distinguishing the same column names in the inner and outer tables.

result:

write picture description here

3. Classification of subqueries:

correlated subqueries

　　Execute data that relies on external queries.
　　The outer query returns one row, and the subquery is executed once.
uncorrelated subqueries

　　A subquery independent of the outer query.
　　The subquery is executed once in total, after which the value is passed to the outer query.

In the examples mentioned above, the first example is a correlated sub-query that asks students to correspond to the class name, where WHERE c.class_id=s.class_id is the relevant condition. The other examples only operate on one table and are non-correlated subqueries.

It should be noted that the main query of the correlated subquery is executed once, and the subquery is executed once, which is very time-consuming, especially when there is a lot of data.

5. Combination query:

The two tables are vertically joined by the UNION operator. The basic method is as follows:

SELECT 列1 , 列2 FROM 表1
UNION
SELECT 列3 , 列4 FROM 表2;

UNION ALL to keep duplicate rows:

SELECT 列1 , 列2 FROM 表1
UNION ALL
SELECT 列3 , 列4 FROM 表2;

Combining queries isn't very practical, so I'll just mention it briefly here without giving an example.