1. Aggregate function
Aggregate functions: Aggregate functions provided in SQL can be used for statistics, summation, maximum value, and so on.
Classification:
- COUNT: count the number of rows
- SUM: Get the total value of a single column
- AVG: Calculate the average of a column
- MAX: Calculates the maximum value of a column
- MIN: Calculates the minimum value of a column
First, create the data table as follows:
1. Execute column, row count (count):
standard format
SELECT COUNT(<计数规范>) FROM <表名>
Among them, the counting specification includes:
- * : count all selected rows, including NULL values;
- ALL column name: counts all non-null rows of the specified column, if not written, the default is ALL;
- DISTINCT Column Name: Counts unique non-null rows for the specified column.
For example, to count how many students are in the class:
SELECT COUNT(*) FROM t_student;
Filter conditions can also be added, such as the number of female students:
SELECT COUNT(*) FROM t_student WHERE student_sex='女';
If you want to count the number of classes, you need to use DISTINCT:
SELECT COUNT(DISTINCT student_class) FROM t_student;
DISTINCT means deduplication. If DISTINCT is not added, the result is the number of table rows - 5.
2. Return the column total value (SUM):
Note: sum has only two counting specifications, ALL and DISTINCT, without *.
Calculate the sum of the student's ages:
SELECT SUM(student_age) FROM t_student;
3. Returns the column average (AVG):
Calculate the average age of students:
SELECT AVG(student_age)FROM t_student;
4. Return the maximum/minimum value (MAX/MIN):
Find the information of the student with the oldest age (the same is true for the minimum value):
SELECT MAX(student_age) FROM t_student;
Note: Only the maximum age can be obtained here. To display all the information of the oldest student, you need to use the following subquery.
2. GROUP BY:
In SQL, data can be grouped by column name, which is very useful with aggregate functions.
For example, count the number of people in each class:
SELECT student_class,COUNT(ALL student_name) AS 总人数 FROM t_student GROUP BY (student_class);
AS is to define aliases. The use of aliases will have a very good effect when combining and joining queries, which will be discussed later.
The filter condition WHERE can also be added to the grouping, but it must be noted here that the execution order is: WHERE filtering → grouping → aggregation function. keep in mind!
Count the number of students over the age of 20 in each class:
SELECT student_class,COUNT(student_name) AS 总人数 FROM t_student WHERE student_age >20 GROUP BY (student_class);
Three, having filter conditions
I mentioned the execution order of grouping operations, aggregation functions, and WHERE filtering before. What if we want to execute the filter conditions after aggregation?
For example, we want to query for classes whose average age is over 20 years old
Can the following statement be used?
SELECT student_class, AVG(student_age) FROM t_student WHERE AVG(student_age)>20 GROUP BY student_class;
The result will be wrong. Because the aggregate function is executed after the WHERE, it is impossible to add the aggregate function to the WHERE condition here.
Here it can be done using HAIVING:
SELECT student_class,AVG(student_age) AS 平均年龄 FROM t_student GROUP BY (student_class) HAVING AVG(student_age)>20;
One more word here
Execution order of SQL:
- Step 1: Execute FROM
- Step 2: Filter by WHERE condition
- The third step: GROUP BY grouping
- Step 4: Execute SELECT Projection Columns
- Step 5: HAVING condition filtering
- Step 6: Perform ORDER BY sorting
Fourth, the sub query:
Why subqueries?
An existing data table is as follows:
Based on the previous knowledge, we can find out the highest score of each subject, but it is impossible to find out the information of the student who has achieved the highest score. At this time, you need to use subqueries to obtain complete information.
What is a subquery? A subquery is a query nested within the main query.
Subqueries can be nested anywhere in the main query, including SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY.
But not every positional nested subquery is meaningful and practical. Here are some practical subqueries described.
There are two tables: a student table and a class table. id associated
1. Nested in SELECT:
The student information and class name are in different tables. To find out the student's student number, name, and class name in the same table:
SELECT s.student_id,s.student_name,(SELECT class_name FROM t_class c WHERE c.class_id=s.class_id) FROM t_student s GROUP BY s.student_id;
First of all, this SQL statement uses an alias, which is written by adding a character such as FROM t_student s after the table name of FORM, so that when a column of t_student is called later, s.student_id can be used to emphasize that this column comes from the corresponding alias of that table.
The application of aliases in sub-queries and join queries has a very good effect. When two tables have the same column names or to enhance readability, different aliases are added to the tables to distinguish which columns belong to which table. .
In another case, in the subquery or join query, both the main query and the subquery operate on the same table. Adding different aliases to the tables in the main and subqueries can well distinguish which columns are operated on. In the main query, which column operations are performed in the subquery, there will be examples below.
Next, go back to the above SQL statement. It can be seen that the nesting of this subquery is in the SELECT position (the part enclosed in parentheses), which is separated from the student ID and student name by commas and listed in the SELECT position, that is, Say it is a column we want to find out, the
sub query finds out that the class id in the class table is the same as the class id in the student table, pay attention to WHERE c.class_id=s.class_id Here is the alias usage A good representation, distinguishing columns with the same column name in two tables.
Result The
last GROUP BY can be understood as deduplication of duplicate rows, if not added:
2. Nested in WHERE:
Now to find out the information of the students with the highest C language scores:
SELECT * FROM t_student WHERE student_subject='C语言' AND student_score>=ALL (SELECT student_score FROM t_student WHERE student_subject='C语言') ;
result:
There is an ALL here, which is the subquery operator
Classification:
- The ALL operator
is compared with the results of the subquery one by one, and the value of the expression is true only when all of them are satisfied. - The ANY operator
is compared with the results of the subquery one by one, and if one of the records satisfies the condition, the value of the expression is true. - EXISTS/NOT EXISTS operator
EXISTS determines whether there is data in the subquery, if there is data, the expression is true, otherwise it is false. NOT EXISTS is the opposite.
In subqueries or related queries, the maximum value of a certain column is required, usually ALL is used to compare, to the effect that the value larger than other rows is the maximum value.
To find out information about students with higher C scores than Li Si:
SELECT * FROM t_student WHERE student_subject='C语言' AND student_score >(SELECT student_score FROM t_student WHERE student_name='李四' AND student_subject='C语言');
Through the above two examples, you should be able to understand the role of subqueries nested in WHERE. The column value returned in the subquery is used as the comparison object, and different comparison operators are used in the WHERE to compare them to obtain the result.
Now let's go back to the original question, how to find out the information of the students with the highest grades in each course:
SELECT * FROM t_student s1 WHERE s1.student_score >= ALL(SELECT s2.student_score FROM t_student s2 WHERE s1.`student_subject`=s2.student_subject);
Here is the second usage of the alias mentioned above. The main and sub-queries operate on the same table, distinguishing the same column names in the inner and outer tables.
result:
3. Classification of subqueries:
correlated subqueries
Execute data that relies on external queries.
The outer query returns one row, and the subquery is executed once.uncorrelated subqueries
A subquery independent of the outer query.
The subquery is executed once in total, after which the value is passed to the outer query.
In the examples mentioned above, the first example is a correlated sub-query that asks students to correspond to the class name, where WHERE c.class_id=s.class_id is the relevant condition. The other examples only operate on one table and are non-correlated subqueries.
It should be noted that the main query of the correlated subquery is executed once, and the subquery is executed once, which is very time-consuming, especially when there is a lot of data.
5. Combination query:
The two tables are vertically joined by the UNION operator. The basic method is as follows:
SELECT 列1 , 列2 FROM 表1
UNION
SELECT 列3 , 列4 FROM 表2;
UNION ALL to keep duplicate rows:
SELECT 列1 , 列2 FROM 表1
UNION ALL
SELECT 列3 , 列4 FROM 表2;
Combining queries isn't very practical, so I'll just mention it briefly here without giving an example.