GROUP BY group query and SQL execution order.

Using GROUP BY in SQL to group data from SELECT results, you need to know some important rules before using GROUP BY.

  • The GROUP BY clause can contain any number of columns. This means that groups can be regrouped to provide more granular control over data grouping.
  • If multiple groupings are specified in the GROUP BY clause, the data will be aggregated on the last specified grouping.
  • Each column listed in the GROUP BY clause must be a retrieval column or a valid expression (but not an aggregate function). If an expression is used in the SELECT, the same expression must be specified in the GROUP BY clause. Aliases cannot be used.
  • Except for aggregate calculation statements, every column in a SELECT statement must be given in the GROUP BY clause.
  • If there are NULL values ​​in the grouping column, NULL will be returned as a grouping. If there are multiple rows of NULL values, they will be grouped together.
  • The GROUP BY clause must come after the WHERE clause and before the ORDER BY.

filter grouping

Excessive use of the HAVING clause for grouping. The HAVING clause supports all WHERE operations. The difference between HAVING and WHERE is that WHERE filters rows, while HAVING filters groups.

Another way to understand the difference between WHERE and HAVING is that WHERE filters before grouping while HAVING filters after grouping on a per-group basis.

grouping and sorting

In general, when using the GROUP BY clause, the ORDER BY clause should also be used. This is the only way to guarantee that the data is sorted correctly.

Execution order of SQL SELECT statement:

  1. The from clause assembles data from different data sources;
  2. where clause filters rows of records based on specified conditions;
  3. The group by clause divides the data into multiple groups;
  4. Use aggregate functions for calculations;
  5. Use the having clause to filter groups;
  6. Evaluate all expressions;
  7. Use order by to sort the result set;
  8. select collection output.

Take an example.

1 select 考生姓名, max(总成绩) as max总成绩
2 from tb_Grade
3 where 考生姓名 is not null
4 group by 考生姓名
5 having max(总成绩) > 600
6 order by max总成绩

The order of execution of the SQL statements in the above example is as follows:

  1. First execute the FROM clause to assemble the data of the data source from the tb_Grade table
  2. Execute the WHERE clause to filter all data in the tb_Grade table that are not NULL
  3. 执行 GROUP BY 子句, 把 tb_Grade 表按 "学生姓名" 列进行分组
  4. 计算 max() 聚集函数, 按 "总成绩" 求出总成绩中最大的一些数值
  5. 执行 HAVING 子句, 筛选课程的总成绩大于 600 分的.
  6. 执行 ORDER BY 子句, 把最后的结果按 "Max 成绩" 进行排序.
注:如果使用了连接join和on,则会在where执行之前先执行on,然后执行join,接着才去执行where。

 
附:
MySQL中的聚集函数:
1、count()返回某列的行数
2、avg()返回某列的平均值
3、max()返回某列的最大值
4、min()返回某列的最小值
5、sum()返回某列的和
6、distinct 去除重复值
注:avg()忽略值为null的行,count(*)时统计所有行,count(列)时忽略为null的行

MySQL的语句执行顺序

MySQL的语句一共分为11步,如下图所标注的那样,最先执行的总是FROM操作,最后执行的是LIMIT操作。其中每一个操作都会产生一张虚拟的表,这个虚拟的表作为一个处理的输入,只是这些虚拟的表对用户来说是透明的,但是只有最后一个虚拟的表才会被作为结果返回。如果没有在语句中指定某一个子句,那么将会跳过相应的步骤。

下面我们来具体分析一下查询处理的每一个阶段

  1. FORM: 对FROM的左边的表和右边的表计算笛卡尔积。产生虚表VT1
  2. ON: 对虚表VT1进行ON筛选,只有那些符合<join-condition>的行才会被记录在虚表VT2中。
  3. JOIN: 如果指定了OUTER JOIN(比如left join、 right join),那么保留表中未匹配的行就会作为外部行添加到虚拟表VT2中,产生虚拟表VT3, rug from子句中包含两个以上的表的话,那么就会对上一个join连接产生的结果VT3和下一个表重复执行步骤1~3这三个步骤,一直到处理完所有的表为止。
  4. WHERE: 对虚拟表VT3进行WHERE条件过滤。只有符合<where-condition>的记录才会被插入到虚拟表VT4中。
  5. GROUP BY: 根据group by子句中的列,对VT4中的记录进行分组操作,产生VT5.
  6. CUBE | ROLLUP: 对表VT5进行cube或者rollup操作,产生表VT6.
  7. HAVING: 对虚拟表VT6应用having过滤,只有符合<having-condition>的记录才会被 插入到虚拟表VT7中。
  8. SELECT: 执行select操作,选择指定的列,插入到虚拟表VT8中。
  9. DISTINCT: 对VT8中的记录进行去重。产生虚拟表VT9.
  10. ORDER BY: 将虚拟表VT9中的记录按照<order_by_list>进行排序操作,产生虚拟表VT10.
  11. LIMIT:取出指定行的记录,产生虚拟表VT11, 并将结果返回。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326399818&siteId=291194637
Recommended