分组
我们想要查看每个柜员创建了多少账户。利用group by实现分组;利用聚集函数count()计算每个分组的行数,*表示对分组的所有列计数。
若是想过滤数据,并且过滤条件针对分组数据而不是原始数据。由于group by子句在where子句被评估之后运行,所以无法利用where子句增加过滤条件。所以必须使用having子句。
mysql> SELECT open_emp_id,COUNT(*) how_many FROM account
-> WHERE COUNT(*)>4 GROUP BY open_emp_id; #在where子句时,分组还未被创建
ERROR 1111 (HY000): Invalid use of group function
mysql> SELECT open_emp_id,COUNT(*) how_many FROM account
-> GROUP BY open_emp_id HAVING COUNT(*)>4;
+-------------+----------+
| open_emp_id | how_many |
+-------------+----------+
| 1 | 8 |
| 10 | 7 |
| 16 | 6 |
+-------------+----------+
聚集函数
Max() | 返回集合中最大值 | 忽略遇到的null值 |
Min() | 返回集合中最小值 | |
Avg() | 返回集合中平均值 | |
Sum() | 返回集合中所有值的和 | |
Count() | 返回集合中值的个数 |
mysql> SELECT MAX(avail_balance) max_balance,MIN(avail_balance) min_balance,
-> AVG(avail_balance) avg_balance, SUM(avail_balance) sum_balance,
-> COUNT(*) num_accounts FROM account WHERE product_cd = 'CHK';
+-------------+-------------+-------------+-------------+--------------+
| max_balance | min_balance | avg_balance | sum_balance | num_accounts |
+-------------+-------------+-------------+-------------+--------------+
| 38552.05 | 122.37 | 7300.800985 | 73008.01 | 10 |
+-------------+-------------+-------------+-------------+--------------+
除了使用列作为聚集函数的参数外,还可以创建表达式作为参数。
mysql> SELECT MAX(pending_balance - avail_balance) max_uncleared FROM account;
+---------------+
| max_uncleared |
+---------------+
| 660.00 |
+---------------+
-
隐式显式分组
在上一例子中,用过滤条件product_cd = 'CHK'指定分组上的所有行,属于隐式分组。我们想获取所有product_cd的分组结果,即需要额外的一列product_cd 值。
mysql> SELECT product_cd MAX(avail_balance) max_balance,MIN(avail_balance) min_balance,
-> AVG(avail_balance) avg_balance, SUM(avail_balance) sum_balance,
-> COUNT(*) num_accounts FROM account;
由于上面命令并没有显式地指定如何对数据分组而导致查询失败,所以要为它增加一个group by子句以指定聚集函数所作用行的分组
mysql> SELECT product_cd, MAX(avail_balance) max_balance,MIN(avail_balance) min_balance,
-> AVG(avail_balance) avg_balance, SUM(avail_balance) sum_balance,
-> COUNT(*) num_accounts FROM account GROUP BY product_cd;
+------------+-------------+-------------+--------------+-------------+--------------+
| product_cd | max_balance | min_balance | avg_balance | sum_balance | num_accounts |
+------------+-------------+-------------+--------------+-------------+--------------+
| BUS | 9345.55 | 0.00 | 4672.774902 | 9345.55 | 2 |
| CD | 10000.00 | 1500.00 | 4875.000000 | 19500.00 | 4 |
| CHK | 38552.05 | 122.37 | 7300.800985 | 73008.01 | 10 |
| MM | 9345.55 | 2212.50 | 5681.713216 | 17045.14 | 3 |
| SAV | 767.77 | 200.00 | 463.940002 | 1855.76 | 4 |
| SBL | 50000.00 | 50000.00 | 50000.000000 | 50000.00 | 1 |
+------------+-------------+-------------+--------------+-------------+--------------+
-
count()
当使用count确定每个分组成员数目时,可以选择对分组中的所有成员计数还是利用distinct关键字只计数某个列的不同值
mysql> SELECT account_id,open_emp_id FROM account;
+------------+-------------+
| account_id | open_emp_id |
+------------+-------------+
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 14 | 1 |
| 15 | 1 |
| 21 | 1 |
| 22 | 1 |
| 23 | 1 |
| 1 | 10 |
| 2 | 10 |
| 3 | 10 |
| 4 | 10 |
| 5 | 10 |
| 17 | 10 |
| 27 | 10 |
| 7 | 13 |
| 8 | 13 |
| 29 | 13 |
| 13 | 16 |
| 18 | 16 |
| 19 | 16 |
| 24 | 16 |
| 25 | 16 |
| 28 | 16 |
+------------+-------------+
mysql> SELECT COUNT(open_emp_id) FROM account;
+--------------------+
| COUNT(open_emp_id) |
+--------------------+
| 24 |
+--------------------+
mysql> SELECT COUNT(DISTINCT open_emp_id) FROM account;
+-----------------------------+
| COUNT(DISTINCT open_emp_id) |
+-----------------------------+
| 4 |
+-----------------------------+
分组形式
-
单列分组
mysql> SELECT product_cd, SUM(avail_balance) prod_balance FROM account GROUP BY product_cd;
+------------+--------------+
| product_cd | prod_balance |
+------------+--------------+
| BUS | 9345.55 |
| CD | 19500.00 |
| CHK | 73008.01 |
| MM | 17045.14 |
| SAV | 1855.76 |
| SBL | 50000.00 |
+------------+--------------+
-
多行分组
mysql> SELECT product_cd, open_branch_id, SUM(avail_balance) tot_balance FROM account
-> GROUP BY product_cd, open_branch_id ORDER BY product_cd;
+------------+----------------+-------------+
| product_cd | open_branch_id | tot_balance |
+------------+----------------+-------------+
| BUS | 2 | 9345.55 |
| BUS | 4 | 0.00 |
| CD | 1 | 11500.00 |
| CD | 2 | 8000.00 |
| CHK | 1 | 782.16 |
| CHK | 2 | 3315.77 |
| CHK | 3 | 1057.75 |
| CHK | 4 | 67852.33 |
| MM | 1 | 14832.64 |
| MM | 3 | 2212.50 |
| SAV | 1 | 767.77 |
| SAV | 2 | 700.00 |
| SAV | 4 | 387.99 |
| SBL | 3 | 50000.00 |
+------------+----------------+-------------+
-
表达式分组
mysql> SELECT EXTRACT(YEAR FROM start_date) year, COUNT(*) how_many FROM employee
-> GROUP BY EXTRACT(YEAR FROM start_date);
+------+----------+
| year | how_many |
+------+----------+
| 2005 | 2 |
| 2006 | 8 |
| 2015 | 1 |
| 2007 | 3 |
| 2008 | 2 |
| 2004 | 2 |
+------+----------+
-
对每个分组产生合计数
对结果进行分组后,每一个分组的总和也要显示在每组数据的下面。利用with rollup选项来请求数据库服务器完成。
mysql> SELECT product_cd, open_branch_id, SUM(avail_balance) tot_balance
-> FROM account GROUP BY product_cd, open_branch_id WITH ROLLUP;
+------------+----------------+-------------+
| product_cd | open_branch_id | tot_balance |
+------------+----------------+-------------+
| BUS | 2 | 9345.55 |
| BUS | 4 | 0.00 |
| BUS | NULL | 9345.55 |
| CD | 1 | 11500.00 |
| CD | 2 | 8000.00 |
| CD | NULL | 19500.00 |
| CHK | 1 | 782.16 |
| CHK | 2 | 3315.77 |
| CHK | 3 | 1057.75 |
| CHK | 4 | 67852.33 |
| CHK | NULL | 73008.01 |
| MM | 1 | 14832.64 |
| MM | 3 | 2212.50 |
| MM | NULL | 17045.14 |
| SAV | 1 | 767.77 |
| SAV | 2 | 700.00 |
| SAV | 4 | 387.99 |
| SAV | NULL | 1855.76 |
| SBL | 3 | 50000.00 |
| SBL | NULL | 50000.00 |
| NULL | NULL | 170754.46 |
+------------+----------------+-------------+
-
分组过滤条件
mysql> SELECT product_cd, SUM(avail_balance) prod_balance FROM account WHERE status = 'ACTIVE'
-> GROUP BY product_cd HAVING SUM(avail_balance) >= 10000;
+------------+--------------+
| product_cd | prod_balance |
+------------+--------------+
| CD | 19500.00 |
| CHK | 73008.01 |
| MM | 17045.14 |
| SBL | 50000.00 |
+------------+--------------+
mysql> SELECT product_cd, SUM(avail_balance) prod_balance FROM account WHERE status = 'ACTIVE'
-> GROUP BY product_cd HAVING MAX(avail_balance) <= 10000 AND MIN(avail_balance) >= 1000;
+------------+--------------+
| product_cd | prod_balance |
+------------+--------------+
| CD | 19500.00 |
| MM | 17045.14 |
+------------+--------------+