SQL handles duplicate columns to better sort out grouping and partitioning


1. Group statistics and partition rankings

1. Grammar and meaning:

If you are confused about the query results, read the second part - sql processing of duplicate columns to better clarify grouping and partitioning. There are sql statements for creating tables and inserting data.


Group statistics: GROUP BY is used in conjunction with statistical/aggregation functions

-- 举例子: 按照性别统计男生、女生的人数
select sex,count(distinct id) sex_num from student_score group by sex;

Partition ranking: ROW_NUMBER() OVER(PARTITION BY partition field ORDER BY ascending/descending field [DESC])

-- 举例子: 按照性别-男生、女生进行分区,按照成绩进行降序
select id,name,sex,score,
ROW_NUMBER() OVER(PARTITION BY sex ORDER BY score DESC) rn
from student_score;


2. Precautions for use:

▷ The ranking function row_number() requires mysql version 8 or above!

▷ For group by statistics, error problems that are likely to occur:

因为规定要求 select 列表的字段非聚合字段,必须出现在group by后面进行分组

报错:Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column ‘数据库.表.字段’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Expression for the SELECT list - is not in a GROUP BY clause and contains the non-aggregate column 'database.table.field'.

▷ For the ranking function ROW_NUMBER, error problems that may easily occur:

  • Generally, there is a problem with your partition field. You can insist on the partition field! For example, in hive, the partition field is get_json_object(map_col,'$.title'), but one ' is missing

报错:Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.

Failed to break window calls into groups. At least 1 group must depend on input columns only. Also check for circular dependencies.



2. SQL handles duplicate columns to better sort out grouping and partitioning

1. SQL statements - statements for creating tables and inserting data

DROP TABLE IF EXISTS `student_score`;
CREATE TABLE `student_score` (
  `id`  int(6),
  `name` varchar(255),
  `sex` varchar(255),
  `subject` varchar(30),
  `score` float
) ENGINE = InnoDB;


INSERT INTO `student_score` VALUES (1, '小明', '男','语文', 80);
INSERT INTO `student_score` VALUES (2, '小红', '女','语文', 70);
INSERT INTO `student_score` VALUES (3, '小哈', '女','语文', 88);
INSERT INTO `student_score` VALUES (1, '小明', '男','数学', 66);
INSERT INTO `student_score` VALUES (2, '小红', '女','数学', 70);
INSERT INTO `student_score` VALUES (3, '小哈', '女','数学', 89);
INSERT INTO `student_score` VALUES (1, '小明', '男','英语', 80);
INSERT INTO `student_score` VALUES (2, '小红', '女','英语', 70);
INSERT INTO `student_score` VALUES (3, '小哈', '女','英语', 68);

2. Query the scores of all students:

  • select * from student_score;


3. As a result, there are duplicate column values.

The subject names corresponding to the corresponding grades are displayed in the form of columns, resulting in duplication of Chinese, Chinese, and Chinese.


4-1. Process duplicate column values ​​- method 1 - 合并去除重复[column to row]

Corresponding to common SQL application scenarios, there are two implementation methods for counting the scores of each student in each subject , one is the group statistics method, and the other is the partition ranking method.

Group statistics:

select id,name,sex,
	max(case when subject='语文' then score else 0 end) as chinese,
	max(case when subject='英语' then score else 0 end) as english,
	max(case when subject='数学' then score else 0 end) as math
from student_score 
group by id
order by score desc
  • result:

Sort by scores in descending order. You can see that the scores of the first subject - Chinese are selected by default for descending order.


4-2. Process duplicate columns-method 2-process duplicate column values排名

Division Ranking

select id,name,subject,score,
       row_number() over(partition by subject order by score desc) rn
from student_score;



3. Summarize the differences between grouping and partitioning

For example, grouping by subject or partitioning by subject, then grouping is the result of one column value (one record of data), and partitioning is the result of multiple column values ​​(multiple records of data).

Group-one record

Partition - multiple records




If this article is helpful to you, please remember to give Yile a like, thank you!

Guess you like

Origin blog.csdn.net/weixin_45630258/article/details/129467898