Data Analyst ---- SQL Strengthening (2)

Data Analyst ---- SQL Strengthening (2)

Topic 1: Implementing text processing with SQL

Existing test paper information table examination_info (exam_id test paper ID, tag test paper category, difficulty test paper difficulty, duration test duration) A
insert image description here
student who recorded the test paper made a mistake and entered some of the recorded test test category tag, difficulty and duration into the tag field at the same time.
Please help to find Output these wrongly recorded records, and output them according to the correct column type after splitting.
The result output from the sample data is as follows:
insert image description here
Create table

drop table if exists examination_info,exam_record;
CREATE TABLE examination_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
    tag varchar(32) COMMENT '类别标签',
    difficulty varchar(8) COMMENT '难度',
    duration int NOT NULL COMMENT '时长',
    release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
  (9001, '算法', 'hard', 60, '2020-01-01 10:00:00'),
  (9002, '算法', 'hard', 80, '2020-01-01 10:00:00'),
  (9003, 'SQL', 'medium', 70, '2020-01-01 10:00:00'),
  (9004, '算法,medium,80','', 0, '2020-01-01 10:00:00');

Analysis of the meaning of the question:
Through the meaning of the question, we can understand that the main thing to do is to fill the wrong line with data and split the string, so that the correct data can be put into the corresponding field

select  exam_id,
substring_index(tag,",",1) tag,
substring_index(substring_index(tag,",",-2),",",1) difficulty,
substring_index(tag,",",-1) duration
from examination_info
where difficulty=''

Knowledge points involved:
string splitting:substring_index(str, delim, count)

parameter name explain
str the string to split
I share Delimiter, split by a character
count When count is a positive number, all characters before the nth delimiter are taken; when count is negative, all characters after the last nth delimiter are taken.

Can be nested

You can also use regexp_substrfunctions to split using regular expressions

Topic 2: All songs with the top three most played languages

Song table: songplay
insert image description here
Language table: languageid
insert image description here
Create table:

drop table if exists  songplay;
create table `songplay`(
`id` int,
`playcnt` int,
`languageid` int
);
insert into songplay values(1,85001,1);
insert into songplay  values(2,80001,2);
insert into  songplay  values(3,60001,2);
insert into  songplay values(4,90001,1);
insert into  songplay values(5,69001,1);
insert into  songplay values(6,85001,1);
insert into  songplay values(7,70001,1);

drop table if exists language;
create table `language`(
`id` int,
`name` varchar(255)
);
insert into  language  values(1,'中文');
insert into  language values(2,'英文');

Analysis of the meaning of the question:
The question is to query all the songs with the highest playback volume in different languages. When the playback volume is the same, the ranking is the same, so at this time, it is necessary to consider using the function to establish the ranking first, and finally dense_ranktake the ranking of each of the top 3 songs. song.

The DENSE_RANK() function sorts the serial numbers in parallel, and does not skip repeated serial numbers, such as serial numbers 1, 1, 2

select language_name,songid,playcnt
from (
	select s.id songid,
	l.name language_name,s.playcnt,
	# 关键
	dense_rank() over(partition by name order by s.playcnt desc) rank_num
	from songplay s join language l
	on s.languageid = l.id
	# 排序
	order by l.id
) tmp
where rank_num<4

Note that there is a hidden requirement in this topic, that is, the order of returned languages ​​needs to be consistent with the order of appearance in the language table, so you needorder by l.id

Key code interpretation:

dense_rank() over(partition by name order by s.playcnt desc) rank_num

Use the window function dense_rank()to not skip the serial number sorting, group by and sort in descending order
partition by nameaccording toname
order by s.playcnt descs.playcnt

Summarize:

This time, the two SQL test questions are relatively basic. Question 1 mainly examines substring_indexthe application of functions, and question 2 examines the application of window function dense_rank()neutralization over().

Guess you like

Origin blog.csdn.net/qq_52007481/article/details/130170086