Data Analyst ---- SQL Strengthening (2)
Article Directory
Topic 1: Implementing text processing with SQL
Existing test paper information table examination_info (exam_id test paper ID, tag test paper category, difficulty test paper difficulty, duration test duration) A
student who recorded the test paper made a mistake and entered some of the recorded test test category tag, difficulty and duration into the tag field at the same time.
Please help to find Output these wrongly recorded records, and output them according to the correct column type after splitting.
The result output from the sample data is as follows:
Create table
drop table if exists examination_info,exam_record;
CREATE TABLE examination_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
tag varchar(32) COMMENT '类别标签',
difficulty varchar(8) COMMENT '难度',
duration int NOT NULL COMMENT '时长',
release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
(9001, '算法', 'hard', 60, '2020-01-01 10:00:00'),
(9002, '算法', 'hard', 80, '2020-01-01 10:00:00'),
(9003, 'SQL', 'medium', 70, '2020-01-01 10:00:00'),
(9004, '算法,medium,80','', 0, '2020-01-01 10:00:00');
Analysis of the meaning of the question:
Through the meaning of the question, we can understand that the main thing to do is to fill the wrong line with data and split the string, so that the correct data can be put into the corresponding field
select exam_id,
substring_index(tag,",",1) tag,
substring_index(substring_index(tag,",",-2),",",1) difficulty,
substring_index(tag,",",-1) duration
from examination_info
where difficulty=''
Knowledge points involved:
string splitting:substring_index(str, delim, count)
parameter name | explain |
---|---|
str | the string to split |
I share | Delimiter, split by a character |
count | When count is a positive number, all characters before the nth delimiter are taken; when count is negative, all characters after the last nth delimiter are taken. |
Can be nested
You can also use
regexp_substr
functions to split using regular expressions
Topic 2: All songs with the top three most played languages
Song table: songplay
Language table: languageid
Create table:
drop table if exists songplay;
create table `songplay`(
`id` int,
`playcnt` int,
`languageid` int
);
insert into songplay values(1,85001,1);
insert into songplay values(2,80001,2);
insert into songplay values(3,60001,2);
insert into songplay values(4,90001,1);
insert into songplay values(5,69001,1);
insert into songplay values(6,85001,1);
insert into songplay values(7,70001,1);
drop table if exists language;
create table `language`(
`id` int,
`name` varchar(255)
);
insert into language values(1,'中文');
insert into language values(2,'英文');
Analysis of the meaning of the question:
The question is to query all the songs with the highest playback volume in different languages. When the playback volume is the same, the ranking is the same, so at this time, it is necessary to consider using the function to establish the ranking first, and finally dense_rank
take the ranking of each of the top 3 songs. song.
The DENSE_RANK() function sorts the serial numbers in parallel, and does not skip repeated serial numbers, such as serial numbers 1, 1, 2
select language_name,songid,playcnt
from (
select s.id songid,
l.name language_name,s.playcnt,
# 关键
dense_rank() over(partition by name order by s.playcnt desc) rank_num
from songplay s join language l
on s.languageid = l.id
# 排序
order by l.id
) tmp
where rank_num<4
Note that there is a hidden requirement in this topic, that is, the order of returned languages needs to be consistent with the order of appearance in the language table, so you need
order by l.id
Key code interpretation:
dense_rank() over(partition by name order by s.playcnt desc) rank_num
Use the window function dense_rank()
to not skip the serial number sorting, group by and sort in descending order
partition by name
according toname
order by s.playcnt desc
s.playcnt
Summarize:
This time, the two SQL test questions are relatively basic. Question 1 mainly examines substring_index
the application of functions, and question 2 examines the application of window function dense_rank()
neutralization over()
.