Played for a few days, combined work and rest, continued to review and brush up sql
1. Conditional function
1. Topic: Now the operation wants to divide the users into two age groups: under 25 years old and 25 years old and above, and check the number of users in these two age groups respectively ( age is null and also recorded as under 25 years old )
user_profile
Desired result:
Involved knowledge:
You need to use the case function, which is a branch function that can return one of multiple possible results based on a conditional expression. Can be used anywhere an expression is allowed, but cannot be executed with a single statement.
simple case function
Evaluates the test expression, comparing the value of the test expression with the simple expression of each where clause in the order written from top to bottom. If the value of a simple expression is equal to the value of the test expression, the first matching when clause is returned, if the values of all expressions are not equal to the value of the test expression, if an else clause is specified, Returns the value of the result specified in the else clause, or NULL if no else clause is specified
search case function
Evaluates the Boolean expression for each when clause in the order they are written top to bottom. Returns the value of the result expression corresponding to the first Boolean expression that evaluates to true. If there is no Boolean expression that evaluates to true, and when an else clause is specified, returns the result specified by the else clause, or null if no else clause is specified
SELECT CASE WHEN age < 25 OR age IS NULL THEN '25岁以下'
WHEN age >= 25 THEN '25岁及以上'
END age_cut,COUNT(*)number
FROM user_profile
GROUP BY age_cut
select
if (age >= 25, "25岁以上", "25岁以下") AS age_cut,
count(*) as number
from
user_profile
group by
age_cut;
2. Date function
1. Questions: Now the operation wants to calculate the number of daily user practice questions in August 2021 , please take out the corresponding data.
question_practice_detail
Desired result:
Involved knowledge:
Since it involves time, you can directly use the day() month() year() function. Since it is to calculate the number of practice questions per day in August, it needs to be separated by date time. Since the month is specified, you can use where
select
day(date) day,
count(question_id) question_cnt
from
question_practice_detail
where
month(date) = 8
and year(date) = 2021
group by
date
3. Text function
1. Topic: Count the number of people of each gender
user_submit
Desired result:
Involved knowledge:
You can use substring_index(str,delim,count)
str: the string to process
delim: delimiter
count: count
If count is a positive number, from left to right, all the content on the left of the nth separator. If count is negative, count from right to left, everything to the right of the nth delimiter.
Example: str=www.baidu.com
sunstring_index(str,' . ',1)
Result: www
sunstring_index(str,' . ',-2)
Result: baidu.com
select
substring_index (profile, ',', -1) gender,
count(*) number
from
user_submit
group by
gender
Use substring_index to intercept the last field, gender, then count the number of gender, and finally group by gender
Involved knowledge:
You can use the like function for fuzzy matching. % indicates a placeholder, and then use if to judge. If the profile field contains a female field, it is female, otherwise it is male, and it is gender, and then count the number. Because the number of people of each gender needs to be counted, gender is used to group.
select
if (profile like '%female', 'female', 'male') gender,
count(*) number
from
user_submit
group by
gender
Four, window function
Topic: Now the operation wants to find the students with the lowest gpa in each school for research. Please take out the lowest gpa in each school.
Desired result:
First of all, you can get the lowest gpa of each school first, and you can use the min function and group grouping to get the lowest gpa of each school respectively
Solution 1: Since the device_id also needs to be obtained, the value in it needs to be obtained again. Then use the where field and (university and gpa)
select
device_id,
university,
gpa
from
user_profile
where
(university, gpa) in (
select
university,
min(gap)
from
user_profile
group by
university
)
order by
university
Solution 2:
Involved knowledge:
The window function involves the ranking in the group and needs to involve the advanced function window function of sql. Window functions are also called OLAP functions
The basic syntax of window functions:
<窗口函数> over (partition by <用于分组的列名>
order by <用于排序的列名>)
There are two types of functions that can be placed in the window function:
1. Special window function: rank, dense_rank, row_number special window function
2. Aggregation functions, sum, avg, max, min, etc.
Because window functions operate on the results of where or group by clauses, window functions can only be written in select clauses in principle.
Partition by is used to group tables
The order by clause is to sort the grouped results
There is already a group by clause grouping function before, why do we need window functions.
After group by grouping and summarizing, the number of rows in the table is changed, one category per row. The partition function will not reduce the number of rows in the original table.
Other window functions:
Rank, dense_rank, row_number difference?
select *,
rank() over (order by 成绩 desc) as ranking,
dense_rank() over (order by 成绩 desc) as dese_rank,
row_number() over (order by 成绩 desc) as row_num
from 班级表
Rank function: 5 digits, 5 digits, 5 digits, and 8 digits, that is, if there is a row with a tied rank, it will occupy the position of the next rank.
dense_rank: 5th, 5th, 5th, and 6th, if there is a tied ranking, the next ranking will not be occupied.
row_num function: 5 digits, 6 digits, 7 digits, and 8 digits, that is, the situation of tied rankings is not considered.
answer:
First use the row_num function to sort, use the school as a group, then use the school group to rank, and then use where to filter the required ranking
select
*,
row_number() over (
partition by
university
order by
gpa
) as rn
from
user_profile
Since the title requires that the final ranking should be based on the school, so we use oder by at the end, because we use the last place, so we use the cn ranking as 1, because the sorting defaults to ascending order.
select
device_id,
university,
gpa
from
(
select
*,
row_number() over (
partition by
university
order by
gpa
) as rn
from
user_profile
) as univ_min
where
rn = 1
order by
university;