23. SQL data analysis practice (10 simple SQL questions)

Topic 1: Arrangement of the competition list

There is an easy_competition_list table of team names participating in the competition, and the data in the easy_competition_list table is as follows:

mysql> select * from easy_competition_list;
-- team_name 参数队名
+------------+
| team_name  |
+------------+
| 谁与争锋队 |
| 必胜队     |
| 乘风破浪队 |
| 群英汇队   |
| 梦之队     |
+------------+
5 rows in set (0.00 sec)

[Question 1] Each participating team will conduct a team competition with other participating teams, and it is required to output all the game situation combinations of the two participating teams (team A and team B respectively), and arrange them in ascending order according to the team names. The output content includes: Team A, Team B, and the sample results are as follows:

+------------+------------+
| 队伍A      | 队伍B      |
+------------+------------+
| 乘风破浪队 | 必胜队     |
| 乘风破浪队 | 梦之队     |
| 乘风破浪队 | 群英汇队   |
| 乘风破浪队 | 谁与争锋队 |
| 必胜队     | 梦之队     |
| 必胜队     | 群英汇队   |
| 必胜队     | 谁与争锋队 |
| 梦之队     | 群英汇队   |
| 梦之队     | 谁与争锋队 |
| 群英汇队   | 谁与争锋队 |
+------------+------------+
10 rows in set (0.00 sec)

[Explanation of Topic 1] Use the self-join of the table to ensure that the team will not match itself by connecting the team names in the form of <, and arrange them in ascending order according to the team names. The SQL code for this question is as follows:

mysql> SELECT a.team_name AS 队伍A,b.team_name AS 队伍B FROM easy_competition_list
    -> a INNER JOIN easy_competition_list b ON a.team_name < b.team_name ORDER BY 队伍A,队伍B;

Topic 2: Top Games Ranking

Now there is a game download ranking table easy_game_ranking, the data of the easy_game_ranking table is as follows:

mysql> SELECT * FROM easy_game_ranking;
-- ① game: 游戏名称 VARCHAR ② category: 游戏类别 VARCHAR ③ downloads: 游戏下载量 INT
+------+----------+-----------+
| game | category | downloads |
+------+----------+-----------+
| A    | puzzle   |     13628 |
| B    | shooting |      2830 |
| C    | shooting |      1920 |
| D    | action   |     23800 |
| E    | puzzle   |       842 |
| F    | shooting |     48201 |
| G    | action   |      4532 |
| H    | puzzle   |      1028 |
| I    | action   |     48910 |
| J    | shooting |       342 |
| K    | puzzle   |     32456 |
| L    | action   |      2801 |
| M    | puzzle   |      1248 |
| N    | action   |      8756 |
+------+----------+-----------+
14 rows in set (0.00 sec)

[Question 2] Query the top two games in each category by download volume. The output content includes: category (game category), game (game name), and the result sample is shown in the figure below:
insert image description here
[Question 2 Analysis] A typical group ranking problem can be realized by using the window function, using the DENSE_RANK() function Generate the sorting results of each game in each game category, and finally filter out the data we need through the sorting results. The reference code is as follows:

-- ① 使用窗口函数+分组聚合
mysql> SELECT category, GROUP_CONCAT(game) as game
    -> FROM (SELECT *, DENSE_RANK() OVER (PARTITION BY category ORDER BY downloads DESC) AS 'downloads_rank'
    ->       FROM easy_game_ranking)
    ->          AS a
    -> WHERE a.downloads_rank < 3
    -> GROUP BY category;

-- ② 使用分组聚合+文本字符串处理函数
-- 通过分组统计,在每组内按照游戏下载量降序排列并使用GROUP_CONCAT()函数进行连接,然后使用
-- SUBSTRING_INDEX()函数提取出前两个游戏名称即可得到结果
mysql> SELECT category, SUBSTRING_INDEX(GROUP_CONCAT(game ORDER BY downloads DESC), ',', 2) AS game
    -> FROM easy_game_ranking
    -> GROUP BY category;

Topic 3: Coverage Analysis of Community Fresh App

There is an existing community fresh app table easy_fresh_food installed by users, and the data in the easy_fresh_food table is as follows:

mysql> SELECT * FROM easy_fresh_food;
-- user_id(用户ID): VARCHAR app(用户安装的社区生鲜App列表): VARCHAR
+---------+-------+
| user_id | app   |
+---------+-------+
| u001    | A,B   |
| u002    | C,D,A |
| u003    | E     |
| u004    | A     |
| u005    | F,D   |
| u006    | E,G   |
| u007    | C,B   |
| u008    | H,J   |
| u009    | J     |
| u010    | A,K,E |
+---------+-------+
10 rows in set (0.00 sec)

[Question 3] Query the number of users who have installed Athe App. The output includes: num (the number of users), and the result sample is shown in the figure below:
insert image description here
[Topic 3 Analysis] Idea ①: Fuzzy matching, count as long as there is an A, you can use the like keyword or the built-in function FIND_IN_SET or INSTR of mysql . Idea ②: Split according to ,, split a piece of data into multiple rows of data, and then count them in groups. The reference code is as follows:

-- 第①种写法: 模糊匹配使用LIKE或者是REGEXP关键字
mysql> SELECT COUNT(*) AS num FROM easy_fresh_food WHERE app LIKE '%A%';
mysql> SELECT COUNT(*) AS num FROM easy_fresh_food WHERE app REGEXP 'A';
-- 第②种写法: 模糊匹配使用mysql内置函数 FIND_IN_SET或者是INSTR
mysql> SELECT SUM(IF(FIND_IN_SET('A', app), 1, 0)) AS num
    -> FROM easy_fresh_food;

mysql> SELECT SUM(CASE WHEN INSTR(app, 'A') > 0 THEN 1 ELSE 0 END) AS num
    -> FROM easy_fresh_food;

-- 第③种写法: 先将一行数据拆分为多行 然后分组计算
-- 如果 mysql.help_topic 没有权限,可以自己创建一张临时表,用来与要查询的表连接查询
-- 创建临时表,并给临时表添加数据: 注意:
-- 1.临时表必须有一列从 0 或者 1 开始的自增数据
-- 2.临时表表名随意,字段可以只有一个
-- 3.临时表示的数据量必须比 (LENGTH(easy_fresh_food.app)-LENGTH(REPLACE(easy_fresh_food.app, ',', '')) + 1) 的值大
mysql> SELECT *
    -> FROM (SELECT
    -> SUBSTRING_INDEX(SUBSTRING_INDEX(easy_fresh_food.app, ',', b.help_topic_id + 1), ',', - 1) AS app_name,
    -> COUNT(user_id)                                                                            AS num
    ->       FROM easy_fresh_food
    ->                INNER JOIN mysql.help_topic b ON b.help_topic_id < (LENGTH(easy_fresh_food.app) -
    ->                                                                    LENGTH(REPLACE(easy_fresh_food.app, ',', '')) + 1)
    ->       WHERE `app` <> ''
    ->       GROUP BY app_name) a
    -> WHERE a.app_name = 'A';
+----------+-----+
| app_name | num |
+----------+-----+
| A        |   4 |
+----------+-----+
1 row in set (0.00 sec)

Topic 4: Analysis of Community Group Buying Behavior

There is a community group buying user order table easy_group_buy, which records the situation that users log in to the group buying page and generate orders through different channels on different dates. The data in the easy_group_buy table is shown in the following table:

mysql> SELECT * FROM easy_group_buy;
-- user_id(用户id):VARCHAR login_source(登录渠道):VARCHAR login_date(登录日期):DATE order_count(产生订单数量):INT
+---------+--------------+------------+-------------+
| user_id | login_source | login_date | order_count |
+---------+--------------+------------+-------------+
| a001    | applet       | 2021-03-20 |           1 |
| a002    | application  | 2021-03-20 |           0 |
| a003    | web          | 2021-03-21 |           0 |
| a002    | application  | 2021-03-21 |           2 |
| a001    | applet       | 2021-03-21 |           4 |
| a003    | application  | 2021-03-22 |           1 |
| a001    | applet       | 2021-03-22 |           1 |
| a004    | application  | 2021-03-23 |           1 |
+---------+--------------+------------+-------------+
8 rows in set (0.00 sec)

[Title 4-1] Query the name of the channel that each user logs in for the first time. The output includes: user_id (user ID), login_source (login channel), and the result sample is shown in the figure below:
insert image description here
[Question 4-1 Analysis] Idea ①: Use the MIN() function to find the earliest login time of each user, and set INNER JOIN the original table and the earliest login time of each user to get the user ID and user login channel. Idea ②: window function, the reference code is as follows:

-- 第①种写法
mysql> SELECT a1.user_id, a1.login_source
    -> FROM easy_group_buy a1
    ->          INNER JOIN (SELECT user_id, MIN(login_date) AS first_login_date FROM easy_group_buy GROUP BY user_id) a2
    ->                     ON a1.login_date = a2.first_login_date AND a1.user_id = a2.user_id;

-- 第②种写法
mysql> SELECT user_id, login_source
    -> FROM (SELECT user_id, login_source, DENSE_RANK() OVER (PARTITION BY user_id ORDER BY login_date ASC ) AS login_date_rank
    ->       FROM easy_group_buy) temp_table
    -> WHERE temp_table.login_date_rank = 1;

[Title 4-2] Query the user's login date and cumulative order quantity. The output includes: user_id (user ID), login_date (login date), total_order_count (cumulative order quantity), the result sample is shown in the figure below: [
insert image description here
Question 4-2 Analysis] Use the SUM() function to group by user ID And sort according to the login date, you can get the user login date and the cumulative order quantity, involving knowledge points: window function, the reference code is as follows:

mysql> SELECT user_id,
    ->        login_date,
    ->        SUM(order_count) OVER (PARTITION BY user_id ORDER BY login_date)
    ->            AS total_order_count
    -> FROM easy_group_buy;

Topic 5: Count the occurrences of characters

There is an original text table easy_original_text, and the data of the easy_original_text table is shown in the following table:

mysql> SELECT * FROM easy_original_text;
-- text_id(文本ID): VARCHAR text_content(文本内容): VARCHAR
+---------+--------------+
| text_id | text_content |
+---------+--------------+
| t001    | !**@%&       |
| t002    | *            |
| t003    | @@!***&*     |
| t004    | %&*$@        |
| t005    | *******      |
| t006    | 123456       |
+---------+--------------+
6 rows in set (0.00 sec)

【题目5】Count the number of occurrences of symbols in each text . The output includes: text_id (text ID), num (the number of times the symbol * appears), and the result sample is shown in the figure below:
insert image description here
[Analysis of Question 5] Use the REPLACE() function to replace the text in the text with an empty string (empty string The length of the text is 0), the difference between the length of the text before and after the replacement is the number of occurrences of * in the text, involving knowledge points: string processing functions, the reference code is as follows:

mysql> SELECT text_id, LENGTH(text_content) - LENGTH(REPLACE(text_content, '*', '')) AS num
    -> FROM easy_original_text table1;

Topic 6: Find the product with the highest sales in each category

There is a product sales table easy_product_sale, which records the product information of different categories of product sales. The data in the easy_product_sale table is shown in the following table:

mysql> SELECT * FROM easy_product_sale;
-- product_id: 商品ID VARCHAR product_category: 商品类别 VARCHAR sale: 商品销量 INT
+------------+------------------+-------+
| product_id | product_category | sale  |
+------------+------------------+-------+
| p001       | c001             | 14600 |
| p002       | c001             | 23300 |
| p003       | c001             |  8000 |
| p004       | c002             | 40800 |
| p005       | c002             |  5300 |
| p006       | c003             | 12900 |
+------------+------------------+-------+
6 rows in set (0.00 sec)

[Question 6] Query the product information with the highest sales volume of different categories of products. The output includes: product_category (commodity category), product_id (commodity ID), sale (commodity sales), and the result sample is shown in the figure below: [Analysis of
insert image description here
Question 6] Use the DENSE_RANK() function to generate a new column, that is, different product categories Sales ranking (sale_rank), and then use this part as the inside of the subquery, filter out the records with sale_rank=1 through WHERE outside the subquery, and then get the product information with the highest sales of different categories of products. The reference code is as follows:

mysql> SELECT temp_table.product_category, temp_table.product_id, temp_table.sale
    -> FROM (SELECT *, DENSE_RANK() OVER (PARTITION BY product_category ORDER BY sale DESC ) AS sale_rank
    ->       FROM easy_product_sale) temp_table
    -> WHERE temp_table.sale_rank = 1;

Topic 7: Find the second highest paid employee in each department

There is a company employee information table easy_employee, and the data in the easy_employee table is shown in the following table:

mysql> SELECT * FROM easy_employee;
-- employee_id(员工ID): VARCHAR employee_name(员工姓名): VARCHAR employee_salary(员工薪资): INT 
-- department(员工所属部门ID): VARCHAR
+-------------+---------------+-----------------+------------+
| employee_id | employee_name | employee_salary | department |
+-------------+---------------+-----------------+------------+
| a001        | Bob           |            7000 | b1         |
| a002        | Jack          |            9000 | b1         |
| a003        | Alice         |            8000 | b2         |
| a004        | Ben           |            5000 | b2         |
| a005        | Candy         |            4000 | b2         |
| a006        | Allen         |            5000 | b2         |
| a007        | Linda         |           10000 | b3         |
+-------------+---------------+-----------------+------------+
7 rows in set (0.00 sec)

There is also a department information table easy_department, the data of the easy_department table is shown in the following table:

mysql> SELECT * FROM easy_department;
-- department_id(部门ID): VARCHAR department_name(部门名称): VARCHAR
+---------------+-----------------+
| department_id | department_name |
+---------------+-----------------+
| b1            | Sales           |
| b2            | IT              |
| b3            | Product         |
+---------------+-----------------+
3 rows in set (0.00 sec)

[Title 7] Query the information of the employee with the second highest salary in each department. The output includes: employee_id (employee ID), employee_name (employee name), employee_salary (employee salary), department_name (employee's department name), the result sample is shown in the figure below: [Problem 7 Analysis] Using window function, according to the department
insert image description here
ID The group is sorted in descending order of employee salary in the group and recorded as employee_salary_rank, and employee_salary_rank=2 is used as the second highest salary condition to perform WHERE filtering, and then the processed table is connected with the department information table to associate the department name. , select the desired column to get the result, the reference code is as follows:

mysql> SELECT a2.employee_id, a2.employee_name, a2.employee_salary, easy_department.department_name
    -> FROM (SELECT *
    ->       FROM (SELECT *, RANK() OVER (PARTITION BY department ORDER BY employee_salary DESC ) AS employee_salary_rank
    ->             FROM easy_employee) AS a1
    ->       WHERE a1.employee_salary_rank = 2) AS a2
    ->          INNER JOIN easy_department ON a2.department = easy_department.department_id;

Topic 8: Analysis of Game Player Login Situation

There is a game player login table easy_game_login, and the data in the easy_game_login table is shown in the following table:

mysql> SELECT * FROM easy_game_login;
-- user_id(玩家ID): VARCHAR login_time(登录时间): VARCHAR
+---------+---------------------+
| user_id | login_time          |
+---------+---------------------+
| u001    | 2021-03-01 06:01:12 |
| u001    | 2021-03-01 07:14:20 |
| u002    | 2021-03-01 07:20:22 |
| u003    | 2021-03-01 08:22:45 |
| u001    | 2021-03-01 11:10:23 |
| u004    | 2021-03-01 12:00:10 |
| u002    | 2021-03-01 18:03:52 |
| u005    | 2021-03-01 20:10:29 |
| u003    | 2021-03-01 21:11:50 |
+---------+---------------------+
9 rows in set (0.00 sec)

[Title 8-1] Query the players who have logged in to the game multiple times in a day and the number of times they have logged in. The output includes: user_id (player ID), login_date (login date), num (login times), and the result sample is shown in the figure below:
insert image description here
[Question 8-1 Analysis] To convert the time in string format, use LEFT( ) function intercepts the date part, and uses HAVING to filter out players who have logged in multiple times in a day through group aggregation. The reference code is as follows:

mysql> SELECT a.user_id, a.login_date, COUNT(a.login_date) AS 'num'
    -> FROM (SELECT user_id, LEFT(login_time, 10) AS 'login_date' FROM easy_game_login)
    ->          AS a
    -> GROUP BY a.user_id, a.login_date
    -> HAVING COUNT(a.login_date) > 1;

[Question 8-2] For players who log in to the game multiple times in a day, only the last record of the day is searched. The output includes: user_id (player ID), login_time (login time), and the result sample is shown in the figure below:
insert image description here
[Question 8-2 Analysis] Based on the players who logged in multiple times in a day found in the previous question, use The RANK() function selects the last record according to the user group and sorted by time. The reference code is as follows:

mysql> SELECT user_id, login_time
    -> FROM (SELECT e1.user_id,
    ->              e1.login_time,
    ->              RANK() OVER (PARTITION BY e1.user_id,LEFT(login_time, 10) ORDER BY login_time DESC ) AS login_time_rank
    ->       FROM easy_game_login e1
    ->                INNER JOIN (
    ->           SELECT a.user_id, a.login_date
    ->           FROM (SELECT user_id, LEFT(login_time, 10) AS 'login_date' FROM easy_game_login)
    ->                    AS a
    ->           GROUP BY a.user_id, a.login_date
    ->           HAVING COUNT(a.login_date) > 1) e2 ON e1.user_id = e2.user_id AND left(e1.login_time, 10) = e2.login_date) b
    -> WHERE b.login_time_rank = 1;

Topic 9: Amount of the user's first order

There is a user's shopping order information table easy_user_order on the e-commerce website, which records the user's shopping and other related information. The data in the easy_user_order table is shown in the following table:

mysql> SELECT * FROM easy_user_order;
-- user_id(用户ID):VARCHAR payment(订单金额):INT paytime(下单时间):DATETIME
+---------+---------+---------------------+
| user_id | payment | paytime             |
+---------+---------+---------------------+
| a001    |     500 | 2021-02-01 13:25:00 |
| a001    |     800 | 2021-02-03 09:10:00 |
| b001    |     150 | 2021-02-03 15:18:00 |
| a002    |      90 | 2021-02-05 08:10:00 |
| a001    |    1050 | 2021-02-06 10:34:00 |
| b001    |     400 | 2021-02-07 18:19:00 |
+---------+---------+---------------------+
6 rows in set (0.00 sec)

[Question 9] The user's first order on the e-commerce website (the order with the earliest order time) can reflect the user's consumption ability, and it is required to count the information of each user's first order. The output content includes: user_id (user ID), payment (order amount), and the result sample is shown in the figure below:
insert image description here
[Analysis of Question 9] Use the DENSE_RANK() function to group by user ID and arrange them in ascending order by default by order time to get Each user's respective order time ranking, query outside the subquery and filter out the order information of each user ranked 1 (that is, the first order information) involves knowledge points: subquery, window function. The reference code is as follows:

mysql> SELECT user_id, payment
    -> FROM (SELECT user_id, payment, DENSE_RANK() OVER (PARTITION BY user_id ORDER BY paytime ASC) AS 'paytime_rank'
    ->       FROM easy_user_order)
    ->          AS a
    -> WHERE a.paytime_rank = 1;

Topic 10: Products participating in promotional activities

Now there is a product promotion schedule easy_product_promotion, and the data in the easy_product_promotion table is as follows:

mysql> SELECT * FROM easy_product_promotion;
-- commodity_id(商品ID):VARCHAR start_date(商品优惠活动起始日期):DATE end_date(商品优惠活动结束日期):DATE
+--------------+------------+------------+
| commodity_id | start_date | end_date   |
+--------------+------------+------------+
| a001         | 2021-01-01 | 2021-01-06 |
| a002         | 2021-01-01 | 2021-01-10 |
| a003         | 2021-01-02 | 2021-01-07 |
| a004         | 2021-01-05 | 2021-01-07 |
| b001         | 2021-01-05 | 2021-01-10 |
| b002         | 2021-01-04 | 2021-01-06 |
| c001         | 2021-01-06 | 2021-01-08 |
| c002         | 2021-01-02 | 2021-01-04 |
| c003         | 2021-01-08 | 2021-01-15 |
+--------------+------------+------------+
9 rows in set (0.00 sec)

[Question 10] Inquire about the products participating in the promotional activities from January 7, 2021 to January 9, 2021. The output content includes: commodity_id (commodity ID) result sample as shown in the figure below:
insert image description here
[Analysis 10] This question is suitable for using the graphical method to enumerate all possible time arrangements first, and then write the SQL code. Suppose January 7, 2021 is time a, January 9, 2021 is time b, the start time of each promotional activity is s, and the end time is e, then all possible sequences are "sabe", "saeb", "asbe", and "aseb". Knowledge points involved: complex time judgment. The SQL code for this question is as follows:

mysql> SELECT commodity_id FROM easy_product_promotion
    -> WHERE (start_date <= '2021-01-09' AND start_date >= '2021-01-07')
    -> OR (end_date <= '2021-01-09' AND end_date >='2021-01-07')
    -> OR (start_date >= '2021-01-07' AND end_date <= '2021-01-09')
    -> OR (start_date <= '2021-01-07' AND end_date >= '2021-01-09');

So far, today's study is over. The author declares here that the author writes the article only to learn and communicate, and to let more readers who study the database avoid some detours, save time, and do not use it for other purposes. If there is any infringement, contact The blogger can be deleted. Thank you for reading this blog post, I hope this article can become a leader on your programming journey. Happy reading!


insert image description here

    A good book does not tire of reading a hundred times, and the child knows himself when he is familiar with the class. And if I want to be the most beautiful boy in the audience, I must persist in acquiring more knowledge through learning, change my destiny with knowledge, witness my growth with blogs, and prove that I am working hard with actions.
    If my blog is helpful to you, if you like the content of my blog, please 点赞, 评论,收藏 click three links! I heard that those who like it will not have bad luck, and they will be full of energy every day! If you really want to prostitute for nothing, then I wish you happy every day, welcome to visit my blog often.
 Coding is not easy, everyone's support is the motivation for me to persevere. Don't forget 关注me after you like it!

Guess you like

Origin blog.csdn.net/xw1680/article/details/130570707