Article directory
- Topic 1: App usage frequency analysis
- Topic 2: App download statistics
- Topic 3: Finding Active Learners
- Topic 4: Product classification and sorting
- Topic 5: Merchandise Sales Analysis
- Topic 6: Revenue statistics of online car-hailing drivers
- Topic 7: Website login time interval statistics
- Topic 8: Statistics of Commodity Revenue in Different Regions
- Topic 9: Statistics on overdue credit
Topic 1: App usage frequency analysis
There is an existing app time table middle_app_login for users, and the data in the middle_app_login table is shown in the following table:
mysql> SELECT * FROM middle_app_login;
-- user_id(用户ID):VARCHAR start_time(登录App时间):DATETIME end_time(退出App时间):DATETIME
+---------+---------------------+---------------------+
| user_id | start_time | end_time |
+---------+---------------------+---------------------+
| u001 | 2021-04-01 10:12:30 | 2021-04-01 11:13:21 |
| u002 | 2021-04-02 08:40:21 | 2021-04-02 10:13:41 |
| u003 | 2021-04-02 15:31:01 | 2021-04-02 15:54:42 |
| u001 | 2021-04-04 13:25:40 | 2021-04-04 17:52:46 |
| u003 | 2021-04-06 07:10:20 | 2021-04-06 08:03:15 |
| u001 | 2021-04-09 18:20:34 | 2021-04-09 18:23:58 |
| u001 | 2021-04-10 14:25:55 | 2021-04-10 15:01:25 |
+---------+---------------------+---------------------+
7 rows in set (0.00 sec)
[Question 1] According to the table, calculate the average time between each user exiting the App and the next login to the App. If the user has only logged in to the App once, it will not be counted. The unit of the average time required to be output is minutes, and It is rounded to one decimal place. The output content includes: user_id (user ID), avg_minute (average interval time), and the result sample is shown in the figure below:
[Analysis of Topic 1] This question uses the LEAD() function to group and sort the time of each user logging in to the App, and generate a new to construct a table structure in which the time of the last exit from the App and the time of the next login to the App are in the same row, which is convenient for later processing. Then filter out non-empty rows, use the TIMESTAMPDIFF() function to calculate the minute difference between start_time_lead and end_time, calculate the average value, and round it to one decimal place to get the result. Knowledge points involved: subqueries, date/time processing functions, window functions, null value processing, decimal retention, group aggregation. The reference code is as follows:
mysql> -- ① 按照解析的写法
mysql> SELECT user_id
-> , ROUND(AVG(TIMESTAMPDIFF(MINUTE, end_time, start_time_lead)), 1) AS avg_minute
-> FROM (SELECT user_id
-> , start_time
-> , end_time
-> , LEAD(start_time, 1) OVER (PARTITION BY user_id ORDER BY start_time) AS start_time_lead
-> FROM middle_app_login) a
-> WHERE start_time_lead IS NOT NULL
-> GROUP BY user_id;
+---------+------------+
| user_id | avg_minute |
+---------+------------+
| u001 | 4293.3 |
| u003 | 5235.0 |
+---------+------------+
2 rows in set (0.00 sec)
mysql> -- ② 第二种写法
mysql> SELECT user_id, ROUND(AVG(end_time_lag), 1) AS avg_minute
-> FROM (SELECT a1.user_id,
-> TIMESTAMPDIFF(MINUTE, LAG(end_time, 1) OVER (PARTITION BY a1.user_id ORDER BY start_time), a1.start_time
-> ) AS end_time_lag
-> FROM middle_app_login a1
-> INNER JOIN (SELECT user_id FROM middle_app_login GROUP BY user_id HAVING COUNT(*) > 1) a2
-> ON a1.user_id = a2.user_id) a
-> WHERE a.end_time_lag IS NOT NULL
-> GROUP BY user_id;
+---------+------------+
| user_id | avg_minute |
+---------+------------+
| u001 | 4293.3 |
| u003 | 5235.0 |
+---------+------------+
2 rows in set (0.00 sec)
Topic 2: App download statistics
There is an App cumulative download table middle_app_download, which records the information on the cumulative download times of the App in the application product. The data in the middle_app_download table is as follows:
mysql> SELECT * FROM middle_app_download;
-- app_id(AppID):VARCHAR app_type(App类型):VARCHAR download(下载次数):INT
+--------+----------+----------+
| app_id | app_type | download |
+--------+----------+----------+
| a001 | A | 12432 |
| a002 | B | 9853 |
| a003 | A | 1924 |
| a004 | C | 2679 |
| a005 | C | 29104 |
| a006 | A | 10235 |
| a007 | B | 5704 |
| a008 | B | 2850 |
| a009 | B | 8235 |
| a010 | C | 9746 |
+--------+----------+----------+
10 rows in set (0.00 sec)
[Question 2] To query the average download times of different types of apps, it is necessary to exclude the apps whose download times rank in the top 10% and bottom 10%. The output content includes: app_type (App type), avg_download (average number of downloads), and the result sample is shown in the figure below:
[Analysis of Question 2] Use the RANK() function to generate a new column as the download ranking (ranking), and the part As an internal subquery, and outside the subquery, use WHERE to filter out the records that meet the requirements, and group and count the average download times. Knowledge points involved: subqueries, window functions, null value processing, grouping and aggregation. The reference code is as follows:
mysql> SELECT a.app_type, AVG(a.download) as avg_download
-> FROM (SELECT app_id, app_type, download, RANK() OVER (ORDER BY download DESC ) AS download_rank
-> FROM middle_app_download) a
-> WHERE a.download_rank > (SELECT COUNT(*) FROM middle_app_download) * 0.1
-> AND a.download_rank < (SELECT COUNT(*) FROM middle_app_download) * 0.9
-> GROUP BY a.app_type;
Topic 3: Finding Active Learners
There is a user learning check-in table middle_active_learning, and the data in the middle_active_learning table is as follows:
mysql> SELECT * FROM middle_active_learning;
-- user_id(用户ID):VARCHAR study_date(打卡日期):DATE
+---------+------------+
| user_id | study_date |
+---------+------------+
| u001 | 2021-04-01 |
| u002 | 2021-04-01 |
| u003 | 2021-04-03 |
| u001 | 2021-04-06 |
| u003 | 2021-04-07 |
| u001 | 2021-04-12 |
| u001 | 2021-04-13 |
| u002 | 2021-04-14 |
| u001 | 2021-04-23 |
| u002 | 2021-04-24 |
| u001 | 2021-04-26 |
| u003 | 2021-04-27 |
| u002 | 2021-04-30 |
+---------+------------+
13 rows in set (0.00 sec)
[Question 3] According to the table, count the users who learn to clock in every week in April 2021. The output includes: user_id (user ID), and the result sample is shown in the figure below:
[Analysis of Question 3] Use the WEEKOFYEAR function to obtain the week number, and limit the study_date to April 2021. Since the user may check in multiple times in a week, use DISTINCT Deduplication is carried out to pave the way for subsequent statistical operations. Users are grouped by GROUP BY, and the number of users who check in weekly is equal to 5 (across 5 weeks in April 2021), and users who check in every week can be obtained. Knowledge points involved: subqueries, DISTINCT, date/time processing functions. The reference code is as follows:
mysql> SELECT a.user_id
-> FROM (SELECT DISTINCT user_id
-> , WEEKOFYEAR(study_date) AS study_week
-> FROM middle_active_learning
-> WHERE study_date >= '2021-04-01'
-> AND study_date <= '2021-04-30') a
-> GROUP BY a.user_id
-> HAVING COUNT(a.study_week) = 5;
Topic 4: Product classification and sorting
There is a commodity classification table middle_commodity_classification, and the data of the middle_commodity_classification table is shown in the following table:
mysql> SELECT * FROM middle_commodity_classification;
-- current_category(商品当前分类):VARCHAR parent_category(商品父类别):VARCHAR
+------------------+-----------------+
| current_category | parent_category |
+------------------+-----------------+
| 刀 | 厨具 |
| 厨具 | 生活用品 |
| 碗 | 餐具 |
| 水果刀 | 刀 |
| 剔骨刀 | 刀 |
| 餐具 | 生活用品 |
| 汤碗 | 碗 |
+------------------+-----------------+
7 rows in set (0.00 sec)
【题目4】Query to obtain the sample results shown in the figure below. The output content includes: third-level categories, second-level categories, first-level categories, and root categories. The result samples are shown in the figure below:
[Analysis of Question 4] This question is about sorting out the relationship between categories, and the displayed result samples include The 4-layer category relationship needs to be realized through the self-join of 3 tables. Knowledge points involved: self-connection. The reference code is as follows:
mysql> SELECT m1.current_category AS '三级类目',
-> m1.parent_category AS '二级类目',
-> m2.parent_category AS '一级类目',
-> m3.parent_category AS '根目录'
-> FROM middle_commodity_classification m1,
-> middle_commodity_classification m2,
-> middle_commodity_classification m3
-> WHERE m1.parent_category = m2.current_category
-> AND m2.parent_category = m3.current_category;
Topic 5: Merchandise Sales Analysis
There is a commodity information table middle_commodity_info, which records the basic information of commodities, and the middle_commodity_info data is as follows:
mysql> SELECT * FROM middle_commodity_info;
-- sku_id(商品SKU):VARCHAR commodity_category(商品类别):VARCHAR director(商品销售负责人):VARCHAR
+--------+--------------------+----------+
| sku_id | commodity_category | director |
+--------+--------------------+----------+
| u001 | c001 | a001 |
| u003 | c002 | a001 |
| u002 | c003 | a002 |
+--------+--------------------+----------+
3 rows in set (0.00 sec)
There is also a commodity sales amount table middle_commodity_sale, which records the daily commodity sales. The middle_commodity_sale data is as follows:
mysql> SELECT * FROM middle_commodity_sale;
-- date(日期):DATE sku_id(商品SKU):VARCHAR sales(商品销售金额):INT
+------------+--------+-------+
| date | sku_id | sales |
+------------+--------+-------+
| 2020-12-20 | u001 | 12000 |
| 2020-12-20 | u002 | 8000 |
| 2020-12-20 | u003 | 11000 |
| 2020-12-21 | u001 | 20000 |
| 2020-12-21 | u003 | 16000 |
| 2020-12-22 | u003 | 11000 |
| 2020-12-22 | u001 | 34000 |
| 2020-12-22 | u002 | 11000 |
| 2020-12-23 | u003 | 18000 |
| 2020-12-23 | u001 | 30000 |
+------------+--------+-------+
10 rows in set (0.00 sec)
[Question 5] Query the information on the two days with the highest sales volume of each commodity category in 2020 for a001, the person in charge of commodity sales. The output includes: commodity_category (commodity classification), date (date), total_sales (sales), and the result sample is shown in the figure below Shown:
[Question 5] The reference code is as follows:
mysql> SELECT commodity_category
-> , `date`
-> , total_sales
-> FROM (
-> SELECT commodity_category
-> , `date`
-> , RANK() OVER (PARTITION BY commodity_category ORDER BY total_sales DESC) AS ranking
-> , total_sales
-> FROM (
-> SELECT b.commodity_category
-> , a.`date`
-> , SUM(a.sales) AS total_sales
-> FROM middle_commodity_sale a
-> JOIN middle_commodity_info b
-> ON a.sku_id = b.sku_id
-> WHERE b.director = 'a001'
-> AND YEAR(a.`date`) = 2020
-> GROUP BY b.commodity_category
-> , a.`date`
-> ) c
-> ) d
-> WHERE ranking <= 2;
Topic 6: Revenue statistics of online car-hailing drivers
There is an online car-hailing order table middle_car_order, which records information about a certain day’s online car-hailing order. The middle_car_order data is shown in the following table:
mysql> SELECT * FROM middle_car_order;
-- order_id(订单ID):VARCHAR driver_id(司机ID):VARCHAR order_amount(订单金额):DOUBLE
+----------+-----------+--------------+
| order_id | driver_id | order_amount |
+----------+-----------+--------------+
| o001 | d001 | 15.6 |
| o002 | d002 | 36.5 |
| o003 | d001 | 30.1 |
| o004 | d002 | 10.6 |
| o005 | d001 | 26.2 |
| o006 | d001 | 14.6 |
| o007 | d003 | 28.9 |
| o008 | d001 | 8.8 |
| o009 | d002 | 13.3 |
| o010 | d001 | 29.4 |
+----------+-----------+--------------+
10 rows in set (0.00 sec)
[Question 6] The driver’s income is 80% of the order amount (the unit of the order amount in the table is yuan). If the driver’s order quantity on the day>=5 and the total order amount>=100, he can receive an additional subsidy of 10 yuan. Please count the income of each driver on the day, and arrange the results in descending order of income and round to two decimal places. The output includes: driver_id (driver ID), total_order (total order quantity), total_income (total income), and the result sample is shown in the figure below:
[Question 6] The reference code is as follows:
mysql> SELECT a.driver_id,
-> a.total_order,
-> CASE
-> WHEN total_order >= 5 AND total_amount >= 100 THEN ROUND(total_amount * 0.8 + 10, 2)
-> ELSE ROUND(total_amount * 0.8, 2) END AS 'total_income'
-> FROM (SELECT driver_id, COUNT(driver_id) AS 'total_order', SUM(order_amount) AS 'total_amount'
-> FROM middle_car_order
-> GROUP BY driver_id) a ORDER BY total_income DESC;
Topic 7: Website login time interval statistics
There is a website login table middle_login_info, which records the website login information of all users. The data of the middle_login_info table is as follows:
mysql> SELECT * FROM middle_login_info;
-- user_id(用户ID):VARCHAR login_time(用户登录日期):DATE
+---------+------------+
| user_id | login_time |
+---------+------------+
| a001 | 2021-01-01 |
| b001 | 2021-01-01 |
| a001 | 2021-01-03 |
| a001 | 2021-01-06 |
| a001 | 2021-01-07 |
| b001 | 2021-01-07 |
| a001 | 2021-01-08 |
| a001 | 2021-01-09 |
| b001 | 2021-01-09 |
| b001 | 2021-01-10 |
| b001 | 2021-01-15 |
| a001 | 2021-01-16 |
| a001 | 2021-01-18 |
| a001 | 2021-01-19 |
| b001 | 2021-01-20 |
| a001 | 2021-01-23 |
+---------+------------+
16 rows in set (0.00 sec)
[Title 7] Calculate the number of times each user's login date interval is less than 5 days. The output includes: user_id (user ID), num (the number of times the user login date interval is less than 5 days), and the result sample is shown in the figure below:
[Question 7] The reference code is as follows:
mysql> SELECT a.user_id, COUNT(*) AS 'num'
-> FROM (SELECT user_id,
-> login_time,
-> TIMESTAMPDIFF(DAY, LAG(login_time) OVER (PARTITION BY user_id ORDER BY login_time),
-> login_time) AS date_diff
-> FROM middle_login_info) a
-> WHERE a.date_diff < 5
-> GROUP BY a.user_id;
Topic 8: Statistics of Commodity Revenue in Different Regions
There is a middle_sale_volume table of commodity income in different cities, which records information such as year and region. The middle_sale_volume data is shown in the following table:
mysql> SELECT * FROM middle_sale_volume;
-- year(年份):YEAR region(区域):VARCHAR city(城市):VARCHAR money(收入):INT
+------+--------+------+-------+
| year | region | city | money |
+------+--------+------+-------+
| 2018 | 东区 | A 市 | 1125 |
| 2019 | 东区 | A 市 | 1305 |
| 2020 | 东区 | A 市 | 1623 |
| 2018 | 东区 | C 市 | 845 |
| 2019 | 东区 | C 市 | 986 |
| 2020 | 东区 | C 市 | 1134 |
| 2018 | 西区 | M 市 | 638 |
| 2019 | 西区 | M 市 | 1490 |
| 2020 | 西区 | M 市 | 1120 |
| 2018 | 西区 | V 市 | 1402 |
| 2019 | 西区 | V 市 | 1209 |
| 2020 | 西区 | V 市 | 1190 |
+------+--------+------+-------+
12 rows in set (0.00 sec)
【题目8】Calculate the total income and average income of each region, and round the results to one decimal place. The output includes: year (year), total income and average income in different regions, and the result sample is shown in the figure below:
[Question 8] The reference code is as follows:
-- 第①种写法
mysql> SELECT a.`year`
-> , ROUND(SUM(IF(a.region = '东区', a.money, 0)), 1)
-> AS '东区总收入'
-> , ROUND(SUM(IF(a.region = '西区', a.money, 0)), 1)
-> AS '西区总收入'
-> , ROUND(SUM(IF(a.region = '东区', a.money, 0)) / SUM(a.east_area), 1)
-> AS '东区平均收入'
-> , ROUND(SUM(IF(a.region = '西区', a.money, 0)) / SUM(a.west_area), 1)
-> AS '西区平均收入'
-> FROM (
-> SELECT `year`
-> , region
-> , money
-> , IF(region = '东区', 1, 0) AS east_area
-> , IF(region = '西区', 1, 0) AS west_area
-> FROM sale_volume
-> GROUP BY `year`
-> , region
-> , money
-> ) AS a
-> GROUP BY a.`year`;
-- 第②种写法
mysql> SELECT a.year,
-> ROUND(a.收入, 1) AS '东区总收入',
-> ROUND(b.收入, 1) AS '西区总收入',
-> ROUND(a.平均收入, 1) AS '东区平均收入',
-> ROUND(b.平均收入, 1) AS '西区平均收入'
-> FROM (SELECT year,
-> region,
-> SUM(money) AS '收入',
-> AVG(money) AS '平均收入'
-> FROM middle_sale_volume
-> GROUP BY year, region) a
-> INNER JOIN (SELECT year,
-> region,
-> SUM(money) AS '收入',
-> AVG(money) '平均收入'
-> FROM middle_sale_volume
-> GROUP BY year, region) b ON a.region < b.region AND a.year = b.year;
Topic 9: Statistics on overdue credit
There is a user loan situation table middle_credit_overdue, and the data in the middle_credit_overdue table is as follows:
mysql> SELECT * FROM middle_credit_overdue;
-- user_id(用户ID):VARCHAR overdue_date(贷款逾期日期):DATE
+---------+--------------+
| user_id | overdue_date |
+---------+--------------+
| u001 | 2020-10-20 |
| u002 | 2020-11-03 |
| u003 | 2020-10-04 |
| u004 | 2021-01-05 |
| u005 | 2021-01-15 |
| u006 | 2020-09-04 |
| u007 | 2021-01-03 |
| u008 | 2020-12-24 |
| u009 | 2020-12-10 |
+---------+--------------+
9 rows in set (0.00 sec)
[Question 9] The statistical date is as of January 20, 2021, the number of samples that are overdue for 1-29 days, 30-59 days overdue and over 60 days overdue in different overdue months. The output includes: overdue_month (overdue month), 1~29 days overdue, 30~59 days overdue, and over 60 days overdue. The result sample is shown in the figure below: [Question 9] The reference code is as follows
:
-- 第①种写法参考:
mysql> SELECT LEFT(overdue_date, 7),
-> SUM(CASE
-> WHEN TIMESTAMPDIFF(DAY, overdue_date, '2021-01-20') BETWEEN 1 AND 29 THEN 1
-> ELSE 0 END) AS '逾期1-29天',
-> SUM(CASE
-> WHEN TIMESTAMPDIFF(DAY, overdue_date, '2021-01-20') BETWEEN 30 AND 59 THEN 1
-> ELSE 0 END) AS '逾期30-59天',
-> SUM(CASE
-> WHEN TIMESTAMPDIFF(DAY, overdue_date, '2021-01-20') > 60 THEN 1
-> ELSE 0 END) AS '逾期60天以上'
-> FROM middle_credit_overdue
-> GROUP BY LEFT(overdue_date, 7)
-> ORDER BY LEFT(overdue_date, 7)
-> DESC;
-- 第②种写法参考:
mysql> SELECT overdue_month
-> , COUNT(CASE
-> WHEN overdue_days >= 1 AND overdue_days < 30
-> THEN user_id END)
-> AS '逾期 1-29 天'
-> , COUNT(CASE
-> WHEN overdue_days >= 30 AND overdue_days < 60
-> THEN user_id END)
-> AS '逾期 30-59 天'
-> , COUNT(CASE
-> WHEN overdue_days >= 60
-> THEN user_id END)
-> AS '逾期 60 天以上'
-> FROM (
-> SELECT user_id
-> , DATE_FORMAT(overdue_date, '%Y-%m') AS overdue_month
-> , DATEDIFF('2021-01-20', overdue_date)
-> AS overdue_days
-> FROM middle_credit_overdue
-> ) a
-> GROUP BY overdue_month
-> ORDER BY overdue_month DESC;
So far, today's study is over. The author declares here that the author writes the article only to learn and communicate, and to let more readers who study the database avoid some detours, save time, and do not use it for other purposes. If there is any infringement, contact The blogger can be deleted. Thank you for reading this blog post, I hope this article can become a leader on your programming journey. Happy reading!
A good book does not tire of reading a hundred times, and the child knows himself when he is familiar with the class. And if I want to be the most beautiful boy in the audience, I must persist in acquiring more knowledge through learning, change my destiny with knowledge, witness my growth with blogs, and prove that I am working hard with actions.
If my blog is helpful to you, if you like the content of my blog, please点赞
,评论
,收藏
click three links! I heard that those who like it will not have bad luck, and they will be full of energy every day! If you really want to prostitute for nothing, then I wish you happy every day, welcome to visit my blog often.
Coding is not easy, everyone's support is the motivation for me to persevere. Don't forget关注
me after you like it!