Gangster No. [public data pipeline to see three questions on data analysis], reproduced below:
1, the conventional sheet user_goods_table's as follows:
-
user_name user name
-
goods_kind user ordered takeaway category
Now the boss wants to know every user to buy take-away category preference distribution, and remove each user to purchase up takeaway category is which.
Output requirements are as follows:
-
user_name user name
-
goods_kind the most users to buy take-away category
Ideas, determined using a window function for each user row_number each category ranking number for later distribution, and remove the first category ranked i.e., the user purchases the most takeaway category.
Reference Solution:
select b.user_name,b.goods_kind from
(select
user_name,
goods_kind,
row_number() over(partition by user_name
order by count(goods_kind) desc ) as rank
from user_goods_table) b where b.rank =1
2, a leading payment platform data analysis interview questions . User_sales_table existing transaction data table as follows:
-
user_name user name
-
pay_amount user payment amount
Now the boss wants to know 20% of the amount paid the previous user.
Output requirements are as follows:
- user_name user name (the top 10% of the users)
Ideas, ntile using window functions corresponding to each user and the payment amount into 5 groups (each group so that there is 1/5), to take a first user packet ranked paid before the user group i.e. 20% of the previous amount. (Note that this is 20% of the demand before the user rather than the user required to pay the top 20)
Reference Solution:
select b.user_name from
(select
user_name,
ntile(5) over(order by sum(pay_amount) desc) as level
from user_sales_table group by user_name ) b
where b.level = 1
3, a top small video platform data analysis interview questions . Existing user login table user_login_table as follows:
-
user_name user name
-
date user login time
Now the boss wants to know important users 7 days in a row landing platform.
Output requirements are as follows:
- user_name user name (7 days in a row the number of users logged in)
Ideas, first lead with an offset window function is obtained for each user login time shift back 7 rows each landing time, and then calculating for each user login time lag for 7 days each landing time, if each user 7, line offset rearward landing time exactly equal to the lag time of 7 days, indicating that the user has landed 7 consecutive days.
Reference Solution:
select b.user_name
(select user_name,
date,lead(date,7)
over(partition by user_name order by date desc) as date_7
from user_login_table) b
where b.date is not null
and date_sub(cast(b.date as date,7)) = cast(b.date_7 as date)