Take a look at the use of window functions in HQL giant hot topic in data analysis

Gangster No. [public data pipeline to see three questions on data analysis], reproduced below:

1, the conventional sheet user_goods_table's as follows:

  • user_name user name

  • goods_kind user ordered takeaway category

Now the boss wants to know every user to buy take-away category preference distribution, and remove each user to purchase up takeaway category is which.

Output requirements are as follows:

  • user_name user name

  • goods_kind the most users to buy take-away category

Ideas, determined using a window function for each user row_number each category ranking number for later distribution, and remove the first category ranked i.e., the user purchases the most takeaway category.

Reference Solution:

select b.user_name,b.goods_kind from
(select 
user_name,
goods_kind,
row_number() over(partition by user_name 
order by count(goods_kind) desc ) as rank 
from user_goods_table) b where b.rank =1 

2, a leading payment platform data analysis interview questions . User_sales_table existing transaction data table as follows:

  • user_name user name

  • pay_amount user payment amount

Now the boss wants to know 20% of the amount paid the previous user.

Output requirements are as follows:

  • user_name user name (the top 10% of the users)

Ideas, ntile using window functions corresponding to each user and the payment amount into 5 groups (each group so that there is 1/5), to take a first user packet ranked paid before the user group i.e. 20% of the previous amount. (Note that this is 20% of the demand before the user rather than the user required to pay the top 20)

Reference Solution:

select b.user_name from 
(select 
user_name,
ntile(5) over(order by sum(pay_amount) desc) as level
from user_sales_table group by user_name ) b 
where b.level = 1

3, a top small video platform data analysis interview questions . Existing user login table user_login_table as follows:

  • user_name user name

  • date user login time

Now the boss wants to know important users 7 days in a row landing platform.

Output requirements are as follows:

  • user_name user name (7 days in a row the number of users logged in)

Ideas, first lead with an offset window function is obtained for each user login time shift back 7 rows each landing time, and then calculating for each user login time lag for 7 days each landing time, if each user 7, line offset rearward landing time exactly equal to the lag time of 7 days, indicating that the user has landed 7 consecutive days.

Reference Solution:

select b.user_name
(select user_name,
date,lead(date,7) 
over(partition by user_name order by date desc) as date_7
from user_login_table) b 
where b.date is not null
and date_sub(cast(b.date as date,7)) = cast(b.date_7 as date)
Published 343 original articles · won praise 389 · views 220 000 +

Guess you like

Origin blog.csdn.net/BeiisBei/article/details/105003843