Popular analytical data analysis interview questions (finishing)

I. Background

SQL there is a class of functions called aggregation functions, such as count, sum, avg, min, max , these functions may be multiple rows of data according to the structured aggregated into one line, general data row before the aggregate is greater than the data line after aggregation. And sometimes we do not just want the data before aggregation, after aggregation of data they want, when they introduced this window function, Hive window function summary .

By following a few TMD face questions explain how to use the window function. Knowledge relates to window functions for sorting, grouping query the user for the window function for offset analysis window function, each will face questions answered by a background theme.

Second, the text

1, a top take-platform data analyst interview questions . User_goods_table existing transaction data table as follows:

  • user_name user name

  • goods_kind user ordered takeaway category

Now the boss wants to know every user to buy take-away category preference distribution, and remove each user to purchase up takeaway category is which.

Output requirements are as follows:

  • user_name user name

  • goods_kind the most users to buy take-away category

Ideas, determined using a window function for each user row_number each category ranking number for later distribution, and remove the first category ranked i.e., the user purchases the most takeaway category.

Reference Solution:

select b.user_name,b.goods_kind from
(select 
user_name,
goods_kind,
row_number() over(partition by user_name 
order by count(goods_kind) desc ) as rank 
from user_goods_table) b where b.rank =1 

2, a leading payment platform data analysis interview questions . User_sales_table existing transaction data table as follows:

  • user_name user name

  • pay_amount user payment amount

Now the boss wants to know 20% of the amount paid the previous user.

Output requirements are as follows:

  • user_name user name (the top 10% of the users)

Ideas, ntile using window functions corresponding to each user and the payment amount into 5 groups (each group so that there is 1/5), to take a first user packet ranked paid before the user group i.e. 20% of the previous amount. (Note that this is 20% of the demand before the user rather than the user required to pay the top 20)

Reference Solution:

select b.user_name from 
(select 
user_name,
ntile(5) over(order by sum(pay_amount) desc) as level
from user_sales_table group by user_name ) b 
where b.level = 1

3, a top small video platform data analysis interview questions . Existing user login table user_login_table as follows:

  • user_name user name

  • date user login time

Now the boss wants to know important users 7 days in a row landing platform.

Output requirements are as follows:

  • user_name user name (7 days in a row the number of users logged in)

Ideas, first lead with an offset window function is obtained for each user login time shift back 7 rows each landing time, and then calculating for each user login time lag for 7 days each landing time, if each user 7, line offset rearward landing time exactly equal to the lag time of 7 days, indicating that the user has landed 7 consecutive days.

Reference Solution:

select b.user_name
(select user_name,
date,lead(date,7) 
over(partition by user_name order by date desc) as date_7
from user_login_table) b 
where b.date is not null
and date_sub(cast(b.date as date,7)) = cast(b.date_7 as date)

Third, the summary

In this paper, the analysis of data from three interview questions to understand the practical application scenarios window function, of course, the assumption is that we already know the syntax, using the window function window function indeed can be measured as a data analyst sql mastery of ability, of course, No matter what kind of usage have to learn the practical application background thinking why you need this analysis function.

Published 334 original articles · won praise 227 · views 80000 +

Guess you like

Origin blog.csdn.net/BeiisBei/article/details/104863581