Hive popular data analysis face questions resolved

Author | Data Pipeline

 Zebian | Xu Veyron

Cover photo | CSDN download the visual China

SQL there is a class of functions called aggregation functions, such as count, sum, avg, min, max, these functions may be multiple rows of data according to the structured aggregated into one line, general data row before the aggregate is greater than the data line after aggregation. And sometimes we do not just want the data before aggregation, after aggregation of data they want, when they introduced this window function.

By following a few TMD face questions explain how to use the window function. Knowledge relates to window functions for sorting, grouping query the user for the window function for offset analysis window function, each will face questions answered by a background theme.

text

1, a top take-platform data analyst interview questions. User_goods_table existing transaction data table as follows:

  • user_name user name

  • goods_kind user ordered takeaway category

Now the boss wants to know every user to buy take-away category preference distribution, and remove each user to purchase up takeaway category is which.

Output requirements are as follows:

  • user_name user name

  • goods_kind the most users to buy take-away category

Ideas, determined using a window function for each user row_number each category ranking number for later distribution, and remove the first category ranked i.e., the user purchases the most takeaway category.

Reference Solution:

select b.user_name,b.goods_kind from
(select 
user_name,
goods_kind,
row_number() over(partition by user_name 
order by count(goods_kind) desc ) as rank 
from user_goods_table) b where b.rank =1 

2, a leading payment platform data analysis interview questions. User_sales_table existing transaction data table as follows:

  • user_name user name

  • pay_amount user payment amount

Now the boss wants to know 20% of the amount paid the previous user.

Output requirements are as follows:

  • user_name user name (the top 10% of the users)

Ideas, ntile using window functions corresponding to each user and the payment amount into 5 groups (each group so that there is 1/5), to take a first user packet ranked paid before the user group i.e. 20% of the previous amount. (Note that this is 20% of the demand before the user rather than the user required to pay the top 20)

Reference Solution:

select b.user_name from 
(select 
user_name,
ntile(5) over(order by sum(pay_amount) desc) as level
from user_sales_table group by user_name ) b 
where b.level = 1

3, the top of a small video platform data analysis interview questions. Existing user login table user_login_table as follows:

  • user_name user name

  • date user login time

Now the boss wants to know important users 7 days in a row landing platform.

Output requirements are as follows:

  • user_name user name (7 days in a row the number of users logged in)

Ideas, first lead with an offset window function is obtained for each user login time shift back 7 rows each landing time, and then calculating for each user login time lag for 7 days each landing time, if each user 7, line offset rearward landing time exactly equal to the lag time of 7 days, indicating that the user has landed 7 consecutive days.

Reference Solution:

select b.user_name
(select user_name,
date,lead(date,7) 
over(partition by user_name order by date desc) as date_7
from user_login_table) b 
where b.date is not null
and date_sub(cast(b.date as date,7)) = cast(b.date_7 as date)

to sum up

In this paper, the analysis of data from three interview questions to understand the practical application scenarios window function, of course, the assumption is that we already know the syntax, using the window function window function indeed can be measured as a data analyst mastery of the ability of SQL, of course, No matter what kind of usage have to learn the practical application background thinking why you need this analysis function.

推荐阅读:百万人学AI:CSDN重磅共建人工智能技术新生态突破性能极限——阿里云神龙最新ASPLOS论文解读漫画:如何给女朋友解释什么是熔断?
疫情期间天天对你“开枪”的额温枪,你知道它的工作原理吗?| 原力计划
如何更新你的机器学习模型?手把手带你设计一个可持续的预测模型!
区块链数据分析,让你看清交易对手
真香,朕在看了!
Published 282 original articles · won praise 1249 · Views 1.19 million +

Guess you like

Origin blog.csdn.net/FL63Zv9Zou86950w/article/details/104957832