Hive面试题3:年销售额查询HiveSQL

有销售表T,样例数据如下,请用sql查出每个员工的年累计销售额

原表T:

员工姓名	月份	销售额
emi		201801	10000
emi		201802	11000
emi		201803	9000
emi		201901	10000
tommy	201801	12500
tommy	201802	10500
tommy	201803	8900
tommy	201901	9000

查询结果要求如下:

员工姓名	月份	销售额	年累计销售额
emi		201801	10000	10000
emi		201802	11000	21000
emi		201803	9000	30000
tommy   201801  12500   12500
tommy   201802  10500   23000
tommy   201803  8900    31900
tommy   201901  9000    9000

hive建表

create table t(t_name string,t_month string,t_sale int) 
row format delimited fields terminated by '\t';

创建本地数据文件vim /root/temp/data.csv

emi     201801  10000
emi     201802  11000
emi     201803  9000
emi     201901  10000
tommy   201801  12500
tommy   201802  10500
tommy   201803  8900
tommy   201901  9000

加载数据

load data local inpath '/root/temp/data.csv' into table t;

查询结果,显示数据导入完成。

select * from t;

显示:

+--------+--------+--------+
| t_name | t_date | t_sale |
+--------+--------+--------+
| emi    | 201801 | 10000  |
| emi    | 201802 | 11000  |
| emi    | 201803 | 9000   |
| emi    | 201901 | 10000  |
| tommy  | 201801 | 12500  |
| tommy  | 201802 | 10500  |
| tommy  | 201803 | 8900   |
| tommy  | 201901 | 9000   |
+--------+--------+--------+

查询年销售额语句

使用sum()开窗函数

select t_name `员工姓名`
,t_date `月份`
,t_sale `销售额`
,sum(t_sale) over (partition by t_name,substr(t_date,1,4)) as `年累计销售额` from t;

结果显示

+----------+--------+--------+--------------+
| 员工姓名 | 月份   | 销售额 | 年累计销售额 |
+----------+--------+--------+--------------+
| emi      | 201801 | 10000  | 30000        |
| emi      | 201802 | 11000  | 30000        |
| emi      | 201803 | 9000   | 30000        |
| emi      | 201901 | 10000  | 10000        |
| tommy    | 201801 | 12500  | 31900        |
| tommy    | 201802 | 10500  | 31900        |
| tommy    | 201803 | 8900   | 31900        |
| tommy    | 201901 | 9000   | 9000         |
+----------+--------+--------+--------------+

此种语法用于查询每月在全年总量的占比

添加order by用于显示截至当前日期的累加

select t_name `员工姓名`,t_date `月份`,t_sale `销售额`,sum(t_sale) over (partition by t_name,substr(t_date,1,4) order by t_date ) as `年累计销售额` from t;

结果显示,与题目要求一致

+----------+--------+--------+--------------+
| 员工姓名 | 月份   | 销售额 | 年累计销售额 |
+----------+--------+--------+--------------+
| emi      | 201801 | 10000  | 10000        |
| emi      | 201802 | 11000  | 21000        |
| emi      | 201803 | 9000   | 30000        |
| emi      | 201901 | 10000  | 10000        |
| tommy    | 201801 | 12500  | 12500        |
| tommy    | 201802 | 10500  | 23000        |
| tommy    | 201803 | 8900   | 31900        |
| tommy    | 201901 | 9000   | 9000         |
+----------+--------+--------+--------------+

常见错误:使用group by

select t_name,t_month,t_sale,sum(t_sale) over (partition by t_name,substr(t_month,1,4)) from sales group by t_name,substr(t_month,1,4);

代码直接显示出错

答案sql语句:

select 
员工姓名
,月份
,销售额
,sum(销售额) over (partition by 员工姓名,substr(月份,1,4) order by 月份) as 年累计销售额 from t;
发布了35 篇原创文章 · 获赞 12 · 访问量 6638

猜你喜欢

转载自blog.csdn.net/u012955829/article/details/102824624