Hivesql practice (1) grouping, aggregation, accumulation, join, marking, multi-row to multi-column

Given the order table sale_order

id sale_dt user_id sku_id sale_count price amount
1 2019-01-01 1 1001 2 100 200
2 2019-01-02 2 1001 1 100 100
3 2019-02-10 3 1001 2 80 160
4 2019-02-11 2 1002 2 100 200
5 2019-03-01 3 1002 1 100 100
6 2019-03-01 3 1001 1 50 50
7 2019-03-02 3 1003 4 100 400

Question 1. Generate the following data according to the order table sale_order, please write SQL, preferably supported by hive

month Sales Sales
2019-01 3 300
2019-02 4 360
2019-03 6 550

Calculation:

select
substr(sale_dt,1,7) as `月份`,
sum(sale_count) as `销量`,
sum(amount) as `销售额`
from sale_order
group by substr(sale_dt,1,7)
;

Question 2: Generate the following data according to the order table sale_order, please write SQL, preferably supported by hive

user_id 2019_01 2019_02 2019_03
1 200 0 0
2 100 200 0
3 0 160 550

Calculation:

with tmp as(
select
user_id,
substr(sale_dt,1,7) as dt,
sum(amount) as am
from sale_order
group by user_id,substr(sale_dt,1,7)
)

select
user_id,
max(`2019_01`) as `2019_01`,
max(`2019_02`) as `2019_02`,
max(`2019_03`) as `2019_03`
from
(
select
user_id,
case when dt='2019-01' then am else 0 end as `2019_01`,
case when dt='2019-02' then am else 0 end as `2019_02`,
case when dt='2019-03' then am else 0 end as `2019_03`
from tmp
) t
group by user_id;

Question 3: Generate the following data according to the order table sale_order, please write SQL, preferably supported by hive

month Cumulative sales Cumulative sales
2019-01 3 300
2019-02 7 660
2019-03 13 1210

Calculation:

with tmp as(
select
substr(sale_dt,1,7) as dt,
sum(sale_count) as sum_cnts,
sum(amount) as sum_amt
from sale_order
group by substr(sale_dt,1,7)
)

select
dt as `月份`,
sum(sum_cnts) over(order by dt rows between unbounded preceding and current row) as `累计销量`,
sum(sum_amt) over(order by dt rows between unbounded preceding and current row) as `累计销售额`
from tmp
;

Question 4. Another product table products are as follows, please ask questions, please give the cumulative sales volume and cumulative sales ranking, and write sql.

sku_id sku_name
1001 Product 1
1002 Product 2
1003 Product 3

Calculation: (I feel that the demand is incomplete. My personal understanding is to seek the sales volume ranking and sales ranking of each product each month, and the sales volume and sales volume of each product each month are cumulative)

with tmp as(
select
substr(sale_dt,1,7) as dt,
sku_name,
sum(sale_count) as sum_cnts,
sum(amount) as sum_amt
from sale_order
join products on sale_order.sku_id=products.sku_id
group by substr(sale_dt,1,7),sku_name
)

select
dt,
sku_name,
row_number() over(partition by dt order by `累计销售量` desc) as `累计销售量排名`,
row_number() over(partition by dt order by `累计销售额` desc) as `累计销售额排名`
from
(
select
dt,
sku_name,
sum(sum_cnts) over(partition by sku_name order by dt rows between unbounded preceding and current row) as `累计销售量`,
sum(sum_amt) over(partition by sku_name order by dt rows between unbounded preceding and current row) as `累计销售额`
from tmp
) t
;

Question 5. Based on the above data, please write some conclusive opinions.

Guess you like

Origin blog.csdn.net/weixin_47699191/article/details/114679607
Recommended