Hive execution sequence:
**FROM-->WHERE-->GROUP BY-->HAVING-->SELECT-->ORDER BY**
Writing order:
**SELECT DISTINCT
FROM
JOIN
ON
WHERE
GROUP BY
WITH
HAVING
ORDER BY
LIMIT**
HIVE is a Hadoop-based data warehouse
Hive SQL compared with traditional SQL:
Note: If the table is a partitioned table , the partition field must be restricted in the where condition.
select user_name,piece,pay_amount from user_trade
where dt="2019-04-09" and goods_category='food';
The role of GROUP BY: subtotal
SELECT user_name,sum(pay_amount) AS total_amount
FROM user_trade WHERE dt between '2019-04-01' and '2019-04-30' GROUP BY user_name
HAVING sum(pay_amount)>50000;
HAVING: filter group by objects and only return results that meet the HAVING conditions
How to convert a timestamp to a date?
select pay_time,from_unixtime(pay_time,
'yyyy-MM-dd hh:mm:ss')
from user_trade
where dt='2019-04-09';
format:
1.yyyy-MM-dd hh:mm:ss
2.yyyy-MM-dd hh
3.yyyy-MM-dd hh:mm
4.yyyyMMdd
– How to retrieve users who are in user_list_1 but not in user_list_2?
select a.user_id,
a.user_name
from user_list_1 a left join user_list_2 b on a.user_id=b.user_id
where b.user_id is null;
--注:MySQL中的写法(子查询)
select user_id,
user_name
from user_list_1
where user_id not in(select user_id from user_list_2)
--在2019年购买但是没有退款的用户
select a.user_name
from
(select distinct user_name
from user_trade
where year(dt)=2019)a
left join
(select distinct user_name
from user_refund
where year(dt)=2019)b
on a.user_name=b.user_name
where b.user_name is null;
-- 在2017年、2018年、2019年都有交易的用户
-- 第一种写法
select distinct a.user_name
from trade_2017 a
join trade_2018 b on a.user_name=b.user_name
join trade_2019 c on b.user_name=c.user_name;
-- 第二种写法(在表的数据量很大时,推荐这种写法,hive中建议这种写法)
select a.user_name
from
(select distinct user_name
from trade_2017)a
join
(select distinct user_name
from trade_2018)b on a.user_name=b.user_name
join
(select distinct user_name
from trade_2019)c on b.user_name=c.user_name;