Hive使用中常见问题总结(二)

尊敬的读者您好:笔者很高兴自己的文章能被阅读,但原创与编辑均不易,所以转载请必须注明本文出处并附上本文地址超链接以及博主博客地址:https://blog.csdn.net/vensmallzeng。若觉得本文对您有益处还请帮忙点个赞鼓励一下,笔者在此感谢每一位读者,如需联系笔者,请记下邮箱:[email protected],谢谢合作!

1、类型转换

如将int型转成string型,再与空字符串进行比较

cast (reser_no as string) <> ' '

2、去重

① distinct:对select 后面所有字段去重,并不能只对一列去重;当distinct应用到多个字段的时候,distinct必须放在开头,其应用的范围是其后面的所有字段,而不只是紧挨着它的一个字段,而且distinct只能放到所有字段的前面。

计数时直接DISTINCT去重:

SELECT order_id, count(DISTINCT customer_id) as person_number FROM order_map_customer group by order_id

② group by:对group by 后面所有字段去重,并不能只对一列去重

SELECT * FROM TABLE WHERE ID IN (SELECT MAX(ID) FROM TABLE GROUP BY [去除重复的字段名列表,....])

③ ROW_Number() over():row_number() over(partition by col1 order by col2) 表示根据col1分组,在分组内部根据 col2排序,而此函数计算的值就表示每组内部排序后的顺序编号(组内连续的唯一的)

With order_map_customer as (SELECT b.customer_id, a.order_id, b.reser_no, a.room_type FROM
(select * from dev_hotel.user_portrait_order_base) a
join
(select * from base_elong.dshord_reserve_guests where customer_id > 0) b
On a.order_id = b.reser_no
where cast (b.reser_no as string) <> '')


SELECT order_id, room_type, person_number from
(SELECT c.order_id as order_id, d.room_type as room_type, c.person_number as person_number, row_number() over(partition by c.order_id ORDER BY c.person_number desc) rank from
(SELECT order_id, count(DISTINCT customer_id) as person_number FROM order_map_customer
group by order_id) c
join
(select * from order_map_customer) d
On c.order_id = d.reser_no) e
where e.rank = 1

推荐阅读:[1] https://www.cnblogs.com/sonia0087/p/9996366.html

                  [2] https://blog.csdn.net/hd243608836/article/details/80088173

3、聚合函数

形如SUM, COUNT, MAX, AVG等函数,我们称之为聚合函数,与其它函数的区别在于它们一般作用在多条记录上。

4、count的用法

若查询只需返回一个结果,那么可以直接针对单条记录进行count统计

5、count与group by结合用法

Group By语句从英文的字面意义上理解就是“根据(by)一定的规则进行分组(Group)”。它的作用是通过一定的规则将一个数据集划分成若干个小的区域,然后针对若干个小区域进行数据处理。
注意:group by 是先排序后分组;

6、where和having区别

① having子句可以让我们筛选成组后的各组数据,having子句可以使用聚合函数
② where子句在聚合前先筛选记录.也就是说作用在group by子句和having子句前。where字句中不能使用聚合函数

7、当同时含有where子句、group by 子句 、having子句及聚集函数时,执行顺序如下:
① 执行where子句查找符合条件的数据;
② 使用group by子句对数据进行分组;对group by 子句形成的组运行聚集函数计算每一组的值;最后用having 子句去掉不符合条件的组。
③ having子句和where子句都可以用来设定限制条件以使查询结果满足一定的条件限制。
④ having子句限制的是组,而不是行。where子句中不能使用聚集函数,而having子句中可以。

8、统计一个字段去重后的条数,若列字段非空时

select count(distinct column)from db.table

若列字段为空串或者null时,则给它赋个新值,然后再进行正常的去重统计,故此时查询语句可改写为

select count(distinct case when trim(column) is null or trim(column) ='' then  ''1)from db.table

9、对每名用户下过单的房型分类进行统计数目

SELECT unionid, COUNT(case when room_type like '%大床房%' or room_type like '%大床间%' then 1
                   else NULL
                   end) as `大床房`,
        COUNT(case when room_type like '%单人房%' or room_type like '%单人间%' or room_type like '%单间%' then 1 
                   else NULL
                   end) as `单人房`,
        COUNT(case when room_type like '%标准间%' or room_type like '%标准房%' or room_type like '%双人房%'
                   or room_type like '%双床房%' or room_type like '%双床间%' or room_type like '%双人间%' 
                   or room_type like '%标间%' then 1 
                   else NULL
                   end) as `标准房`,
        COUNT(case when room_type like '%家庭房%' or room_type like '%套房%' or room_type like '%亲子房%'
                   or room_type like '%亲子间%' then 1 
                   else NULL
                   end) as `家庭房`,
        COUNT(case when room_type like '%高级房%' or room_type like '%高级间%' then 1 
                   else NULL
                   end) as `高级房`,
        COUNT(case when room_type like '%' then 1 
                   else NULL
                   end) as `其他`
FROM dev_hotel.user_portrait_order_base
group by unionid
ORDER BY rand()
LIMIT 100

10、【行转列】将不同unionid对应同一属性的值,罗列成一列,即从图一到图二

                                                                                          图一

                                                                                          图二

具体实现方法如下:先将同一unionid对应的属性,生成两个字段一个unionid ,将标签与标签名先拼接成指定格式再用str_to_map方法生成一个包含所有标签与标签名的字典{ label_name_en: label_value };然后对字典进行解析得到每个unionid的标签名对应的标签值,最后就能实现了图二的效果。具体hive sql如下:

select g.unionid, g.all_label_name_and_value_map["mem_basic"] as mem_basic, 
g.all_label_name_and_value_map["mem_outgoing"] as mem_outgoing,
g.all_label_name_and_value_map["mem_marriage"] as mem_marriage,
g.all_label_name_and_value_map["mem_education"] as mem_education,
g.all_label_name_and_value_map["mem_certification"] as mem_certification,
g.all_label_name_and_value_map["mem_stayArea"] as mem_stayArea,
g.all_label_name_and_value_map["mem_stayCity"] as mem_stayCity,
g.all_label_name_and_value_map["mem_age"] as mem_age,
g.all_label_name_and_value_map["mem_sex"] as mem_sex,
g.all_label_name_and_value_map["mmem_assets"] as mem_assets,
g.all_label_name_and_value_map["mem_constellation"] as mem_constellation,
g.all_label_name_and_value_map["mem_social_stratum"] as mem_social_stratum,
g.all_label_name_and_value_map["mem_area"] as mem_area,
g.all_label_name_and_value_map["mem_city"] as mem_city,
g.all_label_name_and_value_map["mem_career"] as mem_career,
g.all_label_name_and_value_map["mem_birthday"] as mem_birthday,
g.all_label_name_and_value_map["mem_mail"] as mem_mail
 from
(
    select f.unionid, str_to_map(f.all_label_name_and_value, '&', ':') as all_label_name_and_value_map from 
        (select e.unionid, concat_ws('&', collect_list(e.label_name_and_value)) as all_label_name_and_value  from
            (select a.unionid, concat(d.label_name_en, ':', d.label_value) as label_name_and_value from 
                (select * from dev_hotel.user_family_features_no_comment_random_consistent) a
                left join
                (select b.*  from 
                        (select * from dev_hotel.user_portrait_base_info) b
                        join 
                        (select * from dev_hotel.user_label_sys where label_id like '01%') c
                        on b.label_name_en=c.label_name_en
                ) d
                on a.unionid = d.unionid 
            ) e
            group by e.unionid
        ) f
    ) g

参看文献:

[1] https://blog.csdn.net/liu_shi_jun/article/details/51329472

[2] https://www.cnblogs.com/blogyuhan/p/9274784.html

[3] https://blog.csdn.net/yuanyangsdo/article/details/64441165

日积月累,与君共进,增增小结,未完待续。

发布了152 篇原创文章 · 获赞 147 · 访问量 11万+

猜你喜欢

转载自blog.csdn.net/Vensmallzeng/article/details/103486604