hive常用sql语句写法

##########group_concat,concat_ws,collect_list,collect_ws函数用法############
group_concat替代写法
concat_ws('|', collect_set(str))

select user,concat_ws(',' , collect_set ( concat ( order_type , '(' , order_number , ')' ) ) ) order from table group by user
sql解析：
其中order是别名第一个逗号是以逗号分隔！
collect_set的作用：

（1）：去重，对group by后面的user进行去重
（2）：对group by以后属于同一user的形成一个集合，结合concat_ws对集合中元素使用，进行分割形成字符串

HIVE的行转列列的操作：

在hive中如何进行行转列呢？使用：concat_ws(‘分割时使用的字符’collect_set(字段))

test.txt

a b 1
a b 2
a b 3
c d 4
c d 5
c d 6
如何将上图转化成为：

a b 1,2,3
c d 4,5,6
语句：
  select col1,col2,concat_ws( ' , ' , collect_set(col3)) from  tablename  group by col1 , col2;
上述就是一个行转列！使用到了concat_ws()这个函数！
其中里面的collect_set 的作用：它只接受基本数据类型，主要作用是将某字段的值进行去重汇总，产生array类型字段。
如何将上面的在转化回去呢？？这就用到了列转行：lateral view explode()  HIVE的列转行函数。
语句： ;
   select col1 , col2 ,col5  from tablename lateral view explode( split(col3,','))b as col5 ;
   其中b是一个虚表的名称，这个字段是必须要有的！。
   其中explode 的作用的是列转行：这个函数接受array类型的参数，其作用恰好与collect_set相反，实现将array类型数据行转列：
上面就是一个concat   concat_ws   collect_set 的用处。

hive行转列与列转行
2、一行转多行
说明：lateral view用于和split、explode等UDTF一起使用的，能将一行数据拆分成多行数据，
在此基础上可以对拆分的数据进行聚合，lateral view首先为原始表的每行调用UDTF，
UDTF会把一行拆分成一行或者多行，lateral view在把结果组合，产生一个支持别名表的虚拟表。
create table specter.userapps as
select mid,device,app from specter.userapp_list
lateral view explode(split(applist,',')) r1 AS app;

select count(*) from xx
lateral view explode(pair) ids_table1 as id1 lateral view explode(pair) ids_table2 as id2
where year='2016' and month='02' and day='23' and id2.type=68 and id1.type=64 and array_contains(dates,'20160223');

2、多行转一行

说明：collect_set: 返回去重的元素数组。

select mid,device,collect_set(app) from
(select a.mid,a.device,a.app from specter.userapps a join specter.userapps b
on a.mid = b.mid and a.device=b.device) tmp
group by mid,device;

hive行转列
hive > select product_id, concat_ws('_',collect_set(promotion_id)) as promotion_ids from product_promotion group by product_id;

select id,concat_ws(',',collect_set(cast(colname as string)))
from table;
使用concat_ws函数，需将字段转成string格式，collect_set会对该列进行去重，如果不需要去重，可使用collect_list参数代替。

hive参数的优化

SET mapred.reduce.tasks=50;
SET mapreduce.reduce.memory.mb=6000;
SET mapreduce.reduce.shuffle.memory.limit.percent=0.06;

涉及数据倾斜的话，主要是reduce中数据倾斜的问题，可能通过设置hive中reduce的并行数，
reduce的内存大小单位为m，reduce中 shuffle的刷磁盘的比例，来解决。

这种用在出现数据倾斜时经常使用

参数说明：

1）如果是小表，自动选择Mapjoin：
set hive.auto.convert.join = true; # 默认为false
该参数为true时，Hive自动对左边的表统计量，如果是小表就加入内存，即对小表使用Map join

2）大表小表的阀值：
set hive.mapjoin.smalltable.filesize;
hive.mapjoin.smalltable.filesize=25000000
默认值是25mb

3）map join做group by 操作时，可以使用多大的内存来存储数据，如果数据太大，则不会保存在内存里
set hive.mapjoin.followby.gby.localtask.max.memory.usage;
默认值：0.55

4）本地任务可以使用内存的百分比
set hive.mapjoin.localtask.max.memory.usage;

hive常用sql语句写法

猜你喜欢