hive sql

case when

case用法1:
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
/* When a = b, returns c; when a = d, returns e; else returns f. */
case用法2:
CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
/* When a = true, returns b; when c = true, returns d; else returns e. */

两个例子：

select (case col1 when 0 then col2 else col1 end) as new column from table_name;
select sum(case when col1 = 'val1' then col2 else 0 end) from table_name group by col3;

percent、percentile

percentile(BIGINT col, p)：准确地求在列col上的对应的p分位数，用的较多的是求中位数，col列的值只能是整数，不能是浮点数；
p必须在0和1之间；
比如p=0.1，则col列上的值，有10%的数小于查询出的结果；
percentile(BIGINT col, array(p1 [, p2]...))：当求多个分位数时用数组；

percentile_approx(DOUBLE col, p [, B])：求约等于列col上的对应的p分位数；
percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B])：当求多个分位数时用数组；

一个例子：
select percentile(summary, 0.1) from (select imei, sum(download) as summary from(select imei, (case diffSize when 0 then apkSize else diffSize end) as download from table_name where date=20180415) as imeiselect group by imei order by summary desc) as percselect;

子查询

子查询就是在一个查询之中嵌套了其它若干的查询。

as

as可以给表或者列或者子查询起别名。
比如hive sql当子查询在from后，如果子查询没有别名报错。

explode

explode拆分json格式数组，参考1很详细。

/* explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. */
select explode(col1) as content from tableName;

但是，如果select查询结果增加其他列，则报错，解决方法是使用lateralview。

lateral view

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias

SELECT qId, cId, vId FROM answer
LATERAL VIEW explode(vIds) visitor AS vId
WHERE cId = 2

hive sql的使用(一)

hive sql

case when

percent、percentile

子查询

as

explode

lateral view

references

猜你喜欢