聚合函数

    max min  sum avg count    （对于这种聚合函数就会运行Map Reduce）

hive (default)> select count(1) from ruoze_emp where deptno=10;（查询部门编号为10的人的数量）
hive (default)> select max(sal) min(sal) avg(sal) sum(sal) from ruoze_emp;

分组函数

出现在select中的字段，要么出现在group by子句中，要么出现在聚合函数中

hive (default)> select deptno,avg(sal) from ruoze_emp group by deptno;（求部门的平均工资。）
hive (default)> select deptno,job,max(sal) from ruoze_emp group by deptno,job;（求每个部门、工作岗位的最高工资）
10 CLERK 1300.0
10 MANAGER 2450.0
10 PRESIDENT 5000.0
20 ANALYST 3000.0
20 CLERK 1100.0
20 MANAGER 2975.0
30 CLERK 950.0
30 MANAGER 2850.0
30 SALESMAN 1600.0

hive (default)> select deptno,avg(sal) from ruoze_emp group by deptno having avg(sal)>2000;（求每个部门的平均薪水大于2000的部门）（如果把having改成where会报错的，因为作用在分组之上的函数要用having，即group by与having搭配使用）

case when then if-else

hive (default)> select ename, sal, case when sal>1 and sal<=1000 then 'LOWER'  when sal>1000 and sal<=2000 then 'MIDDLE' when sal>2000 and sal<=4000 then 'HIGH' ELSE 'HIGHEST' end from ruoze_emp;
SMITH 800.0 LOWER
ALLEN 1600.0 middle
WARD 1250.0 middle
JONES 2975.0 HIGH
HIVE 10300.0 HIGHEST

查看hive的内置函数

hive (default)> show functions;（查看hive的内置函数）
hive (default)> desc function upper;（查看具体的某个函数的用法）
upper(str) - Returns str with all characters changed to uppercase （upper函数后面跟字符串，其作用是把字符串变为大写）
hive (default)> desc function extended upper;（更详细查看）

数据倾斜

union all select count(1) from ruoze_emp where deptno=10 union all select count(1) from ruoze_emp where deptno=20;
a = a1 union all a2 （a表假设是倾斜的，把a表分为两部分，倾斜的a1和不倾斜的a2 然后把它们的结果进行联合到一起去）

类型转换函数

cast(value as TYPE)
举例如下：

hive (default)> select empno,ename,sal,comm,cast(comm as int) from ruoze_emp;（把comm下面的值转为整型）
7369 SMITH 800.0 NULL NULL
7499 ALLEN 1600.0 300.0 300
7521 WARD 1250.0 500.0 500
7566 JONES 2975.0 NULL NULL
(注：如果转换失败，返回值就是null)
hive (default)> select cast('5' as int);（把字符串5转为int类型）
hive (default)> select current_timestamp;
2018-11-08 22:11:19.285
hive (default)> select cast(current_timestamp as date);
2018-11-08

字符串相关函数：

1.substr：

hive (default)> select substr('abcdefg',2,3);（从字符串的第二个字符开始取三个字符）
bcd

2.concat_ws：

hive (default)> desc function extended concat_ws;
concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator.

hive (default)> select concat_ws('.','www',array('facebook','com'));
www.facebook.com
hive (default)> select concat_ws('.','192','168','2','65');
192.168.2.65            （注意有无array的区别）
hive (default)> select length('192.168.2.65');
12

3.split：

hive (default)> select split ("192.168.2.65",'.');
["","","","","","","","","","","","",""]
hive (default)> select split ("192.168.2.65",'\\.');（用转义字符对点进行转义）
["192","168","2","65"]

4.explode：

hive (default)> desc function extended explode;
explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns

实例：

[hadoop@hadoop001 data]$ vi student.txt（创建一个文档）
1,doudou,化学:物理:数学:语文
2,dasheng,化学:数学:生物:生理:卫生
3,rachel,化学:语文:英语:体育:生物
hive (default)> create table ruoze_student(id int,name string,subjects array<string>)row format delimited fields terminated by ','COLLECTION ITEMS TERMINATED BY ':';
load data local inpath '/home/hadoop/data/student.txt' into table ruoze_student;
hive (default)> select explode(subjects) from ruoze_student;
化学
物理
数学
语文
化学
数学
生物
生理
卫生
化学
语文
英语
体育
生物
hive (default)> select distinct s.sub from(select explode (subjects) as sub from ruoze_student) s;（完成对上面学科的去重）
体育
化学
卫生
数学
物理
生物
生理
英语
语文

面试题：使用hive完成wordcount

hive (default)> create table ruoze_wc(sentence string);
hive (default)> load data local inpath"/home/hadoop/data/wc.txt" into table ruoze_wc;
hive (default)> select *from ruoze_wc;
hello,world,welcome
hello,welcome
步骤1：把字符串进行拆分
hive (default)> select split(sentence,',') from ruoze_wc;
["hello","world","welcome"]
["hello","welcome"]
步骤2：把数组里面的每个单词都拆出来，按每行一个单词
hive (default)> select explode(split(sentence,',')) from ruoze_wc;
hello
world
welcome
hello
welcome
步骤3：对单词进行统计个数
hive (default)> select word, count(1) as c  from(select explode(split(sentence,",")) as word from ruoze_wc) t group by word  order by c desc; （t是属于别名，虽然没有用到，但是如果不加上别名格式会报错。desc是指按照降序排列）
welcome 2
hello 2
world 1

Hive -函数