Hive -函数

聚合函数

    max min  sum avg count    (对于这种聚合函数就会运行Map Reduce)
hive (default)> select count(1) from ruoze_emp where deptno=10;(查询部门编号为10的人的数量)
hive (default)> select max(sal) min(sal) avg(sal) sum(sal) from ruoze_emp;

分组函数

出现在select中的字段,要么出现在group by子句中,要么出现在聚合函数中

hive (default)> select deptno,avg(sal) from ruoze_emp group by deptno;(求部门的平均工资。)
hive (default)> select deptno,job,max(sal) from ruoze_emp group by deptno,job;(求每个部门、工作岗位的最高工资)
10 CLERK 1300.0
10 MANAGER 2450.0
10 PRESIDENT 5000.0
20 ANALYST 3000.0
20 CLERK 1100.0
20 MANAGER 2975.0
30 CLERK 950.0
30 MANAGER 2850.0
30 SALESMAN 1600.0

hive (default)> select deptno,avg(sal) from ruoze_emp group by deptno having avg(sal)>2000;(求每个部门的平均薪水大于2000的部门)(如果把having改成where会报错的,因为作用在分组之上的函数要用having,即group by与having搭配使用)

case when then if-else

hive (default)> select ename, sal, case when sal>1 and sal<=1000 then 'LOWER'  when sal>1000 and sal<=2000 then 'MIDDLE' when sal>2000 and sal<=4000 then 'HIGH' ELSE 'HIGHEST' end from ruoze_emp;
SMITH 800.0 LOWER
ALLEN 1600.0 middle
WARD 1250.0 middle
JONES 2975.0 HIGH
HIVE 10300.0 HIGHEST

查看hive的内置函数

hive (default)> show functions;(查看hive的内置函数)
hive (default)> desc function upper;(查看具体的某个函数的用法)
upper(str) - Returns str with all characters changed to uppercase (upper函数后面跟字符串,其作用是把字符串变为大写)
hive (default)> desc function extended upper;(更详细查看)

数据倾斜

union all select count(1) from ruoze_emp where deptno=10 union all select count(1) from ruoze_emp where deptno=20;
a = a1 union all a2 (a表假设是倾斜的,把a表分为两部分,倾斜的a1和不倾斜的a2 然后把它们的结果进行联合到一起去)

类型转换函数

cast(value as TYPE)
举例如下:

hive (default)> select empno,ename,sal,comm,cast(comm as int) from ruoze_emp;(把comm下面的值转为整型)
7369 SMITH 800.0 NULL NULL
7499 ALLEN 1600.0 300.0 300
7521 WARD 1250.0 500.0 500
7566 JONES 2975.0 NULL NULL
(注:如果转换失败,返回值就是null)
hive (default)> select cast('5' as int);(把字符串5转为int类型)
hive (default)> select current_timestamp;
2018-11-08 22:11:19.285
hive (default)> select cast(current_timestamp as date);
2018-11-08

字符串相关函数:

1.substr:

hive (default)> select substr('abcdefg',2,3);(从字符串的第二个字符开始取三个字符)
bcd

2.concat_ws:

hive (default)> desc function extended concat_ws;
concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator.

hive (default)> select concat_ws('.','www',array('facebook','com'));
www.facebook.com
hive (default)> select concat_ws('.','192','168','2','65');
192.168.2.65            (注意有无array的区别)
hive (default)> select length('192.168.2.65');
12

3.split:

hive (default)> select split ("192.168.2.65",'.');
["","","","","","","","","","","","",""]
hive (default)> select split ("192.168.2.65",'\\.');(用转义字符对点进行转义)
["192","168","2","65"]

4.explode:

hive (default)> desc function extended explode;
explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns

实例:

[hadoop@hadoop001 data]$ vi student.txt(创建一个文档)
1,doudou,化学:物理:数学:语文
2,dasheng,化学:数学:生物:生理:卫生
3,rachel,化学:语文:英语:体育:生物
hive (default)> create table ruoze_student(id int,name string,subjects array<string>)row format delimited fields terminated by ','COLLECTION ITEMS TERMINATED BY ':';
load data local inpath '/home/hadoop/data/student.txt' into table ruoze_student;
hive (default)> select explode(subjects) from ruoze_student;
化学
物理
数学
语文
化学
数学
生物
生理
卫生
化学
语文
英语
体育
生物
hive (default)> select distinct s.sub from(select explode (subjects) as sub from ruoze_student) s;(完成对上面学科的去重)
体育
化学
卫生
数学
物理
生物
生理
英语
语文

面试题:使用hive完成wordcount

hive (default)> create table ruoze_wc(sentence string);
hive (default)> load data local inpath"/home/hadoop/data/wc.txt" into table ruoze_wc;
hive (default)> select *from ruoze_wc;
hello,world,welcome
hello,welcome
步骤1:把字符串进行拆分
hive (default)> select split(sentence,',') from ruoze_wc;
["hello","world","welcome"]
["hello","welcome"]
步骤2:把数组里面的每个单词都拆出来,按每行一个单词
hive (default)> select explode(split(sentence,',')) from ruoze_wc;
hello
world
welcome
hello
welcome
步骤3:对单词进行统计个数
hive (default)> select word, count(1) as c  from(select explode(split(sentence,",")) as word from ruoze_wc) t group by word  order by c desc; (t是属于别名,虽然没有用到,但是如果不加上别名格式会报错。desc是指按照降序排列)
welcome 2
hello 2
world 1

猜你喜欢

转载自blog.csdn.net/qq_42694416/article/details/84317668