hive 常用语句汇总

hive的常用语句及UDF
   基本语句
       -》字段的查询
       -》 where、limit、distince
       -》查询部门编号是30的员工

select empno,ename,deptno from emp where deptno='30'

-》查看前3条记录

select * from emp limit 3;

-》查询当前有哪些部门

select distinct deptno from emp;

-》 between and， > < = is null , is not null,in,not in
-》查询员工编号大于7500

select * from emp where empno > 7500;

-》查询薪资2000 到3000之间的

select * from emp where sal between 2000 and 3000;

-》查询奖金不为空的员工

select * from emp where comm is not null;

-》聚合函数 max 、 min、avg、count、sum

select count(1) cnt from emp;
select max(sal) max_sal from emp;
select avg(sal) avg_sal from emp;

-》 group by, having
-> 求每个部门的评价工资

select deptno,avg(sal) from emp group by deptno;

->求部门平均工资大于2000的

select deptno, avg(sal) avg from emp group by deptno having avg > 2000;

-》 join

- 等值join(innner join) ：两边都有的值进行join

扫描二维码关注公众号，回复： 4693212 查看本文章

select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a inner join dept b on a.deptno = b.deptno;

- left join ：以左表的值为基准

select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a left join dept b on a.deptno = b.deptno;

- right join ：以右表的值为基准

select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a right join dept b on a.deptno = b.deptno;

- full join：以两张表中所有的值为基准

select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a full join dept b on a.deptno = b.deptno;

四、hive中的四种排序
-> order by:对某一列进行全局排序

select empno,ename,deptno,sal from emp order by sal desc;

-> sort by:对每个reduce进行内部排序,如果只有一个reduce,等同于order by

 set mapreduce.job.reduces =2
 insert overwrite local directory '/opt/datas/sort' select empno,ename,deptno,sal from emp sort by sal desc;

-> distribute by:对数据按照某个字段进行分区，交给不同的reduce进行处理，一般与sort by 连用，必须放在sort by前面

 insert overwrite local directory '/opt/datas/distribute' select empno,ename,deptno,sal from emp distribute by empno sort by sal desc;

-> cluster by:当我们的distribute by 与sort by 使用的是同一个字段时，可用cluster by代替