hive 常用语句汇总

hive的常用语句及UDF
    基本语句
        -》 字段的查询
        -》 where、limit、distince
        -》查询部门编号是30的员工        

select empno,ename,deptno from emp where deptno='30'

 -》查看前3条记录

select * from emp limit 3;

-》查询当前有哪些部门

select distinct deptno from emp;

-》 between and, > < =  is null , is not null,in,not in
 -》查询员工编号大于7500        

select * from emp where empno > 7500;

-》查询薪资2000 到3000之间的
        

select * from emp where sal between 2000 and 3000;

        -》查询奖金不为空的员工        

select * from emp where comm is not null;

        -》聚合函数 max 、 min、avg、count、sum

select count(1) cnt from emp;
select max(sal) max_sal from emp;
select avg(sal) avg_sal from emp;

        -》 group by, having
            -> 求每个部门的评价工资            

select deptno,avg(sal) from emp group by deptno;

            ->求部门平均工资大于2000的            

select deptno, avg(sal) avg from emp group by deptno having avg > 2000;

        -》 join

            - 等值join(innner join) :两边都有的值进行join                

扫描二维码关注公众号,回复: 4693212 查看本文章
select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a inner join dept b on a.deptno = b.deptno;

            - left join : 以左表的值为基准
                

select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a left join dept b on a.deptno = b.deptno;

            - right join : 以右表的值为基准            

select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a right join dept b on a.deptno = b.deptno;

            - full join:以两张表中所有的值为基准           

select a.empno,a.ename,a.sal,b.deptno,b.dname from emp a full join dept b on a.deptno = b.deptno;

四、hive中的四种排序
    -> order by:对某一列进行全局排序         

select empno,ename,deptno,sal from emp order by sal desc;

-> sort by:对每个reduce进行内部排序,如果只有一个reduce,等同于order by        

 set mapreduce.job.reduces =2
 insert overwrite local directory '/opt/datas/sort' select empno,ename,deptno,sal from emp sort by sal desc;

    -> distribute by:对数据按照某个字段进行分区,交给不同的reduce进行处理,一般与sort by 连用,必须放在sort by前面   

 insert overwrite local directory '/opt/datas/distribute' select empno,ename,deptno,sal from emp distribute by empno sort by sal desc;

    -> cluster by:当我们的distribute by 与sort by 使用的是同一个字段时,可用cluster by代替

猜你喜欢

转载自blog.csdn.net/dengwenqi123/article/details/81702784