Hive的SQL操作

1、分区表

1）创建分区表


  
  
   
   hive> create 
   
   table dept_partitions()

  
  
  
  
   
         > partition by()

  
  
  
  
   
         > row format

  
  
  
  
   
         > delimited fields

  
  
  
  
   
         > terminated by 
   
   '';

例：


  
  
   
   hive> create table dept_partitions(deptno int, dept 
   
   string, loc 
   
   string)

  
  
  
  
   
       > partitioned 
   
   by(day 
   
   string)

  
  
  
  
   
       > row format

  
  
  
  
   
       > delimited fields

  
  
  
  
   
       > terminated 
   
   by 
   
   '\t';

  
  
  
  
   
   hive> load 
   
   data 
   
   local inpath 
   
   '/root/dept.txt' 
   
   into table dept_partitions

  
  
  
  
   
       > partition(day=
   
   '0228');

2）查询
全查询
hive> select * from dept_partitions;
注意：此时查看的是整个分区表中的数据

单分区查询
hive> select * from dept_partitions where day = ‘0228’;
注意：此时查看的是指定分区中的数据

联合查询
hive> select * from dept_partitions where day = ‘0228’ union select * from dept_partitions where day = ‘0302’;

添加单个分区
hive> alter table dept_partitions add partition(day = ‘0303’);
注意：如果想一次添加多个的话空格分割即可
hive> alter table dept_partitions add partition(day = ‘0304’) partition(day = ‘0305’);

查看分区
hive> show partitions dept_partitions;

删除分区
hive> alter table dept_partitions drop partition(day=’0305’);
分区表在hdfs中分目录文件夹

hive> dfs -mkdir -p /user/hive/warehouse/dept_partitions/day=0305;

hive> dfs -put /root/dept.txt /user/hive/warehouse/dept_partitions/day=0305;

hive> show partitions dept_partitions;
此时并没有day=0305，需要进行下面操作

导入数据
相当于修复数据：msck repair table dept_partitions;

2、DML数据操作

1）数据的导入
hive> load data [local] inpath ” into table ;

2）向表中插入数据
hive> insert into table student_partitions partition(age = 20)
values(1,’re’);
向表中插入sql查询结果数据
hive> insert overwrite table student_partitions partition(age = 20) select * from hsiehchou where id<3;

create方式：
hive> create table if not exists student_partitions1 as select * from student_partitions where id = 2;

3）创建表直接加载数据


  
  
   
   hive> create 
   
   table student_partitions3(id 
   
   int,name string)

  
  
  
  
   
         > row format

  
  
  
  
   
         > delimited fields

  
  
  
  
   
         > terminated by 
   
   '\t'

  
  
  
  
   
         > location 
   
   '';

注意：locatition路径是hdfs路径
关联文件时不能有多级目录！！！
例：


  
  
   
   hive> create 
   
   table student_partitions4(id 
   
   int,name string)

  
  
  
  
   
       > row format

  
  
  
  
   
       > delimited fields

  
  
  
  
   
       > terminated by 
   
   '\t'

  
  
  
  
   
       > location 
   
   '/wc';

4）把操作结果导出到本地linux
hive> insert overwrite local directory ‘/root/data’ select * from hsiehchou;

5）把hive中表数据导出到hdfs中
hive> export table hsiehchou to ‘/hsiehchou’;

把hdfs数据导入到hive中
hive> import table hsiehchou3 from ‘/hsiehchou/’;

6）清空表数据
hive> truncate table hsiehchou3;

3、查询操作

基础查询
select * from table;全表查询
hive> select hsiehchou.id,hsiehchou.name from table …;指定列

1）指定列查询
hive> select hsiehchou.name from hsiehchou;

2）指定列查询设置别名
hive> select hsiehchou.name as myname from hsiehchou;

3）创建员工表


  
  
   
   hive> create table hive_db.emptable(empno 
   
   int, ename 
   
   string , job 
   
   string,mgr 
   
   int, birthday 
   
   string, sal 
   
   double, comm 
   
   double, deptno 
   
   int)

  
  
  
  
   
       > row format

  
  
  
  
   
       > delimited fields

  
  
  
  
   
       > terminated 
   
   by 
   
   '\t';

hive> load data local ‘/root/emp.txt’ into table hive_db.emptable;

4）查询员工姓名和工资(每个员工加薪1000块)
hive> select emptable.ename,emptable.sal+1000 salmoney from emptable;

5）查看公司有多少员工
hive> select count(1) empnumber from emptable;

6）查询工资最高的工资
hive> select max(sal) numberone from emptable;

7）查询工资最小的工资
hive> select min(sal) from emptable;

8）求工资的总和
hive> select sum(sal) sal_sum from emptable;

9）求该公司员工工资的平均值
hive> select avg(sal) sal_avg from emptable;

10）查询结果只显示前多少条
hive> select * from emptable limit 4;

11）where语句
作用：过滤
使用：where子句紧接着from

求出工资大于2600的员工
hive> select * from emptable where sal>2600;

求出工资在1000~2500范围的员工
hive> select * from emptable where sal>1000 and sal<2500;

或者
hive> select * from emptable where sal between 1000 and 2500;

查询工资在2000和3000这两个数的员工信息
hive> select ename from emptable where sal in(2000,3000);

12）is null与is not null
空与非空的过滤
空
hive> select * from emptable where comm is null;

非空
hive> select * from emptable where comm is not null;

13）like
模糊查询
使用：
通配符% 后面零个或者多个字符
_代表一个字符

查询工资以1开头的员工信息
hive> select * from emptable where sal like ‘1%’;

查询工资地第二位是1的员工信息
hive> select * from emptable where sal like ‘_1%’;

_代表一个字符
查询工资中有5的员工信息
hive> select * from emptable where sal like ‘%5%’;

14）And/Not/Or
查询部门号30并且工资大于1000的员工信息
hive> select * from emptable where sal>1000 and deptno=30;

查询部门号30或者工资大于1000的员工信息
hive> select * from emptable where sal>1000 or deptno=30;

查询工资在2000和3000这两个数的员工信息
hive> select * from emptable where sal in(2000,3000);

查询工资不在2000和3000这两个数的员工信息
hive> select * from emptable where sal not in(2000,3000);

15）分组操作
Group By语句
通常和一些聚合函数一起使用
求每个部门的平均工资
hive> select avg(sal) avg_sal,deptno from emptable group by deptno;
having
where：后不可以与分组函数，而having可以

求每个部门的平均工资大于2000的部门
hive> select deptno,avg(sal) avg_sal from emptable group by deptno hav
ing avg_sal>2000;

4、Join操作


  
  
   
   hive> 
   
   create table dept(deptno int, dname string, loc int)

  
  
  
  
   
         > row format

  
  
  
  
   
         > delimited fields

  
  
  
  
   
         > terminated by '\t';

员工表中只有部门编号，并没有部门名称
部门表中有部门标号和部门名称

等值join
1）查询员工编号、员工姓名、员工所在的部门名称
hive> select emptable.empno,emptable.ename,dept.dname from emptable join dept on emptable.deptno=dept.deptno;

2）查询员工编号、员工姓名、员工所在部门名称、部门所在地
内连接：只有连接的两张表中都存在与条件向匹配的数据才会被保留下来
hive> select e.empno,e.ename,d.dname,d.loc from emptable e join dept d on e.deptno=d.deptno;

3）左外连接(left join)
查询员工编号，员工姓名，部门名称
hive> select e.empno,e.ename,d.deptname from emptable e left join dept
d on e.deptno=d.deptno;
特点：默认用的Left join 可以省略left
保留左表数据，右表没有join上显示为null

4）右外连接(right join)
hive> select e.empno,e.ename,d.dname from emptable e right join dept d
on e.deptno=d.deptno;

特点：
保留右表数据，左表没有join上显示为null

5）满外连接(full join)
hive> select e.empno,e.ename,d.dname from emptable e full join dept d
on e.deptno=d.deptno;
特点：结果会返回所有表中符合条件的所有记录，如果有字段没有符合条件用null值代替

6）多表连接


  
  
   
   hive> create 
   
   table location(loc 
   
   int, loc_name string)

  
  
  
  
   
         > row format

  
  
  
  
   
         > delimited fields

  
  
  
  
   
         > terminated by 
   
   '\t';

加载数据
hive> load data local inpath ‘/root/location.txt’ into table location;

查询员工名、部门名称、地域名称
hive> select e.ename,d.dname,l.loc_name from emptable e join dept d on
e.deptno=d.deptno join location l on d.loc=l.loc;

1、分区表

2、DML数据操作

3、查询操作

4、Join操作

猜你喜欢