【Hive】Hive分区表

Hive 分区表

一、What？

分区表实际上就是对应一个 HDFS 文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive 中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过 WHERE 子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。

二、Why?

Hive中存有海量数据，若不进行分区，则和orderby原理相同，只能由一个人去完成某个指令，其并行度不够，运行速度较慢，因此需要分区，将表中数据分开存放，放到多个目录中，当查询等操作时，可以避免全表扫描。

三、How?

1.创建分区表
create table dept_partition( deptno int, dname string)
partitioned by (day string)
row format delimited fields terminated by ‘\t’;

2.加载数据
load data local inpath ‘/opt/module/hive/datas/dept_20200401.log’ into table dept_partition partition(day=‘20200401’);

3.查询分区表中数据

单分区询

select * from dept_partition where day=‘20200401’;

多分区联合查询

select * from dept_partition where day=‘20200401’
union
select * from dept_partition where day=‘20200402’
union
select * from dept_partition where day=‘20200403’;

4.增加分区
alter table dept_partition add partition(day=‘20200405’) partition(day=‘20200406’);

5.创建二级分区表
create table dept_partition2( deptno int, dname string, loc string)partitioned by (day string, hour string)

6.动态分区调整
set hive.exec.dynamic.partition.mode = nonstrict;
insert into table dept_partition_dy partition(loc) select deptno, dname, loc from dept;