大数据（二十）：hive分区表、修改表语句与数据的导入导出

一、分区表

分区表实际上就是对应一个HDFS文件系统上的一个独立的文件夹，该文件夹下是该分区所有的数据文件，hive中的分区就是分目录，把一个大的数据集更具业务需求分割成小的数据集。在查询时通过where子句中的表达式选择查询所需要的指定分区，这样查询效率会提高很多。

1.创建分区表

create table dept_partition(
deptno int,
dname string,
loc string
)
partitioned by (month string)
row format delimited fields terminated by '\t';

2.导入数据

load data local inpath '/opt/datas/dept.txt' into table dept_partition partition (month ='201809')

3.选择分区查询

select * from dept_partition where month='201809';

4.多分区联合查询

select * from dept_partition where month='201809'
union
select * from dept_partition where month='201808';

5.新增分区

alter table dept_partition add partition(month='201810');

6同时增加多个分区（两个之间只有空格，没有任何连接符）

alter table dept_partition add partition(month='201811') partition(month='201812');

7.删除分区

alter table dept_partition drop partition(month='201812');

8.同时删除多个分区（中间有逗号分隔，与增加不同）

alter table dept partition drop partition(month='201811'),partition(month='201810');

9.查看有多少分区

show partition dept_partition;

10.查看分区表结构

desc formatted dept_partition;

11.创建二级分区

create table dept_partition2(
deptno int,
dname string,
loc string
)
partitioned by(month string,day string)
row format delimited fields terminated by '\t';

12.二级分区导入数据

load data local inpath '/opt/datas/dept.txt' into table dept_partition partition (month ='201809',day='123');

二、分区数据关联的三种方式

1.正常的加载数据

load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201809',day='123');

2.上传数据后修复

创建目录和上传数据（hive客户端）

dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201810/day=12;
dfs -put /user/hive/warehouse/dept_partition2/month=201810/day=12;

执行修复命令

msck repair table dept_partition2;

3.上传数据后添加分区

dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201810/day=12;
dfs -put /user/hive/warehouse/dept_partition2/month=201810/day=12;

添加分区

alter table dept_partition2 add partition(month='201810',day='12');

三、修改表

1.修改表名

ALTER TABLE table_name RENAME TO new_table_name

2.更新列

ALTER TABLE table_name CHANGE [COLUMN] col_old_name column_type [COMMENT col_comment][FIRST|AFTER colummn_name]

3.增加和替换列

ALTER TABLE table_name ADD|REPLACE COLUMNS(col_name data_type[COMMENT col_comment],...)

注：ADD是代表增加一字段，字段位置在所有列后面（partition列前），REPLACE则是表示替换表中所有字段。

四、数据导入

1.向表中装载数据

load data [local] inpath '/opt/datas/student.txt' [overwrite] into table student [partition(partcol1=val1,...)];

load data:表示加载数据
local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive
inpath：表示加载数据的路径
overwrite：表示覆盖表中已有数据，否则表示追加
into table：表示加载到哪张表
student：具体的表名
partition：表示上传的指定分区

2.通过查询语句插入数据

基本插入数据

insert into table student partition(month='201809') values('1001','wangwu');

基本模式插入（根据单张表查询结果插入）

insert overwrite table student partition(month='201809')select id,name from student where month='201808';

3.Import导入数据到指定Hive表中（必须是用export导出的数据）

import table student2 partition (month='201809') from '/opt/datas/export/student'

五、数据导出

1.将查询结果导出到本地

insert overwrite local directory '/opt/export/student' select * from student;

2.将查询的结果格式化导出到本地

insert overwrite local directory '/opt/export/student1' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n' select * from student;

3.将查询的结果导出到HDFS上（去掉local就行了）

insert overwrite directory '/opt/export/student1' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n' select * from student;

4.通过hadoop命令导出到本地

dfs -get /suer/hive/warehouse/student/month=201809/000000_0 /opt/datas/export/student.txt

5.Export导出到HDFS上

export table default student to '/opt/datas/export/student';

六、清除表中数据

truncate table student;

注意：只能删除管理表中的数据，无法删除外部表中的数据

大数据（二十）：hive分区表、修改表语句与数据的导入导出

猜你喜欢