Hive中数据的导入与导出的几种方式

一、数据导入（5种方式）

向表中装载数据（load在HDFS上表现为剪切）
hive> load data [local] inpath ‘/opt/module/datas/student.txt’ [overwrite] into table student [partition (partcol1=val1,…)];
（1）load data:表示加载数据
（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表
（3）inpath:表示加载数据的路径
（4）overwrite:表示覆盖表中已有数据，否则表示追加
（5）into table:表示加载到哪张表
（6）student:表示具体的表
（7）partition:表示上传到指定分区
通过查询语句向表中插入数据
create table stu（id int ,name string，……）partitioned by (month string);
insert into table stu select * from stu2；（这时从表2查询的字段数必须和表1的字段数相同）
insert into【overwrite】 table stu values（id1，name1，……），（id2，name2，……）；
多表多分区插入模式：
insert overwrite table stu partition(month) select id,name from stu1 where month=‘201709’
insert overwrite table stu partition(month) select id,name from stu2 where month=‘201709’;
查询语句中创建表并加载数据（as select）
create table if not exists stu1 as select * from stu;
创建表时通过Location指定加载HDFS数据路径
create external table stu(id int,name string) location ‘/student’;
import数据到指定Hive表中（用import之前要先用export将数据导出,数据导入后，HDFS上的数据仍在）
import table stu partition(month=‘201709’) from ‘/user/hive/warehouse/export/stu’;

二、数据导出（5种方式）

export 导出到HDFS上（主要用于两个集群之间的hive表迁移，export不仅导出数据，而且导出表结构）
export table default.stu to ‘/user/hive/warehouse/export/stu’;
insert导出
将查询结果导出到本地文件：insert overwrite local directory ‘/opt/module/datas/stu.txt’select * from stu;
将查询结果格式化导出到本地：insert overwrite local directory ‘/opt/module/datas/stu.txt’row format delimited fields terminated by ‘/t’ select * from stu;
将查询结果格式化导出到HDFS上（没有local）：insert overwrite directory ‘/user/hive/warehouse/stu’row format delimited fields terminated by ‘/t’ select * from stu;
hadoop命令导出到本地（下载）
dfs -get /user/hive/warehouse/stu/000000_0（HDFS文件路径） /opt/module/datas/stu.txt（本地路径）;
hive shell 命令导出
bin/hive -e ‘select * from stu ;’ > /opt/module/datas/stu.txt;
sqoop导出

三、清楚表中数据（管理表）
truncate table 表名；

Hive中数据的导入与导出的几种方式

猜你喜欢