Hive中DML数据操作

一、数据导入

1.1 向表中装载数据

1、语法

hive> load data [local] inpath '/opt/module/datas/student.txt' [overwrite] | into table student 
[partition (partcol1=val1,…)]; 

(1) load data:表示加载数据
(2) local:表示从本地加载数据到 hive 表,否则从 HDFS 加载数据到 hive 表
(3) inpath:表示加载数据的路径
(4) overwrite:表示覆盖表中已有数据,否则表示追加
(5) into table:表示加载到哪张表
(6) student:表示具体的表
(7) partition:表示上传到指定分区

2、案例实操
(1) 创建一张表

hive (db_hive1)>  create table student(id string, name string) row format delimited fields terminated by '\t'; 
OK
Time taken: 0.215 seconds

(2) 加载本地文件到 hive

hive (db_hive1)> load data local inpath '/opt/module/datas/student.txt' into table db_hive1.student;
Loading data to table db_hive1.student
Table db_hive1.student stats: [numFiles=1, totalSize=24]
OK
Time taken: 0.31 seconds
hive (db_hive1)> select * from student;
OK
student.id	student.name
1	a
2	b
3	c
4	d
5	e
6	f
Time taken: 0.083 seconds, Fetched: 6 row(s)

(3) 加载 HDFS 文件到 hive 中
A、上传文件到 HDFS

hive (db_hive1)> dfs -put /opt/module/datas/student.txt /user/hive/student.txt;

B、加载 HDFS 上数据

hive (db_hive1)> load data inpath '/user/hive/student.txt' into table db_hive1.student; 
Loading data to table db_hive1.student
Table db_hive1.student stats: [numFiles=2, totalSize=48]
OK
Time taken: 0.379 seconds
hive (db_hive1)> select * from student;
OK
student.id	student.name
1	a
2	b
3	c
4	d
5	e
6	f
1	a
2	b
3	c
4	d
5	e
6	f
Time taken: 0.074 seconds, Fetched: 12 row(s)

(4) 加载数据覆盖表中已有数据
A、上传文件到 HDFS

hive (db_hive1)> dfs -put /opt/module/datas/student.txt /user/hive/student1.txt;

B、加载数据覆盖表中已有的数据

hive (db_hive1)> load data inpath '/user/hive/student1.txt' overwrite into table db_hive1.student;
Loading data to table db_hive1.student
Moved: 'hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student/student.txt' to trash at: hdfs://hadoop151:9000/user/test/.Trash/Current
Moved: 'hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student/student_copy_1.txt' to trash at: hdfs://hadoop151:9000/user/test/.Trash/Current
Table db_hive1.student stats: [numFiles=1, numRows=0, totalSize=24, rawDataSize=0]
OK
Time taken: 0.36 seconds
hive (db_hive1)> select * from student;
OK
student.id	student.name
1	a
2	b
3	c
4	d
5	e
6	f
Time taken: 0.068 seconds, Fetched: 6 row(s)

1.2 通过查询语句向表中插入数据(Insert)

1、创建一张分区表

hive (db_hive1)> create table student1(id int, name string) partitioned by (month string) row format delimited fields terminated by '\t';
OK
Time taken: 0.918 seconds

2、基本插入数据

hive (db_hive1)> insert into table  student1 partition(month='201709') values(1,'wangwu'); 
Query ID = test_20200217212020_42f330b2-1238-4e8a-bb82-6853bb21feda
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0001, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0001/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1581943453985_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 21:20:40,968 Stage-1 map = 0%,  reduce = 0%
2020-02-17 21:20:56,845 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 9.43 sec
MapReduce Total cumulative CPU time: 9 seconds 430 msec
Ended Job = job_1581943453985_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student1/month=201709/.hive-staging_hive_2020-02-17_21-20-20_199_2401424390848584264-1/-ext-10000
Loading data to table db_hive1.student1 partition (month=201709)
Partition db_hive1.student1{month=201709} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 9.43 sec   HDFS Read: 3691 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 430 msec
OK
_col0	_col1
Time taken: 39.719 seconds

3、基本模式插入(根据单张表查询结果)

hive (db_hive1)> insert overwrite table student1 partition(month="201708") 
               > select id, name from student1 where month = "201709";
Query ID = test_20200217212301_1019c676-0765-4558-b5c5-03655db43698
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0002, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0002/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1581943453985_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 21:23:09,418 Stage-1 map = 0%,  reduce = 0%
2020-02-17 21:23:14,725 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.7 sec
MapReduce Total cumulative CPU time: 1 seconds 700 msec
Ended Job = job_1581943453985_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student1/month=201708/.hive-staging_hive_2020-02-17_21-23-01_487_919923111020437954-1/-ext-10000
Loading data to table db_hive1.student1 partition (month=201708)
Partition db_hive1.student1{month=201708} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.7 sec   HDFS Read: 3695 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 700 msec
OK
id	name
Time taken: 14.787 seconds
hive (db_hive1)> select * from student1;
OK
student1.id	student1.name	student1.month
1	wangwu	201708
1	wangwu	201709
Time taken: 0.253 seconds, Fetched: 2 row(s)

4、多插入模式(根据多张表查询结果)

hive (db_hive1)> from student1               insert overwrite table student1 partition(month='201707')               select id, name where month='201709'               insert overwrite table student1 partition(month='201706')               select id, name where month='201709';
Query ID = test_20200217212637_88c895ad-ae2b-4699-820b-b6cbbc69a94a
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0003, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0003/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1581943453985_0003
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2020-02-17 21:26:45,621 Stage-2 map = 0%,  reduce = 0%
2020-02-17 21:26:51,960 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.99 sec
MapReduce Total cumulative CPU time: 1 seconds 990 msec
Ended Job = job_1581943453985_0003
Stage-5 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-10 is filtered out by condition resolver.
Stage-12 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student1/month=201707/.hive-staging_hive_2020-02-17_21-26-37_364_8617726182849089105-1/-ext-10000
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student1/month=201706/.hive-staging_hive_2020-02-17_21-26-37_364_8617726182849089105-1/-ext-10002
Loading data to table db_hive1.student1 partition (month=201707)
Loading data to table db_hive1.student1 partition (month=201706)
Partition db_hive1.student1{month=201707} stats: [numFiles=1, numRows=0, totalSize=9, rawDataSize=0]
Partition db_hive1.student1{month=201706} stats: [numFiles=1, numRows=0, totalSize=9, rawDataSize=0]
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   Cumulative CPU: 1.99 sec   HDFS Read: 4630 HDFS Write: 190 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 990 msec
OK
id	name
Time taken: 17.433 seconds
hive (db_hive1)> select * from student1;
OK
student1.id	student1.name	student1.month
1	wangwu	201706
1	wangwu	201707
1	wangwu	201708
1	wangwu	201709
Time taken: 0.132 seconds, Fetched: 4 row(s)

1.3 查询语句中创建表并加载数据(as select)

create table if not exists student3 as select id, name from student; 

1.4 创建表时通过 location 指定加载数据路径

1、创建表,并指定在 hdfs 上的位置

hive (db_hive1)> create table if not exists student5(               id int, name string               )               row format delimited fields terminated by '\t'               location '/user/hive/warehouse/student5'; 
OK
Time taken: 0.209 seconds

2、上传数据到 hdfs 上

hive (db_hive1)>  dfs -put /opt/module/datas/student.txt /user/hive/warehouse/student5;

3、查询数据

hive (db_hive1)> select * from student5;
OK
student5.id	student5.name
1	a
2	b
3	c
4	d
5	e
6	f
Time taken: 0.118 seconds, Fetched: 6 row(s)

1.5 import 数据到指定 hive 表中

注意: 先用 export 导出后,再将数据导入。

hive (default)> import table student2 partition(month='201709') from  '/user/hive/warehouse/export/student'; 

二、数据导出

2.1 insert 导出

1、将查询的结果导出到本地
导出命令:

hive (db_hive1)> insert overwrite local directory '/opt/module/datas/export/student'             select * from student; 
Query ID = test_20200217215823_5077496a-bb6b-4e88-93a6-c66ab9a30fc0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0005, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0005/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1581943453985_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 21:58:31,158 Stage-1 map = 0%,  reduce = 0%
2020-02-17 21:58:39,575 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.44 sec
MapReduce Total cumulative CPU time: 2 seconds 440 msec
Ended Job = job_1581943453985_0005
Copying data to local directory /opt/module/datas/export/student
Copying data to local directory /opt/module/datas/export/student
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.44 sec   HDFS Read: 2952 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 440 msec
OK
student.id	student.name
Time taken: 17.525 seconds

在本地查看:

[test@hadoop151 datas]$ cd export/
[test@hadoop151 export]$ ll
总用量 4
drwxrwxr-x 3 test test 4096 2月  17 21:58 student
[test@hadoop151 export]$ cd student/
[test@hadoop151 student]$ ll
总用量 4
-rw-r--r-- 1 test test 24 2月  17 21:58 000000_0
[test@hadoop151 student]$ cat 000000_0 
1a
2b
3c
4d
5e
6f

2、将查询的结果格式化导出到本地
导出命令:

hive (db_hive1)> insert overwrite local directory '/opt/module/datas/export/student1'            ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'             select * from student;
Query ID = test_20200217220026_ceafc862-72e6-4f36-9f83-edacec2ff848
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0006, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0006/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1581943453985_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 22:00:35,266 Stage-1 map = 0%,  reduce = 0%
2020-02-17 22:00:40,449 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.5 sec
MapReduce Total cumulative CPU time: 1 seconds 500 msec
Ended Job = job_1581943453985_0006
Copying data to local directory /opt/module/datas/export/student1
Copying data to local directory /opt/module/datas/export/student1
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.5 sec   HDFS Read: 3043 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 500 msec
OK
student.id	student.name
Time taken: 15.389 seconds

在本地查看:

[test@hadoop151 datas]$ cd export/
[test@hadoop151 export]$ ll
总用量 8
drwxrwxr-x 3 test test 4096 2月  17 21:58 student
drwxrwxr-x 3 test test 4096 2月  17 22:00 student1
[test@hadoop151 export]$ cd student1
[test@hadoop151 student1]$ ll
总用量 4
-rw-r--r-- 1 test test 24 2月  17 22:00 000000_0
[test@hadoop151 student1]$ cat 000000_0 
1	a
2	b
3	c
4	d
5	e
6	f

3、将查询的结果导出到 HDFS 上
导出命令:

hive (db_hive1)> insert overwrite directory '/user/test/student2'              ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'               select * from student; 
Query ID = test_20200217220815_aacc063e-eb02-4d28-bd53-362b55dabbe4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0007, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0007/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job  -kill job_1581943453985_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 22:08:54,020 Stage-1 map = 0%,  reduce = 0%
2020-02-17 22:09:26,584 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.66 sec
MapReduce Total cumulative CPU time: 2 seconds 660 msec
Ended Job = job_1581943453985_0007
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/test/student2/.hive-staging_hive_2020-02-17_22-08-15_836_8246443228787537002-1/-ext-10000
Moving data to: /user/test/student2
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.66 sec   HDFS Read: 3009 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 660 msec
OK
student.id	student.name
Time taken: 75.209 seconds

在hdfs上查看:

hive (db_hive1)> dfs -cat /user/test/student2/000000_0
               > ;
1	a
2	b
3	c
4	d
5	e
6	f

2.2 Hadoop 命令导出到本地

导出命令:

hive (db_hive1)> dfs -get /user/test/student2/000000_0 /opt/module/datas/export/student3.txt;

在本地查看结果:

[test@hadoop151 datas]$ cd export/
[test@hadoop151 export]$ ll
总用量 12
drwxrwxr-x 3 test test 4096 2月  17 21:58 student
drwxrwxr-x 3 test test 4096 2月  17 22:00 student1
-rw-r--r-- 1 test test   24 2月  17 22:13 student3.txt
[test@hadoop151 export]$ cat student3.txt 
1	a
2	b
3	c
4	d
5	e
6	f

2.3 hive shell 命令导出

1、基本语法
hive -f/-e 执行语句或者脚本 > file

2、案例实操

[test@hadoop151 export]$ hive -e 'select * from db_hive1.student;' >  /opt/module/datas/export/student5.txt;

Logging initialized using configuration in file:/opt/module/hive/conf/hive-log4j.properties
OK
Time taken: 1.822 seconds, Fetched: 6 row(s)
[test@hadoop151 export]$ cat /opt/module/datas/export/student5.txt 
student.id	student.name
1	a
2	b
3	c
4	d
5	e
6	f

2.4 export 导出到 hdfs 上

hive (db_hive1)>  export table student to  '/user/hive/warehouse/export/student'; 
Copying data from file:/tmp/test/f5c14a08-4c33-4088-8279-2b3780c7ec25/hive_2020-02-17_22-23-18_503_4846105517648274309-1/-local-10000/_metadata
Copying file: file:/tmp/test/f5c14a08-4c33-4088-8279-2b3780c7ec25/hive_2020-02-17_22-23-18_503_4846105517648274309-1/-local-10000/_metadata
Copying data from hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student
Copying file: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student/student1.txt
OK
Time taken: 1.041 seconds
hive (db_hive1)> dfs -ls /user/hive/warehouse/export/student
               > ;
Found 2 items
-rwxrwxr-x   3 test supergroup       1269 2020-02-17 22:23 /user/hive/warehouse/export/student/_metadata
drwxrwxr-x   - test supergroup          0 2020-02-17 22:23 /user/hive/warehouse/export/student/data

2.5 sqoop导出

这里不专门讲解

三、清除表中数据(Truncate)

注意:Truncate 只能删除管理表,不能删除外部表中数据

hive (default)> truncate table student;
发布了85 篇原创文章 · 获赞 72 · 访问量 4万+

猜你喜欢

转载自blog.csdn.net/a1786742005/article/details/104363679