一、数据导入
1.1 向表中装载数据
1、语法
hive> load data [local] inpath '/opt/module/datas/student.txt' [overwrite] | into table student
[partition (partcol1=val1,…)];
(1) load data:表示加载数据
(2) local:表示从本地加载数据到 hive 表,否则从 HDFS 加载数据到 hive 表
(3) inpath:表示加载数据的路径
(4) overwrite:表示覆盖表中已有数据,否则表示追加
(5) into table:表示加载到哪张表
(6) student:表示具体的表
(7) partition:表示上传到指定分区
2、案例实操
(1) 创建一张表
hive (db_hive1)> create table student(id string, name string) row format delimited fields terminated by '\t';
OK
Time taken: 0.215 seconds
(2) 加载本地文件到 hive
hive (db_hive1)> load data local inpath '/opt/module/datas/student.txt' into table db_hive1.student;
Loading data to table db_hive1.student
Table db_hive1.student stats: [numFiles=1, totalSize=24]
OK
Time taken: 0.31 seconds
hive (db_hive1)> select * from student;
OK
student.id student.name
1 a
2 b
3 c
4 d
5 e
6 f
Time taken: 0.083 seconds, Fetched: 6 row(s)
(3) 加载 HDFS 文件到 hive 中
A、上传文件到 HDFS
hive (db_hive1)> dfs -put /opt/module/datas/student.txt /user/hive/student.txt;
B、加载 HDFS 上数据
hive (db_hive1)> load data inpath '/user/hive/student.txt' into table db_hive1.student;
Loading data to table db_hive1.student
Table db_hive1.student stats: [numFiles=2, totalSize=48]
OK
Time taken: 0.379 seconds
hive (db_hive1)> select * from student;
OK
student.id student.name
1 a
2 b
3 c
4 d
5 e
6 f
1 a
2 b
3 c
4 d
5 e
6 f
Time taken: 0.074 seconds, Fetched: 12 row(s)
(4) 加载数据覆盖表中已有数据
A、上传文件到 HDFS
hive (db_hive1)> dfs -put /opt/module/datas/student.txt /user/hive/student1.txt;
B、加载数据覆盖表中已有的数据
hive (db_hive1)> load data inpath '/user/hive/student1.txt' overwrite into table db_hive1.student;
Loading data to table db_hive1.student
Moved: 'hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student/student.txt' to trash at: hdfs://hadoop151:9000/user/test/.Trash/Current
Moved: 'hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student/student_copy_1.txt' to trash at: hdfs://hadoop151:9000/user/test/.Trash/Current
Table db_hive1.student stats: [numFiles=1, numRows=0, totalSize=24, rawDataSize=0]
OK
Time taken: 0.36 seconds
hive (db_hive1)> select * from student;
OK
student.id student.name
1 a
2 b
3 c
4 d
5 e
6 f
Time taken: 0.068 seconds, Fetched: 6 row(s)
1.2 通过查询语句向表中插入数据(Insert)
1、创建一张分区表
hive (db_hive1)> create table student1(id int, name string) partitioned by (month string) row format delimited fields terminated by '\t';
OK
Time taken: 0.918 seconds
2、基本插入数据
hive (db_hive1)> insert into table student1 partition(month='201709') values(1,'wangwu');
Query ID = test_20200217212020_42f330b2-1238-4e8a-bb82-6853bb21feda
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0001, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0001/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1581943453985_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 21:20:40,968 Stage-1 map = 0%, reduce = 0%
2020-02-17 21:20:56,845 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.43 sec
MapReduce Total cumulative CPU time: 9 seconds 430 msec
Ended Job = job_1581943453985_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student1/month=201709/.hive-staging_hive_2020-02-17_21-20-20_199_2401424390848584264-1/-ext-10000
Loading data to table db_hive1.student1 partition (month=201709)
Partition db_hive1.student1{month=201709} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 9.43 sec HDFS Read: 3691 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 430 msec
OK
_col0 _col1
Time taken: 39.719 seconds
3、基本模式插入(根据单张表查询结果)
hive (db_hive1)> insert overwrite table student1 partition(month="201708")
> select id, name from student1 where month = "201709";
Query ID = test_20200217212301_1019c676-0765-4558-b5c5-03655db43698
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0002, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0002/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1581943453985_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 21:23:09,418 Stage-1 map = 0%, reduce = 0%
2020-02-17 21:23:14,725 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.7 sec
MapReduce Total cumulative CPU time: 1 seconds 700 msec
Ended Job = job_1581943453985_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student1/month=201708/.hive-staging_hive_2020-02-17_21-23-01_487_919923111020437954-1/-ext-10000
Loading data to table db_hive1.student1 partition (month=201708)
Partition db_hive1.student1{month=201708} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.7 sec HDFS Read: 3695 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 700 msec
OK
id name
Time taken: 14.787 seconds
hive (db_hive1)> select * from student1;
OK
student1.id student1.name student1.month
1 wangwu 201708
1 wangwu 201709
Time taken: 0.253 seconds, Fetched: 2 row(s)
4、多插入模式(根据多张表查询结果)
hive (db_hive1)> from student1 insert overwrite table student1 partition(month='201707') select id, name where month='201709' insert overwrite table student1 partition(month='201706') select id, name where month='201709';
Query ID = test_20200217212637_88c895ad-ae2b-4699-820b-b6cbbc69a94a
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0003, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0003/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1581943453985_0003
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2020-02-17 21:26:45,621 Stage-2 map = 0%, reduce = 0%
2020-02-17 21:26:51,960 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.99 sec
MapReduce Total cumulative CPU time: 1 seconds 990 msec
Ended Job = job_1581943453985_0003
Stage-5 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-10 is filtered out by condition resolver.
Stage-12 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student1/month=201707/.hive-staging_hive_2020-02-17_21-26-37_364_8617726182849089105-1/-ext-10000
Moving data to: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student1/month=201706/.hive-staging_hive_2020-02-17_21-26-37_364_8617726182849089105-1/-ext-10002
Loading data to table db_hive1.student1 partition (month=201707)
Loading data to table db_hive1.student1 partition (month=201706)
Partition db_hive1.student1{month=201707} stats: [numFiles=1, numRows=0, totalSize=9, rawDataSize=0]
Partition db_hive1.student1{month=201706} stats: [numFiles=1, numRows=0, totalSize=9, rawDataSize=0]
MapReduce Jobs Launched:
Stage-Stage-2: Map: 1 Cumulative CPU: 1.99 sec HDFS Read: 4630 HDFS Write: 190 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 990 msec
OK
id name
Time taken: 17.433 seconds
hive (db_hive1)> select * from student1;
OK
student1.id student1.name student1.month
1 wangwu 201706
1 wangwu 201707
1 wangwu 201708
1 wangwu 201709
Time taken: 0.132 seconds, Fetched: 4 row(s)
1.3 查询语句中创建表并加载数据(as select)
create table if not exists student3 as select id, name from student;
1.4 创建表时通过 location 指定加载数据路径
1、创建表,并指定在 hdfs 上的位置
hive (db_hive1)> create table if not exists student5( id int, name string ) row format delimited fields terminated by '\t' location '/user/hive/warehouse/student5';
OK
Time taken: 0.209 seconds
2、上传数据到 hdfs 上
hive (db_hive1)> dfs -put /opt/module/datas/student.txt /user/hive/warehouse/student5;
3、查询数据
hive (db_hive1)> select * from student5;
OK
student5.id student5.name
1 a
2 b
3 c
4 d
5 e
6 f
Time taken: 0.118 seconds, Fetched: 6 row(s)
1.5 import 数据到指定 hive 表中
注意: 先用 export 导出后,再将数据导入。
hive (default)> import table student2 partition(month='201709') from '/user/hive/warehouse/export/student';
二、数据导出
2.1 insert 导出
1、将查询的结果导出到本地
导出命令:
hive (db_hive1)> insert overwrite local directory '/opt/module/datas/export/student' select * from student;
Query ID = test_20200217215823_5077496a-bb6b-4e88-93a6-c66ab9a30fc0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0005, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0005/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1581943453985_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 21:58:31,158 Stage-1 map = 0%, reduce = 0%
2020-02-17 21:58:39,575 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.44 sec
MapReduce Total cumulative CPU time: 2 seconds 440 msec
Ended Job = job_1581943453985_0005
Copying data to local directory /opt/module/datas/export/student
Copying data to local directory /opt/module/datas/export/student
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.44 sec HDFS Read: 2952 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 440 msec
OK
student.id student.name
Time taken: 17.525 seconds
在本地查看:
[test@hadoop151 datas]$ cd export/
[test@hadoop151 export]$ ll
总用量 4
drwxrwxr-x 3 test test 4096 2月 17 21:58 student
[test@hadoop151 export]$ cd student/
[test@hadoop151 student]$ ll
总用量 4
-rw-r--r-- 1 test test 24 2月 17 21:58 000000_0
[test@hadoop151 student]$ cat 000000_0
1a
2b
3c
4d
5e
6f
2、将查询的结果格式化导出到本地
导出命令:
hive (db_hive1)> insert overwrite local directory '/opt/module/datas/export/student1' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student;
Query ID = test_20200217220026_ceafc862-72e6-4f36-9f83-edacec2ff848
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0006, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0006/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1581943453985_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 22:00:35,266 Stage-1 map = 0%, reduce = 0%
2020-02-17 22:00:40,449 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec
MapReduce Total cumulative CPU time: 1 seconds 500 msec
Ended Job = job_1581943453985_0006
Copying data to local directory /opt/module/datas/export/student1
Copying data to local directory /opt/module/datas/export/student1
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.5 sec HDFS Read: 3043 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 500 msec
OK
student.id student.name
Time taken: 15.389 seconds
在本地查看:
[test@hadoop151 datas]$ cd export/
[test@hadoop151 export]$ ll
总用量 8
drwxrwxr-x 3 test test 4096 2月 17 21:58 student
drwxrwxr-x 3 test test 4096 2月 17 22:00 student1
[test@hadoop151 export]$ cd student1
[test@hadoop151 student1]$ ll
总用量 4
-rw-r--r-- 1 test test 24 2月 17 22:00 000000_0
[test@hadoop151 student1]$ cat 000000_0
1 a
2 b
3 c
4 d
5 e
6 f
3、将查询的结果导出到 HDFS 上
导出命令:
hive (db_hive1)> insert overwrite directory '/user/test/student2' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student;
Query ID = test_20200217220815_aacc063e-eb02-4d28-bd53-362b55dabbe4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1581943453985_0007, Tracking URL = http://hadoop152:8088/proxy/application_1581943453985_0007/
Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1581943453985_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-02-17 22:08:54,020 Stage-1 map = 0%, reduce = 0%
2020-02-17 22:09:26,584 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.66 sec
MapReduce Total cumulative CPU time: 2 seconds 660 msec
Ended Job = job_1581943453985_0007
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://hadoop151:9000/user/test/student2/.hive-staging_hive_2020-02-17_22-08-15_836_8246443228787537002-1/-ext-10000
Moving data to: /user/test/student2
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.66 sec HDFS Read: 3009 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 660 msec
OK
student.id student.name
Time taken: 75.209 seconds
在hdfs上查看:
hive (db_hive1)> dfs -cat /user/test/student2/000000_0
> ;
1 a
2 b
3 c
4 d
5 e
6 f
2.2 Hadoop 命令导出到本地
导出命令:
hive (db_hive1)> dfs -get /user/test/student2/000000_0 /opt/module/datas/export/student3.txt;
在本地查看结果:
[test@hadoop151 datas]$ cd export/
[test@hadoop151 export]$ ll
总用量 12
drwxrwxr-x 3 test test 4096 2月 17 21:58 student
drwxrwxr-x 3 test test 4096 2月 17 22:00 student1
-rw-r--r-- 1 test test 24 2月 17 22:13 student3.txt
[test@hadoop151 export]$ cat student3.txt
1 a
2 b
3 c
4 d
5 e
6 f
2.3 hive shell 命令导出
1、基本语法
hive -f/-e 执行语句或者脚本 > file
2、案例实操
[test@hadoop151 export]$ hive -e 'select * from db_hive1.student;' > /opt/module/datas/export/student5.txt;
Logging initialized using configuration in file:/opt/module/hive/conf/hive-log4j.properties
OK
Time taken: 1.822 seconds, Fetched: 6 row(s)
[test@hadoop151 export]$ cat /opt/module/datas/export/student5.txt
student.id student.name
1 a
2 b
3 c
4 d
5 e
6 f
2.4 export 导出到 hdfs 上
hive (db_hive1)> export table student to '/user/hive/warehouse/export/student';
Copying data from file:/tmp/test/f5c14a08-4c33-4088-8279-2b3780c7ec25/hive_2020-02-17_22-23-18_503_4846105517648274309-1/-local-10000/_metadata
Copying file: file:/tmp/test/f5c14a08-4c33-4088-8279-2b3780c7ec25/hive_2020-02-17_22-23-18_503_4846105517648274309-1/-local-10000/_metadata
Copying data from hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student
Copying file: hdfs://hadoop151:9000/user/hive/warehouse/db_hive1.db/student/student1.txt
OK
Time taken: 1.041 seconds
hive (db_hive1)> dfs -ls /user/hive/warehouse/export/student
> ;
Found 2 items
-rwxrwxr-x 3 test supergroup 1269 2020-02-17 22:23 /user/hive/warehouse/export/student/_metadata
drwxrwxr-x - test supergroup 0 2020-02-17 22:23 /user/hive/warehouse/export/student/data
2.5 sqoop导出
这里不专门讲解
三、清除表中数据(Truncate)
注意:Truncate 只能删除管理表,不能删除外部表中数据
hive (default)> truncate table student;