创建分区表详细笔记

前言

如果一个表中数据很多，我们查询时就很慢，耗费大量时间，如果要查询其中部分数据该怎么办呢，这时我们引入分区的概念。

分区

可以根据PARTITIONED BY创建分区表，一个表可以拥有一个或者多个分区，每个分区以文件夹的形式单独存在表文件夹的目录下。
分区是以字段的形式在表结构中存在，通过describe table命令可以查看到字段存在，但是该字段不存放实际的数据内容，仅仅是分区的表示。
分区建表分为2种，一种是单分区，也就是说在表文件夹目录下只有一级文件夹目录。另外一种是多分区，表文件夹下出现多文件夹嵌套模式。

分区表演示

创建表

hive> create table stu(
    > id int,name string,gender string,math int,english int)
    > row format delimited fields terminated by ',';
OK
Time taken: 0.341 seconds

查询表结构信息

hive> desc stu;
OK
id                  	int                 	                    
name                	string              	                    
gender              	string              	                    
math                	int                 	                    
english             	int                 	                    
Time taken: 0.27 seconds, Fetched: 5 row(s)

插入数据

hive> insert into stu values(1,"shangguan","N",87,78);
Query ID = root_20200416153003_41bf7694-8f28-4016-8b13-2baf8fc1d89a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1587021132741_0001, Tracking URL = http://hadoop01:8088/proxy/application_1587021132741_0001/
Kill Command = /opt/app/hadoop/bin/hadoop job  -kill job_1587021132741_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-04-16 15:30:13,909 Stage-1 map = 0%,  reduce = 0%
2020-04-16 15:30:21,291 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.5 sec
MapReduce Total cumulative CPU time: 1 seconds 500 msec
Ended Job = job_1587021132741_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop01:9000/user/hive/warehouse/student.db/stu/.hive-staging_hive_2020-04-16_15-30-03_216_5835972568713550258-1/-ext-10000
Loading data to table student.stu
Table student.stu stats: [numFiles=1, numRows=1, totalSize=20, rawDataSize=19]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.5 sec   HDFS Read: 4206 HDFS Write: 87 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 500 msec
OK
Time taken: 19.593 seconds
hive> insert into stu values(1,"guan","m",87,78);
Query ID = root_20200416153050_93f07263-68ce-4fcd-b5e1-b9dc7b9fdb01
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1587021132741_0002, Tracking URL = http://hadoop01:8088/proxy/application_1587021132741_0002/
Kill Command = /opt/app/hadoop/bin/hadoop job  -kill job_1587021132741_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-04-16 15:30:56,954 Stage-1 map = 0%,  reduce = 0%
2020-04-16 15:31:04,368 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.55 sec
MapReduce Total cumulative CPU time: 1 seconds 550 msec
Ended Job = job_1587021132741_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop01:9000/user/hive/warehouse/student.db/stu/.hive-staging_hive_2020-04-16_15-30-50_562_7187561534696972662-1/-ext-10000
Loading data to table student.stu
Table student.stu stats: [numFiles=2, numRows=2, totalSize=35, rawDataSize=33]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.55 sec   HDFS Read: 4296 HDFS Write: 82 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 550 msec
OK
Time taken: 15.121 seconds

查询表中信息

hive> select * from stu;
OK
1	shangguan	N	87	78
1	guan	m	87	78

创建分区表

hive> create table partition_table(
    > id int,name string)
    > partitioned by(gender string)
    > row format delimited fields terminated by ',';
OK
Time taken: 0.17 seconds

查看分区表信息

hive> desc partition_table
    > ;
OK
id                  	int                 	                    
name                	string              	                    
gender              	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
gender              	string              	                    
Time taken: 0.068 seconds, Fetched: 8 row(s)

插入对应分区表

hive> insert into table partition_table partition(gender='N')select id,name from stu where gender='N';
Query ID = root_20200416154126_0bbfae98-f01a-4147-afba-5ab27362f57a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1587021132741_0003, Tracking URL = http://hadoop01:8088/proxy/application_1587021132741_0003/
Kill Command = /opt/app/hadoop/bin/hadoop job  -kill job_1587021132741_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-04-16 15:41:32,346 Stage-1 map = 0%,  reduce = 0%
2020-04-16 15:41:39,666 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.6 sec
MapReduce Total cumulative CPU time: 1 seconds 600 msec
Ended Job = job_1587021132741_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop01:9000/user/hive/warehouse/student.db/partition_table/gender=N/.hive-staging_hive_2020-04-16_15-41-26_261_148002425019543036-1/-ext-10000
Loading data to table student.partition_table partition (gender=N)
Partition student.partition_table{gender=N} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.6 sec   HDFS Read: 4110 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 600 msec
OK
Time taken: 14.945 seconds
hive> insert into table partition_table partition(gender='m')select id,name from stu where gender='m';
Query ID = root_20200416154229_667496ac-4947-44b7-8785-16e2c25c37cc
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1587021132741_0004, Tracking URL = http://hadoop01:8088/proxy/application_1587021132741_0004/
Kill Command = /opt/app/hadoop/bin/hadoop job  -kill job_1587021132741_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-04-16 15:42:36,048 Stage-1 map = 0%,  reduce = 0%
2020-04-16 15:42:43,461 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.63 sec
MapReduce Total cumulative CPU time: 1 seconds 630 msec
Ended Job = job_1587021132741_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop01:9000/user/hive/warehouse/student.db/partition_table/gender=m/.hive-staging_hive_2020-04-16_15-42-29_979_9199116166636330816-1/-ext-10000
Loading data to table student.partition_table partition (gender=m)
Partition student.partition_table{gender=m} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.63 sec   HDFS Read: 4190 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 630 msec
OK
Time taken: 14.795 seconds

查看hdfs

在这里插入图片描述

[root@hadoop01 ~]# hdfs dfs -ls /user/hive/warehouse/student.db/stu
Found 2 items
-rwxrwxr-x   2 root supergroup         20 2020-04-16 15:30 /user/hive/warehouse/student.db/stu/000000_0
-rwxrwxr-x   2 root supergroup         15 2020-04-16 15:31 /user/hive/warehouse/student.db/stu/000000_0_copy_1
[root@hadoop01 ~]# hdfs dfs -text /user/hive/warehouse/student.db/stu/000000_0
1,shangguan,N,87,78
[root@hadoop01 ~]# hdfs dfs -text /user/hive/warehouse/student.db/stu/000000_0_copy_1
1,guan,m,87,78
[root@hadoop01 ~]#

我是泛滥

发布了62 篇原创文章 · 获赞 32 · 访问量 2659

私信关注

创建分区表详细笔记

目录

前言

分区

分区表演示

创建表

查询表结构信息

插入数据

查询表中信息

创建分区表

查看分区表信息

插入对应分区表

查看hdfs

猜你喜欢