Article Directory
- 1. Concept
- 2. Create a partition table
- 3. Load data to the partition table
- 4. Partition data query
- 5. Add partition
- 6. Delete the partition
- 7. Check how many partitions are in the partition table
- 8. Secondary partition table
- 9. Upload the data directly to the partition directory, three ways to associate the partition table and the data
1. Concept
Hive's partition table actually corresponds to an independent folder on the HDFS file system, and all data files of the partition are under this folder. The partition in Hive is to divide the directory , and divide a large data set into small data sets according to business needs. During the query, select the specified partition required by the query through the expression in the WHERE clause, such query efficiency will be greatly improved.
2. Create a partition table
create table dept_partition(
id int, name string
)
partitioned by (month string)
row format delimited fields terminated by '\t'
stored as textfile;
3. Load data to the partition table
There is a srcdata.txt under / home / hive:
load data:
load data local inpath '/home/hive/srcdata.txt' into table default.dept_partition partition(month='202001');
4. Partition data query
- Single partition query:
select * from dept_partition where month = '202001';
- Multi-partition joint query:
select * from dept_partition where month = '202001'
union
select * from dept_partition where month = '202002';
5. Add partition
- Create a single partition:
alter table dept_partition add partition(month='202008');
- Create multiple partitions:
alter table dept_partition add partition(month='202005') partition(month='202006');
6. Delete the partition
- Delete a single partition
alter table dept_partition drop partition (month='202008');
- Delete multiple partitions
alter table dept_partition drop partition(month='202005'),partition(month='202006');
Note: When
deleting partitions, you need to add commas to multiple partitions, but not to add multiple partitions.
7. Check how many partitions are in the partition table
show partitions dept_partition;
8. View the partition table structure
show partitions dept_partition;
8. Secondary partition table
- Create a secondary partition table:
create table dept_partition2(
id int, name string
)
partitioned by (month string, day string)
row format delimited fields terminated by '\t'
stored as textfile;
- Load data to the partition table
load data local inpath '/home/hive/srcdata.txt' into table dept_partition2 partition(month='202011', day='01');
- Query partition data
select * from dept_partition2 where month='202011' and day='01';
9. Upload the data directly to the partition directory, three ways to associate the partition table and the data
- Fix after uploading data
First create a partition table:
create table dept_partition2(
id int, name string
)
partitioned by (month string, day string)
row format delimited fields terminated by '\t'
stored as textfile;
Add partition:
alter table dept_partition2 add partition(month='202001',day='01');
upload data:
hadoop fs -put srcdata.txt /user/hive/warehouse/dept_partition2/month=202001/day=01
Direct query:
2. Add partition after uploading data.
You can also create a folder in the specified location according to step 1, upload data and add partition at the end;
dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=202002/day=02;
hadoop fs -put srcdata.txt /user/hive/warehouse/dept_partition2/month=202002/day=02/
alter table dept_partition2 add partition(month='202002',day='02');
3. After creating the folder, load the data to the partition
alter table dept_partition2 add partition(month='202003',day='03');
load data local inpath '/home/hive/srcdata.txt' into table dept_partition2 partition(month='202003',day='03');