Partition table is actually corresponding to a separate file on the file system HDFS folder, the folder is the partition of all data files. Hive partitions is divided directory, you need a large data set into smaller data sets according to the service. When a query query selects the specified partition required by WHERE clause expression, such query efficiency will improve a lot.
Partition table basic operations
1. The introduction of the partition table (the need for log management according to date)
/user/hive/warehouse/log_partition/20170702/20170702.log /user/hive/warehouse/log_partition/20170703/20170703.log /user/hive/warehouse/log_partition/20170704/20170704.log
2. Create a partition table syntax
hive (default)> create table dept ( deptno int, dname string, loc string ) partitioned by (month string) row format delimited fields terminated by '\t';
Note: partition field is not already present in the table data, you may be considered as dummy partition field in the list.
3. Loading data into a partitioned table
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept partition(month='201709'); hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept partition(month='201708'); hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept partition(month='201707’);
Note: When the partition table data is loaded, you must specify partition
4. Query data partition table
Single partition query
hive (default)> select * from dept_partition where month='201709';
Multi-partition joint inquiry
hive (default)> select * from dept_partition where month='201709' union select * from dept_partition where month='201708' union select * from dept_partition where month='201707'; _u3.deptno _u3.dname _u3.loc _u3.month 10 ACCOUNTING NEW YORK 201707 10 ACCOUNTING NEW YORK 201708 10 ACCOUNTING NEW YORK 201709 20 RESEARCH DALLAS 201707 20 RESEARCH DALLAS 201708 20 RESEARCH DALLAS 201709 30 SALES CHICAGO 201707 30 SALES CHICAGO 201708 30 SALES CHICAGO 201709 40 OPERATIONS BOSTON 201707 40 OPERATIONS BOSTON 201708 40 OPERATIONS BOSTON 201709
5. Add District
Create a single partition
hive (default)> alter table dept_partition add partition(month='201706') ;
Create multiple partitions
hive (default)> alter table dept_partition add partition(month='201705') partition(month='201704');
6. Delete partition
To delete a single partition
hive (default)> alter table dept_partition drop partition (month='201704');
Delete multiple partitions
hive (default)> alter table dept_partition drop partition (month='201705'), partition (month='201706');
7. View the partition table how many partitions
hive> show partitions dept_partition;
8. View the partition table structure
hive> desc formatted dept_partition; # Partition Information # col_name data_type comment month string
Partition Table Notes
1. Creating a Secondary Partition Table
hive (default)> create table dept_partition2( deptno int, dname string, loc string ) partitioned by (month string, day string) row format delimited fields terminated by '\t';
2. Normal loading data
(1) loading data into the secondary partition table
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition2 partition(month='201709', day='13');
(2) partitioned data queries
hive (default)> select * from dept_partition2 where month='201709' and day='13';
3. The data is directly uploaded to the directory partition, so that the partition table and the associated data generated in three ways
(1) One way: After uploading data recovery
upload data
hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=12; hive (default)> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=12;
Query data (query data can not just upload)
hive (default)> select * from dept_partition2 where month='201709' and day='12';
Perform a repair order
hive> msck repair table dept_partition2;
Query data again
hive (default)> select * from dept_partition2 where month='201709' and day='12';
(2) Second way: After uploading the data partition is added
upload data
hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=11; hive (default)> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=11;
Add a partition execution
hive (default)> alter table dept_partition2 add partition(month='201709',day='11');
Query data
hive (default)> select * from dept_partition2 where month='201709' and day='11';
(3) Three ways: After you create a folder to load data partition
Create a directory
hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=10;
upload data
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10');
Query data
hive (default)> select * from dept_partition2 where month='201709' and day='10';