[Big Data] Hive Series - Hive-Partition Table (Static Partition and Dynamic Partition)

Partition Table

The partition table actually corresponds to an independent folder on the HDFS file system, under which are all the data files of the partition. Partitions in Hive are sub-directories, which divide a large data set into small data sets according to business needs. When querying, use the expression in the WHERE clause to select the specified partition required for the query, which will greatly improve the query efficiency.

Basic operation of partition table

create partition table syntax

Note: The partition field cannot be data that already exists in the table, and the partition field can be regarded as a pseudo-column of the table.

create table user_partition( 
	no int, 
	name string,
)
partitioned by (day string)
row format delimited fields terminated by '\t';

Load data into partition table

prepare data

20230312.log
20230313.log
20230314.log

Download Data

Note: When loading data in a partitioned table, you must specify a partition

load data local inpath
'/data/20230312.log' into table user_partition partition(day='20230312');

load data local inpath
'/data/20230313.log' into table user_partition partition(day='20230313');

load data local inpath
'/data/20230314.log' into table user_partition partition(day='20230314');

add partition

Create a single partition

alter table user_partition add partition(day='20230311');

Create multiple partitions at the same time

alter table user_partition add partition(day='20230309') partition(day='20230310');

delete partition

delete a single partition

alter table user_partition drop partition (day='20230309');

Delete multiple partitions at the same time

alter table user_partition drop partition (day='20230311'), partition(day='20230310');

Check how many partitions the partition table has

show partitions user_partition; 

View partition table structure

desc formatted user_partition;

Secondary partition

Create a secondary partition table

create table access_log( id int, name string, loc string
) partitioned by (day string, hour string);

load data normally

Load data into the secondary partition table

load data local inpath '/data/access_20230312.log' into table access_log partition(day='202303', hour='12');

Query partition data

select * from access_log where day='202303' and hour='12';

Three ways to directly upload data to the partition directory and associate the partition table with the data

  • Method 1: Repair the upload after uploading the data
hive (default)> dfs -mkdir -p
/hive/warehouse/op_log.db/access_log/day=202303/hour=12;

hive (default)> dfs -put /data/access_20230312.log
/hive/warehouse/op_log.db/access_log/day=202303/hour=12;

Query data (the data just uploaded cannot be queried)

select * from access_log where day='202303' and hour='12';

Execute the repair command

msck repair table access_log; 

execute query

  • Method 2: Add partitions after uploading data
hive (default)> dfs -mkdir -p
/hive/warehouse/op_log.db/access_log/day=202304/hour=14; 
hive (default)> dfs -put /data/access_20230414.log
/hive/warehouse/op_log.db/access_log/day=202304/hour=14;

Execute add partition

hive (default)> alter table access_log add partition(day='202304',hour='14');

execute query

  • Method 3: Create a folder and load data to the partition to create a directory
    Create a directory
hive (default)> dfs -mkdir -p
/hive/warehouse/op_log.db/access_log/day=202303/hour=15; 

upload data

hive (default)> load data local inpath '/data/access_20230315.log' into table access_log partition(day='202303',hour='15');

execute query

Dynamic Partition Adjustment

In a relational database, when inserting data into a partition table, the database will automatically insert the data into the corresponding partition according to the value of the partition field. Hive also provides a similar mechanism, that is, Dynamic Partition (Dynamic Partition), but, To use Hive's dynamic partitions, corresponding configurations are required.

Enable dynamic partition parameter settings

Related configuration items

Enable the dynamic partition function (default true, enabled)

hive.exec.dynamic.partition=true

Set to non-strict mode (the dynamic partition mode, the default strict, means that at least one partition must be designated as a static partition, and the nonstrict mode means that all partition fields are allowed to use dynamic partitions.)

hive.exec.dynamic.partition.mode=nonstrict

The maximum number of dynamic partitions that can be created on all nodes that execute MR. Default 1000

hive.exec.max.dynamic.partitions=1000

On each node that executes MR, the maximum number of dynamic partitions that can be created. This parameter needs to be set according to the actual data. For example: the source data contains one year's data, that is, the day field has 365 values, then this parameter needs to be
set to be greater than 365, and if the default value of 100 is used, an error will be reported.

hive.exec.max.dynamic.partitions.pernode=100

The maximum number of HDFS files that can be created in the entire MR Job. Default 100000

hive.exec.max.created.files=100000

Whether to throw an exception when an empty partition is generated. Generally no setting is required. default false

hive.error.on.empty.partition=false

the case

Requirement: Insert the data in the user table into the corresponding partition of the target table person according to the region (loc field).

  • Create target partition table
hive (default)> create table user_partition(id int, name string) partitioned by (loc int) row format delimited fields terminated by '\t';
  • Set up dynamic partitions
set hive.exec.dynamic.partition.mode = nonstrict;
hive (default)> insert into table user_partition partition(loc) select id, name, loc from user;
  • View the partition status of the target partition table
hive (default)> show partitions user_partition; 

I hope it will be helpful to you who are viewing the article, remember to pay attention, comment, and favorite, thank you

Guess you like

Origin blog.csdn.net/u013412066/article/details/129539412
Recommended