Hive query results bulk insert partitioning operations

When hive of data to build the table, in order to efficiently query, we often will create the partition table, for example, the following table

create external table dm_fan_photo_icf_basic(user string, item string, hot int) 
PARTITIONED BY (day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' stored as textfile location '/user/hive/fan/photo/icf/basic/';

This is an external table, in order to (day) as the partition, in general, to insert new data must be specified partition, e.g.

insert into table dm_fan_photo_icf_basic
PARTITIONED BY (day = '20130620') select * from table_test where day = 20130620;

The above table table_test which will field day = 20,130,620 of data inserted into the table dm_fan_photo_icf_basic in, and sometimes build a partition to insert data into these new data may be more than one day may be a month, this time in accordance with the conventional case should write multiple sql, then the partition field name changed to the appropriate date, on the one hand the code is not simple, on the other hand it needs to start more than one job, and do not make full use of the advantages of the cluster, if one-time put all the data into a different partition , then the efficiency would be brought up, if the following aspects which should table_test data 20130620 to the table into a table dm_fan_photo_icf_basic day, and corresponds to the respective partition, this time may be utilized

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true; insert into table dm_fan_photo_icf_basic PARTITIONED BY (day) select * from table_test where day >= 20130620 distribute by day;

Of which the first two months setting is necessary because it is a dynamic partitioning insert, default is static

distribute the final surface by day is necessary, which is designated partition

When hive of data to build the table, in order to efficiently query, we often will create the partition table, for example, the following table

create external table dm_fan_photo_icf_basic(user string, item string, hot int) 
PARTITIONED BY (day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' stored as textfile location '/user/hive/fan/photo/icf/basic/';

This is an external table, in order to (day) as the partition, in general, to insert new data must be specified partition, e.g.

insert into table dm_fan_photo_icf_basic
PARTITIONED BY (day = '20130620') select * from table_test where day = 20130620;

The above table table_test which will field day = 20,130,620 of data inserted into the table dm_fan_photo_icf_basic in, and sometimes build a partition to insert data into these new data may be more than one day may be a month, this time in accordance with the conventional case should write multiple sql, then the partition field name changed to the appropriate date, on the one hand the code is not simple, on the other hand it needs to start more than one job, and do not make full use of the advantages of the cluster, if one-time put all the data into a different partition , then the efficiency would be brought up, if the following aspects which should table_test data 20130620 to the table into a table dm_fan_photo_icf_basic day, and corresponds to the respective partition, this time may be utilized

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true; insert into table dm_fan_photo_icf_basic PARTITIONED BY (day) select * from table_test where day >= 20130620 distribute by day;

Of which the first two months setting is necessary because it is a dynamic partitioning insert, default is static

distribute the final surface by day is necessary, which is designated partition

Guess you like

Origin www.cnblogs.com/shujuxiong/p/11161580.html