Two ways for hive to quickly copy dynamic partitions

 

 

In the previous logic of the modification of the hive warehouse, if the fields of the hive table were added due to a temporary need, after a few days of thinking, I felt

This business does not need to add this field, in the hive partition table,

 

To add a column statement, you need to add a cascade, otherwise the partitioned table will not be found when querying the partitioned data of a certain day

 

 

alter table ods_teach_online_coursewares ADD COLUMNS (ccdl_begtime string COMMENT 'Teaching start time') CASCADE;

 

 

 

The main discussion here is that columns are added to the partition table, and the processing of these columns is not needed later:

 

1 This is my usual way, through sql way:

eg : Table 1 needs to remove column 1 and column 2,

Then, first create this table with columns 1 and 2 removed, 

Then the hive command line is as follows:

 

 

set hive.exec.dynamic.partition.mode=nonstrict; must be set

insert overwrite table ods_teach_online_coursewares_bak partition(day)
When select selects a specific column name, it must be displayed with day
province_id,
province_name,
city_id,
city_name,
county_id,
county_name,
school_id,
school_name,
grade,
class_id,
class_name,
subject_id,
subject_name,
book_id,
book_name,
unit_id,
unit_name,
ccl_coursewares_id,
coursewares_name,
is_collect,
pid,
courseware_creator,
creator_name,
creator_icon,
courseware_owner,
owner_name,
owner_icon,
ccl_id,
ccl_begtime,
ccl_endtime,
duration,
ccdl_type,
resource_count,
ccl_type,
day
from ods_teach_online_coursewares  distribute by day;

 

 

Here's what the loading looks like:

 



 

 

If you are copying the columns of the entire table, instead of only copying some of the columns, the writing method is as follows:

insert overwrite table tmp_test partition(day)  select *  from dm_login_class_user_count_distribution_semester  distribute by day

 

 

 

Way 2:

 

Method 2: Use a combination of hadoop cp command + hive msck repair command

1 create table tmp_test1 like dm_login_class_user_count_distribution_semester; create target table

2 hadoop fs -cp hdfs://Galaxy/user/hive/warehouse/dev_treasury.db/dm_login_class_user_count_distribution_semester/* hdfs://Galaxy/user/hive/warehouse/dev_treasury.db/tmp_test1/ Copy the original table hdfs data to the target Table in hdfs directory

3 Enter the Hive environment and enter MSCK REPAIR TABLE tmp_test1;

4 Verify that the data is loaded in:

 > select * from  dm_login_class_user_count_distribution_semester where day='2016-12-12' limit 1;
OK
2016-12-12      4                                       3301            0                                       EDUCATION_STAFF 769     896     0       2016-12-12

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326398149&siteId=291194637