When SQOOP incremental extraction, to achieve a similar merge operation the Oracle in HIVE

Data Extraction aspects of the data warehouse construction, often need to extract incremental business library data. But business is not a layer of the same library data will take place according to the time the state changes, you need to synchronize data changes to update the HIVE. When used to do data warehouse on Oracle, you can use the merge method merge old and new data. However, this feature does not hive, the aims herein by sqoop extraction, automatically merge data.

Table Design

Extract the table is divided into three,

  1. _Arc a table, save a snapshot of the combined daily, according to pt field partition
  2. _Inc a table for storing the extracted incremental data day, in accordance with pt field partitions
  3. One without the suffix table, point to the subsequent final table ETL tasks.

step

  1. Use sqoop for hive import, the data import sheet _inc
  2. Core, using a full join, coalesce, if the combination of SQL merger will inc partition data table that day and the day before the arc more day to Merge partitioned data partition _arc table.
  3. The final day of the partition table points _arc by hive command set location.

Code points:

merge SQL

use ods; 
insert overwrite table mytable_arc partition (pt='20200407') 
select coalesce(a.id,b.id), if(a.id is null, b.type, a.type), if(a.id is null, b.amt, a.amt) from (
  select id, type, amt
  from mytable_inc where pt='20200407'
) a full join (
  select id, type, amt
  from mytable_arc where pt='20200406'
) b on a.%s = b.%s" 

hive set location

use ods; 
alter table mytable set location 'hdfs://hadoop01:9000/user/hive/warehouse/ods.db/mytable_arc/pt=20200407'"

Guess you like

Origin www.cnblogs.com/hupingzhi/p/12654898.html