How does hive implement the If else logic, according to the conditions, get the data of different tables, and the data warehouse processes the upstream table record behavior 0 scene

background

The table of the upstream business system, batch truncate and then insert, the task fails, resulting in empty data extracted by the ods layer of the data warehouse (record behavior 0). This table is an organizational dimension table, and a large number of downstream data models rely on this table, resulting in a large number of data exceptions and affecting the BI user experience.

plan

  • Solution 1: Scheduling tool to make judgments. If the data extracted from the upstream table on the current day is empty, take yesterday’s ods layer data (the dimension table of the organizational structure, which does not change frequently). However, the purchased data development platform and scheduling tools do not support similar branch judgments. Dophinscheduler provides similar functionality.
  • Solution 2: Use shell scripts and SQL to judge the upstream data status of the current day. If it is empty, use yesterday's ODS layer data. It is cumbersome to call SQL statements in the shell script, and the keberos authentication of hive needs to be configured. Not elegant enough.
  • Solution 3: Use HSQL directly to implement this IF ELSE judgment logic. If the upstream table data extracted on the current day is empty, then take yesterday’s ODS layer data. The solution directly uses the platform's scheduling and integrated keberos authentication. It's perfect, but how to implement such logic?

accomplish

Suppose the name of the ODS layer table extracted by DataX is; o_hcm.org_unit_record_stg. The name of the ods layer table is: o_hcm.org_unit_record.

  • Common Program Judgment Logic
if select count(1) from o_hcm.org_unit_record_stg == 0
select * from o_hcm.org_unit_record 
else 
select * from o_hcm.org_unit_record_stg 
  • SQL implementation
SELECT *
  FROM (
       SELECT COUNT(1) AS cnt FROM o_hcm.org_unit_record_stg
       ) t1
  INNER JOIN o_hcm.org_unit_record t2 ON t1.cnt =  0 -- 匹配 org_unit_record_stg 空时;

UNION ALL
SELECT * FROM o_hcm.org_unit_record_stg t1;  -- 匹配 org_unit_record_stg 不为空时;
;

Use the INNER JOIN method to obtain the data of org_unit_record.

Extended Thinking

  • Can all IF ELSE patterns be constructed by union all + Cartesian product/JOIN?

Guess you like

Origin blog.csdn.net/zdsx1104/article/details/128782570