The Hive window function takes the last piece of data closest to the time based on specific conditions (takes the value of the window function based on the conditions)

1. The Hive windowing function takes the last piece of data that is closest to the time based on specific conditions (a single windowing function actually takes two windows)

For medical treatment business, one visit, multiple prescriptions, and the prescription settlement time may be inconsistent. Then there will be multiple AI assistants recommending medications, and there will be multiple recommendation logs. Moreover, the recommended log time and the prescription settlement time are inconsistent, and the logs can only be associated. At the granularity of the visit level, it is necessary to find the recommendation record before the prescription settlement. Therefore, for the windowing function for one visit, only one time window can be opened, but there may be two prescriptions, so it is necessary to find the recommendation before the two prescriptions. Therefore, we need to add conditions according to hive's window function to implement a window and filter out two pieces of data.

select
t1.*
,case when  substring(t1.gmt_created,1,19)=substring(t1.gmt_created_max,1,19) then 1 else 0 end as use_flag

from (
select 
t1.*
,max(
    case
      when t1.log_type='2-2' and  substring(t1.gmt_created, 1, 19) <= substring(t4.expense_date, 1, 19) then substring(t1.gmt_created, 1, 19)
    end
  ) over(partition by t1.visitCode, t4.expense_date) as gmt_created_max
from wedw_dw.unfold_chdisease_gpt_opt_log_df t1
left join (select  
 visit_no
,mi_card_no
,expense_date
from  wedw_dw.doris_yyf_styy_txynhis_record_settle_bill_detail_df_tmp 
group by 
 visit_no
,mi_card_no
,expense_date
)   t2 on t1.visitCode=t2.visit_no and t1.patientIdNo=t2.mi_card_no 
) t1 

Guess you like

Origin blog.csdn.net/qq_44696532/article/details/134400534