hive 随机抽取 分流

背景:做AB testing, 需要分流出固定百分比70%(但数量不固定)的用户跑二级风控模型

问题:查了几个函数,都不能满足要求,比如ntile, tablesample(n percent) (这个是按size分的,不是行数)

解决:

with temp1 as (
    select ceil(count(*)*0.7) as cnt
    from report.report_mxdai_riskresult_backwid_sh
    where topichour="{{ymdh}}" 
), temp2 as (
    select 
        *
        ,row_number() over(order by rand()) as rn
        ,t1.cnt
    from report.report_mxdai_riskresult_backwid_sh t
    left join temp1 t1 on (1=1 )
    where topichour="{{ymdh}}"
)


select 
    id
    ,wid
    ,case when rn <= cnt then result 
    else 1
    end as result
    ,case when rn <= cnt then model_version 
    else 'v0'
    end as model_version
    ,create_time_mysql
    ,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') as create_time_hive 
from temp2

猜你喜欢

转载自blog.csdn.net/littlecarton/article/details/80228645