表Schema
hive> desc gdm.dim_category;
name string 分类名称
org_code string 分类code
hive> select name, org_code from gdm.dim_category limit 2;
OK
鞋 _8_
鞋/男 _8_21_
hive> desc gdm.dim_product_brand;
brand_id bigint 品牌ID
ch_name string 品牌中文名
hive> select brand_id, ch_name from gdm.dim_product_brand limit 2;
OK
1 nb
2 np
待运行的SQL
select
t1.keyword,
t3.name,
t4.ch_name
from
(
select "categoryIds:_8_" as keyword
union all
select "categoryIds:_8_21_" as keyword
union all
select "brandId:1" as keyword
) t1
left join gdm.dim_category t3
on split(t1.keyword, ":")[1] = t3.org_code and split(t1.keyword, ":")[0] = "categoryIds"
left join gdm.dim_product_brand t4
on split(t1.keyword, ":")[1] = t4.brand_id and split(t1.keyword, ":")[0] = "brandId"
在Hive中跑出的结果 (错误)
categoryIds:_8_ NULL NULL
categoryIds:_8_21_ NULL NULL
brandId:1 NULL nb
在Spark-sql中跑出的结果 (正确)
categoryIds:_8_ 鞋 NULL
categoryIds:_8_21_ 鞋/男 NULL
brandId:1 NULL nb
原因
因为 gdm.dim_product_brand 表中 brand_id字段是 bigint 类型
所以 在Hive中会把 keyword 转成double类型 用来进行 join 匹配操 –> split(t1.keyword, “:”)[1] = t4.brand_id)
因此 split(t1.keyword, “:”)[1] = t3.org_code 匹配不成功,所以结果为NULL
解决办法
split(t1.keyword, ":")[1] = t4.brand_id --> split(t1.keyword, ":")[1] = cast(t4.brand_id as string)