同一个sql 在Hive和spark-sql 跑出结果不一样记录

表Schema

hive> desc gdm.dim_category;                                
name                    string         分类名称                                                   
org_code                string         分类code                             

hive> select name, org_code from gdm.dim_category limit 2;
OK
鞋     _8_
鞋/男  _8_21_
hive> desc gdm.dim_product_brand;
brand_id                bigint                  品牌ID                
ch_name                 string                  品牌中文名

hive> select brand_id, ch_name from gdm.dim_product_brand limit 2;
OK
1       nb
2       np               

待运行的SQL

select
  t1.keyword,
  t3.name,
  t4.ch_name
from
(
  select "categoryIds:_8_" as keyword
  union all
  select "categoryIds:_8_21_" as keyword
  union all
  select "brandId:1" as keyword
) t1
left join gdm.dim_category t3
on split(t1.keyword, ":")[1] = t3.org_code and split(t1.keyword, ":")[0] = "categoryIds"
left join gdm.dim_product_brand t4
on split(t1.keyword, ":")[1] = t4.brand_id and split(t1.keyword, ":")[0] = "brandId"

在Hive中跑出的结果 (错误)

categoryIds:_8_	NULL	NULL
categoryIds:_8_21_	NULL	NULL
brandId:1	NULL	nb

在Spark-sql中跑出的结果 (正确)

categoryIds:_8_	鞋	NULL
categoryIds:_8_21_	鞋/男	NULL
brandId:1	NULL	nb

原因

因为 gdm.dim_product_brand 表中 brand_id字段是 bigint 类型
所以 在Hive中会把 keyword 转成double类型 用来进行 join 匹配操 –> split(t1.keyword, “:”)[1] = t4.brand_id)

因此 split(t1.keyword, “:”)[1] = t3.org_code 匹配不成功,所以结果为NULL

解决办法

split(t1.keyword, ":")[1] = t4.brand_id  -->  split(t1.keyword, ":")[1] = cast(t4.brand_id as string)
发布了53 篇原创文章 · 获赞 50 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/xw514124202/article/details/83305309