spark concat_ws,collect_set

concat_ws

hive > select product_id, concat_ws('_',collect_set(promotion_id)) as promotion_ids from product_promotion group by product_id;
OK
5112 960024_960025_960026_960027_960028
5113 960043_960044_960045_960046
Time taken: 3.116 seconds
concat_ws实现将多行记录合并成一行

collect_set

 
  
from pyspark.sql import functions as F
 
  
F.collect_set("di_ware_no")
 
  
 
  
这里的collect_set的作用是对di_ware_no去重,值得注意的是,必须保证di_ware_no的类型是string类型

猜你喜欢

转载自blog.csdn.net/kwame211/article/details/80505939