DataFrame' object has no attribute 'map'

在对 python的sparksql 代码中测试中, 在spark1.6中使用dataframe的map对象时,

   

session_pv = sqlContext.sql("""SELECT session_id,COUNT(1) AS cnt FROM tmp_page_views GROUP BY session_id ORDER BY cnt DESC LIMIT 20""")\
             .map(lambda output: output.session_id + "\t"+ str(output.cnt))

是可以正常运行的,是因为 在Spark2.0之前,spark_df.map是spark_df.rdd.map()的别名,但在spark2.2的环境中,就会报DataFrame' object has no attribute 'map' 的错误,所以必须显式调用,将其转换为RDD并通过执行spark_df.rdd.map(),代码必须写成

session_pv = sqlContext.sql("""SELECT session_id,COUNT(1) AS cnt FROM tmp_page_views GROUP BY session_id ORDER BY cnt DESC LIMIT 20""")\
             .rdd.map(lambda output: output.session_id + "\t"+ str(output.cnt))

才能正常执行。

猜你喜欢

转载自blog.csdn.net/lepton126/article/details/86626311
今日推荐