RDD,Spark SQL,DF分组

1、RDD

#显示不重复资料     年龄不同
a=userrdd.map(lambda x:x[2]).distinct().collect()
print(a)

#年龄性别不同
a=userrdd.map(lambda x:(x[1],x[2])).distinct().collect()
print(a)

2、Spark SQL


#Spark SQL

sqlContxt.sql("select distinct gender from user_table").show()
sqlContxt.sql("select distinct age,gender from user_table").show()

3、DF

#DF
user_df.select("gender").distinct().show()
user_df.select("age","gender").distinct().show()

猜你喜欢

转载自blog.csdn.net/weixin_40161254/article/details/87920994