需求:spark读取mysql数据为df,df想转RDD,
df.rdd.map()报错
sparksession的一个坑:
用sparkcontext来解决sparksession没有序列化器的问题
示例代码如下:
from pyspark import SparkContext
from pyspark.sql import SQLContext
def map_time(string):
time = string.split()[1]
print(time)
hour = time.split('-')[0]
if hour < 12:
str.time = '上午'
elif 12 < hour < 18:
str.time = '中午'
else:
str.time = '晚上'
if __name__ == "__main__":
sc = SparkContext(appName='finaf_g')
ctx = SQLContext(sc)
jdbcDf = ctx.read.format("jdbc").options(
url="jdbc:mysql://localhost:3306/test",
dbtable="(SELECT * FROM phone) tmp", user="root",
password="root").load()
jdbcDf.show()
jdbcDf.rdd.foreach(map_time)
jdbcDf.show()
sc.stop()