pySpark加载数据

1、加载本地文件

lines=sc.textFile('file:/home/pxy/data/GoodBooks.csv')
for line in lines.take(5):
    print line.encode('utf-8')

效果:

2、从HDFS加载数据

lines=sc.textFile('hdfs://localhost:9000/pxy/film/GoodBooks.csv')
TopFive=lines.take(5)
for line in TopFive[1:]:
    print line.encode('utf-8')

效果:

  

  

猜你喜欢

转载自www.cnblogs.com/giserpan/p/9248399.html