pyspark sql简单入门

采用python开发spark sql简单入门

1.编写pyspark脚本

步骤
  1. 读取本地csv文件转换为DataFrame
  1. DataFrame注册为spark sql临时表
  1. spark sql()函数查询返回DataFrame数据,或者直接DataFrame
# encoding: utf-8

from pyspark.sql import SparkSession

# 创建SparkSession 对象
spark=SparkSession.builder.appName('my_app_name').getOrCreate()

#读取本地csv文件创建DataFrame(本地:file:///开头,hdfs:hdfs:///)参数说明:sep='\t':列以Tab分割,header=True:将表头作为字段名
swimmersCSV = spark.read.csv("file:///home/douyonghou/continuity0916.csv", sep='\t', header=True, )

# DataFrame注册为临时表swimmersCSV
swimmersCSV.createOrReplaceTempView("swimmersCSV")

# 1.DataFrame API 查看数据
swimmersCSV.show()

# 2.sql函数返回的 DataFrame对象
data = spark.sql("select index from swimmersCSV").show()

2.在spark客户端提交spark应用程序

#spark-submit提交python脚本
spark-submit --conf "spark.pyspark.driver.python=/usr/bin/python3.5" --conf "spark.pyspark.python=/usr/bin/python3.5" pysparkTest.py

猜你喜欢

转载自blog.csdn.net/qq_33202508/article/details/108845214