Spark: Mac上配置pySpark的IDE开发环境

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/chao2016/article/details/82914754

1. 开发工具

  • Java
  • spark-2.3.0-bin-2.6.0-cdh5.7.0
  • PyCharm

2. Spark配置

JAVA_HOME=/Users/chao/.jenv/candidates/java/current/
  • slaves
localhost

3. PyCharm配置

3.1 设置启动参数

  • 新建一个python工程,创建一个.py文件
  • Run -> Edit Configurations -> Configuration -> Environment Variables -> 添加参数:
PYTHONPATH=/Users/chao/Documents/app/spark-2.3.0-bin-2.6.0-cdh5.7.0/bin
SPARK_HOME=/Users/chao/Documents/app/spark-2.3.0-bin-2.6.0-cdh5.7.0

如下图所示:
在这里插入图片描述

3.2 引入spark包

PyCharm -> Preferences -> Project -> Project Structure -> Add Current Root
添加同一个目录下的两个包:

/Users/chao/Documents/app/spark-2.3.0-bin-2.6.0-cdh5.7.0/python/lib/py4j-0.10.6-src.zip
/Users/chao/Documents/app/spark-2.3.0-bin-2.6.0-cdh5.7.0/python/lib/pyspark.zip

4. 测试

spark1001.py

from pyspark import SparkConf, SparkContext

# 创建SparkConf:设置的是Spark相关的参数信息
conf = SparkConf().setMaster("local[2]").setAppName("spark0301")

# 创建SparkContext
sc = SparkContext(conf=conf)

# 业务逻辑
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
print(distData.collect())

sc.stop()

点击运行,结果显示:

[1, 2, 3, 4, 5]

5. 集群运行

  • spark-submit 参数中添加py文件(代替jar包)即可。
spark-submit --master local[2] --name spark0301 /root/script/spark0301.py

猜你喜欢

转载自blog.csdn.net/chao2016/article/details/82914754
今日推荐