Python grammar - pyspark actual combat (basic knowledge)
Demonstration of obtaining the execution environment storage object of pyspark: SparkContext
"""
演示获取pyspark的执行环境入库对象:SparkContext
并通过SparkContext对象获取当前的pyspark的版本
"""
# 导包
from pyspark import SparkConf, SparkContext
# 创建sparkconf类对象
# conf = SparkConf()
# conf.setMaster("local[*]")
# conf.setAppName("test_name")
conf = SparkConf().setMaster("local[*]").setAppName("test_spark_app")
# 基于sparkconf类对象创建sparkcontext对象
sc = SparkContext(conf=conf)
# 打印pyspark版本
print(sc.version)
# 停止sparkcontext对象的运行(停止pyspark程序)
sc.stop()
Problem encountered in the first execution : RuntimeError: Java gateway process exited before sending its port number when setting up the PySpark execution environment entry
Reason : Java jdk program is not installed
Solution : Go to the official website to download jdk, install and configure it, restart pycharm, and solve the problem The result of the post procedure is as follows
Reference content:
RuntimeError: Java gateway process exited before sending its port number occurs when Python builds the PySpark execution environment entry
Dark Horse Programmer-Python Basics