pycharm pyspark configuration

1. Install pycharm, download spark (download from the official website, I downloaded spark-2.1.1-bin-hadoop2.7.tgz, after decompression it is the folder spark-2.1.1-bin-hadoop2.7, I will The file is placed under /Applications/spark/, there are python files in this folder, and there are two compressed packages py4j-some-version.zip and pyspark.zip under the python file, which will be used later)

2. Open a project casually, there is a run configurition on the left of the "run" triangle in the upper right corner of pycharm, open it.

3. Set configuration---Environment---Environment variables---click "...", a box appears, click +, enter two names, one is SPARK_HOME, the other is PYTHONPATH, set their values, SPARK_HOME The value is the absolute path of the installation folder spark-2.1.1-bin-hadoop2.7, the value of PYTHONPATH is the absolute path/python, for example, the value of my SPARK_HOME is /Applications/spark/spark-2.1.1-bin- hadoop2.7, then the value of my PYTHONPATH is /Applications/spark/spark-2.1.1-bin-hadoop2.7/python. Save the settings. (Note that no spaces are allowed anywhere in the path!)

4. The key step, many webpages only go to step 3, so the package that introduces spark still has a red line. Click "add content root" on the right in the project structure in perferences, and add the paths of py4j-some-version.zip and pyspark.zip (both files are in the python folder in Spark)

5. Complete, from pyspark import SparkContext, the red line disappears, and the operation is normal.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326561674&siteId=291194637