PyCharm development environment to build Spark

  1. Install the JDK

  Download and install jdk-12.0.1_windows-x64_bin.exe, configure the environment variables:

  New System Variable JAVA_HOME, installation path is Java

  New System Variable CLASSPATH, value;.% JAVA_HOME% \ lib \ dt.jar;% JAVA_HOME% \ lib \ tools.jar; (Note that the preceding dot)

  Configuring the system variable PATH, add% JAVA_HOME% bin;% JAVA_HOME% jrebin

  In the CMD type: java or java -version, does not show the internal commands, the installation is successful.

  2. Install Hadoop, and configuration environment variable

  下载hadoop:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

  Hadoop-2.7.7.tar.gz extracting a specific path, such as: D: \ adasoftware \ hadoop

  Adding system variables HADOOP_HOME: D: \ adasoftware \ hadoop

  Add the system variable PATH: D: \ adasoftware \ hadoop \ bin

  Mounting assembly winutils: winutils the corresponding hadoop version replace bin bin in their installation directory hadoop

  3.Spark environment variable configuration

  spark is on top of hadoop, during operation will call the relevant hadoop library, if you do not configure related hadoop runtime environment, will prompt an error message based on relevant, although not affect the operation.

  Hadoop download the corresponding version of the spark: http: //spark.apache.org/downloads.html

  Unzip the file to: D: \ adasoftware \ spark-2.4.3-bin-hadoop2.7

  Add PATH value: D: \ adasoftware \ spark-2.4.3-bin-hadoop2.7 \ bin;

  New System Variable SPARK_HOME: D: \ adasoftware \ spark-2.4.3-bin-hadoop2.7;

  4. Download and install anaconda

  anaconda integrated python interpreter and most python library after installing python and anaconda do not have to install these components such as the pandas numpy. download link. Finally, the python added to the path environment variable.

  5. Run the CMD pyspark in a similar view illustrating the normal installation configuration:

  This warning appears because the JDK version 12, is too high, but does not affect the operation. No effect.

  6. pycharm disposed in the spark

  Open PyCharm, create a Project. Then select "Run" -> "Edit Configurations" -> Click + to create a new python Configurations

  Select "Environment variables" increase SPARK_HOME catalog PYTHONPATH directories.

  SPARK_HOME: Spark installation directory

  PYTHONPATH: Spark installation directory of Python

  Select File-> setting-> your project-> project structure

  Add content root upper right corner to add: py4j-some-version.zip and the path pyspark.zip (both files are in the python file in the folder Spark)

  Save to Wuxi gynecological hospital http://www.ytsgfk120.com/

  7. test whether the configuration is successful, the program code below to create a python program into the family to stay:

  import os

  import sys

  # Path for spark source folder

  os.environ['SPARK_HOME'] = "D:\adasoftware\spark"

  # Append pyspark to Python Path

  sys.path.append("D:\adasoftware\spark\python")

  try:

  from pyspark imports SparkContext

  from pyspark imports SparkConf

  print("Successfully imported Spark Modules")

  except ImportError as e:

  print("Can not import Spark Modules", e)

  sys.exit(1)

  If the program normal output: "Successfully imported Spark Modules" means that the environment may have been executed properly.

  

Here Insert Picture Description


Guess you like

Origin blog.51cto.com/14503791/2434646