Spark Installation Configuration
1. Under the official website go Scala installation package and spark
2. The
sudo tar zxvf spark-3.0.0-preview -bin-hadoop3.2.tgz -C / usr / local /
decompression installation.
3. Folder renamed
sudo mv spark-3.0.0-preview- bin-hadoop3.2 spark
4. Configuration ~ / .bashrc
5. Configuration spark-env.sh
into Spark / the conf /
CP-env.sh.template spark-env.sh Spark
Vim spark-env.sh
export JAVA_HOME=/home/ysc/Documents/Code_software/JDK-8/jdk1.8.0_231
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SCALA_HOME=/home/ysc/Documents/Code_software/scala/scala-2.13.1
export SPARK_HOME=/usr/local/spark
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8099
export SPARK_WORKER_CORES=3
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=5G
export SPARK_WORKER_WEBUI_PORT=8081
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=1G
export LD_LIBRARY_PATH=
HADOOP_HOME is / lib / Native
Java, etc. Hadoop specific path according to their actual environment settings.
6.配置Slave
cp slaves.template slaves
The default is localhost
7. Start (provided that the distribution has started hadoop pseudo):
then start start-all.sh under the spark / sbin directory.
May occur the following problems:
After resolving the problem in the following ways
Successful start
Spark's web interface: http: //127.0.0.1: 8099 /
8. Start spark-shell bin directory
CD $ SPARK_HOME / bin
./spark-shell
spark-shell web interface http://127.0.0.1:4040
9.python used pyspark
Of course, we, can not say that only developed in such a interpreter in the development process thereafter, so then we can do is to load the spark python libraries.
So we need to add pyspark to python directory to find among Similarly, we need to edit ~ / .bashrc file, add at the end
export PYTHONPATH=/usr/local/spark/python:/usr/bin/python
This put the python in the spark library directory to find the directory of the python
However, due to python need to call the java libraries so in the / usr / local / spark / python path we need to add a py4j folder, this file can be found under / usr / local / spark / python / lib directory, in this there is a py4j-0.9-src.zip archive, extract the zoom to him under / usr / local / spark / python / directory directory on it
sudo unzip -d /usr/local/spark/python py4j-0.9-src.zip
This time type python in any directory
Then enter here import pyspark
See if you can import pyspark correct, if without any prompt, it shows pyspark able to import correctly.
.Py files so that you can write in any place where it is needed to use pyspark with import import it.
10.pycharm import pyspark
of course, some users prefer to use pycharm to write python, so for pycharm use pyspark do what we need to first explain the drop-down box click on the upper right corner, choose Edit Configurations ...
Then in the dialog box that appears, click on the Enviroment variables: Edit button on the right
Click plus to add two new data,
the PYTHONPATH and
SPARK_HOME
data content and a corresponding content ~ / .bashrc same
Next is a critical step, but also to configure other. Many of the pages have only to the previous step. In perferences in the project structure, click on the right "add content root", add the path of py4j-some-version.zip and pyspark.zip (both files are in the Spark python file in folder)
Import pyspark red line disappears, normal operation
The following code then tests
import pyspark
conf = pyspark.SparkConf (). setAppName ( "kick demo"). setMaster ( "local")
sc = pyspark.SparkContext (conf = conf)
appear
Description pycharm can also be a normal load pyspark.