ubuntu configure Hadoop and Sprak

Spark Installation Configuration
1. Under the official website go Scala installation package and spark
Here Insert Picture Description
Here Insert Picture Description

2. The
sudo tar zxvf spark-3.0.0-preview -bin-hadoop3.2.tgz -C / usr / local /
decompression installation.
Here Insert Picture Description

3. Folder renamed
sudo mv spark-3.0.0-preview- bin-hadoop3.2 spark
Here Insert Picture Description

4. Configuration ~ / .bashrc
Here Insert Picture Description

5. Configuration spark-env.sh
into Spark / the conf /
CP-env.sh.template spark-env.sh Spark
Vim spark-env.sh
Here Insert Picture Description

export JAVA_HOME=/home/ysc/Documents/Code_software/JDK-8/jdk1.8.0_231
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SCALA_HOME=/home/ysc/Documents/Code_software/scala/scala-2.13.1
export SPARK_HOME=/usr/local/spark
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8099
export SPARK_WORKER_CORES=3
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=5G
export SPARK_WORKER_WEBUI_PORT=8081
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=1G
export LD_LIBRARY_PATH= L D L I B R A R Y P A T H : {LD_LIBRARY_PATH}: HADOOP_HOME is / lib / Native
Here Insert Picture Description
Java, etc. Hadoop specific path according to their actual environment settings.

6.配置Slave
cp slaves.template slaves
Here Insert Picture Description
Here Insert Picture Description

The default is localhost

7. Start (provided that the distribution has started hadoop pseudo):
then start start-all.sh under the spark / sbin directory.
May occur the following problems:
Here Insert Picture Description

After resolving the problem in the following ways
Here Insert Picture Description

Successful start
Here Insert Picture Description

Spark's web interface: http: //127.0.0.1: 8099 /

Here Insert Picture Description
8. Start spark-shell bin directory
CD $ SPARK_HOME / bin
./spark-shell
Here Insert Picture Description

spark-shell web interface http://127.0.0.1:4040
Here Insert Picture Description

9.python used pyspark
Of course, we, can not say that only developed in such a interpreter in the development process thereafter, so then we can do is to load the spark python libraries.

So we need to add pyspark to python directory to find among Similarly, we need to edit ~ / .bashrc file, add at the end

export PYTHONPATH=/usr/local/spark/python:/usr/bin/python

This put the python in the spark library directory to find the directory of the python

However, due to python need to call the java libraries so in the / usr / local / spark / python path we need to add a py4j folder, this file can be found under / usr / local / spark / python / lib directory, in this there is a py4j-0.9-src.zip archive, extract the zoom to him under / usr / local / spark / python / directory directory on it

sudo unzip -d /usr/local/spark/python py4j-0.9-src.zip

This time type python in any directory
Here Insert Picture Description

Then enter here import pyspark

See if you can import pyspark correct, if without any prompt, it shows pyspark able to import correctly.
.Py files so that you can write in any place where it is needed to use pyspark with import import it.

10.pycharm import pyspark
of course, some users prefer to use pycharm to write python, so for pycharm use pyspark do what we need to first explain the drop-down box click on the upper right corner, choose Edit Configurations ...
Here Insert Picture Description

Then in the dialog box that appears, click on the Enviroment variables: Edit button on the right
Here Insert Picture Description

Click plus to add two new data,
the PYTHONPATH and
SPARK_HOME
data content and a corresponding content ~ / .bashrc same

Here Insert Picture Description

Next is a critical step, but also to configure other. Many of the pages have only to the previous step. In perferences in the project structure, click on the right "add content root", add the path of py4j-some-version.zip and pyspark.zip (both files are in the Spark python file in folder)
Here Insert Picture Description

Import pyspark red line disappears, normal operation

The following code then tests

import pyspark
conf = pyspark.SparkConf (). setAppName ( "kick demo"). setMaster ( "local")
sc = pyspark.SparkContext (conf = conf)

appear
Here Insert Picture Description

Description pycharm can also be a normal load pyspark.

Released five original articles · won praise 0 · Views 515

Guess you like

Origin blog.csdn.net/pursuingparadise/article/details/103811077