Need to configure multiple servers, experimental environment: master and data two servers, hadoop installed, please refer to the previous article! ! !
1.spark installation
- master install
(1) Download scala and spark
(2) Unzip and configure environment variables
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/home/spark-2.4.5-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
(3) Configure spark-env.sh file
export SPARK_MASTER_IP=IP
export SPARK_MASTER_HOST=IP
export SPARK_WORKER_MEMORY=512m
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=4
export SPARK_MASTER_PORT=7077
(4) Configure the slaves file
data
- data installation
(1) Download scala and spark
(2) Unzip and configure environment variables
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/home/spark-2.4.5-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
(3) Configure spark-env.sh file
export SPARK_MASTER_IP=IP
export SPARK_MASTER_HOST=IP
export SPARK_WORKER_MEMORY=512m
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=4
export SPARK_MASTER_PORT=7077
Start up and test:
Enter the sbin directory to start: start-all.sh or start-master.sh, start-slaves.sh, enter jps:
master display: data display:
Then start pyspark:
pyspark
Can visit successfully, and then change the mode:
pyspark --master spark://master_ip:7077
2. Configure Anaconda and remotely access Jupyter
(1) Install Anaconda
installation:
Configure environment variables:
(2) Remote configuration of Jupyter
Reference: https://blog.csdn.net/MuziZZ/article/details/101703604
(3) Combination of pyspark and python
export PATH=$PATH:/root/anaconda3/bin
export ANACONDA_PATH=/root/anaconda3
export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/jupyter-notebook
#PARK_DRIVER_PYTHON="jupyter" PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
export PYSPARK_PYTHON=$ANACONDA_PATH/bin/python
Access interface: