1. Configure the Worker node through the following steps
a) Rename the slaves.template file to slaves, use the following command:
mv /usr/local/spark/conf/slaves.template /usr/local/spark/conf/slaves
b) Edit slaves file, use the following command:
vim /usr/local/spark/conf/slaves
c) Replace the original localhost with the following:
master
slave1
slave2
2. Configure the running parameters of the Spark cluster through the following steps:
a) Rename the spark-env.sh.template configuration file to spark-env.sh
mv /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
b) Edit the spark-env.sh file and append the following at the end:
vim /usr/local/spark/conf/spark-env.sh
# 设置 JDK 目录
export JAVA_HOME=/usr/local/lib/jdk1.8.0_212
# 设置 web 监控页面端口号
export SPARK_MASTER_WEB_PORT=7077
# 设置 zookeeper 集群地址,实现高可用
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 -Dspark.deploy.zookeeper.dir=/usr/local/spark"
# 设置 YARN 的配置文件目录
export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
# 设置 HDFS 的配置文件目录
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
3. Modify the web port to 8085
vim /usr/local/spark/sbin/start-master.sh
4. Deploy Spark to slave1 and slave2 through the following steps:
a) Create a spark directory and use the following commands on slave1 and slave2:
sudo mkdir /usr/local/spark
b) Change the owner of the spark directory to the hadoop user, and use the following commands on slave1 and slave2:
sudo chown hadoop /usr/local/spark/
c) Send spark to slave1 and slave2, use the following command on the master:
scp -r /usr/local/spark/* hadoop@slave1:/usr/local/spark/
scp -r /usr/local/spark/* hadoop@slave2:/usr/local/spark/
d) Enter /usr/local/spark to check whether the sending is successful
e) To send environment variables to slave1 and slave2, use the following command on the master:
scp /home/hadoop/.bashrc hadoop@slave1:/home/hadoop/
scp /home/hadoop/.bashrc hadoop@slave2:/home/hadoop/
f) To refresh the environment variables, use the following commands on slave1 and slave2:
source /home/hadoop/.bashrc
test
1. Start zookeeper (all three virtual machines must be started)
zkServer.sh start
2. Start spark on the master
Be sure to enter spark first
cd /usr/local/spark/
sbin/start-all.sh
3. Start the standby master on slave1, and use the following command on slave1:
start-master.sh
4. View process
jps
5. View web port 8085
You only have three workers Id
6. Close the cluster
# 关闭spark集群(在master上)
sbin/stop-all.sh
# 关闭 master(在salve1上)
stop-master.sh
# 关闭zookeeper(三台都要执行)
zkServer.sh stop