Install spark in standalone mode

1. Configure the Worker node through the following steps

a) Rename the slaves.template file to slaves, use the following command:

mv /usr/local/spark/conf/slaves.template /usr/local/spark/conf/slaves

b) Edit slaves file, use the following command:

vim /usr/local/spark/conf/slaves

c) Replace the original localhost with the following:

master
slave1
slave2

insert image description here

2. Configure the running parameters of the Spark cluster through the following steps:

a) Rename the spark-env.sh.template configuration file to spark-env.sh

mv /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh

b) Edit the spark-env.sh file and append the following at the end:

vim /usr/local/spark/conf/spark-env.sh
# 设置 JDK 目录
export JAVA_HOME=/usr/local/lib/jdk1.8.0_212
# 设置 web 监控页面端口号
export SPARK_MASTER_WEB_PORT=7077
# 设置 zookeeper 集群地址,实现高可用
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 -Dspark.deploy.zookeeper.dir=/usr/local/spark"
# 设置 YARN 的配置文件目录
export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
# 设置 HDFS 的配置文件目录
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

insert image description here

3. Modify the web port to 8085

vim /usr/local/spark/sbin/start-master.sh

insert image description here

4. Deploy Spark to slave1 and slave2 through the following steps:

a) Create a spark directory and use the following commands on slave1 and slave2:

sudo mkdir /usr/local/spark

b) Change the owner of the spark directory to the hadoop user, and use the following commands on slave1 and slave2:

sudo chown hadoop /usr/local/spark/

c) Send spark to slave1 and slave2, use the following command on the master:

scp -r /usr/local/spark/* hadoop@slave1:/usr/local/spark/
scp -r /usr/local/spark/* hadoop@slave2:/usr/local/spark/

d) Enter /usr/local/spark to check whether the sending is successful

insert image description here
insert image description here

e) To send environment variables to slave1 and slave2, use the following command on the master:

scp /home/hadoop/.bashrc hadoop@slave1:/home/hadoop/
scp /home/hadoop/.bashrc hadoop@slave2:/home/hadoop/

insert image description here

f) To refresh the environment variables, use the following commands on slave1 and slave2:

source /home/hadoop/.bashrc

insert image description here

test

1. Start zookeeper (all three virtual machines must be started)

zkServer.sh start

insert image description here

2. Start spark on the master

Be sure to enter spark first

cd /usr/local/spark/

insert image description here

sbin/start-all.sh

insert image description here

3. Start the standby master on slave1, and use the following command on slave1:

start-master.sh

insert image description here

4. View process

jps

insert image description here

5. View web port 8085

You only have three workers Id
insert image description here

6. Close the cluster

# 关闭spark集群(在master上)
sbin/stop-all.sh
# 关闭 master(在salve1上)
 stop-master.sh
# 关闭zookeeper(三台都要执行)
zkServer.sh stop

Guess you like

Origin blog.csdn.net/weixin_45955039/article/details/123463253