HA cluster construction in Spark's Standalone mode
foreword
The version of Spark used in this article is: spark-2.3.0-bin-hadoop2.7.tgz
.
The spark cluster is built with 3 machines, the machines are respectively server01,server02,server03
.
Where: server01,server02
set to Master
, server01,server02,server03
is Worker
.
1. Download Spark
Spark download address:
Just select the corresponding version to download, the version I downloaded here is: spark-2.3.0-bin-hadoop2.7.tgz
.
2. Upload and decompress
2.1 After downloading to the local, upload it to the Linux virtual machine
scp spark-2.3.0-bin-hadoop2.7.tgz hadoop@server01:/hadoop
2.2 Decompression
tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz
2.3 Rename
mv spark-2.3.0-bin-hadoop2.7 spark
3. Configure the environment
enter the spark/conf
directory
3.1 Copy the configuration file
cp slaves.template slaves
cp spark-env.sh.template spark-env.sh
3.2 Modify the slaves configuration file
Worker process configuration of spark cluster
server01
server02
server03
3.3 Modify the spark-env.sh configuration file
# java环境变量
export JAVA_HOME=/java/jdk1.8.0_161
# spark集群master进程主机host
export SPARK_MASTER_HOST=server01
# spark集群master的端口号
export SPARK_MASTER_PORT=7077
# worker数量
export SPARK_WORKER_CORES=3
# worker机器的内存设置
export SPARK_WORKER_MEMORY=1g
# 配置zk
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=server01:2181,server02:2181,server03:2181 -Dspark.deploy.zookeeper.dir=/spark"
# 配置hadoop配置目录
export HADOOP_CONF_DIR=/hadoop/hadoop-2.7.5/etc/hadoop
3.4 Delivered to server02 and server03 machines
scp -r /hadoop/spark hadoop@server02:/hadoop
scp -r /hadoop/spark hadoop@server03:/hadoop
spark-env.sh
3.5 Modify the SPARK_MASTER_HOST
parameter information on the server02 machine
# 将主机名改为server02
export SPARK_MASTER_HOST=server02
3.6 Configure environment variables
Configure spark environment variables on the server01,server02,server03
machine
export SPARK_HOME=/hadoop/spark
export PATH=$PATH:$SPARK_HOME/bin
Make the configuration environment effective
source /etc/profile
4. Start the Spark cluster
On the server01 machine, go to the spark directory
4.1 Start the master and slaves processes respectively
# 启动master进程
sbin/start-master.sh
# 启动3个worker进程
sbin/start-slaves.sh
Use jps to view the process
4.2 Direct use to start-all.sh
start
sbin/start-all.sh
4.3 Manually start the master process on the server02 machine
Enter the spark directory
sbin/start-master.sh
We can kill spark's process using stop-all.sh
sbin/stop-all.sh
web page display
Type in the browser
server01::8080
Status: ALIVE indicates that the master is the master
server02:8080
Status: STANDBY indicates that this is the standby Master