Spark Standalone cluster is a cluster mode of Master-Slaves architecture. Like most Master-Slaves clusters, there is a single point of failure of Master. How to solve this problem, Spark provides two solutions:
(1) based on file system
# 1 .Install Zookeeper # 2 .Modify the spark- env.sh file and add the following configuration Comment out the following content: # SPARK_MASTER_HOST = hadoop102 / comment out the master node ip, because the master of high availability deployment is changed # SPARK_MASTER_PORT = 7077 added The following:
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url = hdp01, hdp02, hdp03 // Fill in the zookeeper cluster's ip zookeeper stand-alone only need to fill in one here
-Dspark.deploy.zookeeper.dir=/spark"
Parameter Description;
(1) recoveryMode, recovery mode (master restart mode)
There are three types: Zookeeper, FileSystem, None
(2) deploy.zookeeper.url, the server address of Zookeeper
(3) deploy.zookeeper.dir, files and directories for storing cluster metadata information.
Including worker, Driver, Application.
note:
To start the cluster in normal mode, you only need to execute start-all.sh on the host.
To start a spark cluster in high availability mode, you need to start the start-all.sh command on any node first. Then start the master separately on another node.
Verify spark cluster: worker appears, hdp02 starts successfully
Start the master of hdp02 separately
The master is started successfully, verify in the web UI, which master is alive
High availability verification, shut down the master of hdp01 node, can it automatically start the master node of hdp02
The spark + zookeeper high availability cluster is successfully built!