Spark availability

Single point of failure master node, to resolve this problem, it should help zookeeper, and start at least two master nodes to achieve highly reliable, relatively simple configuration:

Spark Cluster Planning: Master: hadoop01, hadoop04;

         Worker:hadoop02、hadoop03、hadoop04

Zk cluster installation configuration, and start zk cluster (not repeat them here)

Stop spark all services, modify the configuration file spark-env.sh, delete the configuration file

And add the following configuration SPARK_MASTER_IP

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop02,hadoop03,hadoop04 -Dspark.deploy.zookeeper.dir=/spark"

Distributed to hadoop02, hadoop03, under hadoop04 node

1. Modify the slaves on hadoop01 node configuration file specifies the contents of worker nodes
ps: if you modify the slaves nodes were also made to distribute the profile
2. First start zookeeper cluster
3. Do sbin / start-all.sh script on hadoop01, then sbin performed on hadoop04 / start-master.sh start a second Master

ps: If you use spark-shell start cluster configuration you need to add
spark-shell --master spark: // master01 : port1, master02: port2

Guess you like

Origin www.cnblogs.com/yumengfei/p/12029144.html