There are three working modes of spark cluster, which are:
Standalone Deploy Mode: simplest way to deploy Spark on a private cluster
Of course, the first installation is the easiest, of course, it must be simple first.
1. Preparations
Three centos: spark01/spark02/spark03
Install jdk and configure JAVA_HOME
Download the spark installation package:
https://mirror.tuna.tsinghua.edu.cn/apache/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz
2. Unzip
tar -xvf spark-2.1.0-bin-hadoop2.7.tgz
3. Start
First start the master (spark01):
sbin/start-master.sh
Then start the slaves (spark02 and spark03)
sbin/start-slave.sh spark://spark01:7077
4. Test
There are two ways to submit a test task to the cluster using the built-in calculation pi example:
# This method will submit the task to the cluster but the client mode used, that is, the output of the task will be displayed in the console
bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://hadoop01:7077 --executor-memory 1G --total-executor-cores 2 examples/jars/spark-examples_2.11-2.1.0.jar 1000
# In this way, the task will be submitted to the cluster output to be viewed through the web ui, the console cannot see the result
bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark01:7077 --deploy-mode cluster --supervise --executor-memory 1G --total-executor-cores 2 examples/jars/spark-examples_2.11-2.1.0.jar 1000