Article Directory
Client (understand)
Spark's Driver driver runs on the client that submits the task.
- advantage
- Because the Driver is on the client, the output of all the program results in the Driver can be seen on the client console
- Disadvantage
- High communication cost with the cluster
- When the driver process hangs, it needs to be started manually
Case
Prerequisites:
1. Yarn cluster is required
2. History server
3. Client tool
for submitting tasks-spark -submit command 4. Bytecode of spark task/program to be submitted-sample programs can be used
需求:求Pi元圆周率
代码:
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--deploy-mode client \
--driver-memory 512m \
--driver-cores 1 \
--executor-memory 512m \
--num-executors 2 \
--executor-cores 1 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.11-2.4.7.jar \
10
Result:
View the web interface:
http://node01:8088/cluster
Cluster mode (development and use)
Spark's Driver driver, running on the Yarn cluster
- advantage
1. Because the Driver is handed over to Yarn for management, if it fails, Yarn will restart
. 2. The communication cost with the cluster is low
- Disadvantage
1. The result of the program in Drive is not visible on the client console, but in Yarn
Case
Prerequisites:
1. Yarn cluster is required
2. History server
3. Client tool
for submitting tasks-spark -submit command 4. Bytecode of spark task/program to be submitted-sample programs can be used
需求:求Pi元圆周率
代码:
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 512m \
--driver-cores 1 \
--executor-memory 512m \
--num-executors 2 \
--executor-cores 1 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.11-2.4.7.jar \
10
Result:
View the web interface:
http://node01:8088/cluster
Supplement: spark-shell and spark-submit
-
The difference between the two commands
spark-shell: spark application interactive window. After startup, you can write spark code directly and run it immediately. Generally,
spark-submit is used when learning and testing : to submit the jar package of spark tasks/programs to the spark cluster (generally submitted to the yarn cluster) -
What parameters can be carried
- –Master: The default is local[*]
or specify --master local[2], which means to start two threads to perform Spark task jobs –
master spark://node01:7077, which means to perform Spark task jobs on node01 –
master yarn, which means Perform Spark task jobs on the yarn cluster
- Other parameters
Use to view:spark-shell --help