Two modes of Spark-On-Yarn

Client (understand)

Spark's Driver driver runs on the client that submits the task.

  • advantage
  1. Because the Driver is on the client, the output of all the program results in the Driver can be seen on the client console
  • Disadvantage
  1. High communication cost with the cluster
  2. When the driver process hangs, it needs to be started manually

Insert picture description here
Case
Prerequisites:
1. Yarn cluster is required
2. History server
3. Client tool
for submitting tasks-spark -submit command 4. Bytecode of spark task/program to be submitted-sample programs can be used

需求:求Pi元圆周率
代码:
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master yarn  \
--deploy-mode client \
--driver-memory 512m \
--driver-cores 1 \
--executor-memory 512m \
--num-executors 2 \
--executor-cores 1 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.11-2.4.7.jar \
10

Result:
Insert picture description here
View the web interface:
http://node01:8088/cluster

Cluster mode (development and use)

Spark's Driver driver, running on the Yarn cluster

  • advantage

1. Because the Driver is handed over to Yarn for management, if it fails, Yarn will restart
. 2. The communication cost with the cluster is low

  • Disadvantage

1. The result of the program in Drive is not visible on the client console, but in Yarn

Insert picture description here
Case
Prerequisites:
1. Yarn cluster is required
2. History server
3. Client tool
for submitting tasks-spark -submit command 4. Bytecode of spark task/program to be submitted-sample programs can be used

需求:求Pi元圆周率
代码:
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master yarn  \
--deploy-mode cluster \
--driver-memory 512m \
--driver-cores 1 \
--executor-memory 512m \
--num-executors 2 \
--executor-cores 1 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.11-2.4.7.jar \
10

Result:
Insert picture description here
View the web interface:
http://node01:8088/cluster
Insert picture description here
Insert picture description hereInsert picture description here

Supplement: spark-shell and spark-submit

  • The difference between the two commands
    spark-shell: spark application interactive window. After startup, you can write spark code directly and run it immediately. Generally,
    spark-submit is used when learning and testing : to submit the jar package of spark tasks/programs to the spark cluster (generally submitted to the yarn cluster)

  • What parameters can be carried

  • –Master: The default is local[*]
    or specify --master local[2], which means to start two threads to perform Spark task jobs –
    master spark://node01:7077, which means to perform Spark task jobs on node01 –
    master yarn, which means Perform Spark task jobs on the yarn cluster
  • Other parameters
    Use to view:spark-shell --help

Guess you like

Origin blog.csdn.net/zh2475855601/article/details/114946679