Spark dynamic resource allocation

0x0 Introduction

Using SparkSession to execute spark tasks in java is very convenient in code writing, but there is a problem: after SparkSession is closed, it cannot be opened again, which causes us to consistently occupy the applied Spark cluster resources (including memory and CPU). In order to optimize this One problem, the author adopts two methods:
First, use SparkSubmit to submit tasks, so as to release resources every time JOB is executed, but there is a drawback: the execution process of JOB cannot be freely monitored;
tutorials on programmatically submitting spark tasks:
http ://blog.csdn.net/gx304419380/article/details/79361645
Second, use Spark's dynamic resource allocation mechanism!

0x1 Don't talk nonsense, go directly to the tutorial

Two necessary conditions for enabling Spark dynamic resource allocation:
1. There must be two attributes in the code

.config("spark.dynamicAllocation.enabled", "true")
.config("spark.shuffle.service.enabled", "true")

2. The Spark worker must be configured as follows:
Open the worker's $SPARK_HOME/conf/spark-defaults.conf and
add:

spark.shuffle.service.enabled true

Start the Spark cluster (StandAlone mode)
and call the spark job. After the task is executed, wait for 60s, and you will find that the resources (Executor) requested by the task are automatically released!


PS: some useful parameters:

spark.dynamicAllocation.executorIdleTimeout 默认60s,如果executor超过这个时间未执行任务,则自动释放资源
spark.dynamicAllocation.initialExecutors    spark.dynamicAllocation.minExecutors 默认0,JOB申请的executor最小个数,默认是0
spark.dynamicAllocation.maxExecutors 默认infinite,JOB申请的executor最大个数,默认是无限个
spark.dynamicAllocation.minExecutors 默认等于最小个数

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325941325&siteId=291194637