Detailed explanation of executor-memory parameters in Spark

We know that when spark executes, you --executor-memorycan set the memory required by the executor to execute. But if the setting is too large, the program will report an error, as follows
write picture description here

So what is the maximum value that can be set? This article analyzes it.
The installation in this article is Spark1.6.1, which is installed on hadoop2.7.

1. Related 2 parameters

1.1 yarn.scheduler.maximum-allocation-mb

This parameter indicates the maximum memory that each container can apply for, which is generally the unified configuration of the cluster. The executor process in Spark runs in the container, so the maximum memory of the container will directly affect the maximum available memory of the executor. When you set a relatively large memory, an error will be reported in the log, and the value of this parameter will be printed at the same time. As shown in the figure below, 6144MB is 6G.
write picture description here

1.2 spark.yarn.executor.memoryOverhead

When the executor is executed, the memory used may exceed the executor-memoy, so an additional part of the memory will be reserved for the executor. spark.yarn.executor.memoryOverhead represents this part of the memory. If this parameter is not set, there will be an automatic calculation formula (located in ClientArguments.scala), the code is as follows:
write picture description here

Among them, MEMORY_OVERHEAD_FACTOR defaults to 0.1, executorMemory is the set executor-memory, and MEMORY_OVERHEAD_MIN defaults to 384m. The parameters MEMORY_OVERHEAD_FACTOR and MEMORY_OVERHEAD_MIN generally cannot be directly modified, and are directly written in the Spark code.

2. executor-memory calculation

Calculation formula:

  val executorMem = args.executorMemory + executorMemoryOverhead

Suppose executor- is X (integer, unit is M), ie
1) If spark.yarn.executor.memoryOverhead is not set,

executorMem= X+max(X*0.1,384)

2) If spark.yarn.executor.memoryOverhead is set (integer, the unit is M)

executorMem=X +spark.yarn.executor.memoryOverhead 

Conditions that need to be met:

executorMem< yarn.scheduler.maximum-allocation-mb  

Note: The above code is in Client.scala.
In this example:

6144=X+max(X*0.1,384) 
X=5585.45 

Rounded up to 5586M, that is, the maximum memory can be set to 5586M.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325430417&siteId=291194637