We know that when spark executes, you --executor-memory
can set the memory required by the executor to execute. But if the setting is too large, the program will report an error, as follows
So what is the maximum value that can be set? This article analyzes it.
The installation in this article is Spark1.6.1, which is installed on hadoop2.7.
1. Related 2 parameters
1.1 yarn.scheduler.maximum-allocation-mb
This parameter indicates the maximum memory that each container can apply for, which is generally the unified configuration of the cluster. The executor process in Spark runs in the container, so the maximum memory of the container will directly affect the maximum available memory of the executor. When you set a relatively large memory, an error will be reported in the log, and the value of this parameter will be printed at the same time. As shown in the figure below, 6144MB is 6G.
1.2 spark.yarn.executor.memoryOverhead
When the executor is executed, the memory used may exceed the executor-memoy, so an additional part of the memory will be reserved for the executor. spark.yarn.executor.memoryOverhead represents this part of the memory. If this parameter is not set, there will be an automatic calculation formula (located in ClientArguments.scala), the code is as follows:
Among them, MEMORY_OVERHEAD_FACTOR defaults to 0.1, executorMemory is the set executor-memory, and MEMORY_OVERHEAD_MIN defaults to 384m. The parameters MEMORY_OVERHEAD_FACTOR and MEMORY_OVERHEAD_MIN generally cannot be directly modified, and are directly written in the Spark code.
2. executor-memory calculation
Calculation formula:
val executorMem = args.executorMemory + executorMemoryOverhead
Suppose executor- is X (integer, unit is M), ie
1) If spark.yarn.executor.memoryOverhead is not set,
executorMem= X+max(X*0.1,384)
2) If spark.yarn.executor.memoryOverhead is set (integer, the unit is M)
executorMem=X +spark.yarn.executor.memoryOverhead
Conditions that need to be met:
executorMem< yarn.scheduler.maximum-allocation-mb
Note: The above code is in Client.scala.
In this example:
6144=X+max(X*0.1,384)
X=5585.45
Rounded up to 5586M, that is, the maximum memory can be set to 5586M.