The book is linked to the above: https://blog.csdn.net/wisgood/article/details/78069753
This article mainly explains how 879.0MB is calculated? Spark is using version 1.6.
The parameters of the corresponding program are set to
spark-shell --executor-memory 1536M
Storage Memory
The Storage Memory displayed on this page is actually the sum of the Storage Memory and Execution Memory described above, and is the sum of the memory pools managed by Spark.
which is
Storage Memory =(executorMemory-300m)*0.75
According to the above formula, the calculated Storage Memory=927M (1536-300)*0.75
, which is much larger than the displayed 879. Why?
其实程序在计算的时候,用的Storage Memory是通过Runtime.getRuntime.maxMemory拿到的,Runtime.getRuntime.maxMemory是程序能够使用的最大内存,会比executorMemory值小。原因是java新生代中,有2个Survivor,而只有1个是可用的,所以Runtime.getRuntime.maxMemory实际=Eden+Survivor+Old Gen,比设置的内存要小。
In order to accurately calculate Runtime.getRuntime.maxMemory
this value, we add some other parameters when spark starts,
--conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC"
The start command becomes
spark-shell --executor-memory 1536M --conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC"
The purpose of this is to print part of the gc log during each executor execution. Through this, we can clearly know the situation of the new generation and the old generation in the executor memory. As shown below
From this we can calculate Runtime.getRuntime.maxMemory
Runtime.getRuntime.maxMemory=393216 K+65536 K+1048576 K=1472 M
At this point, calculate the Storage Memory displayed on the hadoop page
Storage Memory = 879 M = (1472-300)*0.75
The article is the author's original article, if it is helpful to you, welcome to reward!