spark source code parsing memory manager --MemoryManager

MemoryManager memory manager

The memory manager is arguably one of the most important spark kernel module basis, when ordering the shuffle, rdd cache, memory expansion, broadcast variable storage operating results of the Task, etc., all need to use memory where memory management is required to is fixed application. I think that the main role of memory manager is to reduce as much as possible out of memory while improving memory utilization. The old version of the spark of static memory management is a memory manager StaticMemoryManager, and the new version (should be from 1.6 after it, remember) is turned into a unified memory manager UnifiedMemoryManager, the same memory management with respect to the static memory manager the biggest difference is that there is no clear boundary between the two execution memory and storage memory, you can borrow from each other, but higher priority to perform memory, that memory is not enough if the execution will be diverted to storage memory, then will be part of rdd overflow cache to disk to free up enough space. However, the implementation of memory in any case will not be diverted, think this is understandable, after all, perform the shuffle memory is used for sorting, this can only be done in memory, and cache rdd requirements are not so strict.
Several control parameters used in a proportion of each part of the memory,

  • spark.memory.fraction, the default value of 0.6, this parameter controls the memory footprint of memory stored spark Memory Manager ratio (precisely: heap memory -300m, 300m set aside for perpetual generations), that perform memory and only (heap memory -300m) 0.6 storage memory add up the remaining 0.4 memory footprint for the process of executing user code, for example, your code may load some of the larger file into memory, or do some sorting, the user code is not used by the memory manager of memory management, it is necessary to reserve a certain proportion.
  • spark.memory.storageFraction, the default value of 0.5, as the name implies, determines the proportion of the value stored in memory, note the memory manager manages accounting proportions that part of the memory, the remaining portion is used as execution memory. For example, by default, a storage memory heap memory representing the ratio is 0.6 * 0.5 = 0.3 (proportion of course precisely the heap memory -300m).

MemoryManager Overview

We first look at the overall MemoryManager this class,


Can be found inside the method MemoryManager relatively small and there is a pattern, it will be divided into three types of memory in function: StorageMemory, UnrollMemory, ExecutionMemory,
for the three memory an application memory and a method of releasing memory, respectively, and Three methods are abstract application memory, implemented by subclasses.
In addition, we look at what MemoryManager internal member variables:

    protected val onHeapStorageMemoryPool = new StorageMemoryPool(this, MemoryMode.ON_HEAP)
    protected val offHeapStorageMemoryPool = new StorageMemoryPool(this, MemoryMode.OFF_HEAP)
    protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, MemoryMode.ON_HEAP)
    protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, MemoryMode.OFF_HEAP)

The four member variables representing the four kinds of memory pool. It is noted here, MemoryPool has a configuration wherein a parameter of type Object methods for synchronization lock, MemoryPool inside the subject will acquire a lock for synchronization.
We look at their initialization:

    offHeapExecutionMemoryPool.incrementPoolSize(maxOffHeapMemory - offHeapStorageMemory)


In fact, call the relevant method of ExecutionMemoryPool,

  def releaseExecutionMemory(
      numBytes: Long,
      taskAttemptId: Long,
      memoryMode: MemoryMode): Unit = synchronized {
    memoryMode match {
      case MemoryMode.ON_HEAP => onHeapExecutionMemoryPool.releaseMemory(numBytes, taskAttemptId)
      case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.releaseMemory(numBytes, taskAttemptId)


Code logic is very simple, not to say.
In fact, from this method, we can probably see, meaning spark memory management, memory management actually spark the final analysis is the amount of memory used to record and manage, but not really as allocate memory as the operating system or jvm and recycling.

def releaseMemory(numBytes: Long, taskAttemptId: Long): Unit = lock.synchronized {
// 从内部的簿记量中获取该任务使用的内存
val curMem = memoryForTask.getOrElse(taskAttemptId, 0L)
// 检查要释放的内存是否超过了该任务实际使用的内存,并打印告警日志
var memoryToFree = if (curMem < numBytes) {
    s"Internal error: release called on $numBytes bytes but task only has $curMem bytes " +
      s"of memory from the $poolName pool")
} else {
if (memoryForTask.contains(taskAttemptId)) {
  // 更新簿记量
  memoryForTask(taskAttemptId) -= memoryToFree
  // 如果该任务的内存使用量小于等于0,那么从簿记量中移除该任务
  if (memoryForTask(taskAttemptId) <= 0) {
// 最后通知其他等待的线程
// 因为可能会有其他的任务在等待获取执行内存
lock.notifyAll() // Notify waiters in acquireMemory() that memory has been freed


The implementation of memory on the heap memory and execute direct memory of the memory used by the task are freed,
onHeapExecutionMemoryPool and offHeapExecutionMemoryPool is the same class, just a memory for recording performed using direct memory, using a memory execution of heap memory record .

private[memory] def releaseAllExecutionMemoryForTask(taskAttemptId: Long): Long = synchronized {
onHeapExecutionMemoryPool.releaseAllMemoryForTask(taskAttemptId) +


For the record used to store memory and the memory is not performed so fine, it does not record how much memory each RDD

def releaseStorageMemory(numBytes: Long, memoryMode: MemoryMode): Unit = synchronized {
memoryMode match {
  case MemoryMode.ON_HEAP => onHeapStorageMemoryPool.releaseMemory(numBytes)
  case MemoryMode.OFF_HEAP => offHeapStorageMemoryPool.releaseMemory(numBytes)


Here, we look at the methods of memory expansion is released, we found that expanded memory is the storage memory usage. Recall BlockManager section, expand mainly in the application memory will need the data temporarily in memory through MemoryStore storing data into blocks, requiring the application to expand the memory.

final def releaseUnrollMemory(numBytes: Long, memoryMode: MemoryMode): Unit = synchronized {
releaseStorageMemory(numBytes, memoryMode)


Several methods to release memory from the above analysis is not difficult to see that the so-called free up memory actually just change some of the bookkeeping amount of internal memory manager, which requires an external caller must ensure that they do emit so much memory, otherwise it will be a great memory management and deviation of the actual memory usage occurs. Of course, the good news is a memory manager module inside the spark is not open to the user, so the user does not call code memory management module.


We talked about the opening, spark the memory manager is divided into two, and the new version of the default is to use unified memory manager UnifiedMemoryManager, behind the static memory manager will gradually enabled, so here we also focuses on unified memory management.
Earlier, we analyzed several parent class MemoryManager release of memory, memory and apply several methods are abstract methods, implementation of these methods are in the subclass, which is UnifiedMemoryManager implemented.


This method is used to apply an execution memory. Where several local approach, maybeGrowExecutionPool squeeze method for storing memory to expand the memory space performed;
computeMaxExecutionPoolSize execution method used to calculate the maximum memory size.
The last method is called executionPool.acquireMemory actual application execution memory.

override private[memory] def acquireExecutionMemory(
  numBytes: Long,
  taskAttemptId: Long,
  memoryMode: MemoryMode): Long = synchronized {
// 检查内存大小是否正确
assert(numBytes >= 0)
// 根据堆内存还是直接内存决定使用不同的内存池和内存大小
val (executionPool, storagePool, storageRegionSize, maxMemory) = memoryMode match {
  case MemoryMode.ON_HEAP => (
  case MemoryMode.OFF_HEAP => (

 * Grow the execution pool by evicting cached blocks, thereby shrinking the storage pool.
 * When acquiring memory for a task, the execution pool may need to make multiple
 * attempts. Each attempt must be able to evict storage in case another task jumps in
 * and caches a large block between the attempts. This is called once per attempt.
// 通过挤占存储内存来扩张执行内存,
// 通过将缓存的块溢写到磁盘上,从而为执行内存腾出空间
def maybeGrowExecutionPool(extraMemoryNeeded: Long): Unit = {
  if (extraMemoryNeeded > 0) {
    // There is not enough free memory in the execution pool, so try to reclaim memory from
    // storage. We can reclaim any free memory from the storage pool. If the storage pool
    // has grown to become larger than `storageRegionSize`, we can evict blocks and reclaim
    // the memory that storage has borrowed from execution.
    // 我们可以将剩余的存储内存都借过来用作执行内存
    // 另外,如果存储内存向执行内存借用了一部分内存,也就是说此时存储内存的实际大小大于配置的值
    // 那么我们就将所有的借用的存储内存都还回来
    val memoryReclaimableFromStorage = math.max(
      storagePool.poolSize - storageRegionSize)
    if (memoryReclaimableFromStorage > 0) {
      // Only reclaim as much space as is necessary and available:
      // 只腾出必要大小的内存空间,这个方法会将内存中的block挤到磁盘中
      val spaceToReclaim = storagePool.freeSpaceToShrinkPool(
        math.min(extraMemoryNeeded, memoryReclaimableFromStorage))
      // 更新一些簿记量,存储内存少了这么多内存,相应的执行内存增加了这么多内存

 * The size the execution pool would have after evicting storage memory.
 * The execution memory pool divides this quantity among the active tasks evenly to cap
 * the execution memory allocation for each task. It is important to keep this greater
 * than the execution pool size, which doesn't take into account potential memory that
 * could be freed by evicting storage. Otherwise we may hit SPARK-12155.
 * Additionally, this quantity should be kept below `maxMemory` to arbitrate fairness
 * in execution memory allocation across tasks, Otherwise, a task may occupy more than
 * its fair share of execution memory, mistakenly thinking that other tasks can acquire
 * the portion of storage memory that cannot be evicted.
def computeMaxExecutionPoolSize(): Long = {
  maxMemory - math.min(storagePool.memoryUsed, storageRegionSize)

  numBytes, taskAttemptId, maybeGrowExecutionPool, () => computeMaxExecutionPoolSize)


The code for this method, I will not put up, the main memory is to calculate the number of complex rules apply, as well as internal bookkeeping amount of maintenance, in addition to the existing if the amount of available memory is too small, it will wait (waiting by the object lock) until the other tasks to free up some memory;
in addition to the most important thing is to call to maybeGrowExecutionPool above-mentioned method, so we still look maybeGrowExecutionPool focus method.


Since this method has already been posted, and marked with a very detailed notes, so skip code logic, which has a critical call storagePool.freeSpaceToShrinkPool, this method enables the memory to the extruded block logic.


We found call memoryStore.evictBlocksToFreeSpace method,

def freeSpaceToShrinkPool(spaceToFree: Long): Long = lock.synchronized {
    val spaceFreedByReleasingUnusedMemory = math.min(spaceToFree, memoryFree)
    val remainingSpaceToFree = spaceToFree - spaceFreedByReleasingUnusedMemory
    if (remainingSpaceToFree > 0) {
      // If reclaiming free memory did not adequately shrink the pool, begin evicting blocks:
      val spaceFreedByEviction =
        memoryStore.evictBlocksToFreeSpace(None, remainingSpaceToFree, memoryMode)
      // When a block is released, BlockManager.dropFromMemory() calls releaseMemory(), so we do
      // not need to decrement _memoryUsed here. However, we do need to decrement the pool size.
      spaceFreedByReleasingUnusedMemory + spaceFreedByEviction
    } else {


This method appears to be very long, in fact, probably it can be summarized as that.
Because MemoryStore stores actual data of all blocks in memory, it is possible to know the exact size of each block based on the information, so that we can calculate the required course, this process there are some details of the process in which the extrusion block, such as block write lock acquisition and release, and so on.
There, the actual release from memory block (essentially corresponding to the data block MemoryEntry references set to null, so that the block can be recovered gc) blockEvictionHandler.dropFromMemory function code implemented in the method, i.e.
BlockManager. dropFromMemory.

private[spark] def evictBlocksToFreeSpace(
  blockId: Option[BlockId],
  space: Long,
  memoryMode: MemoryMode): Long = {
assert(space > 0)
memoryManager.synchronized {
  var freedMemory = 0L
  val rddToAdd = blockId.flatMap(getRddId)
  val selectedBlocks = new ArrayBuffer[BlockId]
  def blockIsEvictable(blockId: BlockId, entry: MemoryEntry[_]): Boolean = {
    entry.memoryMode == memoryMode && (rddToAdd.isEmpty || rddToAdd != getRddId(blockId))
  // This is synchronized to ensure that the set of entries is not changed
  // (because of getValue or getBytes) while traversing the iterator, as that
  // can lead to exceptions.
  entries.synchronized {
    val iterator = entries.entrySet().iterator()
    while (freedMemory < space && iterator.hasNext) {
      val pair =
      val blockId = pair.getKey
      val entry = pair.getValue
      if (blockIsEvictable(blockId, entry)) {
        // We don't want to evict blocks which are currently being read, so we need to obtain
        // an exclusive write lock on blocks which are candidates for eviction. We perform a
        // non-blocking "tryLock" here in order to ignore blocks which are locked for reading:
        // 这里之所以要获取写锁是为了防止在块正在被读取或写入的时候将其挤出去
        if (blockInfoManager.lockForWriting(blockId, blocking = false).isDefined) {
          selectedBlocks += blockId
          freedMemory += pair.getValue.size

  def dropBlock[T](blockId: BlockId, entry: MemoryEntry[T]): Unit = {
    val data = entry match {
      case DeserializedMemoryEntry(values, _, _) => Left(values)
      case SerializedMemoryEntry(buffer, _, _) => Right(buffer)
    // 这里的调用将块挤出内存,如果允许写到磁盘则溢写到磁盘上
    // 注意blockEvictionHandler的实现类就是BlockManager
    val newEffectiveStorageLevel =
      blockEvictionHandler.dropFromMemory(blockId, () => data)(entry.classTag)
    if (newEffectiveStorageLevel.isValid) {
      // The block is still present in at least one store, so release the lock
      // but don't delete the block info
      // 因为前面获取了这些块的写锁,还没有释放,
      // 所以在这里释放这些块的写锁
    } else {
      // The block isn't present in any store, so delete the block info so that the
      // block can be stored again
      // 因为块由于从内存中移除又没有写到磁盘上,所以直接从内部的簿记量中移除该块的信息

  // 如果腾出的内存足够多,比申请的量要大,这时才会真正释放相应的块
  if (freedMemory >= space) {
    var lastSuccessfulBlock = -1
    try {
      logInfo(s"${selectedBlocks.size} blocks selected for dropping " +
        s"(${Utils.bytesToString(freedMemory)} bytes)")
      (0 until selectedBlocks.size).foreach { idx =>
        val blockId = selectedBlocks(idx)
        val entry = entries.synchronized {
        // This should never be null as only one task should be dropping
        // blocks and removing entries. However the check is still here for
        // future safety.
        if (entry != null) {
          dropBlock(blockId, entry)
          // 这时为测试留的一个钩子方法
        lastSuccessfulBlock = idx
      logInfo(s"After dropping ${selectedBlocks.size} blocks, " +
        s"free memory is ${Utils.bytesToString(maxMemory - blocksMemoryUsed)}")
    } finally {
      // like BlockManager.doPut, we use a finally rather than a catch to avoid having to deal
      // with InterruptedException
      // 如果不是所有的块都转移成功,那么必然有的块的写锁可能没有释放
      // 所以在这里将这些没有移除成功的块的写锁释放掉
      if (lastSuccessfulBlock != selectedBlocks.size - 1) {
        // the blocks we didn't process successfully are still locked, so we have to unlock them
        (lastSuccessfulBlock + 1 until selectedBlocks.size).foreach { idx =>
          val blockId = selectedBlocks(idx)
  } else {// 如果不能腾出足够多的内存,那么取消这次行动,释放所有已经持有的块的写锁
    blockId.foreach { id =>
      logInfo(s"Will not store $id")
    selectedBlocks.foreach { id =>


Summarize the main logic of this approach:

  • If the storage level allows memory to disk, then the first overflow to disk
  • The block is removed from the map off the internal structure of the MemoryStore
  • BlockManagerMaster report updates to the driver block
  • Report statistics block updates to the task measurement system

So, around seven of eight around, so spare a large circle, in fact, the so-called crowding out of memory, in fact, the reference to null ^ _ ^ Of course, it is certainly not so simple, in fact, throughout the course of the analysis we can find the so-called memory management tasks most of the work is to use some of the bookkeeping amount of memory management and maintenance, and there are some of the more complex logic, such as how many computational logic memory allocated to each task is more complicated.

private[storage] override def dropFromMemory[T: ClassTag](
  blockId: BlockId,
  data: () => Either[Array[T], ChunkedByteBuffer]): StorageLevel = {
logInfo(s"Dropping block $blockId from memory")
val info = blockInfoManager.assertBlockIsLockedForWriting(blockId)
var blockIsUpdated = false
val level = info.level

// Drop to disk, if storage level requires
// 如果存储级别允许存到磁盘,那么先溢写到磁盘上
if (level.useDisk && !diskStore.contains(blockId)) {
  logInfo(s"Writing block $blockId to disk")
  data() match {
    case Left(elements) =>
      diskStore.put(blockId) { channel =>
        val out = Channels.newOutputStream(channel)
    case Right(bytes) =>
      diskStore.putBytes(blockId, bytes)
  blockIsUpdated = true

// Actually drop from memory store
val droppedMemorySize =
  if (memoryStore.contains(blockId)) memoryStore.getSize(blockId) else 0L
val blockIsRemoved = memoryStore.remove(blockId)
if (blockIsRemoved) {
  blockIsUpdated = true
} else {
  logWarning(s"Block $blockId could not be dropped from memory as it does not exist")

val status = getCurrentBlockStatus(blockId, info)
if (info.tellMaster) {
  reportBlockStatus(blockId, status, droppedMemorySize)
// 向任务度量系统汇报块更新的统计信息
if (blockIsUpdated) {
  addUpdatedBlockStatusToTaskMetrics(blockId, status)


Let us look at the application to the storage memory.
Wherein the storage memory to a relatively simple logic execution memory borrowed, the only two pool size change it, perform certain memory pool to reduce the size of the storage memory corresponding increase in the size of the pool.

override def acquireStorageMemory(
  blockId: BlockId,
  numBytes: Long,
  memoryMode: MemoryMode): Boolean = synchronized {
assert(numBytes >= 0)
val (executionPool, storagePool, maxMemory) = memoryMode match {
  case MemoryMode.ON_HEAP => (
  case MemoryMode.OFF_HEAP => (
// 因为执行内存挤占不了,所以这里如果申请的内存超过现在可用的内存,那么就申请不了了
if (numBytes > maxMemory) {
  // Fail fast if the block simply won't fit
  logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +
    s"memory limit ($maxMemory bytes)")
  return false
// 如果大于存储内存的可用内存,那么就需要向执行内存借用一部分内存
if (numBytes > storagePool.memoryFree) {
  // There is not enough free memory in the storage pool, so try to borrow free memory from
  // the execution pool.
  val memoryBorrowedFromExecution = Math.min(executionPool.memoryFree,
    numBytes - storagePool.memoryFree)
  // 存储内存向执行内存借用的逻辑很简单,
  // 仅仅是将两个内存池的大小改一下,
  // 执行内存池减少一定的大小,存储内存池则增加相应的大小
// 通过storagePool申请一定量的内存
storagePool.acquireMemory(blockId, numBytes)


def acquireMemory(
  blockId: BlockId,
  numBytesToAcquire: Long,
  numBytesToFree: Long): Boolean = lock.synchronized {
assert(numBytesToAcquire >= 0)
assert(numBytesToFree >= 0)
assert(memoryUsed <= poolSize)
// 首先调用MemoryStore的相关方法挤出一些块以释放内存
if (numBytesToFree > 0) {
  memoryStore.evictBlocksToFreeSpace(Some(blockId), numBytesToFree, memoryMode)
// NOTE: If the memory store evicts blocks, then those evictions will synchronously call
// back into this StorageMemoryPool in order to free memory. Therefore, these variables
// should have been updated.
// 因为前面挤出一些块后释放内存时,BlockManager会通过MemoryManager相关方法更新内部的簿记量,
// 所以这里的memoryFree就会变化,会变大
val enoughMemory = numBytesToAcquire <= memoryFree
if (enoughMemory) {
  _memoryUsed += numBytesToAcquire

It can be seen here called memoryStore.evictBlocksToFreeSpace extrusion method in terms of memory block portion, in order to make room for the new block.


Another application for expanded memory, the actual application that is stored in memory.

override def acquireUnrollMemory(
  blockId: BlockId,
  numBytes: Long,
  memoryMode: MemoryMode): Boolean = synchronized {
acquireStorageMemory(blockId, numBytes, memoryMode)

to sum up

Memory management, memory essentially bookkeeping shuffle sorting process used rdd cache memory and used by the precise details of recording and managing memory usage, the maximum to avoid the OOM, while to maximize memory utilization.

Guess you like