[spark] Memory management MemoryManager parsing

Overview

There are two solutions for spark memory management. The classes corresponding to the old and new solutions are UnifiedMemoryManager and StaticMemoryManager.

The old solution is static. The memory owned by storageMemory and executionMemory is exclusive and cannot be borrowed from each other. Therefore, if one party has sufficient memory and the other party has insufficient memory but cannot borrow it, it will cause resource loss. waste. The new solution is managed in a unified way. The initial state is that the memory accounts for half of each, but when one of them has insufficient memory, it can borrow from the other, so as to make reasonable and effective use of memory resources and improve the utilization rate of the overall resources.

In general, memory is divided into three blocks, including storageMemory, executionMemory, and system reservation. Among them, storageMemory is used to cache rdd, unroll partition, store direct task results, broadcast variables, and store blocks of each batch in Spark Streaming receiver mode. . executionMemory is used for caching in shuffle, join, sort, aggregation. Memory other than these two is reserved for the system.

The old solution StaticMemoryManager

A memoryManager is created in SparkEnv:

val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
    val memoryManager: MemoryManager =
      if (useLegacyMemoryManager) {
        new StaticMemoryManager(conf, numUsableCores)
      } else {
        UnifiedMemoryManager(conf, numUsableCores)
      }

The unified management scheme UnifiedMemoryManager is used by default. Here we briefly look at the old scheme StaticMemoryManager.

The memory that storageMemory can allocate is:

systemMaxMemory * memoryFraction * safetyFraction

in:

  • systemMaxMemory : Runtime.getRuntime.maxMemory, which is the maximum memory space that the JVM can obtain.
  • memoryFraction: controlled by the parameter spark.storage.memoryFraction, the default is 0.6.
  • safetyFraction: controlled by the parameter spark.storage.safetyFraction, the default is 0.9, because cache blocks are all estimated, so a safety factor is required to ensure safety.

The memory that executionMemory can allocate is:

systemMaxMemory * memoryFraction * safetyFraction

in:

  • systemMaxMemory : Runtime.getRuntime.maxMemory, which is the maximum memory space that the JVM can obtain.
  • memoryFraction: controlled by the parameter spark.shuffle.memoryFraction, the default is 0.2.
  • safetyFraction: controlled by the parameter spark.shuffle.safetyFraction, the default is 0.8.

The memory outside the memoryFraction coefficient and the safety factor is reserved for the system.

The memory allocated by executionMemory directly affects the frequency of spills in shuffle. Increasing executionMemory can reduce the number of spills, but the capacity of storageMemory can also be reduced accordingly.

After execution and storage are allocated to memory, the size remains unchanged. Each time you apply for memory, you can only apply for your own unique memory and cannot borrow from each other, which will cause a waste of resources. In addition, only execution memory supports off heap, and storage memory does not support off heap.

New solution UnifiedMemoryManager

Since storageMemory and executionMemory are managed uniformly in the new solution, let's see how much memory the two can get in total.

private def getMaxMemory(conf: SparkConf): Long = {
    val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
    val reservedMemory = conf.getLong("spark.testing.reservedMemory",
      if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
    val minSystemMemory = (reservedMemory * 1.5).ceil.toLong
    if (systemMemory < minSystemMemory) {
      throw new IllegalArgumentException(s"System memory $systemMemory must " +
        s"be at least $minSystemMemory. Please increase heap size using the --driver-memory " +
        s"option or spark.driver.memory in Spark configuration.")
    }
    // SPARK-12759 Check executor memory to fail fast if memory is insufficient
    if (conf.contains("spark.executor.memory")) {
      val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
      if (executorMemory < minSystemMemory) {
        throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
          s"$minSystemMemory. Please increase executor memory using the " +
          s"--executor-memory option or spark.executor.memory in Spark configuration.")
      }
    }
    val usableMemory = systemMemory - reservedMemory
    val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
    (usableMemory * memoryFraction).toLong
  }

First, reserve 300M for the system memory reservedMemory. If the maximum memory that the jvm can get and the configured executor memory are not enough to 1.5 times the reservedMemory, that is, 450M, an exception will be thrown. Finally, the memory that can be obtained by storage and execution is:

 (heap space - 300) * spark.memory.fraction (默认为0.6

Storage and execution each account for 50% of the acquired memory.

Apply for storage memory

Apply for numBytes of memory for a blockId:

override def acquireStorageMemory(
      blockId: BlockId,
      numBytes: Long,
      memoryMode: MemoryMode): Boolean = synchronized {
    assertInvariants()
    assert(numBytes >= 0)
    val (executionPool, storagePool, maxMemory) = memoryMode match {
      case MemoryMode.ON_HEAP => (
        onHeapExecutionMemoryPool,
        onHeapStorageMemoryPool,
        maxOnHeapStorageMemory)
      case MemoryMode.OFF_HEAP => (
        offHeapExecutionMemoryPool,
        offHeapStorageMemoryPool,
        maxOffHeapMemory)
    }
    // 申请的内存大于storage和execution内存之和
    if (numBytes > maxMemory) {
      // Fail fast if the block simply won't fit
      logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +
        s"memory limit ($maxMemory bytes)")
      return false
    }
    // 大于storage空闲内存
    if (numBytes > storagePool.memoryFree) {
      // There is not enough free memory in the storage pool, so try to borrow free memory from
      // the execution pool.
      val memoryBorrowedFromExecution = Math.min(executionPool.memoryFree, numBytes)
      executionPool.decrementPoolSize(memoryBorrowedFromExecution)
      storagePool.incrementPoolSize(memoryBorrowedFromExecution)
    }
    storagePool.acquireMemory(blockId, numBytes)
  }
  • If the requested numBytes is larger than the total memory of the two, false is returned directly, indicating that the application failed.
  • If numBytes is larger than the free memory of storage, you need to borrow from the executionPool
    • The borrowed size is the smaller value of the free memory of execution and numBytes at this time (personal opinion should be the smaller value of (numBytes-storage free memory))
    • Reduce the poolSize of the execution
    • Increase the poolSize of storage

Even if the memory is borrowed from the executionPool, it is not necessarily enough numBytes, because it is impossible to take over all the memory being used by the execution, and then the acquireMemory method of the storagePool is called to release the rdd of the cache in the storage pool when there is not enough numBytes to increase the storagePool. The value of .memoryFree:

def acquireMemory(blockId: BlockId, numBytes: Long): Boolean = lock.synchronized {
    val numBytesToFree = math.max(0, numBytes - memoryFree)
    acquireMemory(blockId, numBytes, numBytesToFree)
  }

Calculate how much memory is left to meet numBytes after borrowing memory from execution, that is, the memory that needs to be released numBytesToFree. Then call the acquireMemory method:

def acquireMemory(
      blockId: BlockId,
      numBytesToAcquire: Long,
      numBytesToFree: Long): Boolean = lock.synchronized {
    assert(numBytesToAcquire >= 0)
    assert(numBytesToFree >= 0)
    assert(memoryUsed <= poolSize)
    if (numBytesToFree > 0) {
      memoryStore.evictBlocksToFreeSpace(Some(blockId), numBytesToFree, memoryMode)
    }
    // NOTE: If the memory store evicts blocks, then those evictions will synchronously call
    // back into this StorageMemoryPool in order to free memory. Therefore, these variables
    // should have been updated.
    val enoughMemory = numBytesToAcquire <= memoryFree
    if (enoughMemory) {
      _memoryUsed += numBytesToAcquire
    }
    enoughMemory
  }

When numBytesToFree is greater than 0, it is really necessary to release the block cached in memory, and then check whether the free memory can satisfy numBytes after the release. If it is satisfied, add numBytes to the used variable.

See how the block is released when it needs to be released from the storey:

private[spark] def evictBlocksToFreeSpace(
      blockId: Option[BlockId],
      space: Long,
      memoryMode: MemoryMode): Long = {
    assert(space > 0)
    memoryManager.synchronized {
      var freedMemory = 0L
      val rddToAdd = blockId.flatMap(getRddId)
      val selectedBlocks = new ArrayBuffer[BlockId]
      def blockIsEvictable(blockId: BlockId, entry: MemoryEntry[_]): Boolean = {
        entry.memoryMode == memoryMode && (rddToAdd.isEmpty || rddToAdd != getRddId(blockId))
      }
      // This is synchronized to ensure that the set of entries is not changed
      // (because of getValue or getBytes) while traversing the iterator, as that
      // can lead to exceptions.
      entries.synchronized {
        val iterator = entries.entrySet().iterator()
        while (freedMemory < space && iterator.hasNext) {
          val pair = iterator.next()
          val blockId = pair.getKey
          val entry = pair.getValue
          if (blockIsEvictable(blockId, entry)) {
            // We don't want to evict blocks which are currently being read, so we need to obtain
            // an exclusive write lock on blocks which are candidates for eviction. We perform a
            // non-blocking "tryLock" here in order to ignore blocks which are locked for reading:
            if (blockInfoManager.lockForWriting(blockId, blocking = false).isDefined) {
              selectedBlocks += blockId
              freedMemory += pair.getValue.size
            }
          }
        }
      }

      def dropBlock[T](blockId: BlockId, entry: MemoryEntry[T]): Unit = {
        val data = entry match {
          case DeserializedMemoryEntry(values, _, _) => Left(values)
          case SerializedMemoryEntry(buffer, _, _) => Right(buffer)
        }
        val newEffectiveStorageLevel =
          blockEvictionHandler.dropFromMemory(blockId, () => data)(entry.classTag)
        if (newEffectiveStorageLevel.isValid) {
          // The block is still present in at least one store, so release the lock
          // but don't delete the block info
          blockInfoManager.unlock(blockId)
        } else {
          // The block isn't present in any store, so delete the block info so that the
          // block can be stored again
          blockInfoManager.removeBlock(blockId)
        }
      }

      if (freedMemory >= space) {
        logInfo(s"${selectedBlocks.size} blocks selected for dropping " +
          s"(${Utils.bytesToString(freedMemory)} bytes)")
        for (blockId <- selectedBlocks) {
          val entry = entries.synchronized { entries.get(blockId) }
          // This should never be null as only one task should be dropping
          // blocks and removing entries. However the check is still here for
          // future safety.
          if (entry != null) {
            dropBlock(blockId, entry)
          }
        }
        logInfo(s"After dropping ${selectedBlocks.size} blocks, " +
          s"free memory is ${Utils.bytesToString(maxMemory - blocksMemoryUsed)}")
        freedMemory
      } else {
        blockId.foreach { id =>
          logInfo(s"Will not store $id")
        }
        selectedBlocks.foreach { id =>
          blockInfoManager.unlock(id)
        }
        0L
      }
    }
  }

The blocks in memory in spark are all stored by memoryStore, using

private val entries = new LinkedHashMap[BlockId, MemoryEntry[_]](32, 0.75f, true)

To maintain the association between blockId and MemoryEntry (corresponding to the packaging of value), two methods are also defined in the method. The blockIsEvictable method is to judge whether the traversed blockId and the current blockId belong to the same rdd, because it is impossible to propose another method of the same rdd. a block. The dropBlock method is to actually remove the block from memory. If the StorageLevel includes the use of disk, it will be written to the disk file.

A simple overview of the logic of the entire code is: traverse each block stored in the current memoryStore (not belonging to the same rdd as the currently requested block), and stop traversing until the sum of the memory corresponding to the block is greater than the memory to be released. It may not be able to meet the required memory after traversing. If the memory that can be released meets the required memory, the removal is actually performed, otherwise it is not removed, because it is impossible for a block to be part of the memory and part of the disk, and finally return the memory released by the block that is actually removed.

Summarize the process of requesting memory from StorageMemory (in MemoryMode.ON_HEAP mode):

  • If numBytes is greater than the sum of storage and execution memory, an exception is thrown.
  • If numBytes is greater than the storage free memory, borrow min (executionFree, numBytes) memory from execution, and update the respective poolSize.
  • If the application is not enough, release the block in the storage to make up.
    • If the block size of the memoryStore cache meets the size that needs to be supplemented, the culling is actually performed (traversing the block until the memory meets the block corresponding to the demand), otherwise it is not culled.
  • Finally, if the free memory meets numBytes, it returns true, otherwise it returns false.

Apply for execution memory

When the execution memory is insufficient to borrow from the storage, it can be borrowed as much as the required memory is still not met. Let's see how it is handled when you need to request memory from execution (in MemoryMode.ON_HEAP mode):

override private[memory] def acquireExecutionMemory(
      numBytes: Long,
      taskAttemptId: Long,
      memoryMode: MemoryMode): Long = synchronized {
    assertInvariants()
    assert(numBytes >= 0)
    val (executionPool, storagePool, storageRegionSize, maxMemory) = memoryMode match {
      case MemoryMode.ON_HEAP => (
        onHeapExecutionMemoryPool,
        onHeapStorageMemoryPool,
        onHeapStorageRegionSize,
        maxHeapMemory)
      case MemoryMode.OFF_HEAP => (
        offHeapExecutionMemoryPool,
        offHeapStorageMemoryPool,
        offHeapStorageMemory,
        maxOffHeapMemory)
    }

    /**
     * Grow the execution pool by evicting cached blocks, thereby shrinking the storage pool.
     *
     * When acquiring memory for a task, the execution pool may need to make multiple
     * attempts. Each attempt must be able to evict storage in case another task jumps in
     * and caches a large block between the attempts. This is called once per attempt.
     */
    def maybeGrowExecutionPool(extraMemoryNeeded: Long): Unit = {
      if (extraMemoryNeeded > 0) {
        // There is not enough free memory in the execution pool, so try to reclaim memory from
        // storage. We can reclaim any free memory from the storage pool. If the storage pool
        // has grown to become larger than `storageRegionSize`, we can evict blocks and reclaim
        // the memory that storage has borrowed from execution.
        val memoryReclaimableFromStorage = math.max(
          storagePool.memoryFree,
          storagePool.poolSize - storageRegionSize)
        if (memoryReclaimableFromStorage > 0) {
          // Only reclaim as much space as is necessary and available:
          val spaceToReclaim = storagePool.freeSpaceToShrinkPool(
            math.min(extraMemoryNeeded, memoryReclaimableFromStorage))
          storagePool.decrementPoolSize(spaceToReclaim)
          executionPool.incrementPoolSize(spaceToReclaim)
        }
      }
    }

    /**
     * The size the execution pool would have after evicting storage memory.
     *
     * The execution memory pool divides this quantity among the active tasks evenly to cap
     * the execution memory allocation for each task. It is important to keep this greater
     * than the execution pool size, which doesn't take into account potential memory that
     * could be freed by evicting storage. Otherwise we may hit SPARK-12155.
     *
     * Additionally, this quantity should be kept below `maxMemory` to arbitrate fairness
     * in execution memory allocation across tasks, Otherwise, a task may occupy more than
     * its fair share of execution memory, mistakenly thinking that other tasks can acquire
     * the portion of storage memory that cannot be evicted.
     */
    def computeMaxExecutionPoolSize(): Long = {
      maxMemory - math.min(storagePool.memoryUsed, storageRegionSize)
    }

    executionPool.acquireMemory(
      numBytes, taskAttemptId, maybeGrowExecutionPool, computeMaxExecutionPoolSize)
  }

Here are the two methods:

maybeGrowExecutionPool is a method that needs to borrow memory from storage. The maximum memory that can be borrowed, memoryReclaimableFromStorage, is the larger value of the free memory of storage and the memory borrowed by storage from execution (that is, it has been used and must be released and returned). If memoryReclaimableFromStorage is 0, it means that The storage has not borrowed memory from the execution before, and the storage has no free memory to borrow at this time.

The final application for borrowing is the smaller value of the required memory and memoryReclaimableFromStorage (how much is missing), follow the storagePool.freeSpaceToShrinkPool method to see its implementation:

def freeSpaceToShrinkPool(spaceToFree: Long): Long = lock.synchronized {
    val spaceFreedByReleasingUnusedMemory = math.min(spaceToFree, memoryFree)
    val remainingSpaceToFree = spaceToFree - spaceFreedByReleasingUnusedMemory
    if (remainingSpaceToFree > 0) {
      // If reclaiming free memory did not adequately shrink the pool, begin evicting blocks:
      val spaceFreedByEviction =
        memoryStore.evictBlocksToFreeSpace(None, remainingSpaceToFree, memoryMode)
      // When a block is released, BlockManager.dropFromMemory() calls releaseMemory(), so we do
      // not need to decrement _memoryUsed here. However, we do need to decrement the pool size.
      spaceFreedByReleasingUnusedMemory + spaceFreedByEviction
    } else {
      spaceFreedByReleasingUnusedMemory
    }
  }

If the free memory of the storage is not enough for the requested memory, it needs to be supplemented by releasing the block cached in the storage.

The method computeMaxExecutionPoolSize calculates the maximum available memory for the execution.

Then the method executionPool.acquireMemory is called with these two functions as parameters:

private[memory] def acquireMemory(
      numBytes: Long,
      taskAttemptId: Long,
      maybeGrowPool: Long => Unit = (additionalSpaceNeeded: Long) => Unit,
      computeMaxPoolSize: () => Long = () => poolSize): Long = lock.synchronized {
    assert(numBytes > 0, s"invalid number of bytes requested: $numBytes")

    // TODO: clean up this clunky method signature

    // Add this task to the taskMemory map just so we can keep an accurate count of the number
    // of active tasks, to let other tasks ramp down their memory in calls to `acquireMemory`
    if (!memoryForTask.contains(taskAttemptId)) {
      memoryForTask(taskAttemptId) = 0L
      // This will later cause waiting tasks to wake up and check numTasks again
      lock.notifyAll()
    }

    // Keep looping until we're either sure that we don't want to grant this request (because this
    // task would have more than 1 / numActiveTasks of the memory) or we have enough free
    // memory to give it (we always let each task get at least 1 / (2 * numActiveTasks)).
    // TODO: simplify this to limit each task to its own slot
    while (true) {
      val numActiveTasks = memoryForTask.keys.size
      val curMem = memoryForTask(taskAttemptId)

      // In every iteration of this loop, we should first try to reclaim any borrowed execution
      // space from storage. This is necessary because of the potential race condition where new
      // storage blocks may steal the free execution memory that this task was waiting for.
      maybeGrowPool(numBytes - memoryFree)

      // Maximum size the pool would have after potentially growing the pool.
      // This is used to compute the upper bound of how much memory each task can occupy. This
      // must take into account potential free memory as well as the amount this pool currently
      // occupies. Otherwise, we may run into SPARK-12155 where, in unified memory management,
      // we did not take into account space that could have been freed by evicting cached blocks.
      val maxPoolSize = computeMaxPoolSize()
      val maxMemoryPerTask = maxPoolSize / numActiveTasks
      val minMemoryPerTask = poolSize / (2 * numActiveTasks)

      // How much we can grant this task; keep its share within 0 <= X <= 1 / numActiveTasks
      val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem))
      // Only give it as much memory as is free, which might be none if it reached 1 / numTasks
      val toGrant = math.min(maxToGrant, memoryFree)

      // We want to let each task get at least 1 / (2 * numActiveTasks) before blocking;
      // if we can't give it this much now, wait for other tasks to free up memory
      // (this happens if older tasks allocated lots of memory before N grew)
      if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
        logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName pool to be free")
        lock.wait()
      } else {
        memoryForTask(taskAttemptId) += toGrant
        return toGrant
      }
    }
    0L  // Never reached
  }

It defines the execution memory that a Task can use:

val maxPoolSize = computeMaxPoolSize()
      val maxMemoryPerTask = maxPoolSize / numActiveTasks
      val minMemoryPerTask = poolSize / (2 * numActiveTasks)

Among them, maxPoolSize is the maximum available memory of executionMemoryPool after borrowing memory from storage, which ensures that the memory available for a task is within the range of 1/2*numActiveTasks ~ 1/numActiveTasks, and the overall resource usage balance of each task is guaranteed.

The process of applying memory code to execution:

  1. First obtain the memory that the Task has currently allocated.

  2. When numBytes is greater than the execution free memory, the memory will be borrowed from storage through the maybeGrowPool method.

  3. The maximum memory maxToGrant that can be obtained is the smaller value of numBytes and (maxMemoryPerTask - curMem).

  4. This loop can obtain the real memory toGrant as the smaller value of maxToGrant and (the memory available after execution borrows from memory).

  5. If the memory that can be finally applied for is less than numBytes and the applied memory plus the original memory is not enough for a task to use the smallest memory minMemoryPerTask, it will block until there is enough memory or a new task comes in and reduces the value of minMemoryPerTask .
    Otherwise, the memory allocated this time is returned directly.

The way to apply memory to storage and execution and borrow memory from each other has been explained so far. There are many places where storage and execution memory are used (see the overview). Among them, the cache rdd will apply for memory from storage, and running Task will apply for memory from execution. Next, let's see when the application is made.

cache RDDs

final def iterator(split: Partition, context: TaskContext): Iterator[T] = {
    if (storageLevel != StorageLevel.NONE) {
      getOrCompute(split, context)
    } else {
      computeOrReadCheckpoint(split, context)
    }
  }

The data of each rdd partition is obtained through the corresponding iterator. If the storage level is not NONE, it will first try to obtain it from the storage medium (memory, disk file, etc.) After the calculation is completed, it is cached for direct access by subsequent calculations. Cache serialized and non-serialized data are cached differently. The code for non-serialized cache is:

memoryStore.putIteratorAsValues(blockId, iterator(), classTag)
 private[storage] def putIteratorAsValues[T](
      blockId: BlockId,
      values: Iterator[T],
      classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = {

    require(!contains(blockId), s"Block $blockId is already present in the MemoryStore")

    // Number of elements unrolled so far
    var elementsUnrolled = 0
    // Whether there is still enough memory for us to continue unrolling this block
    var keepUnrolling = true
    // Initial per-task memory to request for unrolling blocks (bytes).
    val initialMemoryThreshold = unrollMemoryThreshold
    // How often to check whether we need to request more memory
    val memoryCheckPeriod = 16
    // Memory currently reserved by this task for this particular unrolling operation
    var memoryThreshold = initialMemoryThreshold
    // Memory to request as a multiple of current vector size
    val memoryGrowthFactor = 1.5
    // Keep track of unroll memory used by this particular block / putIterator() operation
    var unrollMemoryUsedByThisBlock = 0L
    // Underlying vector for unrolling the block
    var vector = new SizeTrackingVector[T]()(classTag)

    // Request enough memory to begin unrolling
    keepUnrolling =
      reserveUnrollMemoryForThisTask(blockId, initialMemoryThreshold, MemoryMode.ON_HEAP)

    if (!keepUnrolling) {
      logWarning(s"Failed to reserve initial memory threshold of " +
        s"${Utils.bytesToString(initialMemoryThreshold)} for computing block $blockId in memory.")
    } else {
      unrollMemoryUsedByThisBlock += initialMemoryThreshold
    }

    // Unroll this block safely, checking whether we have exceeded our threshold periodically
    while (values.hasNext && keepUnrolling) {
      vector += values.next()
      if (elementsUnrolled % memoryCheckPeriod == 0) {
        // If our vector's size has exceeded the threshold, request more memory
        val currentSize = vector.estimateSize()
        if (currentSize >= memoryThreshold) {
          val amountToRequest = (currentSize * memoryGrowthFactor - memoryThreshold).toLong
          keepUnrolling =
            reserveUnrollMemoryForThisTask(blockId, amountToRequest, MemoryMode.ON_HEAP)
          if (keepUnrolling) {
            unrollMemoryUsedByThisBlock += amountToRequest
          }
          // New threshold is currentSize * memoryGrowthFactor
          memoryThreshold += amountToRequest
        }
      }
      elementsUnrolled += 1
    }

    if (keepUnrolling) {
      // We successfully unrolled the entirety of this block
      val arrayValues = vector.toArray
      vector = null
      val entry =
        new DeserializedMemoryEntry[T](arrayValues, SizeEstimator.estimate(arrayValues), classTag)
      val size = entry.size
      def transferUnrollToStorage(amount: Long): Unit = {
        // Synchronize so that transfer is atomic
        memoryManager.synchronized {
          releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, amount)
          val success = memoryManager.acquireStorageMemory(blockId, amount, MemoryMode.ON_HEAP)
          assert(success, "transferring unroll memory to storage memory failed")
        }
      }
      // Acquire storage memory if necessary to store this block in memory.
      val enoughStorageMemory = {
        if (unrollMemoryUsedByThisBlock <= size) {
          val acquiredExtra =
            memoryManager.acquireStorageMemory(
              blockId, size - unrollMemoryUsedByThisBlock, MemoryMode.ON_HEAP)
          if (acquiredExtra) {
            transferUnrollToStorage(unrollMemoryUsedByThisBlock)
          }
          acquiredExtra
        } else { // unrollMemoryUsedByThisBlock > size
          // If this task attempt already owns more unroll memory than is necessary to store the
          // block, then release the extra memory that will not be used.
          val excessUnrollMemory = unrollMemoryUsedByThisBlock - size
          releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, excessUnrollMemory)
          transferUnrollToStorage(size)
          true
        }
      }
      if (enoughStorageMemory) {
        entries.synchronized {
          entries.put(blockId, entry)
        }
        logInfo("Block %s stored as values in memory (estimated size %s, free %s)".format(
          blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed)))
        Right(size)
      } else {
        assert(currentUnrollMemoryForThisTask >= unrollMemoryUsedByThisBlock,
          "released too much unroll memory")
        Left(new PartiallyUnrolledIterator(
          this,
          MemoryMode.ON_HEAP,
          unrollMemoryUsedByThisBlock,
          unrolled = arrayValues.toIterator,
          rest = Iterator.empty))
      }
    } else {
      // We ran out of space while unrolling the values for this block
      logUnrollFailureMessage(blockId, vector.estimateSize())
      Left(new PartiallyUnrolledIterator(
        this,
        MemoryMode.ON_HEAP,
        unrollMemoryUsedByThisBlock,
        unrolled = vector.iterator,
        rest = values))
    }
  }

The code is too long, and I see it myself, it's alright, let's take it slowly~

The blockId in the parameter is the unique identifier of a block. The format is "rdd_" + rddId + "_" + splitIndexthat the value is the iterator of the data corresponding to the partition.

  1. Use the reserveUnrollMemoryForThisTask method to apply to Storage for initialMemoryThreshold (the initial value can be configured through spark.storage.unrollMemoryThreshold, the default is 1M) to unroll the iterator:

    def reserveUnrollMemoryForThisTask(
       blockId: BlockId,
       memory: Long,
       memoryMode: MemoryMode): Boolean = {
     memoryManager.synchronized {
       val success = memoryManager.acquireUnrollMemory(blockId, memory, memoryMode)
       if (success) {
         val taskAttemptId = currentTaskAttemptId()
         val unrollMemoryMap = memoryMode match {
           case MemoryMode.ON_HEAP => onHeapUnrollMemoryMap
           case MemoryMode.OFF_HEAP => offHeapUnrollMemoryMap
         }
         unrollMemoryMap(taskAttemptId) = unrollMemoryMap.getOrElse(taskAttemptId, 0L) + memory
       }
       success
      }
    }

    If you follow acquireUnrollMemory, you can see that the underlying call is the method acquireStorageMemory, which is used to apply for memory from storage. If the application is successful, the corresponding onHeapUnrollMemoryMap will be added to the requested memory, that is, the memory used by unroll.

  2. If the application is successful, it will follow the new value of unrollMemoryUsedByThisBlock, that is, the memory used by unroll on this block.
  3. Then traverse, there are two conditions to stop the traversal, one is that the iterator is all traversed, and the other is that no memory is applied for.
    • Each iteration of a piece of data will be added to the vector of type SizeTrackingVector (the bottom layer is implemented by an array), and every 16 iterations will estimate whether the size of the vector exceeds the memoryThreshold (requested memory).
    • If the memoryThreshold is exceeded, the size of the memory to be applied for again will be calculated, which is 1.5 times the current vector size - the memory size that has been applied for.
    • Apply for memory from Storage again. If the application is successful, follow the new unrollMemoryUsedByThisBlock to continue the traversal and enter the next loop, otherwise stop the traversal.
  4. After the loop is over, if keepUnrolling is true, it means that the values ​​must have been fully expanded; if it is false, not all of them have been expanded, indicating that there is not enough memory to expand the values, which means that the partition failed to be cached in memory.
  5. On the premise that all values ​​are successfully expanded, the vector will be constructed into a DeserializedMemoryEntry object, including the size of the data, and then the expanded data size will be compared with the requested memory size:
    • If the requested memory is smaller than the data, apply for the corresponding size from storage again. If the application is successful, the memory used by unroll will be converted into storage. The logic corresponding to the conversion is: release all unroll memory occupied by the task, and send it to storage again. To apply for the corresponding memory, in fact, the unroll memory is the storage memory, that is, the operation is the memory of the storage, subtracting a certain value and adding a certain value, the result has not changed, but the process has to go this way, because in order to solve the MemoryStore and MemoryManager solutions coupled.
    • If the requested memory is larger than the data, the corresponding unroll memory is released, and then the memory used by unroll is converted to storage.
    • Finally, the blockId and the corresponding entry are added to the entries managed by memorySore.

Cached serialized rdd supports ON_HEAP and OFF_HEAP, similar to cached unserialized rdd, except that it is written to bytebuffer in the form of a stream. If MemoryMode is ON_HEAP, the ByteBuffer here is HeapByteBuffer (heap memory); and if it is OFF_HEAP , where the ByteBuffer is DirectByteBuffer (pointing to off-heap memory). Finally, build a SerializedMemoryEntry according to the data to save it in the entries of the memoryStore.

The use of execution memory in shuffle

During shuffle write, the data will not be directly written to the disk (see Shuffle Write analysis for details ), but will be written to a set first. The memory occupied by this set is the execution memory, and the initial size is 5M. By spark.shuffle.spill.initialMemoryThresholdsetting, every time data is written, it is judged whether it needs to be overwritten to disk. Before overwriting, it will try to apply to execution to avoid overwriting. The code is as follows:

protected def maybeSpill(collection: C, currentMemory: Long): Boolean = {
    var shouldSpill = false
    if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) {
      // Claim up to double our current memory from the shuffle memory pool
      val amountToRequest = 2 * currentMemory - myMemoryThreshold
      val granted = acquireMemory(amountToRequest)
      myMemoryThreshold += granted
      // If we were granted too little memory to grow further (either tryToAcquire returned 0,
      // or we already had more memory than myMemoryThreshold), spill the current collection
      shouldSpill = currentMemory >= myMemoryThreshold
    }
    shouldSpill = shouldSpill || _elementsRead > numElementsForceSpillThreshold
    // Actually spill
    if (shouldSpill) {
      _spillCount += 1
      logSpillage(currentMemory)
      spill(collection)
      _elementsRead = 0
      _memoryBytesSpilled += currentMemory
      releaseMemory()
    }
    shouldSpill
  }

When the number of insert&update is a multiple of 32 and the size of the current collection is greater than or equal to the memory that has been applied for, it will try to apply for more memory to the execution to avoid spill, and the size of the application is 2 times the current collection size minus the already applied memory. The requested memory size, follow up the acquireMemory method:

 public long acquireMemory(long size) {
    long granted = taskMemoryManager.acquireExecutionMemory(size, this);
    used += granted;
    return granted;
  }

Isn't this the way to apply for memory from execution we talked about earlier, so I won't describe it here.

refer to

http://www.jianshu.com/p/999ef21dffe8

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325521376&siteId=291194637