spark TaskMemoryManager shuffle memory management

In the spark, during the shuffle and storing data to be managed by TaskMemoryManager, to ShuffleExternalSorter as an example.

final long recordPointer = sortedRecords.packedRecordPointer.getRecordPointer();
final Object recordPage = taskMemoryManager.getPage(recordPointer);
final long recordOffsetInPage = taskMemoryManager.getOffsetInPage(recordPointer);
int dataRemaining = UnsafeAlignedOffset.getSize(recordPage, recordOffsetInPage);
long recordReadPosition = recordOffsetInPage + uaoSize; // skip over record length
while (dataRemaining > 0) {
  final int toTransfer = Math.min(diskWriteBufferSize, dataRemaining);
  Platform.copyMemory(
    recordPage, recordReadPosition, writeBuffer, Platform.BYTE_ARRAY_OFFSET, toTransfer);
  writer.write(writeBuffer, 0, toTransfer);
  recordReadPosition += toTransfer;
  dataRemaining -= toTransfer;
}
writer.recordWritten();

Shuffle data will be stored in TaskMemoryManager, when the time required to write to the hard disk for persistence will acquire a corresponding memory page in the memory according to a pointer TaskMemoryManager, for specific data from the page memory and a corresponding offset BlockManager written to the segment in order to obtain at the downstream rdd specific implementation.

 

Memory pages mentioned above, is a memory management unit in TaskMemoryManager, wherein the specific category of MemoryBlock.

protected Object obj;

protected long offset;

protected long length;

MemoryBlock a particular composition of three members Hereinafter, when it is the heap memory, but involves only the offset and length, wherein the offset is the starting memory address, length compared to the length of the outer concrete stack memory.

In achieved within the heap memory, obj compared with stored data specific long array, offset compared to the long array, the memory storing the specific elements of the first offset, size compared with the size of the logic of MemoryBlock.

In addition to the stack memory implemented outside OffHeapMemoryBlock, the heap memory OnHeapMemoryBlock achieve, there are achieved by a special byte array ByteArrayMemoryBlock.

 

The TaskMemoryManager is a specialized management MemoryBlock of the above.

private final MemoryBlock[] pageTable = new MemoryBlock[PAGE_TABLE_SIZE];

In TaskMemoryManager, maintains a 8192-sized array MemoryBlock pageTable, used to store specific MemoryBlock.

User to apply an external memory corresponding memory resources, in this method, will be based on the type of memory usage patterns, to allocate the appropriate selection of MemoryAllocator () method to apply the corresponding MemoryBlock by allocatePage () method.

UnsafeMemoryAllocator implementation is not complicated, to obtain the corresponding amount of memory directly through unsafe application, and return address, return as a member of OffHeapMemoryBlock construction.

The HeapMemoryAllocator implementation is more complex.

public MemoryBlock allocate(long size) throws OutOfMemoryError {
  int numWords = (int) ((size + 7) / 8);
  long alignedSize = numWords * 8L;
  assert (alignedSize >= size);
  if (shouldPool(alignedSize)) {
    synchronized (this) {
      final LinkedList<WeakReference<long[]>> pool = bufferPoolsBySize.get(alignedSize);
      if (pool != null) {
        while (!pool.isEmpty()) {
          final WeakReference<long[]> arrayReference = pool.pop();
          final long[] array = arrayReference.get();
          if (array != null) {
            assert (array.length * 8L >= size);
            MemoryBlock memory = OnHeapMemoryBlock.fromArray(array, size);
            if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
              memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
            }
            return memory;
          }
        }
        bufferPoolsBySize.remove(alignedSize);
      }
    }
  }
  long[] array = new long[numWords];
  MemoryBlock memory = OnHeapMemoryBlock.fromArray(array, size);
  if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
    memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
  }
  return memory;
}

First, the size of the application will be aligned according to a multiple of 8 memory.

After the application according to whether the memory size is too large to directly determine the need to apply a large multiplexing to reduce the memory block according to the prior has been released but not yet recovered large MemoryBlock, based on memory size configuration last long array, to generate a return OnHeapMemoryBlock .

 

Thus, if access to some memory space, in the case of heap memory outside, may be obtained directly through the heap memory outside the address, but the in-core memory is relatively complex and requires TaskMemoryManager the pageNum is particularly subscript pageTable's, and memory next Correspondences long array offset.

For the case of the heap memory, TaskMemoryManager provided a coding method encodePageNumberAndOffset () method, the above-described encoding information into a unified 8 bytes, the memory heap outer unity, the heap will also be uniform outer memory according to this encoding.

public static long encodePageNumberAndOffset(int pageNumber, long offsetInPage) {
  assert (pageNumber >= 0) : "encodePageNumberAndOffset called with invalid page";
  return (((long) pageNumber) << OFFSET_BITS) | (offsetInPage & MASK_LONG_LOWER_51_BITS);
}

pageNum maximum 8191, 13 bits, 13 coded storage pageNum high, then the offset storage 51. According to this addressing when encoding, decoding only needs to acquire the specific location of the corresponding memory.

 

When () method of application memory by TaskMemoryManager allocatePage, if memory is insufficient, will by spill () method attempts to release the portion of memory from memory existing holders in order to try to meet the memory size of the application.

if (got < required) {
  // Call spill() on other consumers to release memory
  // Sort the consumers according their memory usage. So we avoid spilling the same consumer
  // which is just spilled in last few times and re-spilling on it will produce many small
  // spill files.
  TreeMap<Long, List<MemoryConsumer>> sortedConsumers = new TreeMap<>();
  for (MemoryConsumer c: consumers) {
    if (c != consumer && c.getUsed() > 0 && c.getMode() == mode) {
      long key = c.getUsed();
      List<MemoryConsumer> list =
          sortedConsumers.computeIfAbsent(key, k -> new ArrayList<>(1));
      list.add(c);
    }
  }
  while (!sortedConsumers.isEmpty()) {
    // Get the consumer using the least memory more than the remaining required memory.
    Map.Entry<Long, List<MemoryConsumer>> currentEntry =
      sortedConsumers.ceilingEntry(required - got);
    // No consumer has used memory more than the remaining required memory.
    // Get the consumer of largest used memory.
    if (currentEntry == null) {
      currentEntry = sortedConsumers.lastEntry();
    }
    List<MemoryConsumer> cList = currentEntry.getValue();
    MemoryConsumer c = cList.get(cList.size() - 1);
    try {
      long released = c.spill(required - got, consumer);
      if (released > 0) {
        logger.debug("Task {} released {} from {} for {}", taskAttemptId,
          Utils.bytesToString(released), c, consumer);
        got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode);
        if (got >= required) {
          break;
        }
      } else {
        cList.remove(cList.size() - 1);
        if (cList.isEmpty()) {
          sortedConsumers.remove(currentEntry.getKey());
        }
      }
    } catch (ClosedByInterruptException e) {
      // This called by user to kill a task (e.g: speculative task).
      logger.error("error while calling spill() on " + c, e);
      throw new RuntimeException(e.getMessage());
    } catch (IOException e) {
      logger.error("error while calling spill() on " + c, e);
      throw new SparkOutOfMemoryError("error while calling spill() on " + c + " : "
        + e.getMessage());
    }
  }
}

Here, the memory will be sorted according to the size of their holdings, access to all holders of the remaining memory is greater than the lack of memory did not apply to consumers, and continue to try to spill () method to release the memory from the memory of all current holders MemoryConsumer until it has traversed all holders of memory or to meet memory requirements.

发布了142 篇原创文章 · 获赞 19 · 访问量 11万+

Guess you like

Origin blog.csdn.net/weixin_40318210/article/details/104085948