Analysis of the data reading process Ozone

Foreword


Previous Article Ozone data writing process analysis , we share the process of writing data on Ozone analysis. , The author to share corresponding to another process, the read data analysis process. Generally speaking, reading and writing process Ozone data, with part of the common, are related to the Block, Chunk, the concept of buffer. On terms of complexity, the reading process is simpler than the writing process, a number of easy to understand.

Ozone reading process data: data reading by Block, Chunk offset of


If you have been perusing the author of articles on Ozone data writing process, you should know Ozone Key data is divided according to Block, and each Block is further write data in accordance with Chunk units. A Chunk Chunk corresponds to a file. Block is the internal virtual concept, but Datanode Container Block will keep the information under their respective Chunk list.

In a Key, according to the data segment, divided into a plurality of Block, Block start position of each data offset in global mode is naturally different. Block offset value of the second example is equal to the length of a Block. Chunk under the Block data organization is the same reason.

Apart from read data dependent Offset, but there are additional needs to read Block, Chunk information to other services, after Client does not know such information in advance, mainly the following three operations:

  • Client initiated query request key information to OzoneManager, key information returned contains all the information under the Block
  • Internal Block Stream Data Container db queries Block in the Datanode, Block information contains information to which they belong has Chunk
  • Chunk Stream Datanode inquiry to the actual chunk data file information, and then loaded into a buffer for external reading within itself

In summary, the overall process of FIG follows:

Here Insert Picture Description

Analysis of the relevant code data read Ozone


Let's where the code associated methods of analysis to achieve some of the key read.

First Client service queries to the OM key operating information,

  public OzoneInputStream readFile(String volumeName, String bucketName,
      String keyName) throws IOException {
    OmKeyArgs keyArgs = new OmKeyArgs.Builder()
        .setVolumeName(volumeName)
        .setBucketName(bucketName)
        .setKeyName(keyName)
        .setSortDatanodesInPipeline(topologyAwareReadEnabled)
        .build();
    // 1.client向OM查询给你key的metadata信息,里面包含有key下的block信息
    // 然后client用查询得到的key信息构造输入流对象.
    OmKeyInfo keyInfo = ozoneManagerClient.lookupFile(keyArgs);
    return createInputStream(keyInfo);
  }

Then performs the initialization method behind KeyInputStream to create multiple Block Stream objects,

  private synchronized void initialize(String keyName,
      List<OmKeyLocationInfo> blockInfos,
      XceiverClientManager xceiverClientManager,
      boolean verifyChecksum) {
    this.key = keyName;
    this.blockOffsets = new long[blockInfos.size()];
    long keyLength = 0;
    // 2.KeyInputStream根据查询得到的key block信息构造对应BlockOutputStream对象
    for (int i = 0; i < blockInfos.size(); i++) {
      OmKeyLocationInfo omKeyLocationInfo = blockInfos.get(i);
      if (LOG.isDebugEnabled()) {
        LOG.debug("Adding stream for accessing {}. The stream will be " +
            "initialized later.", omKeyLocationInfo);
      }
      // 3.构造BlockOutputStream并加入到block stream列表中
      addStream(omKeyLocationInfo, xceiverClientManager,
          verifyChecksum);
      // 4.更新当前创建的BlockOutputStream在全局key文件下的偏移量值
      this.blockOffsets[i] = keyLength;
      // 5.更新当前的key len,此值将成为下一个BlockOutputStream的初始偏移量
      keyLength += omKeyLocationInfo.getLength();
    }
    this.length = keyLength;
  }

Block offset is then based on the data read operation,

  public synchronized int read(byte[] b, int off, int len) throws IOException {
    checkOpen();
    if (b == null) {
      throw new NullPointerException();
    }
    if (off < 0 || len < 0 || len > b.length - off) {
      throw new IndexOutOfBoundsException();
    }
    if (len == 0) {
      return 0;
    }
    int totalReadLen = 0;
    // 当还有剩余需要读取的数据时,继续进行block的数据读取
    while (len > 0) {
      // 当当前的block下标已经是最后一个block stream,并且最后一个block
      // stream的未读数据长度为0时,说明key文件数据已全部读完,操作返回.
      if (blockStreams.size() == 0 ||
          (blockStreams.size() - 1 <= blockIndex &&
              blockStreams.get(blockIndex)
                  .getRemaining() == 0)) {
        return totalReadLen == 0 ? EOF : totalReadLen;
      }

      // 1.获取当前准备读取的BlockInputStream对象
      BlockInputStream current = blockStreams.get(blockIndex);
      // 2.计算后面需要读取的数据长度,取剩余需要读取的数据长度和当前
      // BlockInputStream未读的数据长度间的较小值.
      int numBytesToRead = Math.min(len, (int)current.getRemaining());
      // 3.从BlockInputStream中读取数据到字节数组中
      int numBytesRead = current.read(b, off, numBytesToRead);
      if (numBytesRead != numBytesToRead) {
        // This implies that there is either data loss or corruption in the
        // chunk entries. Even EOF in the current stream would be covered in
        // this case.
        throw new IOException(String.format("Inconsistent read for blockID=%s "
                        + "length=%d numBytesToRead=%d numBytesRead=%d",
                current.getBlockID(), current.getLength(), numBytesToRead,
                numBytesRead));
      }
      // 4.更新相关指标,offset偏移量,剩余需要读取的数据长度更新
      totalReadLen += numBytesRead;
      off += numBytesRead;
      len -= numBytesRead;
      // 5.如果当前的Block数据读完了,则block下标移向下一个block
      if (current.getRemaining() <= 0 &&
          ((blockIndex + 1) < blockStreams.size())) {
        blockIndex += 1;
      }
    }
    return totalReadLen;
  }

Block Stream above again invoked read operation, which involves actually Chunk stream of read operations, and the above method is basically the same logic.

Further a data read operation methods seek method,

  public synchronized void seek(long pos) throws IOException {
    checkOpen();
    if (pos == 0 && length == 0) {
      // It is possible for length and pos to be zero in which case
      // seek should return instead of throwing exception
      return;
    }
    if (pos < 0 || pos > length) {
      throw new EOFException(
          "EOF encountered at pos: " + pos + " for key: " + key);
    }

    // 1. 更新Block的索引位置
    if (blockIndex >= blockStreams.size()) {
      // 如果Index超过最大值,则从blockOffsets中进行二分查找Index值
      blockIndex = Arrays.binarySearch(blockOffsets, pos);
    } else if (pos < blockOffsets[blockIndex]) {
      // 如果目标位置小于当前block的offset,则缩小范围到[0, blockOffsets[blockIndex]]
      // 进行查找
      blockIndex =
          Arrays.binarySearch(blockOffsets, 0, blockIndex, pos);
    } else if (pos >= blockOffsets[blockIndex] + blockStreams
        .get(blockIndex).getLength()) {
      // 否则进行剩余部分[blockOffsets[blockIndex+1], blockOffsets[blockStreams.size() - 1]]
      blockIndex = Arrays
          .binarySearch(blockOffsets, blockIndex + 1,
              blockStreams.size(), pos);
    }
    if (blockIndex < 0) {
      // Binary search returns -insertionPoint - 1  if element is not present
      // in the array. insertionPoint is the point at which element would be
      // inserted in the sorted array. We need to adjust the blockIndex
      // accordingly so that blockIndex = insertionPoint - 1
      blockIndex = -blockIndex - 2;
    }

    // 2.重置上次BlockOutputStream seek的位置
    blockStreams.get(blockIndexOfPrevPosition).resetPosition();

    // 3.重置当前Block下标后的block的位置
    for (int index =  blockIndex + 1; index < blockStreams.size(); index++) {
      blockStreams.get(index).seek(0);
    }
    // 4. 调整当前Block到目标给定的位置=给定位置-此block的全局偏移量
    blockStreams.get(blockIndex).seek(pos - blockOffsets[blockIndex]);
    blockIndexOfPrevPosition = blockIndex;
  }

Because the interior of the read logic and Key Stream Block Stream to achieve substantially uniform, it skipped here. We look Chunk Stream direct the buffer data reading process.

Chunk Stream The read operation is as follows:

  public synchronized int read(byte[] b, int off, int len) throws IOException {
    // According to the JavaDocs for InputStream, it is recommended that
    // subclasses provide an override of bulk read if possible for performance
    // reasons.  In addition to performance, we need to do it for correctness
    // reasons.  The Ozone REST service uses PipedInputStream and
    // PipedOutputStream to relay HTTP response data between a Jersey thread and
    // a Netty thread.  It turns out that PipedInputStream/PipedOutputStream
    // have a subtle dependency (bug?) on the wrapped stream providing separate
    // implementations of single-byte read and bulk read.  Without this, get key
    // responses might close the connection before writing all of the bytes
    // advertised in the Content-Length.
    if (b == null) {
      throw new NullPointerException();
    }
    if (off < 0 || len < 0 || len > b.length - off) {
      throw new IndexOutOfBoundsException();
    }
    if (len == 0) {
      return 0;
    }
    checkOpen();
    int total = 0;
    while (len > 0) {
      // 1.准备读取len长度数据到Buffer中
      int available = prepareRead(len);
      if (available == EOF) {
        // There is no more data in the chunk stream. The buffers should have
        // been released by now
        Preconditions.checkState(buffers == null);
        return total != 0 ? total : EOF;
      }
      // 2.从buffer读数据到输入数组中,此过程buffer的position会往后移动available长度
      buffers.get(bufferIndex).get(b, off + total, available);
      // 3.更新剩余长度
      len -= available;
      total += available;
    }

    // 4.如果已经读到Chunk尾部了,则释放buffer空间
    if (chunkStreamEOF()) {
      // smart consumers determine EOF by calling getPos()
      // so we release buffers when serving the final bytes of data
      releaseBuffers();
    }

    return total;
  }

PrepareRead operation will read the data from the chunk Datanode loaded into buffer

  private synchronized int prepareRead(int len) throws IOException {
    for (;;) {
      if (chunkPosition >= 0) {
        if (buffersHavePosition(chunkPosition)) {
          // The current buffers have the seeked position. Adjust the buffer
          // index and position to point to the chunkPosition.
          adjustBufferPosition(chunkPosition - bufferOffset);
        } else {
          // Read a required chunk data to fill the buffers with seeked
          // position data
          readChunkFromContainer(len);
        }
      }
      // 如果Chunk之前没有seek到某个位置,则获取当前buffer,判断是否包含数据
      if (buffersHaveData()) {
        // Data is available from buffers
        ByteBuffer bb = buffers.get(bufferIndex);
        return len > bb.remaining() ? bb.remaining() : len;
      } else  if (dataRemainingInChunk()) {
    	// 如果当前buffer不包含数据并且chunk有剩余数据需要被读,
    	// 则读取chunk数据到buffer中
        readChunkFromContainer(len);
      } else {
        // All available input from this chunk stream has been consumed.
        return EOF;
      }
    }
  }

At the end of each of the Loop, the above method will be chunkStreamEOF check reading position,

  /**
   * 检查是否已经抵达Chunk尾部.
   */
  private boolean chunkStreamEOF() {
    if (!allocated) {
      // Chunk data has not been read yet
      return false;
    }

    // 判断读取的位置是否已经达到Chunk末尾的2个条件:
    // 1)buffer中是否还有数据
    // 2)是否已经达到chunk的length长度
    if (buffersHaveData() || dataRemainingInChunk()) {
      return false;
    } else {
      Preconditions.checkState(bufferOffset + bufferLength == length,
          "EOF detected, but not at the last byte of the chunk");
      return true;
    }
  }

Chunk Stream to reduce the frequent use ByteBuffer IO read operation, to improve efficiency.

The OK, the above process is Ozone read data analysis, the core point is to read data between Block, Chunk offset based on the data.

Published 373 original articles · won praise 403 · Views 2.03 million +

Guess you like

Origin blog.csdn.net/Androidlushangderen/article/details/103841373