hadoop 2.6 源码解读之写操作之DataStreamer篇

DataStreamer是在创建文件流的时候已经初始化。其主要作用是，将队列中的packet发送到DataNode

DFSClient create()方法中

{

 final DFSOutputStream result = DFSOutputStream.newStreamForCreate(this,
        src, masked, flag, createParent, replication, blockSize, progress,
        buffersize, dfsClientConf.createChecksum(checksumOpt),
        favoredNodeStrs);
    beginFileLease(result.getFileId(), result);
    return result;

}

newStreamForCreate方法里
构造 DFSOutputStream 对象

{
final DFSOutputStream out = new DFSOutputStream(dfsClient, src, stat,
        flag, progress, checksum, favoredNodes);
//启动了 线程 streamer
        out.start();
}

DFSOutputStream 构造里，创建了DataStreamer

/** Construct a new output stream for creating a file. */
  private DFSOutputStream(DFSClient dfsClient, String src, HdfsFileStatus stat,
      EnumSet<CreateFlag> flag, Progressable progress,
      DataChecksum checksum, String[] favoredNodes) throws IOException {
    this(dfsClient, src, progress, stat, checksum);
    this.shouldSyncBlock = flag.contains(CreateFlag.SYNC_BLOCK);

    computePacketChunkSize(dfsClient.getConf().writePacketSize, bytesPerChecksum);

    Span traceSpan = null;
    if (Trace.isTracing()) {
      traceSpan = Trace.startSpan(this.getClass().getSimpleName()).detach();
    }
    streamer = new DataStreamer(stat, traceSpan);
    if (favoredNodes != null && favoredNodes.length != 0) {
      streamer.setFavoredNodes(favoredNodes);
    }
  }

重点看 DataStreamer run()方法，相当复杂，这里不摘录全部代码

public void run() {
...
//大循环
while (!streamerClosed && dfsClient.clientRunning) 
{

//先处理ResponseProcessor  应答器 Responder 异常情况，
//ResponseProcessor  作用是：Processes responses from the datanodes.  A packet is
// removed from the ackQueue when its response arrives.
// if the Responder encountered an error, shutdown Responder
        if (hasError && response != null) {
          try {
            response.close();
            response.join();
            response = null;
          } catch (InterruptedException  e) {
            DFSClient.LOG.warn("Caught exception ", e);
          }
        }
}

...
//处理数据节点错误
//processDatanodeError 会将 ackQueue 里面 packet 移动到 dataQueue
if (hasError && (errorIndex >= 0 || restartingNodeIndex >= 0)) {
            doSleep = processDatanodeError();
          }

...

// 等待条件比较多
// 主要是dataQueue.wait，等待数据包到来
while ((!streamerClosed && !hasError && dfsClient.clientRunning 
                && dataQueue.size() == 0 && 
                (stage != BlockConstructionStage.DATA_STREAMING || 
                 stage == BlockConstructionStage.DATA_STREAMING && 
                 now - lastPacket < dfsClient.getConf().socketTimeout/2)) || doSleep ) {
              long timeout = dfsClient.getConf().socketTimeout/2 - (now-lastPacket);
              timeout = timeout <= 0 ? 1000 : timeout;
              timeout = (stage == BlockConstructionStage.DATA_STREAMING)?
                 timeout : 1000;
              try {
                dataQueue.wait(timeout);
              } catch (InterruptedException  e) {
                DFSClient.LOG.warn("Caught exception ", e);
              }
              doSleep = false;
              now = Time.now();
            }
...
//nextBlockOutputStream()方法用来向Namenode 申请块信息，返回LocatedBlock 对象，其包含了 数据流pipeline 数据流节点信息 DatanodeInfo

// get new block from namenode.
          if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) {
            if(DFSClient.LOG.isDebugEnabled()) {
              DFSClient.LOG.debug("Allocating new block");
            }
            //设置数据流管道
            setPipeline(nextBlockOutputStream());
            initDataStreaming();
          } 
...

//发送数据
try {
            one.writeTo(blockStream);
            blockStream.flush();   
          } c



...

}

总结

整个写操作过程中，除了客户端的主线程，另外还有DataStreamer 发包线程，ResponseProcessor 处理响应线程
在block 粒度上，hdfs 是强一致性的，写入即能看见

hadoop 2.6 源码 解读之写操作之DataStreamer篇

总结

猜你喜欢

hadoop 2.6 源码解读之写操作之DataStreamer篇