hdfs data flow unit block, packet and chunk

block、packet与chunk

In the process of DFSClient write HDFS, there are three units need to be clear: block, packet with a chunk;

  • block is the largest unit, which is eventually stored on a data granularity DataNodes, determined by dfs.block.size parameter defaults 64M; Note: This parameter is determined by the client configuration;

  • packet is a unit of medium, which is the data size of the flow from the DFSClient DataNode to dfs.write.packet.size parameter as a reference value, the default is 64K; NOTE: this parameter is a reference value, means being true when the data transmission , the reason it will be adjusted basis, the adjustment is a packet has a specific structure, adjust the size of the packet's goal is to just include all members of the structure, but also ensure that the written DataNode does not exceed the current size of block set Value;

  • chunk is a unit of the smallest, which is the DataNode DFSClient data transmission data checking particle size, determined by io.bytes.per.checksum parameter defaults 512B;

  • Note : the fact that one chunk further includes a check value. 4B, so when the packet is written chunk 516B; ratio data 128 and the verification value: 1, a block so for verification document 128M will have with it a 1M correspond;

Three write buffer during the write process will chunk, packet size and packet queue to do three three cache;

  • First, when the data flows DFSOutputStream, the DFSOutputStream have a buf chunk size , when the filled data buf (or force encountered the flush), calculates the checksum value, and then filled into Packet;

  • When a packet enters the stuffing chunk, still not sent immediately, but after a packet accumulated to fill the packet into the queue dataqueue;

  • Dataqueue packet into the queue will be sent to another thread sequentially taken Datanode;

  • (Note: producer-consumer model, condition blocking producers is dataqueue with ackqueue sum of more than one block of packet limit)


Guess you like

Origin www.cnblogs.com/xiangyuguan/p/11007497.html