HDFS learning records (comparison of data units, read-write process)

1. The overall architecture of HDFS

Insert picture description here

  • Vague vocabulary explanation:
  1. Client: Any end that accesses HDFS through API or HDFS commands can be regarded as a client.
  2. Rack: Rack, the placement strategy of the copy is related to the rack.
  3. Block Size:Hadoop2.7.3 default start 128 M , the following default Hadoop2.7.3 64 M .

2. The relationship between block, packet, and chunk

  • Block, packet, and chunk are all data storage units involved in HDFS.
  • Xml file in our own Hadoop can configure: core-site.xml, hdfs-site.xmland so, when I do not know how to make changes, you can view core-default.xml, hdfs-site.xmland other documents.
① block
  • Block is the unit of file partitioning in HDFS. A file that does not have 64 M will occupy a block. Size is the actual size of the file, and blocksize is the size of the block.
  • You can modify the default block size hdfs-site.xmlthrough dfs.block.sizeconfiguration items in the file .
  • The relationship between block and disk addressing time and transmission time:
  1. The larger the block, the shorter the disk addressing time and the longer the data transmission time.
  2. The smaller the block, the longer the disk addressing time and the shorter the data transmission time.
  • The block setting is too small:
  1. NameNode memory overload: If the block setting is too small, a NameNodelarge number of small file metadata information is stored in the NameNodememory , which will cause memory overload.
  2. Addressing time is too long: If the block is set too small, the disk addressing time will increase, making the program always look for the beginning of the block.
  • Block is set too large:
  1. Map task time is too long: MapReduce Medium Map usually only processes tasks in one data block at a time. If the block is set too large, the processing time of Map tasks will be too long.
  2. Data transmission time is too long: If the block is set too large, the data transmission time will far exceed the data addressing time, which will affect the data processing speed.
② packet
  • Packet is the second largest unit. It is the basic unit of data transmission between DFSClient and DataNode or DataNode's Pipeline . The default size is 64 kb .
  • You can modify the default packet size hdfs-site.xmlthrough dfs.write.packet.sizeconfiguration items in the file .
③ chunk
  • chunk is the smallest unit, it is DFSClientto DataNode, or DataNodeis Pipelinecarried out between the data check basic unit, the default is 512 byte .
  • You can modify the default chunk size core-site.xmlthrough io.bytes.per.checksumconfiguration items in the file .
  • As the basic unit of data verification, each chunk needs to carry 4 bytes of verification information . Therefore, when actually written to the packet, it is 516 byte , and the ratio of the real data to the check data is 128 : 1.
  • Example: A 128M file can be divided into 256 chunks, and 256 * 4 byte = 1 Mthe verification information needs to be carried in total .
  • Summary of the three:
  1. chunk is DFSClientto DataNodeor DataNodeis Pipelineperformed between data check basic units, each 4 byte chunk need to carry the parity information.
  2. packet is DFSClientto DataNodeor DataNodeis Pipelineperformed between the data transmission base unit, when the actual size of the chunk is written to the packet as 516 byte.
  3. Block is the unit of file block, countless packets form a block. Small files are less than a block size, but will occupy a metadata slot, causing NameNodememory overload.
④ Three-layer buffer in the writing process
  • The writing process involves DataQueuethree-tier caches of chunk, packet, and three granularities:
  1. When data flows DFSOutputStreamin, there will be a chunk-sized buffer. When the data fills this buffer, or when a forced flush()operation is encountered , a checksum is calculated.
  2. The chunk and checksum are written into the packet together. When multiple chunks fill the packet, the packet will enter the DataQueuequeue.
  3. DataQueueThe packet in is taken out by the thread and sent to DataNode, and the packet that is not confirmed to be successfully written will be moved to AckQueue for confirmation.
  4. If you receive DataNodethe ack (write successful), by the ResponseProcessorresponsible packet from AckQueuedeletion; otherwise, it will move to DataQueuethe re-write.
    Insert picture description here
    Three-layer buffer

3. Basic knowledge

NameNode

  • Manage metadata information (Metadata). Note that only metadata information is stored.
  • The namenode manages the metadata information and puts a copy in the memory for access and query. The metadata information will also be persisted to the disk through the fsimage and edits files.
  • Version 1.0 of Hadoop uses SecondaryNamenode to merge fsimage and edits files, but this mechanism cannot achieve the effect of hot backup. The namenode of Hadoop 1.0 has a single point of failure.
  • Metadata is roughly divided into two levels: Namespace management layer , responsible for managing the tree-like directory structure in the file system and the mapping relationship between files and data blocks. The block management layer is responsible for managing the mapping relationship BlocksMap between the physical blocks of the files in the file system and the actual storage location.
    Insert picture description here

datanode

  • Data node, used to store file blocks.
  • In order to prevent data loss caused by the datanode hanging down, a file block must be backed up, and a file block defaults to three copies.

rack

  • Rack, HDFS uses rack awareness strategy to place replicas.
  • The first copy: If the writer is a dataNode, put it directly locally; otherwise, randomly select a dataNode for storage.
  • The second copy: a dataNode on the remote rack
  • Third copy: another dataNode on the same remote rack as the second copy.
  • This placement strategy reduces the write traffic between racks and improves write performance.
  • More than 3 copies: The placement requirements for the remaining copies meet the following conditions:
  • A dataNode is only allowed to have one copy of the block
  • The maximum number of copies of a Hadoop cluster is the total number of dataNodes

Reference link: HDFS Replica Placement Policy

client

  • Client, any end that is operated through API or commands can be regarded as client

blockSize

  • Data blocks generally have a default size, which can be configured in the hdfs-site.xml file dfs.blocksize.
  • Hadoop1.0:64MB。Hadoop2.0 :128MB。
  • The problem of block size: From the perspective of big data processing, the larger the block, the better. Therefore, from the development of technology, the future blocks will become larger and larger, because the block size will reduce the number of disk addressing, thereby reducing the addressing time.

4. HDFS read and write process

① Reading process of HDFS
  1. The client calls the DistributedFileSystem.open() method to obtain the input stream object (FSDataInputStream) of the data block to be read.
  2. When the open() method is running, DistributedFileSystem uses RPC to call NameNode to obtain the dataNode addresses of all copies of the block. After the open() method runs, it returns the FSDataInputStream object, which encapsulates the input stream DFSInputStream.
  3. Call the input stream FSDataInputStream.read() method, so that DFSInputStream automatically connects to a suitable dataNode for data reading (the distance of the network topology) according to the principle of proximity.
  4. The read() method is called in a loop to transfer data from the dataNode to the client.
  5. After reading the current block, close the connection with the current dataNode. Establish a connection with the dataNode of the next block to continue reading the next block.

This process is transparent to the client. From the client's perspective, it seems that only one continuous stream is read.

  1. After the client finishes reading all the blocks, it calls FSDataInputStream.close() to close the input stream, thus ending the file reading process.
    Insert picture description here
  • Read error:
  • If an error occurs during the reading process, DFSInputStream will try to read the block in the adjacent DataNode. At the same time, the dataNode that has the problem will be recorded and will not communicate with it in the subsequent data request process.
  • Every time a block is read, DFSInputStream will check the integrity of the data. If there is damage, the client will notify the NameNode and continue to read the copy from other DataNodes.
② HDFS writing process
  1. Distributed File System client calls the DistributedFileSystem.create( )method sends a request to create a file NameNode.
  2. When the create() method runs, it DistributedFileSystemsends an RPC request to the NameNode, and the NameNode completes the check before file creation. If it passes the check, first record the write operation in EditLog, and then return the output stream object FSDataOutputStream(internally encapsulated DFSOutputDtream).
  3. The client calls the FSOutputStream.write()function, writing data to the corresponding file.
  4. When writing a file, the DFSOutputDtreamfile is divided into packets and the packets are written to the DataQueue. DataStreamerResponsible for managing the DataQueue, it will ask the NameNode to allocate suitable new blocks for storing copies. A pipeline is formed between DataNodes, and packets are transmitted through the pipeline.
  • DataStreamer Stream the packet to DataNode1 through the pipeline
  • DataNode1 transmits the received packet to DataNode2
  • DataNode2 transmits the received packet to DataNode3 to form a triple copy storage of the packet.
  1. In order to ensure the consistency of the copy, the DataNode that has received the packet needs to return an ack packet to the sender. After receiving enough responses, the packet will be deleted from the internal queue.
  2. After the file is written, the client calls the FSOutputStream.close()method to close the file input stream.
  3. Call the DistributedFileSystem.complete()method to notify the NameNode that the file is successfully written.

Guess you like

Origin blog.csdn.net/u014454538/article/details/100938933