HDFS reading and writing process and multi-threaded writing problem

1、HDFS之block package chunk

To understand the read and write process of hdfs, we must first understand the concepts of block, package and chunk.

  • 1. Block,
    everyone should know that the file needs to be divided into blocks before uploading. This block is a block, which is generally 128MB. Of course, you can change it, regardless of not recommending it. Because the block is too small: the addressing time is too high. Block is too large: Map tasks are too few, job execution speed becomes slow. It is the largest unit.
  • 2. Package
    packet is the second largest unit. It is the basic unit for client to transfer data between DataNode or DataNode PipLine . The default is 64KB .
  • 3. Chunk
    chunk is the smallest unit. It is the basic unit for client to check data between DataNode or PipLine of DataNode. The default is 512Byte. Because it is used for verification, each chunk needs to have a 4Byte check digit. . So the actual size of each chunk write packet is 516Byte .

2. HDFS writing process

Insert picture description here

  • 1. Use the Client Client provided by HDFS to initiate an RPC request to the remote Namenode .
  • 2. Namenode will check whether the file to be created already exists and whether the creator has permission to operate. If successful, the file will create a record, otherwise the client will throw an exception.
  • 3. When the client starts to write the file, the client will divide the file into multiple packets , and internally manage these packets in the form of a data queue "data queue", and then apply to the Namenode for blocks to obtain the storage for replications Appropriate datanode list, the size of the list depends on the replication value in Namenode.
  • 4. Start to write the packet to all replications in the form of pipeline. The client writes the packet to the first datanode in a stream. After the datanode stores the packet, it is passed to the next one in this pipeline. Datanode, until the last datanode, this way of writing data is in the form of a pipeline.
  • 5. After the last datanode is successfully stored, it will return an ack packet (confirm package) , which is passed to the client in the pipeline. The "ack queue" is maintained in the client's development library. After successfully receiving the ack packet returned by the datanode, it will be "Ack queue" removes the corresponding packet.
  • 6. If a datanode fails during the transmission process, the current pipeline will be shut down, the failed datanode will be removed from the current pipeline, and the remaining blocks will continue to be pipelined in the remaining datanodes. In the form of transmission, at the same time Namenode will allocate a new datanode, keeping the number set by replications .
  • 7. When the transmission of a block is completed, the client again requests NN to upload the server of the second block.
  • 8. After the client finishes writing the data, it will call close () on the data stream to close the data stream.
  • 9. Send a completion signal to the NameNode.
    (Note: The timing of sending a completion signal to the NameNode depends on whether the cluster has strong consistency or final consistency. Strong consistency requires that all DataNodes are written before they are reported to the NameNode. For final consistency, any DataNode can be written To report to NameNode separately, HDFS generally emphasizes strong consistency )

3. HDFS reading process

Insert picture description here

  • 1. Communicate with NN to query metadata (the node of the DN where the block is located) to find the server of the DN where the file block is located.
  • 2. Pick a DN (nearby principle, then random) server and request to establish an input stream.
  • 3. DN starts to send data (read data from the disk and put it into the stream, one packet is used for verification).
  • 4. When the reading of this data block is completed, close the connection to this data node, and then connect to the nearest data node of the next data block in this file.
  • 5. When the client has finished reading the data, call the close function of FSDataInputStream.

During the reading process, if an error occurs when FSDataInputStream communicates with a datanode, it will try the next closest block, and it will also remember the datanode where the error occurred just now. Make unnecessary attempts on this datanode .

DFSInputStream also on datanode verification number (checknums) checking out the data transmission, if the damaged block is found, attempts have DFSInputStream datanode data backed up to the backup block is read from another .

4. DataNode error during HDFS data writing

If writing data to the DataNode fails, the following operations will be performed:

  • 1. The Pipeline data flow pipeline will be closed, and the packets in the ACK queue will be added to the front of the data queue to ensure that no packet loss occurs .
  • 2. The ID version of the saved block on the normal DataNode node will be upgraded -so that the block data on the failed DataNode node will be deleted after the node returns to normal, and the failed node will also be deleted from the Pipeline .
  • 3. The remaining data will be written to the other two nodes in the pipeline pipeline .

5. Can HDFS write in multiple threads?

The answer is: Can
not write concurrently with hdfs means that there can only be one writer for files with the same name and location, otherwise all upload requests will fail .
Files in different locations or with different names can be uploaded at the same time .

How to maintain data integrity during reading and writing?

Pass the checksum . Because each chunk has a check digit, each chunk forms a packet, and each packet eventually forms a block, so the checksum can be found on the block.

The client side of HDFS implements a checksum check of the contents of HDFS files. When the client creates a new HDFS file, the checksum of each data block of the file is calculated after the block is divided. This checksum is stored in the same HDFS namespace as a hidden file. When the client reads the file content from HDFS, it checks whether the checksum calculated in the block (hidden file) matches the checksum in the file block read. If it does not match, the client can Choose to obtain a copy of the data block from another Datanode.

The specific format of the file block directory structure in HDFS is as follows:

${dfs.datanode.data.dir}/
├── current
│ ├── BP-526805057-127.0.0.1-1411980876842
│ │ └── current
│ │ ├── VERSION
│ │ ├── finalized
│ │ │ ├── blk_1073741825
│ │ │ ├── blk_1073741825_1001.meta
│ │ │ ├── blk_1073741826
│ │ │ └── blk_1073741826_1002.meta
│ │ └── rbw
│ └── VERSION
└── in_use.lock

in_use.lock means that the DataNode is operating on the folder.
rbw means "replica being written". This directory is used to store the data that the user is currently writing.
Block metadata file (* .meta) consists of a header file containing version and type information and a series of check values. The checksum is also there.

Published 9 original articles · praised 0 · visits 62

Guess you like

Origin blog.csdn.net/yangbllove/article/details/105567892