HDFS read and write process enhanced by hadoop principle [Upload and download process and principle]

1 Data writing process (upload)

1 The client requests namenode to upload file data (size, physical block size, number of copies)

2 After namenode receives the upload request from the client, various verifications (permissions, storage capacity, distribution metadata information)

3 The client receives the ok response from namenode

4 The client requests namenode to upload the first piece of data, and NN returns the meta information of the first piece of data

5 The client establishes a connection channel with the 3 machines in the returned metadata ,

6 return OK

7 The client locally reads the content of the first piece of data of the file to be uploaded io.read length=128M

8 The local stream is converted into a distributed output stream (DistributeOutputStream) while reading the data [encapsulating byte data into data packets to improve transmission efficiency]

9 Distributed output stream output data packets to lx01, receive data on lx01, and write to the local while outputting to linux02 Linux02 and transmitting to Linux03

10 linux01 02 03 will store the data in the specified local directory, the directory structure is the same

11 When a data block in the cluster is successfully stored, the data block uploaded this time is successful. If the number of copies is not enough, the subsequent NN will automatically maintain the number of copies

12 Return ok Upload successfully

The process of uploading subsequent data blocks is executed 4 --- 12 steps

Pay attention to the data upload process. After the client transmits the data to the first node, if the node is down during this process, or other conditions occur that cause the data to be unable to upload normally, then the data upload will fail, and the master node Metadata will be re-planned, and then uploaded again. If the client has successfully uploaded the data block to the first node, then the data block is uploaded successfully, and the copy of the data is transmitted to the first node. When other nodes are on, a node that receives a copy of the data is down, and it will not affect the first node to return the successful information of the data upload to the client. In the follow-up, the copy of the data will be stored in the heartbeat mechanism of the datanode and namenode. On other nodes.
 

2 Reading data flow (download)

  1. When the client downloads data, it will first initiate a request to the namenode.
  2. When namenode receives the client's request to download data, it returns the metadata information of the data and the successful request to the client.
  3. After the client receives the metadata information, it will parse the metadata information, and then request different nodes according to the metadata information.
  4. After the datanode receives the client's request, the client starts to download the data

Guess you like

Origin blog.csdn.net/qq_37933018/article/details/107225930