HDFS write process

Enter image description

1. Client calls the create method of the DistributedFileSystem object to create a file output stream (FSDataOutputStream) object.

2. Make an RPC remote call with the NameNode of the Hadoop cluster through the DistributedFileSystem object, and create a file entry (Entry) in the Namespace of HDFS without any Block.

3. Write data to the DataNode through the FSDataOutputStream object. The data is first written into the Buffer inside the FSDataOutputStream object, and then the data is divided into Packet packets.

4. In the smallest unit of Packet, based on the socket connection, it is sent to a node in a group of DataNodes (normally 3, may be greater than or equal to 1) in the HDFS cluster selected according to a specific algorithm, and transmitted in sequence on the Pipeline composed of this group of DataNodes Packet.

5. The Pipeline composed of this group of DataNodes sends the ack in the opposite direction, and finally the first DataNode in the Pipeline sends the Pipeline ack to the Client.

6. After writing data to the file, the Client calls the close method on the file output stream (FSDataOutputStream) object to close the stream.

7. Call the complete method of the DistributedFileSystem object to notify the NameNode that the file was successfully written.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325474472&siteId=291194637