Hadoop-HDFS data read and write process (detailed process and diagram)

Network topology-node distance

    Node distance : the sum of the distances between two nodes to the nearest common ancestor.
Insert picture description here
Insert picture description here

    If the node n1 on the rack r1 in the data center d1 is defined as /d1/r1/n1, then the following results will be obtained:

  1. distance(/d1/r1/n1, /d1/r1/n1) = 0 (two applications on the same node)
  2. distance(/d1/r1/n1, /d1/r1/n2) = 2 (two nodes on the same rack)
  3. distance(/d1/r1/n1, /d1/r2/n3) = 4 (two nodes on different racks in the same data center)
  4. distance(/d1/r1/n1, /d2/r3/n4) = 6 (two nodes in different data centers)

Reading process

    As shown in the figure below, suppose the HDFS client needs to read the file one.txt , and the file one.txt is divided into two data blocks, BlockA and BlockB . The number of copies is 3. BlockA is stored in D2, D5, D7; BlockB is stored in D3, D9, D11.
Insert picture description here

HDFS client communicates with NameNode

  • Because the NameNode stores the metadata of the data block of one.txt, the client needs to send a request to the NameNode for a list of DataNodes that store the data block of one.txt.
  • NameNode receives the request from the client and first checks the permissions of the client user. If the client user has sufficient permissions, the NameNode then checks whether the requested file exists. If it exists, the NameNode will send a list of DataNodes that store data blocks of one.txt (the DataNodes in the list are arranged in descending order of the distance from the client to the DataNode). At the same time, the NameNode will give the client a security token . When the client accesses the DataNode, it needs to use the token for authentication.

HDFS client communicates with DataNode

  • When the client receives the DataNode list sent by the NameNode, the client directly communicates with the DataNode. Send a request to the DataNode (BlockA in D2 and BlockB in D3) closest to the client through the FSDataInputStream object. Among them, DFSInputStream manages the communication between the client and the DataNode.
  • The client user shows the security token given by the NameNode to the DataNode, and then starts to read data from the DataNode, and the data will flow from the DataNode to the client in a stream.
  • After reading all data blocks, the client calls the close() method to close the FSDataInputStream.

The detailed process of HDFS reading data

  1. The client calls FileSystem's open() method (DistributedFileSystem implements FileSystem in HDFS file system).
  2. DistributedFileSystem remotely calls the NameNode through RPC to request the location of the DataNode that stores the first few data blocks of one.txt. Then the NameNode returns the addresses of all DataNodes that store the data block (NameNode sorts the DataNodes according to the distance from the client)
  3. After open(), DistributedFileSystem returns an FSDataInputStream input stream to the client. For HDFS, the specific input stream is DFSInputStream , and DistributedFileSystem will use DFSInputStream to instantiate FSDataInputStream.
  4. The client calls the read() method to read the data. The input stream DFSInputStream selects the DataNode closest to the client to establish a connection and read data according to the sorting results obtained. Then the data flows from the DataNode to the client in the form of a stream.
  5. When the data block is read, DFSInputStream closes the connection with the DataNode. Then send a request to the NameNode to obtain the location of the next data block (if the client cache already contains the information of the data block, there is no need to request the NameNode)
  6. Find the best DataNode for the next data block and read the data.
  7. If an error occurs between the DFSInputStream and the communicating DataNode during the reading process, the client will select another DataNode closest to it. DFSInputStream will also mark those DataNodes that have failed to avoid reading these DataNodes in the next block. At the same time, DFSInputStream will verify the checksum of the data. If an error is found, it will report to the NameNode and then read other DataNodes.
  8. When the client reads all the data, it calls close() of FSDataInputStream to close the input stream.

Code

public static void read() throws IOException, InterruptedException, URISyntaxException {
    
    
	Configuration conf = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://master:9000"), conf , "root");
	FSDataInputStream fis = fs.open(new Path("/picture.jpg"));
	FileOutputStream fos = new FileOutputStream("picture.jpg");
	int numBytes = 0;
	byte[] buffer = new byte[1024];
	while ((numBytes = fis.read(buffer)) > 0) {
    
    
		fos.write(buffer, 0, numBytes);
	}
	fis.close();
	fos.close();
	fs.close();
}

Write data

    In a write operation, DFSOutputStream maintains two queues, namely a data queue (data queue) and the acknowledgment queue (ack queue).
    Suppose you want to write the file two.txt to HDFS (pretend that NameNode has already written the information of two.txt and DataNode has already written data. The static diagram is a bit difficult to draw these two processes, just understand the meaning)
Insert picture description here

HDFS client communicates with NameNode

  • In order to write the file two.txt to HDFS, the client needs to initiate a request to the NameNode. The NameNode first checks the permissions of the client user. If the user has sufficient permissions and there is no file with the same name, the NameNode creates a corresponding record (metadata) for the file. If the file already exists, the file creation fails and an IOException is reported.
  • The NameNode provides the client with the addresses of all the DataNodes that can write the file two.txt, and the NameNode will give the client a security token. Before the client writes the DataNode, the token is used for authentication.

HDFS client communicates with DataNode

  • After the client receives the DataNode list and allowed write permission from the NameNode, the client obtains the DataNode list and allowed write permission, and directly writes data to the first DataNode in the list. When a DataNode is writing, it will make a copy to another DataNode according to the copy number factor. If the copy number factor is 3, then there are at least 3 copies of data blocks on different DataNodes.
  • When the copy is created, the DataNode will send a confirmation message to the client. A pipeline is formed between each DataNode.

The detailed process of HDFS reading data

  1. The client calls the create() method of FileSystem ( DistributedFileSystem implements FileSystem in HDFS file system).
  2. DistributedFileSystem calls NameNode remotely through RPC to create a new file two.txt in the namespace of the file system.
  3. The NameNode checks the permissions of the client user and also checks whether the file two.txt exists. If the user has sufficient permissions and there is no file with the same name, the NameNode creates the file and adds the file information.
  4. After the remote method call is over, DistributedFileSystem will return an output stream FSDataOutputStream to the client, and this object encapsulates a DFSOutputStream. The client uses write() to write data to HDFS.
  5. When the client starts to write data, DFSOutputStream divides the client's data into several data packets, and writes these packets into an internal data queue. Among them, DataStreamer uses this data queue to apply to the NameNode for several DataNodes that save files and duplicate data blocks. These DataNodes will form a data flow pipeline, and the number of DataNodes in the pipeline is determined by the number of replicas. DataStreamer sends the data packet to the first DataNode in the pipeline, and the first DataNode saves the data packet and sends it to the second DataNode. Similarly, the second DataNode saves and sends it to the next DataNode, and so on.
  6. DFSOutputStream also maintains a confirmation queue for receiving confirmation information from the DataNode. Once the DataNode creates a copy, it will send confirmation to ensure data integrity. The confirmation information flows upstream along the data flow pipeline, passes through each DataNode in turn, and is finally sent to the client. When the client receives the confirmation, it deletes the corresponding packet from the confirmation queue until all confirmations are received.
  7. The client calls the close() method to close the output stream, it will flush all the remaining data in the DataNode pipeline and wait for confirmation. When the confirmation queue of DFSOutputStream receives all confirmations, it will call complete() to tell the NameNode to write the file.

What happens if the DataNode fails while writing data

    When writing data, if the DataNode fails, the following operations will occur, and these operations are transparent to the client.

  1. The pipeline will be closed, and the data packets in the confirmation queue will be added to the front of the data queue to ensure that the data packets that have been sent to the DataNode are not lost.
  2. The current data block on the normal DataNode obtains a new identifier, and then this identifier is sent to the NameNode so that the failed DataNode can recover and the data block on this DataNode can be deleted.
  3. Delete the failed DataNode from the pipeline, and construct a new pipeline for the remaining normal DataNodes, and the remaining data will be written to the normal DataNode through the new pipeline.
  4. When the NameNode detects that the data block is not sufficiently replicated, it will arrange to create an additional copy on another DataNode. The other upcoming data blocks are written normally.

Code

public static void write() throws IOException, InterruptedException, URISyntaxException {
    
    
	Configuration conf = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://master:9000"), conf , "root");
	FileInputStream fis = new FileInputStream("picture.jpg");
	FSDataOutputStream fos = fs.create(new Path("/picture.jpg"));
	int numBytes = 0;
	byte[] buffer = new byte[1024];
	while ((numBytes = fis.read(buffer)) > 0) {
    
    
		fos.write(buffer, 0, numBytes);
	}
	fis.close();
	fos.close();
	fs.close();
}

    If there is an error, please point out (ง •̀_•́)ง (*•̀ㅂ•́)و

    

Guess you like

Origin blog.csdn.net/H_X_P_/article/details/105777251