What is HDFS?

Hadoop Distributed File System (HDFS) is a distributed file system designed to run on general-purpose hardware. HDFS is a highly fault-tolerant system suitable for deployment on inexpensive machines. It provides high-throughput data access and is very suitable for applications on large-scale datasets. To understand the inner workings of HDFS, it is first necessary to understand what a distributed file system is.

1. Distributed file system

Multiple computers networked to work together (sometimes called a cluster) solve a problem like a single system, and such a system is called a distributed system .

Distributed file systems are a subset of distributed systems, and the problem they solve is data storage. In other words, they are storage systems spanned across multiple computers. Data stored on a distributed file system is automatically distributed across different nodes.

Distributed file systems have broad application prospects in the era of big data, and they provide the required scalability for storing and processing very large-scale data from the network and elsewhere.

2. Separation of metadata and data: NameNode and DataNode

Every file stored into the file system has associated metadata. Metadata includes file name, inode number, data block location, etc., while data is the actual content of the file.

In traditional file systems, metadata and data are stored on the same machine because the file system does not span multiple machines.

In order to build a distributed file system that is simple for clients to use and does not need to be aware of the activities of other clients, metadata needs to be maintained outside of the client. The design philosophy of HDFS is to take out one or more machines to hold the metadata and let the remaining machines hold the contents of the files.

NameNode and DataNode are the two main components of HDFS. Among them, the metadata is stored on the NameNode, and the data is stored on the cluster of DataNodes. The NameNode not only manages the metadata of the content stored on HDFS, but also records things like which nodes are part of the cluster, how many copies of a file are there, etc. It also has to decide what the system needs to do when a node in the cluster goes down or a copy of the data is lost.

Each data slice stored on HDFS has multiple copies (replica) stored on different servers. In essence, the NameNode is the Master (master server) of HDFS, and the DataNode is the Slave (slave server).

3. HDFS writing process

The NameNode is responsible for managing the metadata of all files stored in HDFS, it will confirm the client's request, and record the name of the file and the set of DataNodes that store the file. It stores this information in an in-memory file allocation table.

For example, a client sends a request to the NameNode saying that it wants to write the "zhou.log" file to HDFS. Then, its execution flow is shown in Figure 1. Specifically:

Step 1: The client sends a message to the NameNode, saying that it wants to write the "zhou.log" file. (① in Figure 1)

Step 2: NameNode sends a message to the client, asking the client to write to DataNodes A, B, and D, and contact DataNode B directly. (② in Figure 1)

Step 3: The client sends a message to DataNode B, asking it to save a "zhou.log" file and send a copy to DataNode A and DataNode D. (3 in Figure 1)

Step 4: DataNode B sends a message to DataNode A, asking it to save a "zhou.log" file and send a copy to DataNode D. (4 in Figure 1)

Step 5: DataNode A sends a message to DataNode D, asking it to save a "zhou.log" file. (⑤ in Figure 1)

Step 6: DataNode D sends a confirmation message to DataNode A. (⑤ in Figure 1)

Step 7: DataNode A sends a confirmation message to DataNode B. (4 in Figure 1)

Step 8: DataNode B sends a confirmation message to the client, indicating that the writing is complete. (6 in Figure 1)

 

Figure 1 Schematic diagram of HDFS writing process

In the design of distributed file systems, one of the challenges is how to ensure data consistency. For HDFS, data is not considered complete until all DataNodes holding the data have confirmed that they have a copy of the file. Therefore, data consistency is done during the write phase. A client will get the same data no matter which DataNode it chooses to read from.

4. HDFS read process

In order to understand the reading process, it can be considered that a file is composed of data blocks stored on the DataNode. The execution flow for the client to view the previously written content is shown in Figure 2. The specific steps are:

Step 1: The client asks the NameNode where it should read the file from. (① in Figure 2)

Step 2: The NameNode sends the information of the data block to the client. (The data block information includes the IP address of the DataNode that saves the file copy, and the data block ID that the DataNode needs to find the data block on the local hard disk.) (2 in Figure 2)

Step 3: The client checks the data block information, contacts the relevant DataNode, and requests the data block. (3 in Figure 2)

Step 4: DataNode returns the file content to the client, and then closes the connection to complete the read operation. (④ in Figure 2)

 

Figure 2 Schematic diagram of HDFS read process

The client obtains data blocks of a file from different DataNodes in parallel, and then joins these data blocks to form a complete file.

5. Fast recovery of hardware failures through replicas

When everything is running normally, the DataNode will periodically send heartbeat information to the NameNode (every 3 seconds by default). If the NameNode does not receive a heartbeat within a predetermined time (10 minutes by default), it will think that the DataNode has a problem, remove it from the cluster, and start a process to restore the data. A DataNode may leave the cluster for a variety of reasons, such as hardware failure, motherboard failure, power aging, and network failure.

For HDFS, losing a DataNode means losing a copy of the data blocks stored on its hard disk. If more than one replica exists at any time (3 by default), a failure will not result in data loss. When a hard disk fails, HDFS will detect that the number of copies of data blocks stored on the hard disk is lower than the required number, and then actively create the required copies to reach the full number of copies.

6. Split files across multiple DataNodes

In HDFS, files are divided into data blocks, usually 64MB~128MB per data block, and then each data block is written to the file system. Different data blocks of the same file are not necessarily stored on the same DataNode. The advantage of this is that when operations are performed on these files, different parts of the file can be read and processed in parallel.

When a client prepares to write a file to HDFS and asks the NameNode where to write the file, the NameNode will tell the client which DataNodes can write the block. After writing a batch of data blocks, the client will return to the NameNode to obtain a new DataNode list, and write the next batch of data blocks to the DataNodes in the new list.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324777621&siteId=291194637