Introduction to hdfs and test of knowledge points

Distributed file system HDFS

Introduction to Distributed File System HDFS
Hadoop Distributed File System (Hadoop Distributed File System, HDFS) is a distributed file system with high reliability, high fault tolerance and high throughput. It is used to store and manage large-scale data sets, and can provide high-performance data reading and writing.

architecture

HDFS consists of two parts, NameNode and DataNode. Among them, NameNode is the master node, which is used to manage the namespace and directory structure of the entire file system; and DataNode is the data node, which is responsible for storing actual data blocks. In HDFS, each file is split into multiple data blocks, and these data blocks are copied to multiple DataNode nodes for backup to improve data fault tolerance and availability.

features

HDFS has the following characteristics:

High reliability: HDFS improves data reliability by replicating data to multiple DataNode nodes for backup. Even if a server fails, it can be recovered through the data on other nodes.

High fault tolerance: HDFS is designed to take into account many factors such as hardware failures and software errors, so it can still maintain normal operation when it suffers certain failures. For example, when the NameNode node fails, HDFS can restart the service and perform data recovery through the standby node.

High throughput: HDFS is optimized for large data sets, so it can provide high-performance data read and write services. At the same time, it also supports features such as concurrent access and data access locality, which further improves the throughput of the system.

Suitable for application development with large data sets: HDFS is designed to handle large amounts of data and can manage data sets known as over one million or billion level files.

Data locality: HDFS has the ability to move data to computing nodes, instead of moving computing to where the data resides like in traditional computing models. This allows for faster completion of large-scale data processing tasks.

Simplicity: One of the design goals of HDFS is to be easy to deploy and use. Users only need to understand simple file system operations, and it is easy to integrate with other systems in actual operation.

Overall, HDFS is one of the core components of big data storage and processing with its fault tolerance, high throughput, high efficiency, scalability and simplicity.
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
Well, it's over here!
insert image description here

Guess you like

Origin blog.csdn.net/weixin_48676558/article/details/130694257