When a dataset outgrows(过大而不适用于) the storage capacity of a single physical machine, it becomes necessary to partition(分割分布) it across a number of separate machines(大量独立的机器).
Filesystems that manage the storage across a network of machines are called distributed filesystems.
Since they are network based(由于基于网路), all the complications of network programming kick in, thus making distributed filesystems more complex than regular disk filesystems.
For example, one of the biggest challenges is making the filesystem tolerate node failure without suffering data loss(能够容忍节点故障而不丢失数据).
Hadoop comes with(自带) a distributed filesystem called HDFS, which stands for Hadoop Distributed Filesystem.
1.1 The Design of HDFS
HDFS is a filesystem designed for storing(用来存储) very large files with streaming data access
patterns(流式数据访问模式), running on clusters of commodity hardware(商用的硬件集群).