Hadoop series (a) - HDFS distributed file system

I. INTRODUCTION

The HDFS ( the Hadoop Distributed File the System ) is a distributed file system in the Hadoop, with high fault tolerance, high throughput characteristics, can be deployed on low-cost hardware.

Two, HDFS design principles

https://github.com/heibaiying

2.1 HDFS architecture

HDFS follow the master / slave architecture, a single NameNode (NN) and a plurality of DataNode (DN) consisting of:

  • The NameNode : responsible for the implementation of related 文件系统命名空间operations, such as opening, closing, renaming files and directories. It is also responsible for storing the metadata cluster, the recording position information of each file data block.
  • DataNode : responsible for providing read and write requests from the file system client creates an execution block, delete and other operations.

2.2 file system namespace

HDFS's 文件系统命名空间hierarchy similar to most file systems (such as Linux), support the creation of directories and files, move, delete and rename files, support for user configuration and access rights, but does not support hard links and soft connection. NameNodeResponsible for maintaining the file system name space to record any changes to the namespace or its properties.

2.3 Data Replication

Since Hadoop was designed to run on low-cost machines, which means that the hardware is not reliable, in order to ensure fault tolerance, HDFS provides data replication mechanism. Each of HDFS file is stored as a series of blocks , each block is composed of a plurality of copies to ensure fault tolerance, the block size and replication factor can configure itself (by default, the block size is 128M, the default is the replication factor 3).

https://github.com/heibaiying

2.4 data replication implementation principle

大型的 HDFS 实例在通常分布在多个机架的多台服务器上,不同机架上的两台服务器之间通过交换机进行通讯。在大多数情况下,同一机架中的服务器间的网络带宽大于不同机架中的服务器之间的带宽。因此 HDFS 采用机架感知副本放置策略,对于常见情况,当复制因子为 3 时,HDFS 的放置策略是:

在写入程序位于 datanode 上时,就优先将写入文件的一个副本放置在该 datanode 上,否则放在随机 datanode 上。之后在另一个远程机架上的任意一个节点上放置另一个副本,并在该机架上的另一个节点上放置最后一个副本。此策略可以减少机架间的写入流量,从而提高写入性能。

<div align="center"> <img src="https://raw.githubusercontent.com/heibaiying/BigData-Notes/master/pictures/hdfs-机架.png"/>; </div>
如果复制因子大于 3,则随机确定第 4 个和之后副本的放置位置,同时保持每个机架的副本数量低于上限,上限值通常为 (复制系数 - 1)/机架数量 + 2,需要注意的是不允许同一个 dataNode 上具有同一个块的多个副本。

2.5 副本的选择

为了最大限度地减少带宽消耗和读取延迟,HDFS 在执行读取请求时,优先读取距离读取器最近的副本。如果在与读取器节点相同的机架上存在副本,则优先选择该副本。如果 HDFS 群集跨越多个数据中心,则优先选择本地数据中心上的副本。

2.6 架构的稳定性

1. 心跳机制和重新复制

每个 DataNode 定期向 NameNode 发送心跳消息,如果超过指定时间没有收到心跳消息,则将 DataNode 标记为死亡。NameNode 不会将任何新的 IO 请求转发给标记为死亡的 DataNode,也不会再使用这些 DataNode 上的数据。 由于数据不再可用,可能会导致某些块的复制因子小于其指定值,NameNode 会跟踪这些块,并在必要的时候进行重新复制。

2. 数据的完整性

由于存储设备故障等原因,存储在 DataNode 上的数据块也会发生损坏。为了避免读取到已经损坏的数据而导致错误,HDFS 提供了数据完整性校验机制来保证数据的完整性,具体操作如下:

当客户端创建 HDFS 文件时,它会计算文件的每个块的 校验和,并将 校验和 存储在同一 HDFS 命名空间下的单独的隐藏文件中。当客户端检索文件内容时,它会验证从每个 DataNode 接收的数据是否与存储在关联校验和文件中的 校验和 匹配。如果匹配失败,则证明数据已经损坏,此时客户端会选择从其他 DataNode 获取该块的其他可用副本。

3.元数据的磁盘故障

FsImageEditLog 是 HDFS 的核心数据,这些数据的意外丢失可能会导致整个 HDFS 服务不可用。为了避免这个问题,可以配置 NameNode 使其支持 FsImageEditLog 多副本同步,这样 FsImageEditLog 的任何改变都会引起每个副本 FsImageEditLog 的同步更新。

4.支持快照

快照支持在特定时刻存储数据副本,在数据意外损坏时,可以通过回滚操作恢复到健康的数据状态。

三、HDFS 的特点

3.1 高容错

由于 HDFS 采用数据的多副本方案,所以部分硬件的损坏不会导致全部数据的丢失。

3.2 高吞吐量

HDFS 设计的重点是支持高吞吐量的数据访问,而不是低延迟的数据访问。

3.3 Large File Support

HDFS suitable for storing large files, the size of the document should be the level of GB to TB.

3.3 Simple consistency model

HDFS is more suitable for Write Once Read Many (write-once-read-many) access model. Support will append to the end of the file, but does not support random access to data, you can not add data from a file anywhere.

3.4 cross-platform portability

HDFS has good cross-platform portability, which allows other large data frame are calculated as the preferred embodiment persistent data storage.

Annex: storage principle diagram HDFS

Description: The following images referenced from blog: translation of classic comics to explain the principles of HDFS

1. HDFS write data Principle

https://github.com/heibaiying

https://github.com/heibaiying

https://github.com/heibaiying

2. HDFS data read Principle

https://github.com/heibaiying

3. HDFS fault type and detection method

https://github.com/heibaiying

https://github.com/heibaiying

Part II: read and write failure

https://github.com/heibaiying

Part III: DataNode Troubleshooting

https://github.com/heibaiying

A copy of the placement strategies :

https://github.com/heibaiying

Reference material

  1. Apache Hadoop 2.9.2 > HDFS Architecture
  2. Tom White. Hadoop Definitive Guide [M]. Tsinghua University Press. 2017.
  3. Translation classic cartoons to explain the principles of HDFS

More big data series can be found GitHub open source project : Big Data Getting Started

Guess you like

Origin blog.51cto.com/14183932/2437387
Recommended