Four principles of the mechanism hdfs

First, the heartbeat mechanism

1, reported heartbeat concept

datanode periodically send reports to the heartbeat namenode, the purpose is to tell namenode own survival situation and the available space. The default time is 3 seconds.

2, the specific role of the heartbeat report

( 1) report their own survival status and available space to namenode

( 2) reports to namenode transmission block, the information block stored on each of a report to do datanode namenode.

3, the position of the metadata stored namenode

( 1 ) hard disk

/home/hadoop/data/hadoopdata/name/current

It includes 3 parts:

A, abstract tree

A correspondence relationship B, and they, and block

Storage position C, the data block

( 2 ) Memory

Real time operation of read and write operations of meta data. Metadata information in the first memory contains only an abstract tree, partial data and the corresponding relationship block (/hadoop-2.7.6.tar.gz [blk01: [], blk02: []]) does not contain the storage position of the block of.

When the user needs to read from the memory location block, block information is time to send datanode namenode heartbeat reporting (report block) acquired

hdp01 blk01 blk02 ---> namenode

hdp03 blk01 blk02 ---> namenode

/hadoop-2.7.6.tar.gz [blk01:[hdp01,hdp03],blk02:[hdp01,hdp03]]

4, confirm datanode downtime

( 1 ) stops sending heartbeat Report

Default consecutive 10 beats to accept less than that is a continuous 10 * 3 = 30s onward. This is 10 times as long as the middle once again received a record heartbeat.

( 2 ) NameNode the transmission check

In continuous after 10 namenode not receive the heartbeat of datanode report, namenode concluded datanode may freeze up, namenode initiative to send checks namenode will open the background guard (blocking) the process of waiting for test results to the datanode. namenode check datanode time: default 5min.

Default checks twice for two consecutive inspections to check 5min (10min) confirmed that no response is down datanode (a transmission wait 5 minutes). namenode confirmed a total time required downtime datanode: 10 * 3s + 300s * 2 = 630s

Second, the strategy Rack

1 copy storage strategy

How to decide the plurality of copies of each data block stored, HDFS default copy of each block 3 has, on different nodes of each block is stored in multiple copies.

2, server room

u-server, u is a unit of the so-called "1U of PC servers" is shape to meet the EIA specifications, the thickness of 4.445cm products.

3, the default case

Copy of the default 3, the frame 2 by default, 10 nodes. Multiple copies of the placement strategy:

( 1) the first copy on the node where the client

Objective To prevent duplicate data block is not successful upload

The first copy of the guarantee the greatest degree of success upload

If the client is not a cluster node randomly select a node

( 2) second copy on any node in the first copy of a different rack

Objective: To ensure data security

Prevent the rack from the overall power failure off the net

( On a different node 3) on a third copy and a second copy of the same rack

Objective: to facilitate the transmission to enhance the efficiency of data transmission

4, the actual production

( 1) a plurality of racks

Rack possible 3,4,5,6,7 ....

A copy of the deposit occurred adjustment

Copy 3

Each deposit a rack

( Multiple 2) room

Copy of the data stored in different rooms

Different rooms, different racks, different nodes

Rack store policy to be configured, by default without configuration different copies stored in different nodes.

Third, load balancing

1, the concept of

hadoop cluster plurality datanode accounted datanode data stored in each and every considerable load balancing datanode hardware-related.

The total storage cluster of 60g

hdp01 50g 20G 40%

hdp02 50g 20g 40%

hdp03 50g 20g 40%

2, the default load balancing

For hadoop cluster for load namenode periodically checks the cluster If you find that start under load load balancing cluster node datanode disproportionate automatic load balancing, data movement on a large proportion of datanode storage node to store a small proportion the datanode node.

hdp01 50g 40G 80%

hdp02 50g 20g 20%

hdp03 50g 0g 0%

The hdp01 hdp03 data block on the underlying network transport movement (because it is transmitted between different nodes) transmitting data on the transmission through the network hdp01 deleted on hdp03 then hdp01.

By default, this bandwidth is small: default bandwidth 1M / S

3, manual load balancing

The default load balancing, if the nodes in the cluster rarely default load balancing with no issues (less things to be transmitted). If the cluster nodes lot of time, the default load balance is difficult to achieve the demand (something to be transmitted more), it is necessary to activate the manual load balancing: start-balancer.sh -t 10%.

( 1) This command will not be executed immediately jvm similar to garbage collection, free time to remind the cluster immediately accelerate the implementation of efficiency.

（2）-t 10%

The final stop on behalf load balancing requirements, there is no absolute load balancing only relative. We are talking about the proportion of poor load balancing within an acceptable range -t parameter specifies the maximum node is stored - the smallest proportion of poor.

hdp01 50g 42%

hdp02 50g 40%

hdp03 50g 38%

-t 10 10% 42% -38% = 4% load balancing that

( 3) This command must be performed manually with the bandwidth adjustment is out of free time as soon as possible to implement and achieve load balancing, the bandwidth will generally turn up a little.

Fourth, Safe Mode

1, the concept of

In a trunking mode of self-protection, do not allow the user to re-cluster related operations under safe mode.

2, into safe mode

( 1 ) the cluster starts

Boot sequence clusters:

namenode --- "datanode ---" secondarynamenode cluster restart namenode and datanode time has been in safe mode.

A, start namenode

hdfs metadata storage: disk, persistent storage abstract tree, correspondence data and the block stored position data pieces; the memory can not be persistent, i.e. all cleared off the cluster when, at the time of opening the cluster reloading in from disk, the correspondence relationship between the abstract tree load, and the data block.

Before starting hdfs no memory metadata information, namenode start when the metadata is loaded into memory disk copy, in order to quickly load only loads the correspondence between abstract tree, data and metadata information blocks.

B, start datanode

每一个启动完成datanode，这个datanode立即向namenode发送心跳发送块报告信息，namenode接受datanode的心跳报告统计块报告添加上块的每一个副本的存储节点块的存储位置。

C、启动secondarynamenode

secondarynamenode启动完成向namenode发送心跳

集群启动过程中执行步骤1步骤2的时候进行namenode元数据的完善阶段这个时候集群不能对外提供服务的处于自我保护的状态安全模式。

（2）集群运行过程

集群的块的汇报率 < 99.9%

datanode节点的汇报个数小于设置

namenode的元数据的存储目录的大小 <100m

3、离开安全模式

集群处于安全模式的时候对元数据相关信息做哪一些检查

（1）每一个数据块的最小副本个数，保证有一个可用就可用

（2）可用数据块的汇报率，保证集群中的99.9%的数据块是可用的（每一个数据块只保证1个副本）

（3）可用的最小节点个数，默认0个

（4）安全模式的停留时长（指定系统退出安全模式时需要的延迟时间，默认为30(秒)）

（5）检查元数据存储的文件夹大小，默认100M

符合以下的要求才会离开安全模式：

（1）数据块的汇报率（每一个数据块最少保证1个副本）达到>=99.9%

（2）datanode的节点个数达到配置要求的时候默认是0个

（3）当前两个条件都满足的时候保证安全模式30s 保证保证集群数据汇报稳定的时候

（4）保证namenode每一个元数据的文件夹的存储预留大小达到100M以上

以上4个条件全部同时满足的时候会退出安全模式。

4、手动进入安全模式

一般情况下集群升级的时候hadoop2.7—》hadoop2.8 一般将集群进入安全模式，进行集群维护。

hdfs dfsadmin -safemode get/leave/enter/wait （在hadoop2.0之前可以用hadoop来代替hdfs）

（1）hdfs dfsadmin -safemode get 获取集群的当前的安全模式的状态

Safe mode is OFF 安全模式关闭的

（2）hdfs dfsadmin -safemode enter 进入安全模式

Safe mode is ON

（3）hdfs dfsadmin -safemode leave 离开安全模式

（4）hdfs dfsadmin -safemode wait 等待安全模式离开了解

5、安全模式下的操作

（1）可执行

hadoop fs -ls /

hadoop fs -get /hadoop6 /home/hadoop/apps

hadoop fs -cat /hdfs-site.xml

hadoop fs -tail /hdfs-site.xml

这些操作查询相关的，只要不修改元数据的操作都可以执行（元数据1）目录树 2）文件--块 3）块的位置）

（2）不可执行

只要修改元数据的操作都不能执行

hadoop fs -put 本地 hdfs

hadoop fs -mkdir /dd

hadoop fs -touchz /rr

hadoop fs -rm -r -f

....