四十一、SecondaryNameNode目录结构、nameNode故障恢复

版权声明:本文为博主原创文章,未经博主允许欢迎转载,请注明原文链接。一起交流,共同进步。 https://blog.csdn.net/newbie_907486852/article/details/83214191

                                      SecondaryNameNode目录结构、nameNode故障恢复

1、 SecondaryNameNode目录结构

Secondary NameNode用来监控HDFS状态的辅助后台程序,每隔一段时间获取HDFS元数据的快照。
在/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/current这个目录中查看SecondaryNameNode目录结构:

edits_0000000000000000001-0000000000000000002
fsimage_0000000000000000002
fsimage_0000000000000000002.md5
VERSION

SecondaryNameNode的namesecondary/current目录和主namenode的current目录的布局相同。
好处:在主namenode发生故障时(假设没有及时备份数据),可以从SecondaryNameNode恢复数据。

2、nameNode故障处理一:将SecondaryNameNode中数据拷贝到namenode存储数据的目录

(1)kill -9 namenode进程号

[admin@hadoop14 current]$ jps
5268 DataNode
6409 Jps
5545 NodeManager
5131 NameNode
[admin@hadoop14 current]$ kill -9 5131
[admin@hadoop14 current]$ jps
5268 DataNode
5545 NodeManager
6442 Jps

(2)删除namenode存储的数据(/opt/module/hadoop-2.7.2/data/tmp/dfs/name)

[admin@hadoop14 name]$ ll
总用量 8
drwxrwxr-x. 2 admin admin 4096 10月 20 16:18 current
-rw-rw-r--. 1 admin admin   13 10月 20 14:49 in_use.lock
[admin@hadoop14 name]$ rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/name/*
[admin@hadoop14 name]$ ll
总用量 0

(3)拷贝SecondaryNameNode中数据到原namenode存储数据目录

[admin@hadoop16 namesecondary]$ rsync -rvl /opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/* admin@hadoop14:/opt/module/hadoop-2.7.2/data/tmp/dfs/namesending incremental file list
in_use.lock
current/
current/VERSION
current/edits_0000000000000000001-0000000000000000008
current/edits_0000000000000000009-0000000000000000010
current/fsimage_0000000000000000008
current/fsimage_0000000000000000008.md5
current/fsimage_0000000000000000010
current/fsimage_0000000000000000010.md5

sent 2440 bytes  received 168 bytes  1738.67 bytes/sec


(4)重新启动namenode

[admin@hadoop16 namesecondary]$ sbin/hadoop-daemon.sh start namenode

或者:

[admin@hadoop14 hadoop-2.7.2]$ sbin/stop-dfs.sh 
Stopping namenodes on [hadoop14]
hadoop14: no namenode to stop
hadoop15: stopping datanode
hadoop14: no datanode to stop
hadoop16: stopping datanode
Stopping secondary namenodes [hadoop16]
hadoop16: stopping secondarynamenode
[admin@hadoop14 hadoop-2.7.2]$ sbin/start-dfs.sh 
Starting namenodes on [hadoop14]
hadoop14: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-admin-namenode-hadoop14.out
hadoop15: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-admin-datanode-hadoop15.out
hadoop14: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-admin-datanode-hadoop14.out
hadoop16: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-admin-datanode-hadoop16.out
Starting secondary namenodes [hadoop16]
hadoop16: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-admin-secondarynamenode-hadoop16.out

3、nameNode故障处理二:使用-importCheckpoint选项启动namenode守护进程,从而将SecondaryNameNode中数据拷贝到namenode目录中

(1)删除namenode存储的数据(/opt/module/hadoop-2.7.2/data/tmp/dfs/name)

[admin@hadoop14 name]$ ll
总用量 8
drwxrwxr-x. 2 admin admin 4096 10月 20 16:18 current
-rw-rw-r--. 1 admin admin   13 10月 20 14:49 in_use.lock
[admin@hadoop14 name]$ rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/name/*
[admin@hadoop14 name]$ ll
总用量 0

(2)杀死NameNode进程

[admin@hadoop14 hadoop-2.7.2]$ jps 
7447 Jps
5545 NodeManager
7213 DataNode
7070 NameNode
[admin@hadoop14 hadoop-2.7.2]$ kill -9 7070

(3)修改hdfs-site.xml文件

[admin@hadoop14 hadoop-2.7.2]$ sudo vim etc/hadoop/hdfs-site.xml

添加:

<property>
  <name>dfs.namenode.checkpoint.period</name>
  <value>120</value>
</property>

<property>
  <name>dfs.namenode.name.dir</name>
  <value>/opt/module/hadoop-2.7.2/data/tmp/dfs/name</value>
</property>

(4)如果SecondaryNameNode不和Namenode在一个主机节点上,需要将SecondaryNameNode存储数据的目录(namesecondary )拷贝到Namenode存储数据的平级目录(name)。

[admin@hadoop16 namesecondary]$ rsync -rvl /opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary admin@hadoop14:/opt/module/hadoop-2.7.2/data/tmp/dfs
sending incremental file list
namesecondary/
namesecondary/in_use.lock
namesecondary/current/
namesecondary/current/VERSION
namesecondary/current/edits_0000000000000000001-0000000000000000008
namesecondary/current/edits_0000000000000000009-0000000000000000010
namesecondary/current/edits_0000000000000000011-0000000000000000012
namesecondary/current/fsimage_0000000000000000010
namesecondary/current/fsimage_0000000000000000010.md5
namesecondary/current/fsimage_0000000000000000012
namesecondary/current/fsimage_0000000000000000012.md5

sent 2589 bytes  received 191 bytes  5560.00 bytes/sec
total size is 1820  speedup is 0.65

结果:

[admin@hadoop14 dfs]$ ll
总用量 8
drwx------. 3 admin admin 4096 10月 20 14:49 data
drwxrwxr-x. 2 admin admin 4096 10月 20 16:45 name
[admin@hadoop14 dfs]$ ll
总用量 12
drwx------. 3 admin admin 4096 10月 20 14:49 data
drwxrwxr-x. 2 admin admin 4096 10月 20 16:45 name
drwxrwxr-x. 3 admin admin 4096 10月 20 16:55 namesecondary

(5)导入检查点数据(等待一会ctrl+c结束掉)

这个操作会将将namesecondary目录的数据拷贝到name目录下

[admin@hadoop14 dfs]$ bin/hdfs namenode -importCheckpoint

(6)启动namenode

[admin@hadoop14 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-admin-namenode-hadoop14.out

(7)如果提示文件锁了,可以删除in_use.lock

[admin@hadoop14 hadoop-2.7.2]$ rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/in_use.lock

猜你喜欢

转载自blog.csdn.net/newbie_907486852/article/details/83214191