Namenode写Journalnode超时,导致Namenode挂掉

查看Namenode的状态,两台Namenode只剩下一台了,到挂的那台看日志

2016-08-09 16:33:51,526 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:52,169 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-08-09 16:33:52,526 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:53,527 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]

。。。。。。。。

2016-08-09 16:34:04,541 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:05,525 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.80.248.17:8486, 10.80.248.18:8486, 10.80.248.19:8486], stream=QuorumOutputStream starting at txid 2947))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
   。。。org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
    at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
2016-08-09 16:34:05,526 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 2947
2016-08-09 16:34:05,600 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-08-09 16:34:05,733 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 

以上是hadoop-hadooptest-namenode-ut07.log在Namenode退出时候的关键日志,可以从中发现,Namenode在写Journalnode发生了超时,默认的超时时间为20秒,而在超时发生后,Namenode会触发ExitUtil类的terminate 方法,导致进程的System.exit()

在hadoop/etc/hadoop下的hdfs-site.xml中,加入一组配置:

<property>
        <name>dfs.qjournal.write-txns.timeout.ms</name>
        <value>60000</value>
</property>

最后记得重启整个集群,这样配置才能生效。

友情提示:使用了Flume的同学,记得也要重启Flume集群

https://www.cnblogs.com/xyliao/p/5755438.html

猜你喜欢

转载自blog.csdn.net/yangbosos/article/details/88706808