查看Namenode的状态,两台Namenode只剩下一台了,到挂的那台看日志
2016-08-09 16:33:51,526 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486] 2016-08-09 16:33:52,169 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds 2016-08-09 16:33:52,526 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486] 2016-08-09 16:33:53,527 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
。。。。。。。。
2016-08-09 16:34:04,541 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486] 2016-08-09 16:34:05,525 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.80.248.17:8486, 10.80.248.18:8486, 10.80.248.19:8486], stream=QuorumOutputStream starting at txid 2947)) java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. 。。。org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 2016-08-09 16:34:05,526 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 2947 2016-08-09 16:34:05,600 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2016-08-09 16:34:05,733 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
以上是hadoop-hadooptest-namenode-ut07.log在Namenode退出时候的关键日志,可以从中发现,Namenode在写Journalnode发生了超时,默认的超时时间为20秒,而在超时发生后,Namenode会触发ExitUtil类的terminate 方法,导致进程的System.exit()
在hadoop/etc/hadoop下的hdfs-site.xml中,加入一组配置:
<property> <name>dfs.qjournal.write-txns.timeout.ms</name> <value>60000</value> </property>
最后记得重启整个集群,这样配置才能生效。
友情提示:使用了Flume的同学,记得也要重启Flume集群