【Hadoop】Hadoop HA切换NameNode报错

【问题描述】

今天在配置Hadoop高可用的时候,发生了如下玄学:当我手动杀死Active的NameNode的时候,作为Standby的NameNode并没有自动的转换为Active。这就非常邪乎了,于是我查看了对应的zkfc的日志信息,发现了如下的错误:

java.net.ConnectException: Call From slave01/192.168.0.83 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
	at org.apache.hadoop.ipc.Client.call(Client.java:1479)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
	at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
	at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
	at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
	at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
	at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
	at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
	at org.apache.hadoop.ipc.Client.call(Client.java:1451)
	... 14 more
2020-12-07 07:47:36,966 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2020-12-07 07:47:36,966 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2020-12-07 07:47:36,967 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to master...
2020-12-07 07:47:36,967 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to master port 22
2020-12-07 07:47:36,969 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
2020-12-07 07:47:36,986 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_7.4
2020-12-07 07:47:36,987 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.42
2020-12-07 07:47:36,987 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
2020-12-07 07:47:36,990 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-sha1 none
2020-12-07 07:47:36,990 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-sha1 none
2020-12-07 07:47:36,994 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
2020-12-07 07:47:36,994 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
2020-12-07 07:47:37,005 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
2020-12-07 07:47:37,005 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'master' (RSA) to the list of known hosts.
2020-12-07 07:47:37,005 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
2020-12-07 07:47:37,005 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
2020-12-07 07:47:37,006 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
2020-12-07 07:47:37,007 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
2020-12-07 07:47:37,008 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password
2020-12-07 07:47:37,008 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic
2020-12-07 07:47:37,013 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password
2020-12-07 07:47:37,013 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey
2020-12-07 07:47:37,075 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey).
2020-12-07 07:47:37,076 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to master
2020-12-07 07:47:37,076 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 9000
2020-12-07 07:47:37,198 WARN org.apache.hadoop.ha.SshFenceByTcpPort: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 9000 via ssh: bash: fuser: command not found
2020-12-07 07:47:37,199 INFO org.apache.hadoop.ha.SshFenceByTcpPort: rc: 127
2020-12-07 07:47:37,199 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from master port 22
2020-12-07 07:47:37,199 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2020-12-07 07:47:37,199 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2020-12-07 07:47:37,199 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed
2020-12-07 07:47:37,200 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at master/192.168.0.82:9000
	at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
	at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
	at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
	at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2020-12-07 07:47:37,200 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2020-12-07 07:47:37,204 INFO org.apache.zookeeper.ZooKeeper: Session: 0x3763a7135750008 closed
2020-12-07 07:47:38,205 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave01:2181,slave02:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@27db1ce7
2020-12-07 07:47:38,207 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave02/192.168.0.84:2181. Will not attempt to authenticate using SASL (unknown error)

于是,我又重新检查了hosts文件已经9000端口等等,但是都没有发现问题……

【解决办法】

后来发现是未找到fuster程序,导致无法进行fence,psmisc软件包中包含了fuster程序,需要在包含有NameNode的机器上执行如下命令:

yum -y install psmisc

安装完成后,再次测试,发现问题已经解决~

猜你喜欢

转载自blog.csdn.net/gdkyxy2013/article/details/110739824
今日推荐