[Hadoop] Hadoop HA switch NameNode error

【Problem Description】

When configuring Hadoop high availability today, the following metaphysics happened: When I manually killed the Active NameNode, the Standby NameNode did not automatically convert to Active. This is very wicked, so I checked the corresponding zkfc log information and found the following error:

java.net.ConnectException: Call From slave01/192.168.0.83 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
	at org.apache.hadoop.ipc.Client.call(Client.java:1479)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
	at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
	at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
	at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
	at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
	at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
	at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
	at org.apache.hadoop.ipc.Client.call(Client.java:1451)
	... 14 more
2020-12-07 07:47:36,966 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2020-12-07 07:47:36,966 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2020-12-07 07:47:36,967 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to master...
2020-12-07 07:47:36,967 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to master port 22
2020-12-07 07:47:36,969 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
2020-12-07 07:47:36,986 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_7.4
2020-12-07 07:47:36,987 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.42
2020-12-07 07:47:36,987 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
2020-12-07 07:47:36,989 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
2020-12-07 07:47:36,990 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-sha1 none
2020-12-07 07:47:36,990 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-sha1 none
2020-12-07 07:47:36,994 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
2020-12-07 07:47:36,994 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
2020-12-07 07:47:37,005 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
2020-12-07 07:47:37,005 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'master' (RSA) to the list of known hosts.
2020-12-07 07:47:37,005 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
2020-12-07 07:47:37,005 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
2020-12-07 07:47:37,006 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
2020-12-07 07:47:37,007 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
2020-12-07 07:47:37,008 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password
2020-12-07 07:47:37,008 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic
2020-12-07 07:47:37,013 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password
2020-12-07 07:47:37,013 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey
2020-12-07 07:47:37,075 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey).
2020-12-07 07:47:37,076 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to master
2020-12-07 07:47:37,076 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 9000
2020-12-07 07:47:37,198 WARN org.apache.hadoop.ha.SshFenceByTcpPort: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 9000 via ssh: bash: fuser: command not found
2020-12-07 07:47:37,199 INFO org.apache.hadoop.ha.SshFenceByTcpPort: rc: 127
2020-12-07 07:47:37,199 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from master port 22
2020-12-07 07:47:37,199 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2020-12-07 07:47:37,199 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2020-12-07 07:47:37,199 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed
2020-12-07 07:47:37,200 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at master/192.168.0.82:9000
	at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
	at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
	at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
	at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2020-12-07 07:47:37,200 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2020-12-07 07:47:37,204 INFO org.apache.zookeeper.ZooKeeper: Session: 0x3763a7135750008 closed
2020-12-07 07:47:38,205 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave01:2181,slave02:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@27db1ce7
2020-12-07 07:47:38,207 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave02/192.168.0.84:2181. Will not attempt to authenticate using SASL (unknown error)

So, I re-checked the hosts file has port 9000 and so on, but no problem was found...

【Solution】

Later, it was discovered that the fuster program was not found, which made it impossible to fence. The fuster program is included in the psmisc software package. The following commands need to be executed on the machine containing the NameNode:

yum -y install psmisc

After the installation is complete, test again and find that the problem has been solved~

Guess you like

Origin blog.csdn.net/gdkyxy2013/article/details/110739824