hadoop HA 集群namenode无法自动切换为active

今天在学习HA自动化配置的时候,按照网上的教程去配置并启动HA即两台namenode后,看到一台为active模式,一台为standby模式,并且文件也可以正常写入。在做测试时,用 kill -9 端口号 杀死active的namenode后,standby的namenode并没有自动启动。
检查配置文件确保无误后,重启集群,发现依然无果。 查看logs->hadoop-root-zkfc-master.log,发现有如下报错信息:
2019-02-05 02:01:02,271 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at slave1/192.168.43.171:8020 standby (unable to connect)
java.net.ConnectException: Call From master/192.168.43.241 to slave1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.GeneratedConstructorAccessor28.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
        at org.apache.hadoop.ipc.Client.call(Client.java:1479)
        at org.apache.hadoop.ipc.Client.call(Client.java:1412)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
        at org.apache.hadoop.ipc.Client.call(Client.java:1451)
        ... 14 more

原来是未找到fuster程序,导致无法进行fence,所以可以通过如下命令来安装,Psmisc软件包中包含了fuster程序:

    yum install psmisc

安装完毕后重启集群,问题解决。

猜你喜欢

转载自blog.csdn.net/qq_41725214/article/details/86765618