nodemanager启动后自动关闭

本文参考:http://www.aiuxian.com/article/p-2804803.html
hadoop/hbase/zk正常启动后,slave节点的nodemanager就会自动关闭。查看日志,
cd /data/yunva/hadoop-2.7.3/logs
tail -100f yarn-root-nodemanager-slave2.log
如下:
2019-03-29 14:57:49,460 INFO org.mortbay.log: Stopped [email protected]:8042
2019-03-29 14:57:49,561 INFO org.apache.hadoop.ipc.Server: Stopping server on 38952
2019-03-29 14:57:49,562 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2019-03-29 14:57:49,562 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 38952
2019-03-29 14:57:49,563 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2019-03-29 14:57:49,583 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040
2019-03-29 14:57:49,583 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8040
2019-03-29 14:57:49,583 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2019-03-29 14:57:49,586 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
2019-03-29 14:57:49,589 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2019-03-29 14:57:49,590 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2019-03-29 14:57:49,590 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2019-03-29 14:57:49,590 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From slave2/10.0.3.89 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:203)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
    at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:272)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:496)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:543)
Caused by: java.net.ConnectException: Call From slave2/10.0.3.89 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy72.registerNodeManager(Unknown Source)
    at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy73.registerNodeManager(Unknown Source)
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:270)
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:197)
    ... 6 more
Caused by: java.net.ConnectException: 拒绝连接
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 18 more
2019-03-29 14:57:49,594 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: 

分析:
通过检查配置文件知道8031端口出现在yarn-site.xml配置文件的yarn.resourcemanager.scheduler.address配置项中。yarn.resourcemanager.scheduler.address 是resourcemanager暴漏给nodemanager的地址和端口。nodermanager利用这个地址通过心跳机制与RM通信。 
正常情况下我的NM应该与RM通信,但是NM却一直与0.0.0.0通信.因此查阅官方默认的yarn-site.xml配置文件,发现其中yarn.resourcemanager.hostname的默认值被设置成:0.0.0.0了(怪不得一直与0.0.0.0通信)。发现原来是我这没有配置
vi /data/yunva/hadoop-2.7.3/etc/hadoop/yarn-site.xml中添加如下:

        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>master</value>
        </property>
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>604800</value>
        </property>
添加后,重启hadoop和hbase,一切正常。        
        
        
        
        
        
原文:
发现原来是我设置的主机ip在这里没有生效。
在yarn-site.xml中添加一项新的项,将yarn.resourcemanager.hostname的值修改为master机器的ip地址。 
重启hadoop服务,一切正常!!! 
注:nodemanager启动后要通过心跳机制定期与RM通信,否则RM会认为NM死掉,会停止NM的服务。 
同时通过这次失败提醒出现问题要多去尝试(我几乎把网上的方法试了个遍。。。。。。)多参考官方文档。


 

猜你喜欢

转载自blog.csdn.net/w892824196/article/details/88895023
今日推荐