背景
在云服务器上配置hadoop集群,参考Hadoop集群安装配置教程_Hadoop2.6.0_Ubuntu/CentOS,Hadoop: Setting up a Single Node Cluster.进行配置
使用一台腾讯云服务器作为master,其上配置namenode
,secondary namenode
,使用一台阿里云服务器作为slave1,其上配置datanode
,两台服务器使用的都是centos 7.4 64位操作系统.
Problem connecting to server
按照网上资料配置hadoop集群,在master和slave1分别使用jps
,可以发现,在两个云服务器上各线程都已启动,但是通过Web监控页发现,Live DataNode为0,查看slave1中的DataNode启动日志如下:
2018-04-05 17:14:02,236 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:03,244 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:04,253 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:05,261 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:06,269 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:07,276 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:08,285 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:09,293 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:10,301 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:11,309 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:11,318 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: master/111.230.135.74:8020
从日志可以看出,Problem connecting to server: master/111.230.135.74:8020
,其中111.230.135.74
是master的公网ip地址,也就是DataNode无法连接到master的8020端口.
此时检查datanode以及namenode,确保可以ping
和ssh
,ip都正确,查看master是否监听8020端口:
虽然master监听了127.0.0.1:8020
,但是没有监听111.230.135.74:8020
,也即只有本机才能访问8020
端口.
解决方案:
查看master的/etc/hosts
文件:
127.0.0.1 master
39.108.72.52 slave1
将其修改为
111.230.135.74 master
39.108.72.52 slave1
这样就可以使master在绑定端口时绑定111.230.135.74:8020
而不是127.0.0.1:8020
了.
Problem binding to [master:8020],Cannot assign requested address
修改/etc/hosts
后重启hdfs,本以为可以完美解决问题,然而重启后在master结点中namenode
和secondary namenode
线程都没有启动,查看日志:
java.net.BindException: Problem binding to [master:8020] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException
在网上查找资料,Hadoop启动出错Cannot assign requested address,是因为腾讯云服务器天生无法绑定公网Ip的端口,因此抛出此异常.
解决方案:
修改/etc/hosts
配置中的腾讯云服务器公网Ip地址为腾讯云内网Ip地址,重启Hdfs即可.
注:此时查看端口监听,可以发现是内网Ip地址在监听8020端口.而slave1应该是连接公网Ip地址监听的8020端口,但是仍然可以连接,不知是何原因.
总结
- 若slave1无法访问master,可能是master的
127.0.0.1:8020
在监听端口,此时只有master本机能访问8020端口 - 腾讯云公网Ip地址无法绑定端口号,需要配置内网Ip地址.