Hadoop-Hadoop集群模式搭建问题

背景

在云服务器上配置hadoop集群,参考Hadoop集群安装配置教程_Hadoop2.6.0_Ubuntu/CentOS,Hadoop: Setting up a Single Node Cluster.进行配置
使用一台腾讯云服务器作为master,其上配置namenode,secondary namenode,使用一台阿里云服务器作为slave1,其上配置datanode,两台服务器使用的都是centos 7.4 64位操作系统.

Problem connecting to server

按照网上资料配置hadoop集群,在master和slave1分别使用jps,可以发现,在两个云服务器上各线程都已启动,但是通过Web监控页发现,Live DataNode为0,查看slave1中的DataNode启动日志如下:

2018-04-05 17:14:02,236 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:03,244 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:04,253 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:05,261 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:06,269 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:07,276 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:08,285 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:09,293 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:10,301 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:11,309 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/111.230.135.74:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-05 17:14:11,318 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: master/111.230.135.74:8020

从日志可以看出,Problem connecting to server: master/111.230.135.74:8020,其中111.230.135.74是master的公网ip地址,也就是DataNode无法连接到master的8020端口.
此时检查datanode以及namenode,确保可以ping和ssh,ip都正确,查看master是否监听8020端口:
这里写图片描述
虽然master监听了127.0.0.1:8020,但是没有监听111.230.135.74:8020,也即只有本机才能访问8020端口.
解决方案:
查看master的/etc/hosts文件:

127.0.0.1 master
39.108.72.52 slave1

将其修改为

111.230.135.74 master
39.108.72.52 slave1

这样就可以使master在绑定端口时绑定111.230.135.74:8020而不是127.0.0.1:8020了.

Problem binding to [master:8020],Cannot assign requested address

修改/etc/hosts后重启hdfs,本以为可以完美解决问题,然而重启后在master结点中namenode和secondary namenode线程都没有启动,查看日志:

java.net.BindException: Problem binding to [master:8020] java.net.BindException: Cannot assign requested address; For more details see:  http://wiki.apache.org/hadoop/BindException

在网上查找资料,Hadoop启动出错Cannot assign requested address,是因为腾讯云服务器天生无法绑定公网Ip的端口,因此抛出此异常.
解决方案:
修改/etc/hosts配置中的腾讯云服务器公网Ip地址为腾讯云内网Ip地址,重启Hdfs即可.
注:此时查看端口监听,可以发现是内网Ip地址在监听8020端口.而slave1应该是连接公网Ip地址监听的8020端口,但是仍然可以连接,不知是何原因.

总结

若slave1无法访问master,可能是master的127.0.0.1:8020在监听端口,此时只有master本机能访问8020端口
腾讯云公网Ip地址无法绑定端口号,需要配置内网Ip地址.