java.net.ConnectException: Connection refused

遇到以下问题,我是选择关掉集群,重新启动,然后解决问题了。。。。

18/08/31 15:43:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/31 15:43:42 WARN SparkConf: The configuration key 'spark.kryoserializer.buffer.mb' has been deprecated as of Spark 1.4 and may be removed in the future. Please use spark.kryoserializer.buffer instead. The default value for spark.kryoserializer.buffer.mb was previously specified as '0.064'. Fractional values are no longer accepted. To specify the equivalent now, one may use '64k'.
Parsing data
Fitting model
train
Traceback (most recent call last):
  File "/home/hduser/pythonwork/CVM/bin/mnist.py", line 53, in <module>
    model.train(trainRDD)
  File "/home/hduser/pythonwork/CVM/cvm/svm.py", line 14, in train
    labeledPoints = cascade(labeledPoints, self._reduce, self.nmax)
  File "/home/hduser/pythonwork/CVM/cvm/cascade.py", line 14, in cascade
    n = labeledPointRDD.count()
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1056, in count
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1047, in sum
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 921, in fold
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 824, in collect
  File "/usr/local/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
  File "/usr/local/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.net.ConnectException: Call From data4/10.200.68.224 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
    at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1676)
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
    at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:153)
    at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 61 more

Connection Refused

You get a ConnectionRefused Exception when there is a machine at the address specified, but there is no program listening on the specific TCP port the client is using -and there is no firewall in the way silently dropping TCP connection requests. If you do not know what a TCP connection request is, please consult the specification.

Unless there is a configuration error at either end, a common cause for this is the Hadoop service isn’t running.

This stack trace is very common when the cluster is being shut down -because at that point Hadoop services are being torn down across the cluster, which is visible to those services and applications which haven’t been shut down themselves. Seeing this error message during cluster shutdown is not anything to worry about.

If the application or cluster is not working, and this message appears in the log, then it is more serious.

The exception text declares both the hostname and the port to which the connection failed. The port can be used to identify the service. For example, port 9000 is the HDFS port. Consult the Ambari port reference, and/or those of the supplier of your Hadoop management tools.

  1. Check the hostname the client using is correct. If it’s in a Hadoop configuration option: examine it carefully, try doing an ping by hand.
  2. Check the IP address the client is trying to talk to for the hostname is correct.
  3. Make sure the destination address in the exception isn’t 0.0.0.0 -this means that you haven’t actually configured the client with the real address for that service, and instead it is picking up the server-side property telling it to listen on every port for connections.
  4. If the error message says the remote service is on “127.0.0.1” or “localhost” that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken.
  5. Check that there isn’t an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
  6. Check the port the client is trying to talk to using matches that the server is offering a service on. The netstat command is useful there.
  7. On the server, try a telnet localhost to see if the port is open there.
  8. On the client, try a telnet to see if the port is accessible remotely.
  9. Try connecting to the server/port from a different machine, to see if it just the single client misbehaving.
  10. If your client and the server are in different subdomains, it may be that the configuration of the service is only publishing the basic hostname, rather than the Fully Qualified Domain Name. The client in the different subdomain can be unintentionally attempt to bind to a host in the local subdomain —and failing.
  11. If you are using a Hadoop-based product from a third party, -please use the support channels provided by the vendor.
  12. Please do not file bug reports related to your problem, as they will be closed as Invalid

See also Server Overflow

None of these are Hadoop problems, they are hadoop, host, network and firewall configuration issues. As it is your cluster, only you can find out and track down the problem.

翻译拒绝连接
当指定的地址有一台机器时,你会得到一个ConnectionRefused Exception,但是没有程序监听客户端正在使用的特定TCP端口 - 并且没有防火墙以静默方式丢弃TCP连接请求。如果您不知道TCP连接请求是什么,请参阅规范。

除非两端都存在配置错误,否则导致Hadoop服务未运行的常见原因。

当群集被关闭时,此堆栈跟踪非常常见 - 因为此时Hadoop服务正在整个群集中被拆除,这些服务和应用程序可以看到这些服务和应用程序本身没有关闭。在群集关闭期间看到此错误消息无需担心。

扫描二维码关注公众号,回复: 3203726 查看本文章

如果应用程序或群集不起作用,并且此消息显示在日志中,则更严重。

  1. 检查客户端使用的主机名是否正确。如果它在Hadoop配置选项中:仔细检查,尝试手动ping。
  2. 检查客户端尝试与之通话的IP地址是否正确。
  3. 确保例外中的目标地址不是0.0.0.0 - 这意味着您实际上没有为客户端配置该服务的实际地址,而是提取服务器端属性,告诉它要监听每个连接端口。
  4. 如果错误消息显示远程服务位于“127.0.0.1”或“localhost”,则表示配置文件告诉客户端该服务位于本地服务器上。如果您的客户端正在尝试与远程系统通信,那么您的配置将被破坏。
  5. 检查在/ etc / hosts中没有映射到127.0.0.1或127.0.1.1的主机名条目(Ubuntu因此而臭名昭着)。
  6. 使用服务器提供服务的匹配项检查客户端尝试与之通信的端口。该netstat的命令非常有用那里。
  7. 在服务器上,尝试telnet localhost 以查看端口是否在那里打开。
  8. 在客户端上,尝试使用telnet 查看端口是否可以远程访问。
  9. 尝试从另一台计算机连接到服务器/端口,看看它是否只是单个客户端行为不端。
  10. 如果您的客户端和服务器位于不同的子域中,则可能是服务的配置仅发布基本主机名,而不是完全限定的域名。不同子域中的客户端可能无意中尝试绑定到本地子域中的主机 - 并且失败。
  11. 如果您使用的是来自第三方的基于Hadoop的产品,请使用供应商提供的支持渠道.
  12. 请不要提交与您的问题相关的错误报告,因为它们将被关闭为无效

另请参阅服务器溢出

这些都不是Hadoop问题,它们是hadoop,主机,网络和防火墙配置问题。由于它是您的群集,只有您可以找到并追踪问题。

猜你喜欢

转载自blog.csdn.net/wqqGo/article/details/82257573
今日推荐