Outside the network can not access the cloud host HDFS file system

I. Background of the problem:
1. The cloud host is Linux environment, build Hadoop pseudo-distributed
public network the IP: 139.198.18.xxx
network IP: 192.168.137.2
hostname: hadoop001
2. Local core-site.xml configured as follows:

<configuration>
<property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:9001</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>hdfs://hadoop001:9001/hadoop/tmp</value>
</property>
</configuration>

3. Local hdfs-site.xml configured as follows:

<configuration>
<property>
       <name>dfs.replication</name>
       <value>1</value>
 </property>
</configuration>

4. Cloud hosts hosts file configuration:

[hadoop@hadoop001 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

# hostname loopback address
  192.168.137.2   hadoop001

The cloud host LAN IP and hostname hadoop001 do mapping
5. local hosts file configuration

139.198.18.XXX     hadoop001

Local has a public IP and domain name hadoop001 do mapping
Second, the symptoms of the problem
1. Open HDFS in the cloud host, JPS view the process are not unusual, by Shell operations HDFS file is not a problem
2. Access 50070 port management interface through a browser there is no problem
3. use Java API to operate the remote HDFS file on the local machine, URI using the public network IP, code is as follows:

val uri = new URI("hdfs://hadoop001:9001")
val fs = FileSystem.get(uri,conf)
val listfiles = fs.listFiles(new Path("/data"),true)
    while (listfiles.hasNext) {
    val nextfile = listfiles.next()
    println("get file path:" + nextfile.getPath().toString())
    }
------------------------------运行结果---------------------------------
get file path:hdfs://hadoop001:9001/data/infos.txt

4. SparkSQL read on the local machine and hdfs file DF during the conversion of

object SparkSQLApp {
  def main(args: Array[String]): Unit = {
  val spark = SparkSession.builder().appName("SparkSQLApp").master("local[2]").getOrCreate()
  val info = spark.sparkContext.textFile("/data/infos.txt")
  import spark.implicits._
  val infoDF = info.map(_.split(",")).map(x=>Info(x(0).toInt,x(1),x(2).toInt)).toDF()
  infoDF.show()
  spark.stop()
  }
  case class Info(id:Int,name:String,age:Int)
}

The following error message appears:

....
....
....
19/02/23 16:07:00 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
19/02/23 16:07:00 INFO HadoopRDD: Input split: hdfs://hadoop001:9001/data/infos.txt:0+17
19/02/23 16:07:21 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
.....
....
19/02/23 16:07:21 INFO DFSClient: Could not obtain BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 from any node: java.io.IOException: No live nodes contain block BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 after checking nodes = [DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK] Dead nodes:  DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]. Will get new block locations from namenode and retry...
19/02/23 16:07:21 WARN DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 272.617680460432 msec.
19/02/23 16:07:42 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
...
...
19/02/23 16:07:42 WARN DFSClient: Failed to connect to /192.168.137.2:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3499)
...
...
19/02/23 16:08:12 WARN DFSClient: Failed to connect to /192.168.137.2:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
...
...
19/02/23 16:08:12 INFO DFSClient: Could not obtain BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 from any node: java.io.IOException: No live nodes contain block BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 after checking nodes = [DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK] Dead nodes:  DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]. Will get new block locations from namenode and retry...
19/02/23 16:08:12 WARN DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 11918.913311370841 msec.
19/02/23 16:08:45 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
...
...
19/02/23 16:08:45 WARN DFSClient: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK] Dead nodes:  DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]. Throwing a BlockMissingException
19/02/23 16:08:45 WARN DFSClient: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK] Dead nodes:  DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]. Throwing a BlockMissingException
19/02/23 16:08:45 WARN DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1001)
...
...
19/02/23 16:08:45 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1001)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:648)
...
...
19/02/23 16:08:45 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
19/02/23 16:08:45 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
19/02/23 16:08:45 INFO TaskSchedulerImpl: Cancelling stage 0
19/02/23 16:08:45 INFO DAGScheduler: ResultStage 0 (show at SparkSQLApp.scala:30) failed in 105.618 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1001)
...
...
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1001)
...
...

Third, the problem analysis
1. Local Shell can operate normally, troubleshoot the cluster structures and processes have not started
2. Cloud Hosting is not set up a firewall, the firewall did not rule out the relevant issues
3. cloud server firewall ports open by default DataNode transmission services for data 50010 is
4. I set up another in a local virtual machine, the virtual machine on the same local area network and a local, local hdfs the normal operation of the virtual machine, the basic set is due to reasons outside the network.
5. HDFS access to information found in the folder and file names are stored on NameNode, operate and does not require DataNode communication, so you can create a normal folder and creates the file description NameNode local and remote communication is not a problem. It is likely to be local and remote communication issue DataNode
four questions guess
because the local test and not a cloud host LAN, hadoop configuration file is within a network ip ip communication between machines. In this case, we can gain access to namenode machine, the machine will give us data namenode where ip address for us to access data transmission services, but when writing data, NameNode and DataNode through internal network communication, returns ip datanode within the network, we are not able to access the IP datanode server.
Let's look at part of the error message:

19/02/23 16:07:21 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
...
19/02/23 16:07:42 WARN DFSClient: Failed to connect to /192.168.137.2:50010 for block, add to deadNodes and continue....

As can be seen from the error message, the connection is less than 192.168.137.2:50010, which is datanode address because you must access the external network " 139.198.18.XXX:50010 " to access to datanode.
In order to allow access to hdfs development machine, we can access hdfs by domain name, let namenode returned to us datanode domain name.
Fifth, the problem is solved
1. Try a:
configuration corresponding to the external network ip datanode and domain (configured above) in the development of machines hosts file and add the following code to interact with the program in hdfs:

val conf = new Configuration()
conf.set("dfs.client.use.datanode.hostname", "true")

Error still
2. Try two:

val spark = SparkSession
      .builder()
      .appName("SparkSQLApp")
       .master("local[2]")
      .config("dfs.client.use.datanode.hostname", "true")
      .getOrCreate()

Error is still
3. Try three:
add the following configuration hdfs-site.xml in:

 <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>

Run successfully
through access to information, the proposed increase in dfs.datanode in hdfs-site.xml.
Use.datanode.hostname property, it means the communication datanode also by its domain name

<property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>true</value>
    </property>

Such replacement can be made within the IP network becomes very simple, convenient, and allows data to be exchanged between specific datanode easier. At the same time there is also a side effect, when DNS resolution failure will cause the entire Hadoop does not work, so make sure DNS reliability.
Summary: IP access through the default instead domain names way.

Transfer: https://my.oschina.net/gordonnemo/blog/3017724

Guess you like

Origin blog.csdn.net/liweihope/article/details/90911864