Start hdfs and report java.io.IOException: Premature EOF from inputStream error

Reason: The new cluster (cdh) is built, and I plan to test the hdfs performance of the new cluster, so I use the test hadoop-test-2.6.0-mr1-cdh5.6.1.jar that comes with hdfs, and plan to generate 10T of data for testing , hadoop jar hadoop-test-2.6.0-mr1-cdh5.6.1.jar TestDFSIO -write -nrFiles 10 -fileSize 10000000 -resFile /tmp/TestDFSIO_results.log;

Due to time problems, when the amount of data is more than 2T, the cluster machine is directly shut down. When it is turned on again, there are two datanodes that cannot be started on hdfs startup, and the package is wrong as follows:

ERROR    DataNode    
laydca10:1004:DataXceiver error processing WRITE_BLOCK operation  src: /192.168.1.150:33090 dst: /192.168.1.151:1004
java.io.IOException: Premature EOF from inputStream
    at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:901)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:808)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
    at java.lang.Thread.run(Thread.java:748)

, and later checked and found that there were hundreds of GB of data under the two datanodes that could not be started, and then deleted the datanode role on the two machines, and then deleted the hundreds of GB of data under the two datanodes and added them again. The datanode roles on these two machines still cannot be started; check the logs and find that there are hundreds of MB of data under yarn on 192.168.1.150 and 192.168.1.151. It is suspected that the data may have been deleted, but the task registration in yarn The information still exists, causing the error; then delete the nodemanager roles on 150 and 151, then clear all the data under yarn on 150 and 151, and then re-add the two nodemanager roles, restart yarn and hdfs, hdfs All nodes can be started normally.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325406934&siteId=291194637