hadoopfsck /a.dat -files -locations -blocks

Job aborted due to stage failure: Task 0 in stage 5008.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5008.0 
(TID 262140, nmg01-hadoop-m04100.nmg01.baidu.com): java.lang.Exception: Could not compute split, block input-2-1466083022800 not found
	at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:


这个问题暂未解决,先记下

可以使用一个命令来查看一个文件的数据块:
[hadoop@hadoop1 ~]$ hadoopfsck /a.dat  -files -locations -blocks
FSCK started by hadoop from/192.168.2.50 for path /a.dat at Mon Apr 15 22:35:44 CST 2013
/a.dat 1073741824 bytes, 16block(s):  OK
0.blk_-1725523339094524579_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
1.blk_-7520025814312846226_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
2.blk_-604534400755073710_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
3.blk_2739579659457502288_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
4.blk_-8114588819409955724_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
5.blk_5463597511316739635_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
6. blk_8144253712227235404_1002len=67108864 repl=2 [192.168.2.51:50010, 192.168.2.50:50010]
7.blk_-3366961986302415706_1002 len=67108864 repl=2 [192.168.2.50:50010,192.168.2.51:50010]
8.blk_-901091692018383111_1002 len=67108864 repl=2 [192.168.2.51:50010, 192.168.2.50:50010]
9.blk_6873872632693102982_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
10.blk_-3870533374548039633_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
11.blk_9183945251370994851_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
12.blk_-1665678832931859838_1002 len=67108864 repl=2 [192.168.2.50:50010,192.168.2.51:50010]
13.blk_5584227929328283812_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
14.blk_5671360054638411492_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
15.blk_6333095820481976871_1002 len=67108864 repl=2 [192.168.2.51:50010,192.168.2.50:50010]
可以看到对于/a.dat有16个数据块,副本数为2,并且这里显示了datanodeIP,我们可以根据上面的名称直接到数据目录去查找:
     <property>
         <name>dfs.data.dir</name>
         <value>/data</value>
     </property>
比如查找blk_6333095820481976871_1002
[hadoop@hadoop1 current]$ ls-lh /data/current/ | grep blk_6333095820481976871
-rw-rw-r-- 1 hadoophadoop  64M 04-15 22:03blk_6333095820481976871
-rw-rw-r-- 1 hadoop hadoop513K 04-15 22:03 blk_6333095820481976871_1002.meta
上面命令在两个datanode上显示同样结果

问题描述如下:

http://blog.csdn.net/aaa1117a8w5s6d/article/details/43150611

https://github.com/dibbhatt/kafka-spark-consumer/issues/17

解决方式

增大spark.executor.memory  或者 减少 spark.executor.cores 或者 增加处理的时间间隔 应该都能缓解这个错误

猜你喜欢

转载自wangqiaowqo.iteye.com/blog/2305685