- No space left on device.
stage 89.3 failed 4 times, most recent failure:
Lost task 38.4 in stage 89.3 (TID 30100, node4.test.com): java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
spark.local.dir = /data1/tmp,/data2/tmp,.... ==> spark-defaults.conf
- serialized results of 381610 tasks (5.0 GB) is bigger than spark.driver.maxResultSize
worker写回data到driver上,最大的数据量。默认是1G
如collect()操作等等。
避免是collect(),使用filter() 限制写回driver的数据量。或者使用saveAsParuetFile(),saveAsTextFile() 写到HDFS上。供下游程序消费
spark.driver.maxResultSize Vs spark.driver.memory
- org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]”
-
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [800 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:143)
-
-
taskSet too large
-
Spark的stage划分:一个Stage钟包含了task多大。一般由于你的transform过程太长。因为driver给executor分发的task就会变得很大。解决这个问题我们可以通过拆分stage解决。比如执行过程中调用cache缓存一些中间数据来切断过长的stageWARN TaskSetManager: Stage 198 contains a task of very large size (5953 KB). The maximum recommended task size is 100 KB.
-
-