[Apache Spark Error Message]

  • No space left on device.
stage 89.3 failed 4 times, most recent failure: 
Lost task 38.4 in stage 89.3 (TID 30100, node4.test.com): java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:326)
        at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)

spark.local.dir = /data1/tmp,/data2/tmp,....   ==> spark-defaults.conf

  • serialized results of 381610 tasks (5.0 GB) is bigger than spark.driver.maxResultSize

        worker写回data到driver上,最大的数据量。默认是1G

        如collect()操作等等。

     避免是collect(),使用filter() 限制写回driver的数据量。或者使用saveAsParuetFile(),saveAsTextFile() 写到HDFS上。供下游程序消费

spark.driver.maxResultSize Vs spark.driver.memory

  • org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]”
    • org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [800 seconds]. This timeout is controlled by spark.rpc.askTimeout
      at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
      at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:143)
  • taskSet too large​​​​​​​​​​​​​​

    • WARN TaskSetManager: Stage 198 contains a task of very large size (5953 KB). The maximum recommended task size is 100 KB.
      Spark的stage划分:一个Stage钟包含了task多大。一般由于你的transform过程太长。因为driver给executor分发的task就会变得很大。解决这个问题我们可以通过拆分stage解决。比如执行过程中调用cache缓存一些中间数据来切断过长的stage
    • ​​​​​​​​​​​​​​

猜你喜欢

转载自my.oschina.net/u/204498/blog/833868