pyspark LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak

pyspark执行卡在某一个阶段,并且报错:

LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting

原因:

分布式数据量太大,收集到一台机器就会报错

解决方法:

在分布式计算中尽量少使用收集到本地处理,比如collect、countByKey等等算子,直接输出到hdfs文件

猜你喜欢

转载自my.oschina.net/u/2000675/blog/1805801
今日推荐