Abnormal spark articles -OutOfMemory: GC overhead limit exceeded

When the execution code is given below

# encoding:utf-8
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession


conf = SparkConf().setMaster('yarn')
sc = SparkContext(conf=conf)
spark = SparkSession(sc)
rdd = spark.read.csv('/spark/gps/GPS1.csv')
print rdd.count()
print rdd.repartition(10000).count()
print rdd.repartition(10000).collect()  # 报错 spark-OutOfMemory:GC overhead limit exceeded

Excuting an order

spark-submit --master yarn bigdata.py

Error content

spark-OutOfMemory:GC overhead limit exceeded

 

No problem in the implementation of count, not the various parameters influence; but in the execution collect, always being given

 

Cause Analysis

1. collect data lead to the return Driver, resulting in memory overflow Driver

The solution is to increase the memory Driver

spark-submit --master yarn --executor-cores 4 --driver-memory 3G  bigdata.py

 

2. Too many executor-core, leading to competition for resources between multiple GC time and core, leading to most of the time is spent on the GC

The solution is to reduce the number of core

spark-submit --master yarn --executor-cores 1  bigdata.py

 

 

 

 

References:

https://blog.csdn.net/amghost/article/details/45303315

Guess you like

Origin www.cnblogs.com/yanshw/p/12010729.html