sparksql读parquet表执行报错

集群内存:1024G(数据量:400G)
在这里插入图片描述
(1)报错信息:
Job aborted due to stage failure: Serialized task 2231:2304 was 637417604 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.

(2)原因:
Driver端发送的数据太大导致超过spark默认的传输限制

(3)解决方案:
增加配置信息 spark.rpc.message.maxSize=1024

spark2-submit \
--class com.lhx.test \
--master yarn \
--deploy-mode cluster \
--conf spark.rpc.message.maxSize=1024 \
--driver-memory 30g \
--executor-memory 12g \
--num-executors 12 \
--executor-cores 3 \
--conf spark.yarn.driver.memoryOverhead=4096m \
--conf spark.yarn.executor.memoryOverhead=4096m \
./test.jar

猜你喜欢

转载自blog.csdn.net/lhxsir/article/details/85337399
今日推荐