Spark的Lost executor错误问题

Spark的Lost executor错误问题

问题一

19/06/17 09:50:52 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 2 for reason Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
19/06/17 09:50:52 ERROR cluster.YarnScheduler: Lost executor 2 on hadoop-master: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
19/06/17 09:50:52 WARN scheduler.TaskSetManager: Lost task 22.0 in stage 0.0 (TID 17, hadoop-master, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
 Container exited with a non-zero exit code 143
 Killed by external signal
 
 19/06/17 09:50:52 WARN scheduler.TaskSetManager: Lost task 21.0 in stage 0.0 (TID 16, hadoop-master, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
 Container exited with a non-zero exit code 143
 Killed by external signal

Spark on Yarn运行时,Yarn调度的executor资源不够,所以kill掉了executor,出现lost executor的情况。
大多是 executor-memory 或者 executor-cores 设置不合理,超过了Yarn可以调度资源的最高上限(内存或者CPU核心)。

如,3台服务器,32核心,64GB内存

yarn.scheduler.maximum-allocation-mb = 68G

设置:

num-executors = 30(一个节点上运行10个executor)
executor-memory = 6G (每个executor分配6G内存)
executor-cores = 5 (每个executor分配5个核心)

每个节点上executor占用的内存:10 * 6.5 (有512堆内内存) = 65G 没超出上限
每个节点上executor占用的核心:10 * 5 = 50 超过上限,错误

修改 executor-cores = 3 问题解决

内存设置问题同上

问题二

19/10/25 10:25:14 ERROR cluster.YarnScheduler: Lost executor 9 on cdh-master: Container killed by YARN for exceeding memory limits. 9.5 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
19/10/25 10:25:14 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 7.0 (TID 690, cdh-master, executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.5 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

问题很明显,物理内存不够。

sudo -uhdfs spark-submit \
--class com.sm.analysis.AnalysisRetained \
--master yarn \
--deploy-mode client \
--driver-memory 3G \
--driver-cores 3 \
--num-executors 3 \
--executor-memory 8g \
--executor-cores 5 \
--jars /usr/java/jdk1.8.0_211/lib/mysql-connector-java-5.1.47.jar \
--conf spark.default.parallelism=30 \
/data4/liujinhe/tmp/original-analysis-1.0-SNAPSHOT.jar \

本来 executor 分配 8G,加上堆外内存(9 x 1024m x 0.07‬ = 645m,增量512m,所以取整1024m)1G,9G内存不够,所以加大内存为10G。

猜你喜欢

转载自blog.csdn.net/qq_32727095/article/details/113740962