Spark's Lost executor error problem

Spark's Lost executor error problem

Question one

19/06/17 09:50:52 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 2 for reason Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
19/06/17 09:50:52 ERROR cluster.YarnScheduler: Lost executor 2 on hadoop-master: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
19/06/17 09:50:52 WARN scheduler.TaskSetManager: Lost task 22.0 in stage 0.0 (TID 17, hadoop-master, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
 Container exited with a non-zero exit code 143
 Killed by external signal
 
 19/06/17 09:50:52 WARN scheduler.TaskSetManager: Lost task 21.0 in stage 0.0 (TID 16, hadoop-master, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
 Container exited with a non-zero exit code 143
 Killed by external signal

When Spark on Yarn is running, the executor resources scheduled by Yarn are not enough, so the executor is killed and the executor is lost.
Most of the executor-memory or executor-cores settings are unreasonable and exceed the upper limit of Yarn's schedulable resources (memory or CPU cores).

For example, 3 servers, 32 cores, 64GB memory

yarn.scheduler.maximum-allocation-mb = 68G

Settings:

num-executors = 30(一个节点上运行10个executor)
executor-memory = 6G (每个executor分配6G内存)
executor-cores = 5 (每个executor分配5个核心)

The memory occupied by the executor on each node: 10 * 6.5 (with 512 heap memory) = 65G does not exceed the upper limit
The core occupied by the executor on each node: 10 * 5 = 50 exceeds the upper limit, an error

Modify executor-cores = 3 to solve the problem

The memory setting problem is the same as above

Question two

19/10/25 10:25:14 ERROR cluster.YarnScheduler: Lost executor 9 on cdh-master: Container killed by YARN for exceeding memory limits. 9.5 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
19/10/25 10:25:14 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 7.0 (TID 690, cdh-master, executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 9.5 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

The problem is obvious, the physical memory is not enough.

sudo -uhdfs spark-submit \
--class com.sm.analysis.AnalysisRetained \
--master yarn \
--deploy-mode client \
--driver-memory 3G \
--driver-cores 3 \
--num-executors 3 \
--executor-memory 8g \
--executor-cores 5 \
--jars /usr/java/jdk1.8.0_211/lib/mysql-connector-java-5.1.47.jar \
--conf spark.default.parallelism=30 \
/data4/liujinhe/tmp/original-analysis-1.0-SNAPSHOT.jar \

Originally, the executor allocated 8G, plus off-heap memory (9 x 1024m x 0.07 = 645m, incremented by 512m, so rounded to 1024m) 1G, 9G memory is not enough, so increase the memory to 10G.

Guess you like

Origin blog.csdn.net/qq_32727095/article/details/113740962