Container killed by YARN for exceeding memory limits

经常我们提交任务到 yarn上后出现内存溢出的错误类似
ExecutorLostFailure (executor 7 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 11.1 GB of 11 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
这个错误总会使你的job夭折。它的意思是：因为超出内存限制，集群停掉了

问题：出现以上问题原因
1、数据出现了倾斜等原因导致其中一个contaimer 内存负载太大运行失败，
container 内存分配图
在这里插入图片描述

建议解决方式
1、增加分区数，或对分区的具体逻辑修改呢（源码）避免出现数据倾斜这个通过scala 语法进行修改
参考：https://stackoverflow.com/questions/38799753/how-to-balance-my-data-across-the-partitions
2、但是对spark sql 进行任务调度时一般难控制，可采用增加spark.yarn.executor.memoryOverhead设置为最大值，可以考虑一下4096。这个数值一般都是2的次幂

注：Spark中executor-memory
程序中可以通过 --executor-memory 来设置executor执行时所需的memory
例如 spark-shell --executor-memory ***G

影响其几个参数：
yarn.scheduler.maximum-allocation-mb
这个参数表示每个container能够申请到的最大内存，一般是集群统一配置。Spark中的executor进程是跑在container中，所以container的最大内存会直接影响到executor的最大可用内存。当你设置一个比较大的内存时，日志中会报错，同时会打印这个参数的值。如下图，6144MB，即6G 本集群cm 设置100G 从前ambari 中设置10G 具体值待后续学习#todo

spark.yarn.executor.memoryOverhead
executor执行的时候，用的内存可能会超过executor-memoy，所以会为executor额外预留一部分内存。spark.yarn.executor.memoryOverhead代表了这部分内存。这个参数如果没有设置，会有一个自动计算公式(位于ClientArguments.scala中)，

在这里插入图片描述

其中，MEMORY_OVERHEAD_FACTOR默认为0.1，executorMemory为设置的executor-memory, MEMORY_OVERHEAD_MIN默认为384m。参数MEMORY_OVERHEAD_FACTOR和MEMORY_OVERHEAD_MIN一般不能直接修改，是Spark代码中直接写死的

executor-memory计算

val executorMem = args.executorMemory + executorMemoryOverhead

1）如果没有设置spark.yarn.executor.memoryOverhead,
executorMem= X+max(X*0.1,384)
如果设置了spark.yarn.executor.memoryOverhead（整数，单位是M）
executorMem=X +spark.yarn.executor.memoryOverhead

设置executorMem需要满足的条件：
xecutorMem< yarn.scheduler.maximum-allocation-mb

Container killed by YARN for exceeding memory limits

猜你喜欢