Spark --如何合理地设置executor-memory、executor-cores、num-executors

文章目录

参数介绍
以下4点建议需要牢记
配置参数

方法一：Tiny executors（One Executor per core）
方法二：Fat executors (One Executor per node)
方法三：Balance between Fat (vs) Tiny
方法四：在方法三基础上每个executor不需要这么多内存

参考网址

参数介绍

executor-memory 表示分配给每个executor的内存，默认是1G。
executor-cores 表示分配给每个executor的core数即核数，在spark on yarn模式下默认是1。
num-executors 表示为当前application启动的executor的数目。如果开启了动态分配(spark.dynamicAllocation.enabled默认是false)，则初始启动的executor的数目就是num-executors设置的数目。

以下4点建议需要牢记

Hadoop/Yarn/OS Deamons：当我们在cluster manager如yarn集群上跑spark任务时，后台会有一些进程在执行如NameNode、Secondary NameNode、DataNode、ResourceManager、NodeManager等。所以在确定上面3个参数前，我们需要确保为那些后台进程留下足够的cores(一般每个节点留一个core)。
Yarn ApplicationMaster (AM)：ApplicationMaster负责从ResourceManager申请资源并且和NodeManagers一起执行和监控containers和它们的资源消耗。如果我们是spark on yarn模式，那么我们需要为ApplicationMaster预留一些资源(1G和1个Executor)。
HDFS Throughput：HDFS client会有多线程的并发问题。研究表明每个Executor同时执行5个task时HDFS的写的吞吐量是最好的。所以，建议将executor-cores的值保持在5或5以下。
MemoryOverhead：我们以spark on yarn且deploy-mode=cluster为例。在2.3版本前，
是通过spark.yarn.executor.memoryOverhead来定义executor的memoryOverhead的。官方给的比例是10%，一般是在6%到10%之间，大多数都选择了7%。具体可参考1. 6.3-spark-properties

Full memory requested to yarn per executor =
          spark-executor-memory + spark.yarn.executor.memoryOverhead.
 spark.yarn.executor.memoryOverhead = 
        	Max(384MB, 7% of spark-executor-memory)

在2.3版本后，是用spark.executor.memoryOverhead来定义的。其中memoryOverhead是用于VM overheads, interned strings, other native overheads等。

执行的Executor分配有很大内存时通常会造成很大的GC延迟。
执行的Executor很小(如一个core和运行一个task所需要的足够的内存)，这样在一个JVM里(每个Executor就是一个独立的JVM实例)能同时运行多个task的优势就丢失了。

配置参数

首先假设集群配置如下

10 Nodes
16 cores per Node
64GB RAM per Node

方法一：Tiny executors（One Executor per core）

--num-executors =  16 x 10 = 160
--executor-cores = 1 (one executor per core)
--executor-memory = `mem-per-node/num-executors-per-node`
                     = 64GB/16 = 4GB

在这种One Executor per core的配置下，我们无法利用在一个JVM里同时运行多个task的优势。同时，共享变量如广播变量(broadcast variables)和累加器(accumulators)会在每个节点上的每个core上复制一次即每个节点上会复制16次(因为共享变量会复制到每个executor上)。而且我们也没有预留足够的资源给后台进程如NameNode等，也没有把ApplicationManager的资源算上去。非常不好！！！

方法二：Fat executors (One Executor per node)

--num-executors =  10
--executor-cores = 16 
--executor-memory = `mem-per-node/num-executors-per-node`
                     = 64GB/1 = 64GB

每个Executor获取到节点上的所有core，除开ApplicationManager和后台进程不谈，HDFS的吞吐量也会收到影响且会伴随着验证的GC。同样也是非常不好的！！！

方法三：Balance between Fat (vs) Tiny

第一，根据上面的参数建议，我们给每个Executor分配5个core即executor-cores=5，这样对HDFS的吞吐量会比较友好。
第二，为后台进程留一个core，则每个节点可用的core数是16 - 1 = 15。所以集群总的可用core数是15 x 10 = 150。
第三，每个节点上的Executor数就是 15 / 5 = 3，集群总的可用的Executor数就是 3 * 10 = 30 。为ApplicationManager留一个Executor，则num-executors=29。
第四，每个节点上每个Executor可分配的内存是 (64GB-1GB) / 3 = 21GB(减去的1GB是留给后台程序用)，除去MemoryOverHead=max(384MB, 7% * 21GB)=2GB，所以executor-memory=21GB - 2GB = 19GB。
所以最后的参数配置是

--num-executors =  29
--executor-cores = 5 
--executor-memory = 19GB

此方法既保证了在一个JVM实例里能同时执行task的优势，也保证了hdfs的吞吐量。

方法四：在方法三基础上每个executor不需要这么多内存

按照方法三，每个Executor分配到的内存是19GB，假设10GB内存就够用了。那么此时我们可以将executor-cores降低如降低为3，那么每个节点就可以有15 / 3 = 5个Executor，那么总共可以获得的Executor数就是 (5 * 10) - 1 =49，每个节点上每个Executor可分配的内存是(64GB-1GB) / 5 = 12GB，除去
MemoryOverHead=max(384MB, 7% * 12GB)=1GB，所以executor-memory=12GB - 1GB = 11GB
所以最后的参数配置是

--num-executors =  49
--executor-cores = 3 
--executor-memory = 11GB

参考网址

distribution_of_executors_cores_and_memory_for_spark_application
how-to-tune-spark-executor-number-cores-and-executor-memory
resource-allocation-configuration-spark-yarn

patrick_wang_bigdata

发布了22 篇原创文章 · 获赞 0 · 访问量 906

私信关注