spark articles -spark on Yarn tuning memory management summary Removing executor 5 with no recent heartbeats: 120504 ms exceeds timeout 120000 ms

This article aims to resolve spark on Yarn memory management, so that the spark tuning thinking more clearly

 

Memory-related parameters

spark is calculated based on memory, most of the spark for memory tuning is to understand the spark of memory parameters also help us understand the spark Memory Management

  • spark.driver.memory: Default 512M
  • spark.executor.memory: Default 512M
  • spark.yarn.am.memory: Default 512M
  • spark.yarn.driver.memoryOverhead:driver memory * 0.10, with minimum of 384
  • spark.yarn.executor.memoryOverhead:executor memory * 0.10, with minimum of 384
  • spark.yarn.am.memoryOverhead:am memory * 0.10, with minimum of 384
  • executor-cores: executor is equivalent to a process, cores equivalent to the process in the thread

 

Memory resolve

spark.xxx.memory / --xxx-memory heap area is the JVM, the JVM itself but also take up some of the heap, which is determined by the part spark.yarn.xxx.memoryOverhead, this relationship follows FIG.

 

Memory Allocation

In order to make better use of spark memory, usually we need to set the following parameters Yarn cluster [not necessarily]

<property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>106496</value> <!-- 104G -->
  </property>
  <property>
      <name>yarn.scheduler.minimum-allocation-mb</name>
      <value>2048</value>
  </property>
  <property>
      <name>yarn.scheduler.maximum-allocation-mb</name>
      <value>106496</value>
  </property>
  <property>
      <name>yarn.app.mapreduce.am.resource.mb</name>
      <value>2048</value>
  </property>
  • yarn.app.mapreduce.am.resource.mb: Maximum Memory am able to apply
  • yarn.nodemanager.resource.memory-mb: maximum memory nodemanager can apply
  • Minimum memory when scheduling a container may apply: yarn.scheduler.minimum-allocation-mb
  • yarn.scheduler.maximum-allocation-mb: When scheduling a container can apply for the maximum memory

 

yarn.scheduler.minimum-allocation-mb is a basic unit of memory Container, i.e. Container memory must yarn.scheduler.minimum-allocation-mb is an integer multiple ,

For example yarn.scheduler.minimum-allocation-mb is set to 2G, 2048M,

If the application is a memory 512M, 512 + 384 <2048M, 2G memory is allocated,

If the application is a memory 3G, 3072 + 384 = 3456M <4096M, 4G memory is allocated,

If the application is a memory 6G, 6144 + 614 = 6758 <8192M, 8G memory is allocated, [max (6144 * 0.1, 384) = 614]

When the set --executor-memory as 3G, Container not the actual memory 3G

 

common problem

Common problem is nothing more than lack of memory or container to be killed

 

Conventional thinking

1. The first solution is to increase the total memory    [this method can not solve all the problems]

2. Second, consider the data skew problem because insufficient data skew results in a memory task, another task enough memory

  // The simplest way is [not solve all the problems with this method] repartition

3. Consider increasing every task available memory

  // reduce the number of Executor

  // reduce the number of executor-cores

 

Parameter setting Notes

executor-memory

Set too large, will lead to GC process is long, 64G of memory is the recommended upper limit [Depending on your hardware, you can find a suitable cap]

performer-colors

1. set too large, the degree of parallelism will be high, resulting in network bandwidth easily filled, in particular, reads the data from the HDFS, return or collect data Driver

2. Set too large, making competition for GC time between multiple core resources has led most of the time spent on GC

 

 

 

References:

https://www.cnblogs.com/saratearing/p/5813403.html#top

https://blog.csdn.net/pearl8899/article/details/80368018

https://www.so.com/s?q=with+minimum+of+384&src=se_zoned

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/ English blog

Guess you like

Origin www.cnblogs.com/yanshw/p/12049900.html
ms