Understanding the memory parameters of Mapreduce under Yarn

This article is to re-clarify the meaning of memory parameters under MR

What is Container ?

Container is a yarn Java process. AM, MapTask, and ReduceTask in Mapreduce are executed as containers on Yarn's framework. You can see the status of the container on the RM webpage.

basis

Yarn's ResourceManger (referred to as RM) allocates memory, CPU and other resources to applications through logical queues. By default, RM allows the maximum AM to apply for Container resources of 8192MB (" yarn.scheduler.maximum-allocation-mb "), by default The minimum allocated resource is 1024M (" yarn.scheduler.minimum-allocation-mb "), AM can only be incremented (" yarn.scheduler.minimum-allocation-mb ") and will not exceed (" yarn.scheduler.maximum -allocation-mb ") value to apply for resources from RM, AM is responsible for (" mapreduce.map.memory.mb ") and (" mapreduce.reduce.memory.mb ") values ​​to be regularized (" yarn.scheduler .minimum-allocation-mb ") Divide , RM will refuse to apply for resource requests whose memory exceeds 8192MB and cannot be divided by 1024MB.

Related parameters

YARN

  • yarn.scheduler.minimum-allocation-mb
  • yarn.scheduler.maximum-allocation-mb
  • yarn.nodemanager.vmem-pmem-ratio
  • yarn.nodemanager.resource.memory.mb

MapReduce

Map Memory

  • mapreduce.map.java.opts
  • mapreduce.map.memory.mb

Reduce Memory

  • mapreduce.reduce.java.opts
  • mapreduce.reduce.memory.mb

Copy_of_Yarn_mem_params

As can be seen from the above figure, map, reduce, AM container JVM, "JVM" rectangle represents the service process, "Max heap", "Max virtual" rectangle represents the maximum memory and virtual memory limits of the JVM process NodeManager.

Taking the map container memory allocation (" mapreduce.map.memory.mb ") set to 1536 as an example, AM will request 2048mb of memory resources for the container from RM, because the minimum allocation unit (" yarn.scheduler.minimum-allocation-mb ") Is set to 1024, which is a logical allocation. This value is used by NodeManager to monitor the utilization rate of improved program memory resources. If the utilization rate of the map task heap exceeds 2048MB, NM will kill this task. The size of the JVM process heap is set to 1024 (" mapreduce.map.java.opts = -Xmx1024m ") suitable for logical allocation of 2048MB, and the same reduce container (" mapreduce.reduce.memory.mb ") is set to 3072 Too.

当一个mapreduce job完成时,你将会看到一系列的计数器被打印出来,下面的三个计数器展示了多少物理内存和虚拟内存被分配

Physical memory (bytes) snapshot=21850116096
Virtual memory (bytes) snapshot=40047247360
Total committed heap usage (bytes)=22630105088

虚拟内存

默认的(“yarn.nodemanager.vmem-pmem-ratio“)设置为2.1,意味则map container或者reduce container分配的虚拟内存超过2.1倍的(“mapreduce.reduce.memory.mb“)或(“mapreduce.map.memory.mb“)就会被NM给KILL掉,如果 (“mapreduce.map.memory.mb”) 被设置为1536那么总的虚拟内存为2.1*1536=3225.6MB

当container的内存超出要求的,log将会打印一下信息

Current usage: 2.1gb of 2.0gb physical memory used; 1.6gb of 3.15gb virtual memory used. Killing container.

mapreduce.map.java.opts和mapreduce.map.memory.mb

大概了解完以上的参数之后,mapreduce.map.java.opts和mapreduce.map.memory.mb参数之间,有什么联系呢?

通过上面的分析,我们知道如果一个yarn的container超除了heap设置的大小,这个task将会失败,我们可以根据哪种类型的container失败去相应增大mapreduce.{map|reduce}.memory.mb去解决问题。 但同时带来的问题是集群并行跑的container的数量少了,所以适当的调整内存参数对集群的利用率的提升尤为重要。

Because in the Yarn container mode, the JVM process runs in the container, mapreduce. {Map | reduce} .java.opts can set the maximum heap usage of the JVM through Xmx, generally set to 0.75 times memory.mb, because of the need Reserve some space for Java  code, non-JVM memory usage, etc.

Add

For FairScheduler (I have n’t looked at others), there is an incremental parameter

  /** Increment request grant-able by the RM scheduler. 
   * These properties are looked up in the yarn-site.xml  */
  public static final String RM_SCHEDULER_INCREMENT_ALLOCATION_MB =
    YarnConfiguration.YARN_PREFIX + "scheduler.increment-allocation-mb";
  public static final int DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB = 1024;

For the minimum allocated memory of 2560MB online, the client's memory is 2048, and the incrementMemory is 1024. The value is obtained through its calculation algorithm . The demo is as follows

/**
 * Created by shangwen on 15-9-14.
 */
public class TestCeil {
    public static void main(String[] args) {
        int clientMemoryReq = 2048;
        int minAllowMermory = 2560;
        int incrementResource = 1024;
        System.out.println(roundUp(Math.max(clientMemoryReq,minAllowMermory),incrementResource));
        // output 3072
    }

    public static int divideAndCeil(int a, int b) {
        if (b == 0) {
            return 0;
        }
        return (a + (b - 1)) / b;
    }

    public static int roundUp(int a, int b) {
        System.out.println("divideAndCeil:" + divideAndCeil(a, b));
        return divideAndCeil(a, b) * b;
    }
}

The result is 3072MB, that is, for map, 3G memory will be allocated, even if you write 2G on the client, so you can see the following log:

Container [pid=35691,containerID=container_1441194300243_383809_01_000181] is running beyond physical memory limits. Current usage: 3.0 GB of 3 GB physical memory used; 5.4 GB of 9.3 GB virtual memory used.

For NM with 56G memory, if all the maps are run, 56/3 will run about 18 containers

Assuming that the minimum allocation is modified to 1024 by default, the allocated memory is 2G, that is, about 56 containers can be run on 56/2.

Through the above description, probably a relatively comprehensive understanding of its parameters.

References

Mapreduce YARN Memory Parameters

Published 30 original articles · praised 74 · 230,000 views +

Guess you like

Origin blog.csdn.net/ruiyiin/article/details/77324634