yarn-site.xml configuration Introduction

yarn-site.xml configuration describes

yarn.scheduler.minimum-Allocation-MB

yarn.scheduler.maximum-MB-Allocation

Description: single vessel may apply minimum and maximum memory, applications running application memory can not exceed the maximum value, is less than the minimum the minimum value is assigned, from this perspective, the minimum bit like the page in the operating system. There is another use of the minimum, a node calculates the maximum number of container Note: these two values is set by a not be changed dynamically (referred to herein refers to dynamically changing the application runtime).

Default: 1024/8192

yarn.scheduler.minimum-Allocation-vcores

yarn.scheduler.maximum-Allocation-vcores

of parameters: the individual may apply the minimum / maximum number of virtual CPU. For example, when set to 1 and 4, run MapRedce jobs, with each Task can apply for a minimum of one virtual CPU, you can apply for up to four virtual CPU.

Default: 1/32

yarn.nodemanager.resource.memory-MB

yarn.nodemanager.vmem-ratio-PMEM

Description: the maximum memory available for each node, in two RM values should not exceed this value. This value can be used to calculate the maximum number of container, namely: a container with a minimum value of the memory divided by the RM. Virtual memory rate is a percentage of the memory used by the task, the default value is 2.1 times; NOTE: The first parameter is immutable, once set, the whole operation process can not be dynamically modified, and this value is the default size 8G, even lack of computer memory 8G also pressing 8G memory to use.

Default value: 8G /2.1

yarn.nodemanager.resource.cpu-vcores

of parameters: NodeManager the total available number of virtual CPU.

Default:. 8


AM memory configuration parameters, an example will be described here to MapReduce (both AM characteristic values, should be disposed in mapred-site.xml), as follows:
mapreduce.map.memory.mb
mapreduce.reduce. memory.mb
Description: this parameter specifies the MapReduce two two tasks (Map and Reduce task) for memory size, which should be between a minimum and maximum value RM of the container. If no configuration is obtained by the following simple equation:
max (MIN_CONTAINER_SIZE, (the RAM the Available the Total) / Containers))
generally should be twice reduce the map. Note: these two values by changing the parameter when the application starts;

AM other memory-related parameters, the JVM related parameter, these parameters can be as follows configuration options:
mapreduce.map.java.opts
mapreduce.reduce .java.opts
Description: this is mainly for two parameters need to run the program JVM (Java, Scala, etc.) prepared, can pass parameters to the JVM through these two settings, is associated with the memory, -Xmx, -Xms and other options. This numerical size, and should map.mb between reduce.mb in the AM.

We summarize the contents of the above, when configured Yarn memory configuration when mainly the following three aspects: Each Map and Reduce the available physical memory limit; limit on the size of JVM for each task; virtual memory limit;

Below with a specific instance of an error, for memory instructions, the following error:
Container [PID = 41884, containerID = container_1405950053048_0016_01_000284] IS running Beyond Virtual Memory Limits Current Usage:. 314.6 MB of 2.9 GB PHYSICAL Memory Used; 8.7 GB of 6.2 GB Virtual memory used Killing container..
configured as follows:

    <Property>
        <name> yarn.nodemanager.resource.memory-MB </ name>
        <value> 100000 </ value>
    </ Property>
    <Property>
        <name> yarn.scheduler. Allocation-MB-maximum </ name>
        <value> 10000 </ value>
    </ Property>
    <Property>
        <name> yarn.scheduler.minimum-Allocation-MB </ name>
        <value> 3000 </ value>
    </property>
    <property>
        <name> mapreduce.reduce.memory.mb </ name>
        <value> 2000 </ value>
    </ Property>



by configuring we see, minimum and maximum memory RAM containers are: 3000m and 10000m, and reduce provided the default value is less than 2000m, map is not set, the two values are 3000m, is the log of "2.9 GB physical memory used". And because virtual memory uses the default rate (that is, 2.1 times), so for the Reduce Task Map Task and total virtual memory as are 3000 * 2.1 = 6.2G. The application of virtual memory than this value, it is an error. Solution: Start Yarn is to adjust the rate of virtual memory, or adjusting memory size of the application is running.

Mapred-the site.xml

 

 

In the framework of the management of the Yarn, both AM from RM application resources, or manage their resources NM where nodes are carried by container. Container Yarn is a resource abstraction, here's resources, including memory and cup and so on. Next, container, carry out a more detailed description. To know that we have a comparative image of container, first look:



We can see from the figure above, first of all AM, when after obtaining the resources, AM its resources through the application package from RM request packet ResourceRequest, packaging to ContainerLaunchContext object, the object AM and NM conduct, communications, in order to start the task. By following protocol ResourceRequest, container and the ContainerLaunchContext
ResourceRequest structured as follows:

    Message ResourceRequestProto {
    optional priority = PriorityProto. 1; // resource priority
    optional string resource_name = 2; // desired host resources reside
    optional ResourceProto capability = 3; // Resource (MEM, CPU)
    optional num_containers = Int32. 4; // number satisfying the condition Container
    optional relax_locality = BOOL. 5; // default = to true;
    }


of the above structure will be briefly explained according to Reference:
2: at the time of filing, from which it is desired to obtain the host, but eventually decided to negotiate AM and RM;
3: contains only two resources, namely: memory and cpu, the mode of application:
Note: 1, 2 and 4 and due There is no limit resources applications, the AP is limitless resources on the application. 2, Yarn resource request using the overlay mode, namely: AM each resource requests issued before and will overwrite the same priority resource request in the same node, i.e. in the same node resource request of the same priority level can have only one .

container structure:

    Message ContainerProto {
    optional ID = ContainerIdProto. 1; // container ID
    optional NodeIdProto the nodeId = 2; // node where the container (resource)
    optional String = node_http_address. 3;
    optional ResourceProto = Resource. 4; // number allocated container
    optional PriorityProto priority = 5; // priority of container
    optional hadoop.common.TokenProto container_token = 6; // container token, for secure authentication
    }


Note: each container ships can run a task, when the AM receives a plurality of container, We will further give a figure. Such as: the MapReduce

ContainerLaunchContext structure:

    ContainerLaunchContextProto {Message
    REPEATED StringLocalResourceMapProto LocalResources =. 1; // required for the program running in the Container resources, for example: jar package
    optional bytes tokens = 2; // SecurityTokens in the Security Mode
    REPEATED StringBytesMapProto service_data =. 3;
    REPEATED StringStringMapProto Environment. 4 = ; // Container environment variables required to start the
    repeated string command = 5; // Container run the command program, such as a java program to run, i.e. $ JAVA_HOME / bin / java org.ourclassrepeated ApplicationACLMapProto application_ACLs = 6; // application of the access control list belongs Container
    }


below in connection with a code, is described as an example only ContainerLaunchContext (should write a simple finite state machine, facilitate better understanding, but sufficiently less time):

apply a new ContainerLaunchContext :

    ContainerLaunchContext ctx = Records.newRecord (ContainerLaunchContext.class);
              fill in the necessary information:
    ctx.setEnvironment (...);
    childRsrc.setResource (...);
    ctx.setLocalResources (...);
    ctx.setCommands (...);
    start the task:
    startReq.setContainerLaunchContext (ctx);



Finally, the container be summarized as follows: container is Yarn resource abstraction, encapsulation some resources on the node, mainly CPU and memory; container is AM to NM application, its operation is initiated by the AM to NM resource is located, and the final run
in. There are two types of container: one is needed to run the AM container; the other is for the AP to perform tasks RM application.


Each slave can operate

the map data <= yarn.nodemanager.resource.memory-mb / mapreduce.map.memory.mb,

the number of reduce tasks <= yarn.nodemanager.resource.memory-mb / mapreduce.reduce.memory. mb

Guess you like

Origin www.cnblogs.com/wenBlog/p/12220897.html