Getting Started with Yarn Basics


1. Yarn resource scheduler

Yarn is a resource scheduling platform that is responsible for providing server computing resources for computing programs. It is equivalent to a distributed operating system platform, while computing programs such as MapReduce are equivalent to applications running on the operating system.

1. Architecture

YARN is mainly composed of ResourceManager, NodeManager, ApplicationMaster and Container components.

Insert image description here

2. Yarn working mechanism

Insert image description here

  1. The MR program is submitted to the node where the client is located.
  2. YarnRunner applies for an Application from ResourceManager.
  3. RM returns the resource path of the application to YarnRunner.
  4. The program submits the resources required to run to HDFS.
  5. After the program resources are submitted, apply to run mrAppMaster.
  6. RM initializes the user's request into a Task.
  7. One of the NodeManagers received the Task task.
  8. The NodeManager creates a container Container and generates MRAppmaster.
  9. Container copies resources from HDFS to local.
  10. MRAppmaster applies to RM for running MapTask resources.
  11. RM assigns the MapTask task to the other two NodeManagers, and the other two NodeManagers receive the tasks and create containers respectively.
  12. MR sends the program startup script to the two NodeManagers that received the task. The two NodeManagers start MapTask respectively, and MapTask sorts the data partitions.
  13. MrAppMaster waits for all MapTasks to finish running, then applies for a container from RM and runs the ReduceTask.
  14. ReduceTask obtains the data of the corresponding partition from MapTask. After the program is finished running, MR will apply to RM to cancel itself.

3. HDFS, YARN, MR relationship

Insert image description here

4. Job submission to HDFS&MapReduce

Insert image description here

  • (1) Homework submission

    • Step 1: Client calls the job.waitForCompletion method to submit MapReduce jobs to the entire cluster.
    • Step 2: Client applies for a job ID from RM.
    • Step 3: RM returns the submission path and job ID of the job resource to the Client.
    • Step 4: Client submits the jar package, slicing information and configuration files to the specified resource submission path.
    • Step 5: After the client submits the resources, it applies to RM to run MrAppMaster.
  • (2) Job initialization

    • Step 6: After RM receives the Client's request, it adds the job to the capacity scheduler.
    • Step 7: An idle NM receives the job.
    • Step 8: The NM creates Container and generates MRAppmaster.
    • Step 9: Download the resources submitted by the Client to the local area.
  • (3) Task allocation

    • Step 10: MrAppMaster applies to RM for resources to run multiple MapTask tasks.
    • Step 11: RM assigns the task of running MapTask to two other NodeManagers, and the other two NodeManagers receive the tasks and create containers respectively.
  • (4) Task running

    • Step 12: MR sends the program startup script to the two NodeManagers that received the task. The two NodeManagers start MapTask respectively, and MapTask sorts the data partitions.
    • Step 13: MrAppMaster waits for all MapTasks to finish running, then applies for a container from RM and runs the ReduceTask.
    • Step 14: ReduceTask obtains the data of the corresponding partition from MapTask.
    • Step 15: After the program is finished running, MR will apply to RM to cancel itself.
  • (5) Progress and status updates

    • Tasks in YARN return their progress and status (including counter) to the application manager. The client requests progress updates from the application manager every second (set through mapreduce.client.progressmonitor.pollinterval) and displays them to the user.
  • (6) Homework completed

    • In addition to requesting job progress from the application manager, the client checks whether the job is completed by calling waitForCompletion() every 5 seconds. The time interval can be set through mapreduce.client.completion.pollinterval. After the job is completed, the application manager and Container will clean up the job status. Job information will be stored by the job history server for later user verification.

2. Yarn scheduler and scheduling algorithm

Currently, there are three main types of Hadoop job schedulers: FIFO, capacity (Capacity Scheduler) and fairness (Fair Scheduler).

  • The default resource scheduler of Apache Hadoop3.1.3 is Capacity Scheduler.

  • The default scheduler of the CDH framework is Fair Scheduler.

explanationyarn-default.xmltext

<property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

1. First-in-first-out scheduler (FIFO)

FIFO scheduler (First In First Out): Single queue, first come first served according to the order in which jobs are submitted.

  • Advantages: Simple and easy to understand.
  • Disadvantages: Does not support multiple queues and is rarely used in production environments.

Insert image description here

2. Capacity Scheduler

Capacity Scheduler is a multi-user scheduler developed by Yahoo.

Insert image description here

Insert image description here

3. Fair Scheduler

3.1 Scheduler Principle

Fair Schedulere is a multi-user scheduler developed by Facebook.

Insert image description here

Fair scheduler—shortage

  • The design goal of a fair scheduler is that all jobs receive fair resources on a time scale. The gap between the resources a job should receive at a certain time and the resources actually obtained is called the "gap".
  • The scheduler will give priority to allocating resources to jobs with large vacancies.

Insert image description here

3.22 Resource allocation method

There are three resource allocation methods: FIFO strategy, Fair strategy, and DRF strategy.

(1), FIFO strategy

If FIFO is selected as the resource allocation strategy for each queue of the fair scheduler, the fair scheduler is equivalent to the capacity scheduler mentioned above.

(2), Fair strategy

The Fair strategy (default) is a resource multiplexing method based on the max-min fair algorithm. By default, this method is used to allocate resources within each queue. This means that if two applications are running simultaneously in a queue, each application gets 1/2 of the resources; if three applications are running at the same time, each application gets 1/3 of the resources.

Insert image description here

Insert image description here

(2) Operation resource allocation

  • Unweighted (the focus is the number of jobs):
需求:有一条队列总资源12个, 有4个job,对资源的需求分别是: 
job1->1,  job2->2 , job3->6,  job4->5

第一次算:  12 / 4 = 3 
    job1: 分3 --> 多2个 
    job2: 分3 --> 多1个
    job3: 分3 --> 差3个
    job4: 分3 --> 差2个

第二次算: 3 / 2  = 1.5 
    job1: 分1
    job2: 分2
    job3: 分3 --> 差3个 --> 分1.5 --> 最终: 4.5 
    job4: 分3 --> 差2个 --> 分1.5 --> 最终: 4.5

第n次算: 一直算到没有空闲资源
  • Weighting (the focus is the weight of the Job):
需求:有一条队列总资源16,有4个job 
对资源的需求分别是: 
job1->4   job2->2  job3->10  job4->4 
每个job的权重为:   
job1->5   job2->8  job3->1   job4->2

第一次算: 16 / (5+8+1+2) =  1
    job1:  分5 --> 多1
    job2:  分8 --> 多6
    job3:  分1 --> 少9
    job4:  分2 --> 少2            

第二次算: 7 / (1+2) = 7/3
    job1: 分4
    job2: 分2
    job3: 分1 --> 分7/3(2.33) -->少6.67
    job4: 分2 --> 分14/3(4.66) -->多2.66

第三次算:2.66/1=2.66 
    job1: 分4
    job2: 分2
    job3: 分3.33 --> 分2.66/1 --> 分6
    job4: 分4
第n次算: 一直算到没有空闲资源

(3), DRF strategy

DRF (Dominant Resource Fairness), the resources we talked about before, are all based on a single standard, for example, only memory is considered (also the default situation of Yarn). But many times we have many types of resources, such as memory, CPU, network bandwidth, etc., so it is difficult for us to measure the proportion of resources that two applications should allocate.

So in YARN, we use DRF to decide how to schedule: Assume that the cluster has a total of 100 CPUs and 10T memory, and application A requires (2 CPU, 300GB) and application B requires (6 CPU, 100GB). Then the two applications require the resources of A (2% CPU, 3% memory) and B (6% CPU, 1% memory) respectively. This means that A is memory-dominated and B is CPU-dominated. In this case , we can choose the DRF policy to limit different proportions of different resources (CPU and memory) for different applications.

3. Modify the Yarn cluster

1. Yarn configuration

Resource configuration:

  • From 1G data, count the number of occurrences of each word. There are 3 servers, each equipped with 4G memory, 4-core CPU, and 4 threads.

  • 1G / 128m = 8 MapTask; 1 ReduceTask; 1 mrAppMaster.

  • On average, each node runs 10 / 3 units ≈ 3 tasks (4 3 3)

Modifyyarn-site.xmlThe configuration parameters are as follows

<!-- 选择调度器,默认容量 -->
<property>
	<description>The class to use as the resource scheduler.</description>
	<name>yarn.resourcemanager.scheduler.class</name>
	<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

<!-- ResourceManager处理调度器请求的线程数量,默认50;如果提交的任务数大于50,可以增加该值,但是不能超过3台 * 4线程 = 12线程(去除其他应用程序实际不能超过8) -->
<property>
	<description>Number of threads to handle scheduler interface.</description>
	<name>yarn.resourcemanager.scheduler.client.thread-count</name>
	<value>8</value>
</property>


<!--
是否将虚拟核数当作CPU核数,默认是false,采用物理CPU核数 
-->
<property>
	<description>Flag to determine if logical processors(such as
	hyperthreads) should be counted as cores. Only applicable on Linux
	when yarn.nodemanager.resource.cpu-vcores is set to -1 and
	yarn.nodemanager.resource.detect-hardware-capabilities is true.
	</description>
	<name>yarn.nodemanager.resource.count-logical-processors-as-cores</name>
	<value>false</value>
</property>

<!-- 是否让yarn自动检测硬件进行配置,默认是false,如果该节点有很多其他应用程序,建议手动配置。如果该节点没有其他应用程序,可以采用自动 -->
<property>
	<description>Enable auto-detection of node capabilities such as
	memory and CPU.
	</description>
	<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
	<value>false</value>
</property>


<!--
Core转成Vcore的个数(虚拟核数和物理核数乘数,默认是1.0) 
hadoop中的vcore不是真正的core,通常vcore的个数设置为逻辑cpu个数的1~5倍。
-->
<property>
	<description>Multiplier to determine how to convert phyiscal cores to vcores. This value is used if 
yarn.nodemanager.resource.cpu-vcores is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The	number of vcores will be calculated as	number of CPUs * multiplier.
	</description>
	<name>yarn.nodemanager.resource.pcores-vcores-multiplier</name>
	<value>1.0</value>
</property>

<!-- NodeManager使用内存数,默认8G,修改为4G内存 -->
<property>
	<description>Amount of physical memory, in MB, that can be allocated 
	for containers. If set to -1 and
	yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
	automatically calculated(in case of Windows and Linux).
	In other cases, the default is 8192MB.
	</description>
	<name>yarn.nodemanager.resource.memory-mb</name>
	<value>4096</value>
</property>

<!-- nodemanager的CPU核数,不按照硬件环境自动设定时默认是8个,修改为4个 -->
<property>
	<description>Number of vcores that can be allocated
	for containers. This is used by the RM scheduler when allocating
	resources for containers. This is not used to limit the number of
	CPUs used by YARN containers. If it is set to -1 and
	yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
	automatically determined from the hardware in case of Windows and Linux.
	In other cases, number of vcores is 8 by default.</description>
	<name>yarn.nodemanager.resource.cpu-vcores</name>
	<value>4</value>
</property>

<!-- 容器最小内存,默认1G -->
<property>
	<description>The minimum allocation for every container request at the RM	in MBs. Memory requests lower than this will be set to the value of this	property. Additionally, a node manager that is configured to have less memory	than this value will be shut down by the resource manager.
	</description>
	<name>yarn.scheduler.minimum-allocation-mb</name>
	<value>1024</value>
</property>

<!-- 容器最大内存,默认8G,修改为2G -->
<property>
	<description>The maximum allocation for every container request at the RM	in MBs. Memory requests higher than this will throw an	InvalidResourceRequestException.
	</description>
	<name>yarn.scheduler.maximum-allocation-mb</name>
	<value>2048</value>
</property>

<!-- 容器最小CPU核数,默认1个 -->
<property>
	<description>The minimum allocation for every container request at the RM	in terms of virtual CPU cores. Requests lower than this will be set to the	value of this property. Additionally, a node manager that is configured to	have fewer virtual cores than this value will be shut down by the resource	manager.
	</description>
	<name>yarn.scheduler.minimum-allocation-vcores</name>
	<value>1</value>
</property>

<!-- 容器最大CPU核数,默认4个,修改为2个 -->
<property>
	<description>The maximum allocation for every container request at the RM	in terms of virtual CPU cores. Requests higher than this will throw an
	InvalidResourceRequestException.</description>
	<name>yarn.scheduler.maximum-allocation-vcores</name>
	<value>2</value>
</property>

<!-- 虚拟内存检查,默认打开,修改为关闭 -->
<property>
	<description>Whether virtual memory limits will be enforced for
	containers.</description>
	<name>yarn.nodemanager.vmem-check-enabled</name>
	<value>false</value>
</property>

<!-- 虚拟内存和物理内存设置比例,默认2.1 -->
<property>
	<description>Ratio between virtual memory to physical memory when	setting memory limits for containers. Container allocations are	expressed in terms of physical memory, and virtual memory usage	is allowed to exceed this allocation by this ratio.
	</description>
	<name>yarn.nodemanager.vmem-pmem-ratio</name>
	<value>2.1</value>
</property>

Restart the Yarn cluster

./sbin/stop-yarn.sh
./sbin/start-yarn.sh

Login page to view resource modifications: http://hadoop102:8088/cluster

Insert image description here

Turn off virtual memory checking

Insert image description here

2. Multi-queue submission

Configuration parameters:

defaultQueue: accounts for 40% of the total memory, and the maximum resource capacity accounts for 60% of the total resources.

hiveQueue: accounts for 60% of the total memory, and the maximum resource capacity accounts for 80% of the total resources.

Repair capacity-scheduler.xmlPlacement

<!-- 指定多队列,增加hive队列 -->
<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,hive</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
</property>

<!-- 降低default队列资源额定容量为40%,默认100% -->
<property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>40</value>
</property>

<!-- 降低default队列资源最大容量为60%,默认100% -->
<property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>60</value>
</property>

Addition capacity-scheduler.xmlPlacement

<!-- 指定hive队列的资源额定容量 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.capacity</name>
    <value>60</value>
</property>

<!-- 用户最多可以使用队列多少资源,1表示所有 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.user-limit-factor</name>
    <value>1</value>
</property>

<!-- 指定hive队列的资源最大容量 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
    <value>80</value>
</property>

<!-- 启动hive队列 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.state</name>
    <value>RUNNING</value>
</property>

<!-- 哪些用户有权向队列提交作业 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
    <value>*</value>
</property>

<!-- 哪些用户有权操作队列,管理员权限(查看/杀死) -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
    <value>*</value>
</property>

<!-- 哪些用户有权配置提交任务优先级 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</name>
    <value>*</value>
</property>

<!-- 任务的超时时间设置:yarn application -appId appId -updateLifetime Timeout
参考资料:https://blog.cloudera.com/enforcing-application-lifetime-slas-yarn/ -->

<!-- 如果application指定了超时时间,则提交到该队列的application能够指定的最大超时时间不能超过该值。 
-->
<property>
    <name>yarn.scheduler.capacity.root.hive.maximum-application-lifetime</name>
    <value>-1</value>
</property>

<!-- 如果application没指定超时时间,则用default-application-lifetime作为默认值 -->
<property>
    <name>yarn.scheduler.capacity.root.hive.default-application-lifetime</name>
    <value>-1</value>
</property>

Distribute the modified configuration file, or modify the node configuration where ResourceManger is located

Restart the Yarn cluster or refresh the configuration

yarn rmadmin -refreshQueues

Login page to view queue updates: http://hadoop102:8088/cluster/scheduler

Insert image description here

3. Submit tasks to the cluster

package com.example.demo.wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCDriver2 {
    
    
    public static void main(String[] args) throws Exception {
    
    
        System.out.println(args[0]);
        System.out.println(args[1]);
        //1.创建Job实例
        Configuration conf = new Configuration();//可以设置参数
        conf.set("mapreduce.job.queuename", "hive");
        Job job = Job.getInstance(conf);

        //2.给Job赋值
        //2.1关联本程序的jar---如果是本地运行不用设置。如果是在集群上运行(打jar包放在集群上)一定要设置
        job.setJarByClass(WCDriver2.class);
        //2.2设置Mapper和Reducer类
        job.setMapperClass(WCMapper.class);
        job.setReducerClass(WCReducer.class);
        //2.3设置Mapper输出的Key,value的类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        //2.4设置最终输出的key,value的类型(在这是Reducer输出的key,value的类型)
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        //2.5设置输入和输出路径
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        //注意:输出的目录必须不存在
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        //3.提交Job
        boolean b = job.waitForCompletion(true);
        System.out.println("=======" + b);
    }
}

pom file packaging method

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.example</groupId>
    <artifactId>demo</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>demo</name>
    <description>demo</description>
    <properties>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.1.3</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.30</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

Log in to the server to execute the script

  • demo-0.0.1.jar: The running jar package, the full path of the server
  • com.example.demo.wordcount.WCDriver2 : Full class name which class in the jar package to run
  • /input :Input path of data (HDFS)
  • /output:Data output path (HDFS)
hadoop jar demo-0.0.1.jar com.example.demo.wordcount.WCDriver2 /input /output

Insert image description here

Guess you like

Origin blog.csdn.net/weixin_44624117/article/details/133818097