24. Yarn capacity scheduler practice of hadoop series

In small and medium enterprises, we generally configure as a capacity scheduler

Please check the local host configuration first

127.0.0.1 datanode
127.0.0.1 namenode
127.0.0.1 resourcemanager
127.0.0.1 nodemanager
127.0.0.1 nodemanager2
127.0.0.1 historyserver

1. The container scheduler configures multiple queues

Requirement 1: The default queue accounts for 40% of the total memory, the maximum resource capacity accounts for 60% of the total resources, the hive queue accounts for
60% of the total memory, and the maximum resource capacity accounts for 80% of the total resources.
Requirement 2: Configure queue priority

Multi-queue configuration requires /opt/hadoop-3.2.1/etc/hadoop/capacity-scheduler.xml, we first copy the file out of the container

docker cp fd7a9150237:/opt/hadoop-3.2.1/etc/hadoop/capacity-scheduler.xml .

Mount all containers in docker-compose.yml

volumes:
      - ./capacity-scheduler.xml:/opt/hadoop-3.2.1/etc/hadoop/capacity-scheduler.xml

At the same time, we map port 8042 to nodemanager, map port 8188 to historyserver, and add a new nodemanager node [When the author executes WordCount, the resources of a single node are not enough, so mapreduce cannot be performed]

  nodemanager:
    ports:
      - 8042:8042
  nodemanager2:
    image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
    container_name: nodemanager2
    hostname: nodemanager2
    ports:
      - 8043:8042
    environment:
      SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
    volumes:
      - ./capacity-scheduler.xml:/opt/hadoop-3.2.1/etc/hadoop/capacity-scheduler.xml
    env_file:
      - ./hadoop.env
  historyserver:
    ports:
      - 8188:8188

[Modification] Configuration in capacity-scheduler.xml

<!-- 指定多队列,增加 hive 队列 -->
<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,hive</value>
</property>
<!-- 降低 default 队列资源额定容量为 40%,默认 100% -->
<property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>40</value>
</property>
<!-- 降低 default 队列资源最大容量为 60%,默认 100% -->
<property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>60</value>
</property>

[New] Configuration in capacity-scheduler.xml

<!-- 指定 hive 队列的资源额定容量 -->
<property>
 <name>yarn.scheduler.capacity.root.hive.capacity</name>
 <value>60</value>
</property>
<!-- 用户最多可以使用队列多少资源,1 表示 -->
<property>
 <name>yarn.scheduler.capacity.root.hive.user-limit-factor</name>
 <value>1</value>
</property>
<!-- 指定 hive 队列的资源最大容量 -->
<property>
 <name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
 <value>80</value>
</property>
<!-- 启动 hive 队列 -->
<property>
 <name>yarn.scheduler.capacity.root.hive.state</name>
 <value>RUNNING</value>
</property>
<!-- 哪些用户有权向队列提交作业 -->
<property>
 <name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
 <value>*</value>
</property>
<!-- 哪些用户有权操作队列,管理员权限(查看/杀死) -->
<property>
 <name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
 <value>*</value>
</property>
<!-- 哪些用户有权配置提交任务优先级 -->
<property>
 <name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</name>
 <value>*</value>
</property>
<!-- 如果 application 指定了超时时间,则提交到该队列的 application 能够指定的最大超时
时间不能超过该值,-1不超时 -->
<property>
 <name>yarn.scheduler.capacity.root.hive.maximum-application-lifetime</name>
 <value>-1</value>
</property>
<!-- 如果 application 没指定超时时间,则用 default-application-lifetime 作为默认
值 -->
<property>
 <name>yarn.scheduler.capacity.root.hive.default-application-lifetime</name>
 <value>-1</value>
</property>

【Modify】hadoop.env file, adjust accordingly according to the situation

MAPRED_CONF_mapred_child_java_opts=-Xmx1024m
MAPRED_CONF_mapreduce_map_memory_mb=2048
MAPRED_CONF_mapreduce_reduce_memory_mb=2048
MAPRED_CONF_mapreduce_map_java_opts=-Xmx1024m
MAPRED_CONF_mapreduce_reduce_java_opts=-Xmx2048m

After configuration, execute again docker-compose up -dto recreate the container

After visiting http://resourcemanager:8088/cluster/scheduler, you can see that the hive queue has appeared

2. Submit tasks to the hive queue

hadoop jar /opt/hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount -D mapreduce.job.queuename=hive /shenjian/input /shenjian/output

Of course, we can also configure in WordCountDriver

conf.set("mapreduce.job.queuename", "hive");

After a while, you can see the completion information in FINSHED, as shown in the figure below

Click the ID, you can click LOGS to view the specific log information, of course, different files will be generated when it fails, click to view the error log

3. Task priority setting

Configure in hadoop.env

// 任务最高优先级为5
YARN_CONF_yarn_cluster_max___application___priority=5

Submit high priority tasks

hadoop jar /opt/hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi -D mapreduce.job.priority=5 5 2000000

You can also modify the priority of the task being executed by the following command

yarn application -appID <Application ID> -updatePriority 优先级

Welcome to pay attention to the official account algorithm niche and communicate with me

Guess you like

Origin blog.csdn.net/SJshenjian/article/details/129371513