In small and medium enterprises, we generally configure as a capacity scheduler
Please check the local host configuration first
127.0.0.1 datanode
127.0.0.1 namenode
127.0.0.1 resourcemanager
127.0.0.1 nodemanager
127.0.0.1 nodemanager2
127.0.0.1 historyserver
1. The container scheduler configures multiple queues
Requirement 1: The default queue accounts for 40% of the total memory, the maximum resource capacity accounts for 60% of the total resources, the hive queue accounts for
60% of the total memory, and the maximum resource capacity accounts for 80% of the total resources.
Requirement 2: Configure queue priority
Multi-queue configuration requires /opt/hadoop-3.2.1/etc/hadoop/capacity-scheduler.xml, we first copy the file out of the container
docker cp fd7a9150237:/opt/hadoop-3.2.1/etc/hadoop/capacity-scheduler.xml .
Mount all containers in docker-compose.yml
volumes:
- ./capacity-scheduler.xml:/opt/hadoop-3.2.1/etc/hadoop/capacity-scheduler.xml
At the same time, we map port 8042 to nodemanager, map port 8188 to historyserver, and add a new nodemanager node [When the author executes WordCount, the resources of a single node are not enough, so mapreduce cannot be performed]
nodemanager:
ports:
- 8042:8042
nodemanager2:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
container_name: nodemanager2
hostname: nodemanager2
ports:
- 8043:8042
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
volumes:
- ./capacity-scheduler.xml:/opt/hadoop-3.2.1/etc/hadoop/capacity-scheduler.xml
env_file:
- ./hadoop.env
historyserver:
ports:
- 8188:8188
[Modification] Configuration in capacity-scheduler.xml
<!-- 指定多队列,增加 hive 队列 -->
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,hive</value>
</property>
<!-- 降低 default 队列资源额定容量为 40%,默认 100% -->
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>40</value>
</property>
<!-- 降低 default 队列资源最大容量为 60%,默认 100% -->
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>60</value>
</property>
[New] Configuration in capacity-scheduler.xml
<!-- 指定 hive 队列的资源额定容量 -->
<property>
<name>yarn.scheduler.capacity.root.hive.capacity</name>
<value>60</value>
</property>
<!-- 用户最多可以使用队列多少资源,1 表示 -->
<property>
<name>yarn.scheduler.capacity.root.hive.user-limit-factor</name>
<value>1</value>
</property>
<!-- 指定 hive 队列的资源最大容量 -->
<property>
<name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
<value>80</value>
</property>
<!-- 启动 hive 队列 -->
<property>
<name>yarn.scheduler.capacity.root.hive.state</name>
<value>RUNNING</value>
</property>
<!-- 哪些用户有权向队列提交作业 -->
<property>
<name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
<value>*</value>
</property>
<!-- 哪些用户有权操作队列,管理员权限(查看/杀死) -->
<property>
<name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
<value>*</value>
</property>
<!-- 哪些用户有权配置提交任务优先级 -->
<property>
<name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</name>
<value>*</value>
</property>
<!-- 如果 application 指定了超时时间,则提交到该队列的 application 能够指定的最大超时
时间不能超过该值,-1不超时 -->
<property>
<name>yarn.scheduler.capacity.root.hive.maximum-application-lifetime</name>
<value>-1</value>
</property>
<!-- 如果 application 没指定超时时间,则用 default-application-lifetime 作为默认
值 -->
<property>
<name>yarn.scheduler.capacity.root.hive.default-application-lifetime</name>
<value>-1</value>
</property>
【Modify】hadoop.env file, adjust accordingly according to the situation
MAPRED_CONF_mapred_child_java_opts=-Xmx1024m
MAPRED_CONF_mapreduce_map_memory_mb=2048
MAPRED_CONF_mapreduce_reduce_memory_mb=2048
MAPRED_CONF_mapreduce_map_java_opts=-Xmx1024m
MAPRED_CONF_mapreduce_reduce_java_opts=-Xmx2048m
After configuration, execute again docker-compose up -d
to recreate the container
After visiting http://resourcemanager:8088/cluster/scheduler, you can see that the hive queue has appeared
2. Submit tasks to the hive queue
hadoop jar /opt/hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount -D mapreduce.job.queuename=hive /shenjian/input /shenjian/output
Of course, we can also configure in WordCountDriver
conf.set("mapreduce.job.queuename", "hive");
After a while, you can see the completion information in FINSHED, as shown in the figure below
Click the ID, you can click LOGS to view the specific log information, of course, different files will be generated when it fails, click to view the error log
3. Task priority setting
Configure in hadoop.env
// 任务最高优先级为5
YARN_CONF_yarn_cluster_max___application___priority=5
Submit high priority tasks
hadoop jar /opt/hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi -D mapreduce.job.priority=5 5 2000000
You can also modify the priority of the task being executed by the following command
yarn application -appID <Application ID> -updatePriority 优先级
Welcome to pay attention to the official account algorithm niche and communicate with me