Complex Flink Task Balanced Scheduling and Optimization Measures

1. Background:

Flink task deployment uses a k8s-based standalone cluster. First deploy the Flink cluster on the container and then submit the Flink task. The submission of the Flink task is performed simultaneously with the creation and registration of the taskmanager.

Two, the problem

如果集群有35个taskmanager,140个slot,其中一个Vertex的并行度<140,属于该vertex的task在taskmanager上分布不均,导致节点负载不均衡。

As follows:

  • The flink topology has 5 vertex, two of which have a parallelism of 140, and the other three are set to 10, 30, 35 according to the number of kafka partitions. The maximum parallelism of the task is 140, and the task resource configuration is: 35 [4core 8gb] taskManager nodes

  • It can be found through the web ui that even if cluster.evenly-spread-out-slots: true is configured, the other three vertex tasks will still be scheduled to the same taskmanager

3. Optimization method

1. Problem Analysis

The appeal question can be simplified to:

假设一个任务拓扑逻辑为:Vertex A(p=2)->Vertex B(p=4)->Vertex C(p=2)。基于slot共享和本地数据传输优先的划分策略,划分为四个ExecutionSlotSharingGroup:{A1,B1,C1}、{A2,B2,C2}、{B3}、{B4},

If the resource configuration divides each Taskmanager into 2 Slots, the following allocations may occur:

The current Slot division is to divide the memory evenly, and there is no limit to the CPU. Appeal distribution will lead to unbalanced node load. If A and C Tasks consume more computing resources, TaskManager1 will become the bottleneck of computing. Ideally, we hope that the distribution method is:

2. Optimization

modify policy

1. 为ExecutionSlotSharingGroup申请slot时先对其按包含Task个数排序,优先调度Task个数多的分组

2. 延缓任务调度,等注册TaskManager个数足够大ExecutionSlotSharingGroup平均分配再为其申请Slot

Effect

优化后task调度情况:同个vertex的多个task均匀调度到不同的taskmanager节点上

4. Performance comparison

1. CPU load comparison

  • Before optimization: The CPU load among nodes is relatively distributed, and some nodes are in a 100% high load state for a long time

  • After optimization: The CPU load between nodes is relatively concentrated, and the nodes will not be in a 100% load state for a long time

Add another CPU usage comparison

It can be seen from the topology diagram that there are two tasks with different parallelism degrees of 200/480. By balancing the task sharegroup, the CPU load balance of each tm node is realized, so that we can subsequently compress the resource quota of tm.

2. Data Backlog

After optimization, the data backlog is reduced by half compared to before, with better processing capability and lower data latency under the same resource conditions.

  • Before optimization:

  • Optimized:

6. Thinking

1. Task balance

For topology: Vertex A(p=3)->Vertex B(p=4)->Vertex C(p=1).

will be distributed as follows:

Vertex B->Vertex C has four data transmission channels (B1->C1), (B2->C1), (B3->C1), (B4->C1), for non-forward connections, no matter which subtask is assigned to In the group, there are at least three channels that require cross-node communication.

Then, if you balance the tasks first when grouping: {A1, B1}, {A3, B3}, {A2, B2}, {B4, C1}, no matter how you schedule them later, they will be balanced. But when task num% slot num ! = 0, there is still a situation where tasks gather in a single tm.

2. Improvements to delayed scheduling

During the generation of execution plan by flink, the delay strategy is generated according to the topology logic to reduce the user's operation perception.

This article is from: "Big Data Technology and Architecture" public account

Guess you like

Origin blog.csdn.net/xianyu624/article/details/130993119