Flink Scheduling Strategy Optimization: Task Balance

1. Background:

Flink task deployment uses a k8s-based standalone cluster. First deploy the Flink cluster on the container and then submit the Flink task. The submission of the Flink task is performed simultaneously with the creation and registration of the taskmanager.

Two, the problem

If the cluster has 35 taskmanagers and 140 slots, and the parallelism of one vertex is <140, the tasks belonging to the vertex are unevenly distributed on the taskmanagers, resulting in unbalanced node load.
As follows,

  • The flink topology has 5 vertex, two of which have a parallelism of 140, and the other three parallelisms are set to 10, 30, and 35 according to the number of kafka partitions. The maximum parallelism of the task is 140, and the task resource configuration is: 35 [4core 8gb] taskManager nodes.
    Topology

  • It can be found through the web ui that even if cluster.evenly-spread-out-slots: true is configured , the other three vertex tasks will still be scheduled to the same taskmanager.
    insert image description here

insert image description here

3. Optimization method

1. Problem analysis

  • The appeal question can be simplified to:

Suppose a task topology logic is: Vertex A(p=2)->Vertex B(p=4)->Vertex C(p=2).
Based on the division strategy of slot sharing and local data transmission priority, it is divided into four ExecutionSlotSharingGroup : {A1, B1, C1}, {A2, B2, C2}, {B3}, {B4},
if the resource configuration divides each Taskmanager For 2 Slots, the following allocations may occur:

Slot1 Slot2
TaskManager1 {A1,B1,C1} {A2,B2,C2}
TaskManager2 {B3} {B4}

The current Slot division is to divide the memory evenly, and there is no limit to the CPU. Appeal distribution will lead to unbalanced node load. If A and C Tasks consume more computing resources, TaskManager1 will become the bottleneck of computing. Ideally, we hope that the distribution method is:

Slot1 Slot2
TaskManager1 {A1,B1,C1} {B3}
TaskManager2 {A2,B2,C2} {B4}

2. Optimization

modify policy
  1. When applying for slots for ExecutionSlotSharingGroup , first sort them by the number of Tasks they contain, and prioritize groups with more Tasks
  2. Delay task scheduling, and wait for the number of registered TaskManagers to be large enough to evenly distribute ExecutionSlotSharingGroup before applying for Slots for them
Effect
  • Optimized task scheduling: multiple tasks of the same vertex are evenly scheduled to different taskmanager nodes
    insert image description here
    insert image description here

4. Performance comparison

1. CPU load comparison

  • Before optimization: The CPU load among nodes is relatively distributed, and some nodes are in a 100% high load state for a long time

    insert image description here

  • After optimization: The CPU load between nodes is relatively concentrated, and the nodes will not be in a 100% load state for a long time
    insert image description here

1.2 Add another CPU usage comparison

It can be seen from the topology diagram that there are two tasks with different parallelism degrees of 200/480. By balancing the task sharegroup, the CPU load balance of each tm node is realized, so that we can subsequently compress the resource quota of tm.insert image description here
insert image description here

2. Data Backlog

After optimization, the data backlog is reduced by half compared to before, with better processing capability and lower data latency under the same resource conditions.

  • Before optimization:
    insert image description here
  • Optimized:
    insert image description here

6. Thinking

1. Task balance

For topology: Vertex A(p=3)->Vertex B(p=4)->Vertex C(p=1). will be distributed as follows

Slot1 Slot2
TaskManager1 {A1,B1,C1} {A3,B3}
TaskManager2 {A2,B2} {B4}

Vertex B->Vertex C has four data transmission channels (B1->C1), (B2->C1), (B3->C1), (B4->C1), for non-forward connections, no matter which subtask is assigned to In the group, there are at least three channels that require cross-node communication.
Then, if you balance the tasks first when grouping: {A1, B1}, {A3, B3}, {A2, B2}, {B4, C1}, no matter how you schedule them later, they will be balanced. But when task num% slot num ! = 0, there is still a situation where tasks gather in a single tm.

2. Improvements to delayed scheduling

During the period of flink generating the execution plan, the delay strategy is generated according to the topology logic to reduce the user's operation perception

Guess you like

Origin blog.csdn.net/qq_30708747/article/details/120081265