Detailed explanation of flink solt concept

Ask is a logical concept in flink. A task is composed of one or more operators (multiple operators must meet certain conditions to form a task. Interested veterans can learn more about Operator Chain) , in order to improve the efficiency of task execution, parallelism can be configured for the task, so that the task can be executed in parallel during the actual running process. At this time, multiple parallel tasks of the task are called subtasks (subTask). As shown in the figure below: each dotted box is a task, and the circles in the box are subtasks.
insert image description here

To sum up: Task is a logical concept, subTask is an actual running instance, and the number of subTasks in a Task is the degree of parallelism mentioned above. In the figure above, there are 3 Tasks and 6 subTasks. 

Slot
Slot is the basic unit of resource allocation in the Flink cluster. Slots are mainly distributed in TaskManager. Anyone who knows the Flink architecture knows that TaskManager is a jvm process and is where subTask runs.

When the TaskManager starts, it will register its own resources to the ResourceManager in the form of Slots, and then after the JobManager applies for the Slot resources from the ResourceManager, it will schedule subTasks to run on these Slots. In the whole process, the sub task is the basis of scheduling. Unit, Slot is the basic unit of resource allocation.
Here we need to explain: The memory is isolated between the slots, and the CPU is not isolated, that is, the memory is independent, but the CPU is shared .

What is the relationship between Task parallelism and the number of slots?
The parallelism of a Task requires that the task has a specified number of subTasks to be executed in parallel, so each subTask is required to run in a different solt, so the number of slots cannot be less than the parallelism of the task .

Slot sharing means slot sharing, but it should be noted here that the subTask that shares the slot needs to meet the following conditions:

1. The subTask must come from the same job, thinking that the resources between different jobs are isolated, and the TaskManager is isolated , not to mention the slot on the TaskManager.

2. SubTasks must come from different Tasks. SubTasks of the same Task do not need to share a slot. Otherwise, the meaning of parallelism will be lost.

What happens if the slots are not shared and each sub task runs on a separate slot?

We know that in the DAG of a job, different tasks consume different resources. If the slots are equally divided, some resources must have high utilization rates and some have low utilization rates. To limit the sharing of sub tasks of different tasks, put the ones with high resource usage and the ones with low resource usage together as much as possible, so that resources can be reused. Otherwise, after the sub tasks with low resource usage are finished running, the slots assigned to them will be idle.

In addition, slot sharing also lowers the threshold for a job to run on resources. If each slot is shared, the number of slots required for a job to run is the same as the number of all sub tasks in the job. With slot sharing , the number of slots required depends on the maximum number of parallelism among all Tasks in the DAG.

In the scenario with slot sharing, the above application only needs 2 slots:

insert image description here
In the scenario without slot sharing, 6 slots are required:
insert image description here

Chaining operators into tasks is a useful optimization:

  • It reduces the overhead of switching between threads, buffering, and increases overall throughput while reducing latency.
  • Chaining behavior is configurable; chaining two operators together allows them to execute in the same thread, improving performance.

By default, Flink will chain operators that can be chained as much as possible (for example, two map transformation operations). In addition, Flink also provides an API for finer-grained control of chaining to meet more needs:
if you want to disable operator chaining for the entire job, you can call StreamExecutionEnvironment.disableOperatorChaining(). The following methods also provide finer-grained control. It should be noted that these methods can only be called after the DataStream conversion operation, because they only take effect for the previous data conversion. For example, someStream.map(...).startNewChain() can be called, but not someStream.startNewChain().
A resource group corresponds to a slot in Flink, and you can manually isolate operators into different slots as needed.
    
 

Start new chain Start a new connection with the current operator as the starting point. The following two mapper operators will be linked together but the filter operator will not be linked with the first mapper operator. someStream.filter(...).map(...).startNewChain().map(...);
Disable chaining
Any operator cannot be linked with the current operator; someStream.map(...).disableChaining();
Set slot sharing group

Configure the operator's resource group. Flink places operators of the same resource group in the same slot for execution, and assigns operators of different resource groups to different slots, thereby achieving slot isolation. The resource group will be inherited from the input operator if all input operations are in the same resource group. Flink's default resource group name is "default", and operators can explicitly call slotSharingGroup("default") to join this resource group. someStream. filter(...). slotSharingGroup("name");

Task Slots and resources
Each worker (TaskManager) is a JVM process that can execute one or more subtasks in a separate thread. In order to control how many tasks are accepted in a TaskManager, there are so-called task slots (at least one). Each task slot represents a fixed subset of resources in the TaskManager.
For example, a TaskManager with 3 slots will use 1/3 of its managed memory for each slot. Allocating resources means that the subtask does not compete with other job's subtasks for managed memory, but instead has a certain amount of reserved managed memory. Note that there is no CPU isolation here; the current slot only isolates the task's managed memory. By adjusting the number of task slots, users can define how subtasks are isolated from each other. There is one slot per TaskManager, which means that each task group runs in a separate JVM (eg, can be started in a separate container). Having multiple slots means more subtasks sharing the same JVM. Tasks in the same JVM share TCP connections (by multiplexing) and heartbeat information. They can also share datasets and data structures, reducing per-task overhead.
 

A slot in Flink is the smallest unit of resource application for task execution. All slots on the same TaskManager are only separated by memory and not isolated by CPU.
Each TaskManager is a JVM process. If there is only one slot on a TaskManager, it means that each task group runs in a separate JVM. If there are multiple slots, it means that more subtasks share the same JVM.
In general, how many subtasks there are is how many parallel threads there are , and the subtasks executed in parallel must be released to different slots for execution.
By default, Flink will link the operators that can be linked as much as possible, that is, the operator chain. Flink will send all the subtasks in the same operator chain group to the same slot for execution, which means that a slot may need to be executed. Multiple subtasks, that is, multiple threads.
Flink can manually isolate each operator into different slots as needed.
The total slot used by a task is the sum of the slots occupied by all resource isolation groups. In the same resource isolation group, slots are allocated according to the maximum parallelism of operators.

If the parallelism is 4, but there are only 3 slots, the deployment will fail
 

Guess you like

Origin blog.csdn.net/qq_35240226/article/details/129009358