A slot and parallelism caused by flink akka timeoutException

First look at the program error message:

caused by: akka.pattern.AskTimeoutException: 
Ask timed out on [Actor[akka://flink/user/taskmanager_0#15608456]] after [10000 ms]. 
Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalRpcInvocation".

Following this problem, I saw a similar problem in the Issue list of Flink's official website: https://issues.apache.org/jira/browse/FLINK-9056. Looking at the comments is almost the reason for the insufficient number of TaskManager slots, which led to job submissions. failure. In Flink 1.63, it has been fixed to throw an exception.

I actually know that it is because of the lack of slots, so the blogger will briefly introduce the slots here.

1、parallelism

Parallelism means parallelism. In Flink, it represents the degree of parallelism of each task. Appropriately increasing the degree of parallelism can greatly improve the execution efficiency of the job. For example, if your job consumes Kafka data too slowly, if you adjust it properly, the consumption may be normal.

How to set parallelism?

(1): Set flink-conf.yaml in the configuration file

parallelism.default: 1  

(2) In the code environment

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(10);

note: The degree of parallelism set in this way is the degree of parallelism of your entire program, then if each of your operators does not separately set the degree of parallelism coverage, then the degree of parallelism of each subsequent operator is the value of the degree of parallelism set here Up.
(3) Each operator sets the degree of parallelism

source.map(new XxxMapFunction).setParallelism(5)

As above, the parallelism is set separately after the operator. In this case, even if you set env.setParallelism(10) before, it will be overwritten.

The priority is: operator setting parallelism> env setting parallelism> configuration file default parallelism

2、slot

Insert picture description here
The Task Manager in the figure receives the tasks that need to be deployed from the Job Manager. The parallelism of the tasks is determined by the slots available on each Task Manager. Each task represents a group of resources allocated to a task slot. Slot can be considered as a resource group in Flink. Flink divides each task into subtasks and assigns these subtasks to slots to execute programs in parallel.

For example, if the Task Manager has four slots, it will allocate 25% of the memory for each slot. You can run one or more threads in a slot. Threads in the same slot share the same JVM. Tasks in the same JVM share TCP connections and heartbeat messages. A Slot of Task Manager represents an available thread with fixed memory. Note that Slot only isolates memory, not CPU . By default, Flink allows subtasks to share Slot, even if they are subtasks of different tasks, as long as they are from the same job. This sharing can have better resource utilization.

Next, let’s illustrate with the legend: in the
Insert picture description here
picture above, there are two Task Managers, and each Task Manager has three slots, so that our operator can achieve a maximum parallelism of 6, and one or more subtasks can be executed in the same slot. .
Then look at the above picture again, source/map/keyby/window/apply can have a maximum of 6 parallelism, and sink only uses 1 parallelism.

Each Flink TaskManager provides slots in the cluster. The number of slots is usually proportional to the number of CPU cores available for each TaskManager. Under normal circumstances, your slot number is the number of cores of each of your TaskManager's cpu

Guess you like

Origin blog.csdn.net/qq_44962429/article/details/108054810