Flink 1.17 Tutorial: Standalone Session Mode Runtime Architecture and Parallelism

Runtime architecture - Standalone session mode as an example

img

Parallelism

Parallelism refers to the ability to perform multiple tasks or operations simultaneously in a computing process. In Apache Flink, parallelism refers to the ability to execute multiple tasks or operators in a job simultaneously.

Parallelism was introduced to solve the following problems:

  1. Increased calculation speed: By splitting a task into multiple subtasks and executing them in parallel, the calculation speed can be greatly improved. Each subtask can be executed independently on different computing resources, so as to make full use of the parallel processing capability of computing resources.
  2. Processing large-scale data: When processing large-scale data, a single task may face the problem of insufficient memory or excessive processing time. By dividing the task into multiple parallel tasks, the load can be balanced to multiple computing resources, improving the efficiency and scalability of processing large-scale data.
  3. Improve the fault tolerance of the system: Parallelism can also improve the fault tolerance of the system. If a task or operator fails, other parallel tasks can still continue, reducing the impact on the overall job.

In general, through parallelism, tasks can be decomposed into smaller units and executed simultaneously, improving computing speed, processing large-scale data and improving the fault tolerance of the system. This is a concept introduced to better utilize computing resources, improve system performance, and handle large-scale data.

img

The number of subtasks of a particular operator is called its parallelism. In this way, a data stream containing parallel subtasks is a parallel data stream, which requires multiple partitions (stream partition) to distribute parallel tasks. In general, the degree of parallelism of a stream program can be considered to be the largest degree of parallelism among all its operators. In a program, different operators may have different degrees of parallelism.
For example, as shown in the figure above, there are four operators in the current data stream: source, map, window, and sink. The parallelism of the sink operator is 1, and the parallelism of other operators is 2. So the parallelism of this stream processing program is 2.

Guess you like

Origin blog.csdn.net/a772304419/article/details/132626415