Flink learning 2-Flink architecture

Flink architecture diagram

insert image description here
According to the architecture diagram of the official website, a Flink cluster will start a JobManager and multiple TaskManagers when it starts. The user's Flink program is submitted to the JobManager through the client, and the JobManager will distribute the programs submitted by different users to different TaskManagers for execution. However, TaskManger manages multiple tasks, and the real calculation is performed in the task. TaskManager will report heartbeat and statistics to JobManager. Data is transmitted between TaskManagers in the form of streams.
It should be noted that there is not a one-to-one correspondence between TaskManager and job. The smallest unit of flink scheduling is task rather than TaskManager. That is to say, different tasks from different jobs may run on different threads of the same TaskManager.
JobManager :
It decides when to schedule the next task (or set of tasks), reacts to completed tasks or execution failures, coordinates checkpoints, and coordinates recovery from failures, etc. This process consists of three distinct components:

  • ResourceManager
    ResourceManager is responsible for resource provisioning, recycling, and allocation in the Flink cluster - it manages task slots, which is the unit of resource scheduling in the Flink cluster (see TaskManagers). Flink implements corresponding ResourceManagers for different environments and resource providers (such as YARN, Mesos, Kubernetes and standalone deployments). In a standalone setting, the ResourceManager can only allocate slots of available TaskManagers, but cannot start a new TaskManager by itself.

  • Dispatcher
    Dispatcher provides a REST interface to submit Flink applications for execution and starts a new JobMaster for each submitted job. It also runs the Flink WebUI to provide job execution information.

  • JobMaster The JobMaster
    is responsible for managing the execution of a single JobGraph. Multiple jobs can run concurrently in a Flink cluster, and each job has its own JobMaster.
    There is always at least one JobManager. There may be multiple JobManagers in a High Availability (HA) setup, one of which is always the leader and the others are standby.

TaskManager:
Also called worker, each worker is a jvm process. It is a task that executes job streams, and buffers and exchanges data streams. Note here that there must always be at least one TaskManager. The smallest unit of resource scheduling in TaskManager is task slot.

Task&Task slots
Task slot is the smallest carrier of resource allocation in a TaskManager, which represents a fixed-size resource subset, and each TaskManager will equally distribute the resources it occupies to its slot.
By adjusting the number of task slots, users can define how tasks are isolated from each other. Each TaskManager has a slot, which means that each task runs in an independent JVM. If each TaskManager has multiple slots, that is to say, multiple tasks run in the same JVM. Tasks in the same JVM process can share TCP connections (based on multiplexing) and heartbeat messages, which can reduce data network transmission and share some data structures, reducing the consumption of each task to a certain extent.
Each slot can accept a single task or a pipeline composed of multiple consecutive tasks. As shown in the figure below, the FlatMap function occupies a taskslot, and the key Agg function and the sink function share a taskslot: in order to achieve the purpose of sharing the slot, in addition
insert image description here
to In order to chain the pipeline operator, we can also allow SlotSharingGroup, as shown in the figure below:
insert image description here
We can execute two operations that cannot be chained into one, such as flatmap and key&sink, in one TaskSlot, which can obtain the following benefits: sharing The slot eliminates the need to calculate the total number of tasks required by each task, and directly takes the parallelism of the highest operator to achieve higher utilization of computing resources. For example, the usual lightweight operation map and heavyweight operation Aggregate no longer require a thread, but can be executed in the same thread, and for scenarios with limited slots, we can increase the parallelism of each task.
Note: The above method can be directly operated at the operator level
insert image description here

Guess you like

Origin blog.csdn.net/qq_40342691/article/details/124726054