Flink tutorial (19) Flink architecture JobManager TaskManager Task SubTask Slot

Insert picture description here

I. Introduction

Recently I found that playing the game was a bit too much. I was still playing at 2 o'clock in the night. Arthur played more than a thousand games and got 16.0 points for the second time. I feel pretty good, but in fact it's still a dish. I plan to not be able to play King recently, it is too addictive, as long as I have time to play it uncontrollably. And I really like to play 1V1. Although it takes a few minutes to play, I can play for an hour or two without stopping. I remember the last time I played with Sun Shangxiang, who was in the top 100 in a certain urban area, and it took 13 minutes to cross the tower and kill. After the victory, click to surrender, exit, and uninstall the game. It was 2 o'clock at night.

I don't remember how many times I deleted the glory of the king, and how many times I installed it back within a day or two. Every time I think I can’t play, delete it and study hard. But every time he slapped his face and put it back, it was still a game.

My self-discipline is too poor, I must give up thinking about it. There are many other things to do in life, such as learning the basics of Flink. Every time I see a good tutorial, the collection never stops, and the learning never starts.

Two, Flink architecture

Insert picture description here
The above picture is a classic picture of Flink's official website, which is shown in many blogs.

  • The box with a white background on the left, Flink Program, is a Flink program, which does not actually belong to the Flink architecture. The Client communicates with the JobManager.
  • JobManager controls the execution of a program, equivalent to a big boss
  • TaskManager is the work process of Flink . Generally, there are multiple in the cluster, which is equivalent to a group of hard-working workers.

1. JobManager

JobManager coordinates and manages the execution of the program.
His main responsibilities include: task scheduling, checkpoints management, fault recovery

  1. The Flink client is responsible for sending GobGraph (job graph) to JobManager
  2. JobManager reproduces ExecutionGraph (execution graph) and sends it to TaskManager
  3. JobManager will TaskManager result of the implementation of the return back to the Flink client

JobManager inside mainly comprises three parts, Jobmaster , the ResourceManager and the Dispatcher .

1.1 JobMaster

Due to the early iteration of Flink version and the relatively few Chinese documents, it is messy. It is easy to confuse JobManager and JobMaster.

For JobMaster, Flink Dispatcher sends JobGraph to JobMaster through JobManagerRunner, and JobMaster then converts JobGraph to ExecutionGraph and distributes it to TaskManager for execution.

1.2 ResourceManager

ResourceManager is responsible for resource management, and there is only one in the entire Flink cluster. This resource is actually the slot of the management task manager.

Note: This ResourceManager is not Yarn, it is the built-in resource manager of Flink.
Yarn also has a ResourceManager. If Yarn mode is used, Yarn will automatically manage and allocate resources (TaskManager slots).

1.3 Dispatcher

Dispatcher provides a REST interface for us to submit Jobs to JobManager (actually to JobMaster in JobManager).

2. TaskManager

  • Flink usually has multiple TaskManagers, and each TaskManager contains a certain number of slots.
  • The number of slots limits the number of tasks that TaskManager can perform.
  • After Flink is started, the TaskManager will register its slot with the resource manager.

2.1 Task和SubTask

  • The upper part of the figure is the condensed view , which is the logic diagram , which is the JobGraph that the client gives to the JobManager.

  • The lower part of the figure is the parallelized view , which is the execution graph , which is calculated by the JobManager according to the JobGraph+ parallelism.

  • Task and Subtask are concepts on different levels . It cannot be simply said that Subtask is a subtask of Task.

  • Task is on the logical diagram , and Task is a logical concept . An Operator represents a Task (a new Operator generated after multiple Operators are chained is considered an Operator).

  • When it is actually running, the Task will be divided into multiple Subtasks according to the degree of parallelism, and the Subtask is the basic unit of execution/scheduling . Each Subtask needs a thread to execute.
    Insert picture description here
    Insert picture description here
    It can be seen that there are 5 subtasks in the above picture. Remember this is useful in the concept of slots below.

2.2 Slot

TaskManager is a JVM process , in which one or more Subtasks can be run in parallel .

Each Subtask is a thread , and the TaskManager needs to allocate corresponding resources ( memory ) for it, and the TaskManager uses Task Slot to allocate resources to the Subtask.

By adjusting the number of task slots, users can define how subtasks are isolated from each other.

The Subtasks in the 5 green circles in the previous section are allocated to different slots. The current parallelism is 2 , and 5 slots are occupied at this time . The slot on the far right is free. By default, Flink allows subtasks to share slots, even if they are subtasks of different tasks, as long as they are from the same job . The result is that one slot can hold the entire job pipeline. The degree of parallelism into 6 , this time representing the six slot . The slot utilization is high at this time.

Insert picture description here

Insert picture description here

  • In the same application, that is, in the same job, multiple subtasks of different tasks can run in the same slot resource slot.
  • Multiple subtasks in the same task cannot run in one slot resource slot, they can be distributed to other resource slots.

Guess you like

Origin blog.csdn.net/winterking3/article/details/115299428