Flink runtime components and Yarn-based task submission

Runtime components

The Flink runtime architecture mainly includes the following four different components, which work together when running stream processing applications:

  • 分发器(Dispatcher):It can run across jobs, and it provides a REST interface for application submission. When an application is submitted for execution, the dispatcher will start and transfer the application to a JobManager. Because it is a REST interface, Dispatcher can be used as an HTTP access point for the cluster, so that it is not blocked by firewalls. Dispatcher will also start a Web UI to conveniently display and monitor job execution information (Dispatcher may not be necessary in the architecture, depending on how the application is submitted and run).
  • 作业管理器(JobManager):The main process that controls the execution of an application, that is, each application will be controlled and executed by a different JobManager. JobManager will first receive the application to be executed. This application will include: JobGraph, logical dataflow graph, and JAR package that packs all classes, libraries, and other resources. JobManager will convert JobGraph into a physical data flow graph. This graph is called "Execution Graph" and contains all tasks that can be executed concurrently. The JobManager will request the resource manager (ResourceManager) for the resources necessary to execute the task, that is, the slot on the TaskManager (TaskManager). Once it has obtained enough resources, it will distribute the execution graphs to the TaskManagers that actually run them. In the running process, JobManager will be responsible for all operations that require central coordination, such as the coordination of checkpoints.
  • 资源管理器(ResourceManager): It is mainly responsible for managing the slot of the TaskManager. The TaskManger slot is a processing resource unit defined in Flink. Flink provides different resource managers for different environments and resource management tools, such as YARN, Mesos, K8s, and standalone deployment. When JobManager applies for slot resources, ResourceManager will assign TaskManagers with free slots to JobManager. If the ResourceManager does not have enough slots to satisfy the JobManager request, it can also initiate a session to the resource providing platform to provide a container for starting the TaskManager process. In addition, ResourceManager is also responsible for terminating idle TaskManagers and releasing computing resources.
  • 任务管理器(TaskManager):Work processes in Flink. Usually there are multiple TaskManagers running in Flink, and each TaskManager contains a certain number of slots. The number of slots limits the number of tasks that TaskManager can perform.
    After startup, the TaskManager will register its slot with the resource manager; after receiving an instruction from the resource manager, the TaskManager will provide one or more slots to the JobManager to call. The JobManager can assign tasks to the slots for execution. During execution, a TaskManager can exchange data with other TaskManagers running the same application.

Overall, Flink uses a standard master-slavestructure. The Master part contains three components: Dispatcher, ResourceManager and JobManager, and the Slave part mainly refers to TaskManager.

Task submission

After the Flink task is submitted, it will first start a Client process responsible for the compilation and submission of the job. It first compiles the code written by the user into a JobGraph (some checks or optimizations will be performed, for example, to determine which Operators can be chained to the same Task) .

Then, the Client submits the generated JobGraph to the cluster for execution. There are two situations at this time. One is the Session mode similar to Standalone. The AM will be started in advance. At this time, the Client directly establishes a connection with the Dispatcher and submits the job. The other is the Per-Job mode. AM will not be started in advance. At this time, the Client will first apply for resources from the resource management system (such as Yarn) to start AM, and then submit the job to the Dispatcher in AM.

Flink on Yarn(Per-Job)

Insert picture description here

After the Flink task is submitted, the Client uploads the Flink Jar package and configuration to HDFS, and then submits the task to the Yarn ResourceManager. The ResourceManager allocates Container resources and informs the corresponding NodeManager to start the Dispatcher (the above figure is omitted) & After the ApplicationMaster ,, ApplicationMaster 启动后加载 Flink 的 Jar 包和配置构建环境, 然后启动 JobManager,之后 ApplicationMaster 向 ResourceManager 申请资源启动 TaskManagerResourceManager allocates the Container resources, The ApplicationMaster notifies the NodeManager of the node where the resource is located to start the TaskManager,NodeManager 加载 Flink 的 Jar 包和配置构建环境并启动 TaskManager, TaskManager 启动后向 JobManager 发送心跳包,并等待 JobManager 调度向其分配任务。

Reference link

https://ververica.cn/developers/advanced-tutorial-1-analysis-of-the-core-mechanism-of-runtime/

Pay attention to the official account, 数据工匠记and focus on the offline and real-time technical dry goods in the big data field to share regularly! Personal website www.lllpan.top
Insert picture description here

Guess you like

Origin blog.csdn.net/lp284558195/article/details/114901122