[Big Data] Detailed Explanation of Flink (7): Source Code II

This series includes:


69. What is the difference between flow diagram, operation diagram and execution diagram?

Since Flink now implements stream-batch integrated code, the Batch API is basically abandoned, so I won't introduce it too much. In the Flink DataStream API, the internal conversion graph of Graph is as follows:

insert image description here
Taking WordCount as an example, the task scheduling among flow graph, job graph, execution graph, and physical execution graph is as follows:

insert image description here
For Flink stream computing applications, when running user code, first call the DataStream API to convert the user code into Transformation, then go through: StreamGraph → JobGraph → ExecutionGraphthree layers of conversion (these are Flink's built-in data structures), and finally go through Flink scheduling execution, in the Flink cluster Start computing tasks to form a physical execution graph.

70. Introduce the flow diagram?

insert image description here
There are two core objects of StreamGraph: StreamNode and StreamEdge .

  • StreamNode is converted from Transformation. It can be simply understood that StreamNode represents an operator, which exists entity and virtual, and can have multiple inputs and outputs. The entity StreamNode eventually becomes a physical operator, which is attached to the edge of StreamEdge virtually.

  • StreamEdge is the edge of StreamGraph, used to connect two StreamNode points, a StreamEdge can have multiple outgoing edges, incoming edges and other information.

71. Tell me about the homework diagram?

JobGraph is optimized from StreamGraph. It combines operators through the OperationChain mechanism. During execution, they are scheduled on the same Task thread to avoid cross-thread and cross-network transfer of data.

insert image description here
Job graph JobGraph core objects include three:

  • JobVertex point : After operator fusion optimization, multiple StreamNodes that meet the conditions may be fused together to generate a JobVertex, that is, a JobVertex contains one or more operators, the input of the JobVertex is JobEdge, and the output is IntermediateDataSet.

  • JobEdge edge : JobEdge represents a data flow channel in JobGraph, its upstream data source is IntermediateDataSet, and its downstream consumer is JobVertex. The data distribution mode in JobEdge will directly affect whether the data connection relationship between Tasks is point-to-point connection or full connection during execution.

  • IntermediateDataSet intermediate data set : intermediate data set IntermediateDataSet is a logical structure used to represent the output of JobVertex, that is, the data set generated by the operators contained in the JobVertex. In different execution modes, the corresponding result partition types are different, which determines the mode of data exchange at the time of execution.

72. Tell us about the execution diagram?

ExecutionGraph is the core data structure for scheduling the execution of Flink jobs, including all the parallel execution Task information in the job, the association relationship between Tasks, and the data flow relationship.

Both StreamGraph and JobGraph are generated on Flink Client and then handed over to Flink cluster. JobGraph to ExecutionGraph is completed in JobMaster, and the important changes during the conversion process are as follows:

  • The concept of parallelism has been added to become a truly schedulable graph structure.
  • generated 6 66 core objects.

insert image description here
Execution graph ExecutionGraph core objects include 6 66 :

  • ExecutionJobVertex : This object corresponds to the JobVertex in the JobGraph. The object also contains a set of ExecutionVertex, the number of which is consistent with the parallelism of the StreamNode contained in the JobVertex, assuming that the parallelism of the StreamNode is 5 55 , then ExecutionJobVertex will also contain5 55 ExecutionVertex. ExecutionJobVertex is used to encapsulate a JobVertex into ExecutionJobVertex, and create ExecutionVertex, Execution, IntermediateResult and IntermediateResultPartition in turn to enrich ExecutionGraph.

  • ExecutionVertex : ExecutionJobVertex will parallelize the job and construct instances that can be executed in parallel. Each instance of parallel execution is ExecutionVertex.

  • IntermediateResult : IntermediateResult is also called intermediate result set. This object is a logical concept representing the output of ExecutionJobVertex, which corresponds to IntermediateDalaSet in JobGrap. The same ExecutionJobVertex can have multiple intermediate results, depending on how many edges (JobEdge) the current JobVertex has. .

  • IntermediateResultPartition : IntermediateResultPartition is also called intermediate result partition. means 1 11 ExecutionVertex output result, associated with ExecutionEdge.

  • ExecutionEdge : Indicates the input of ExecutionVertex, connected to the IntermediateResultPartition generated upstream. 1 11 Execution corresponds to the unique1 11 IntermediateResultPartition and1 11 ExecutionVertex. 1 11 ExecutionVertex can have multiple ExecutionEdge.

  • Execution : ExecutionVertex is equivalent to the template of each Task. When it is actually executed, the information in ExecutionVertex will be packaged as 1 11 Execution, one attempt to execute one ExecutionVertex.

The update of Task deployment and Task execution status between JobManager and TaskManager is identified by ExecutionAttemptID.

73. Introduce the concept of Flink scheduler?

The scheduler is the core component of Flink job execution. It manages all related processes of job execution, including the conversion from JobGraph to ExecutionGraph, job life cycle management (job release, cancellation, stop), job Task life cycle management (Task release, cancel, stop), resource application and release, job and Task failover, etc.

  • DefaultScheduler : Flink's current default scheduler is Flink's new scheduling design, which usesSchedulerStrategyto implement scheduling.

  • LegacySchedular : The scheduler in the past realized the original Execution scheduling logic.

74. How many types of Flink scheduling behaviors are there?

The SchedulingStrategy interface defines the scheduling behavior, which contains 4 44 behaviors:

insert image description here

  • startScheduling: Scheduling entry, which triggers the scheduling behavior of the scheduler.
  • restartTasks: Restart the Task that failed to execute, which is usually caused by an abnormal execution of the Task.
  • onExecutionStateChange: When the Execution state changes.
  • onPartitionConsumable: When the data in IntermediateResultPartition can be consumed.

75. How many scheduling modes does Flink include?

Scheduling mode contains 3 33 types: Eager mode, phased mode (Lazy_From_Source), phased Slot reuse mode (Lazy_From_Sources_With_Batch_Slot_Request).

  • Eager scheduling : suitable for stream computing. Apply for all the resources required at one time. If the resources are insufficient, the job will fail to start.

  • Phased Scheduling :LAZY_FROM_SOURCESSuitable for batch processing. Scheduling starts from SourceTask in stages. When applying for resources, apply for all the resources needed in this stage at one time. After the upstream task is executed, it starts to schedule and execute the downstream task, reads the upstream data, and executes the calculation tasks of this stage. After the execution is completed, it schedules the tasks of the next stage, and schedules them in turn until the job is completed.

  • Phased Slot Reuse Scheduling :LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUESTSuitable for batch processing. It is basically the same as staged scheduling, except that this mode uses the batch resource application mode, which can execute jobs when resources are insufficient, but it is necessary to ensure that there is no Shuffle behavior during job execution at this stage.

The resource application logic of Eagerthe pattern and the pattern currently in sight is the same, and it is a separate resource application logic.LAZY_FROM_SOURCESLAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST

76. How many types of Flink scheduling strategies are there?

insert image description here
insert image description here
The scheduling strategy is all implemented in the scheduler SchedulingStrategy, and there are three implementations:

  • EagerSchedulingStrategy : Suitable for stream computing, and schedules all tasks at the same time.

  • LazyFromSourcesSchedulingStrategy : Suitable for batch processing, vertices scheduling is performed when the input data is ready (upstream processing is completed).

  • PipelinedRegionSchedulingStrategy : Schedule at the local granularity of the pipeline.

PipelinedRegionSchedulingStrategyis 1.11 1.11Added in 1.11 , from 1.12 1.12Starting from 1.12 , scheduling will be performed pipelined regionin

pipelined regionis a set of pipelined tasks. This means that for regionstreaming jobs with multiple , it no longer waits for all tasks to be fetched before starting to deploy the tasks slot. Instead, it can be deployed once any regionhas acquired enough tasks . slotFor batch jobs, tasks will not be assigned slotand tasks will not be deployed individually. Instead, once regionenough of a quest is acquired slot, that quest will be deployed in the same region with all other quests.

77. What states does the Flink job life cycle contain?

In the Flink cluster, the JobMaster is responsible for the life cycle management of the job, and the specific management behavior is implemented in the scheduler and ExecutionGraph.

The complete life cycle state transition of a job is shown in the following figure:

insert image description here

  • A job is first in a created state ( created), then switches to a running state ( running), and when all work is done, it switches to a completed state ( finished).

  • In case of failure, the job first switches to the failed state ( failing), canceling all running tasks. If all nodes have reached the final state and the job is not restartable, the state transitions to Failed ( failed).

  • If the job can be restarted, it enters the restarting state ( restarting). Once it finishes rebooting, it will change to the created state ( created).

  • In case the user cancels the job, it enters the cancel state ( cancelling), which cancels all currently running tasks. Once all running tasks have reached the final state, the job transitions to the canceled state ( canceled).

The completed state (finished), canceled state (canceled) and failed state (failed) represent a global finalized state and trigger cleanup, while the suspended state (suspended) is only in a local finalized state. Means that the execution of the job is terminated on the corresponding JobManager, but another JobManager of the cluster can recover this job from the persistent HA storage and restart it. Therefore, jobs that are in a suspended state will not be cleaned up completely.

78. What states does the job life cycle of Task contain?

TaskManager is responsible for the life cycle management of Task, and notifies JobMaster of status changes, and tracks the status changes of Execution in ExecutionGraph, one Execution for one Task.

The life cycle of Task is as follows: a total of 8 88 states.

insert image description here

During the execution of the ExecutionGraph, each parallel task goes through several stages, from creation ( created) to completion ( finished) or failure ( failed), and the above diagram illustrates the states and possible transitions between them. Tasks can be executed multiple times (e.g. failover). Each Execution tracks the execution of an ExecutionVertex, and each ExecutionVertex has a current Execution ( current execution) and a predecessor Execution ( prior execution).

79. Explain the task scheduling process of Flink?

The task scheduling flow chart is as follows:

insert image description here
(1) When Flink executes the executor, it will automatically generate a DAG data flow graph, namely Jobgraph, according to the program code.

(2) ActorSystem creates Actor and sends the data flow graph to Actor in JobManager.

(3) JobManager will continue to receive the heartbeat message of TaskManager, so that it can obtain a valid TaskManager.

(4) The JobManager schedules and executes the Task in the TaskManager through the scheduler (in Flink, the smallest scheduling unit is the Task, which corresponds to a thread).

(5) During the running of the program, data transmission between Task and Task is possible.

  • Job Client
    • The main responsibility is to submit the task. After submitting, the process can be ended or the result can be returned.
    • The Job Client is not an internal part of Flink program execution, but it is the starting point for task execution.
    • The Job Client is responsible for accepting the user's program code, then creating a data stream, and submitting the data stream to the JobManager for further execution. After the execution is complete, the Job Client returns the result to the user.
  • JobManager
    • The main responsibility is to schedule work and coordinate tasks to do checkpoints.
    • There must be at least one master in the cluster, and the master is responsible for scheduling tasks, coordinating checkpoints and fault tolerance.
    • There can be multiple masters in a high-availability setting, but one must be the leader and the others are stand bys.
    • Job Manager contains Actor System, Scheduler, CheckPointthree important components.
    • After the JobManager receives the task from the client, it first generates an optimized execution plan, and then schedules it to the TaskManager for execution.
  • TaskManager
    • The main responsibility is to receive tasks from JobManager, deploy and start tasks, receive and process upstream data.
    • TaskManagers are worker nodes that execute tasks in one or more threads in the JVM.
    • TaskManager has set up Slots at the beginning of creation, and each Slot can execute a task.

80. What does Flink's Task Slot mean?

insert image description here
Each TaskManager is a JVM process that can execute one or more subtasks in different threads. In order to control workerhow many a can receive task. Controlled workerby (one has at least one ). Each represents a fixed-size subset of resources owned by the TaskManager.task slotworkertask slottask slot

Generally speaking, the number of slots we allocate is equal to the number of cores of the CPU, such as 8 88 cores, then allocate8 88 slots. Flink divides the memory of a process into multipleslot. There are 2 2in the picture2 TaskManagers, each TaskManager has3 33 ,sloteachslotoccupying1 / 3 1/31/3 of memory.

After the memory is divided into different , slotthe following benefits can be obtained:

  • The task that TaskManager can execute concurrently at most is controllable, that is 3 33 becauseslotthe number of cannot be exceeded. The role of the task slot is to separate the managed memory of the task, and no CPU isolation will occur.
  • slotThere is an exclusive memory space, so that multiple different jobs can be run in one TaskManager, and the jobs will not be affected.

Summary: task slotThe number of represents taskthe number of TaskManagers that can be executed in parallel.

81. What does Flink slot sharing mean?

By default, Flink allows subtasks to share slots, even if they are subtasks of different tasks, as long as they come from the same job. The result is a slot that can hold the entire pipeline of a job. Allowing slot sharing has major benefits:

  • Just calculate the highest degree of parallelism ( parallelism) in the Job task slot. As long as this is satisfied, other jobs can also be satisfied.

  • Resource distribution is more equitable. If there is more free time, slotmore tasks can be assigned to it. If there is no task slot sharing in the figure, the Source/Map with low load subtaskwill occupy a lot of resources, while the window with high load subtaskwill lack resources.

  • With task-slot sharing, the base parallelism ( base parallelism) can be changed from 2 22 raised to6 66 . Improved utilization of slotted resources. At the same time, it can also ensure thatsubtaskallocationslotscheme assigned by TaskManager is more fair.

insert image description here

Guess you like

Origin blog.csdn.net/be_racle/article/details/132419026