Flink task scheduling source code analysis 4 (physical execution graph (Task scheduling and execution))

Physical execution graph: After the JobManager schedules the Job according to the ExecutionGraph,
the "graph" formed after the Task is deployed on each TaskManager is not a specific data structure
. The main abstract concepts it contains are:
1. Task: After the Execution is scheduled Start the corresponding Task in the assigned TaskManager. Task wraps the operator with user execution logic.
2. ResultPartition: Represents the data generated by a Task, and corresponds to the IntermediateResultPartition in the ExecutionGraph one-to-one.
3. ResultSubpartition: It is a subpartition of ResultPartition. Each ResultPartition contains multiple
ResultSubpartitions, the number of which is determined by the number of downstream consumer Tasks and DistributionPattern.
4. InputGate: Represents the input package of Task, which corresponds to JobEdge in JobGraph one-to-one. Each InputGate consumes one or more ResultPartitions.
5. InputChannel: Each InputGate will contain more than one InputChannel, which
corresponds to the ExecutionEdge in the ExecutionGraph one-to-one, and is also one-to-one connected to the ResultSubpartition, that is, one InputChannel receives
the output of one ResultSubpartition.

1. Task scheduling

JobMaster.startJobExecution -> resetAndStartScheduler
  -> startScheduling -> startScheduling -> SchedulerBase.startScheduling
  -> startSchedulingInternal -> startScheduling -> PipelinedRegionSchedulingStrategy.startScheduling
  -> DefaultScheduler.allocateSlotsAndDeploy -> waitForAllSlotsAndDeploy -> deployAll
  -> deployOrHandleError -> deployTaskSafe -> deploy -> Execution.deploy
	public void deploy() throws JobException {
    
    
            ...
			// TODO 将 IntermediateResultPartition 转化成 ResultPartition
			// TODO 将 ExecutionEdge 转成 InputChannelDeploymentDescriptor(最终会在执行时转化成InputGate)
			final TaskDeploymentDescriptor deployment = TaskDeploymentDescriptorFactory
				.fromExecutionVertex(vertex, attemptNumber)
				.createDeploymentDescriptor(
					slot.getAllocationId(),
					slot.getPhysicalSlotNumber(),
					taskRestore,
					producedPartitions.values());

			// We run the submission in the future executor so that the serialization of large TDDs does not block
			// the main thread and sync back to the main thread once submission is completed.
			// TODO
			CompletableFuture.supplyAsync(() -> taskManagerGateway.submitTask(deployment, rpcTimeout), executor)
				.thenCompose(Function.identity())
				.whenCompleteAsync(...)
		}

2. Task execution

taskManagerGateway.submitTask -> ... -> TaskExecutor.submitTask(){
    
    Task task = new Task;task.startTaskThread();}
->Task.run() -> doRun(){
    
    
         /*TODO 加载和实例化task的可执行代码*/
	     invokable = loadAndInstantiateInvokable(userCodeClassLoader.asClassLoader(), nameOfInvokableClass, env);
	     /*TODO 执行代码( invokable即为operator对象实例,通过反射创建, 比如 StreamTask里)*/
		 invokable.invoke();
}
->nameOfInvokableClass 在生成 StreamGraph 的时候,就已经确定了,见StreamGraph.addOperator 方法
public <IN, OUT> void addOperator(
	Integer vertexID,
	@Nullable String slotSharingGroup,
	@Nullable String coLocationGroup,
	StreamOperatorFactory<OUT> operatorFactory,
	TypeInformation<IN> inTypeInfo,
	TypeInformation<OUT> outTypeInfo,
	String operatorName) {
    
    
	Class<? extends AbstractInvokable> invokableClass =
	     operatorFactory.isStreamSource() ? SourceStreamTask.class : OneInputStreamTask.class;
	     addOperator(vertexID, slotSharingGroup, coLocationGroup, operatorFactory, inTypeInfo,
	                 outTypeInfo, operatorName, invokableClass);
}
这里的 OneInputStreamTask.class 即为生成的 StreamNode 的 vertexClass。这个值会一直
传递,当 StreamGraph 被转化成 JobGraph 的时候,这个值会被传递到 JobVertex 的
invokableClass。然后当 JobGraph 被转成 ExecutionGraph 的时候,这个值被传入到
ExecutionJobVertex.TaskInformation.invokableClassName 中,一直传到 Task 中

-> StreamTask.invoke() {
    
    
	public final void invoke() throws Exception {
    
    
		try {
    
    
			// TODO 调用前的准备工作
			beforeInvoke();
			
			// let the task do its work
			/*TODO 关键逻辑:运行任务*/
			runMailboxLoop();
			
			// TODO 运行任务之后的清理工作
			afterInvoke();
		}
}
以map算子为例
--> runMailboxLoop(); -> mailboxProcessor.runMailboxLoop() -> runDefaultAction
  -> StreamTask(this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor))
   -> processInput -> StreamOneInputProcessor.processInput -> StreamTaskNetworkInput.emitNext
    -> processElement -> OneInputStreamTask.StreamTaskNetworkOutput.emitRecord -> operator.processElement(record)
     -> StreamMap.processElement()
     	public void processElement(StreamRecord<IN> element) throws Exception {
    
    
		// TODO userFunction.map() 就是用户定义的MapFunction里的map方法
		// TODO 数据经过用户定义的 map 算子,通过采集器往下游发送
		output.collect(element.replace(userFunction.map(element.getValue())));
	}

data transmission

TaskExecutor.submitTask
Task task = new Task
task.startTaskThread();

入口:
sourcestreamtask.LegacySourceFunctionThread.run
headOperator.run

StreamTask
this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor);
processInput - inputProcessor.processInput() input.emitNext(output);
StreamTaskNetworkInput.emitNext
	checkpointedInputGate.pollNext();
	inputGate.pollNext(); SingleInputGate.pollNext getNextBufferOrEvent
	getNextBufferOrEvent waitAndGetNextData
	getChannel


StreamTaskNetworkInput.emitNext.processBufferOrEvent
  setNextBuffer

Guess you like

Origin blog.csdn.net/m0_46449152/article/details/113790616