Flink 任务调度源码分析4 （物理执行图（Task 的调度和执行））

物理执行图：JobManager 根据 ExecutionGraph 对 Job 进行调度后，在各个TaskManager 上部署
Task 后形成的“图”，并不是一个具体的数据结构
它包含的主要抽象概念有：
1、Task：Execution被调度后在分配的 TaskManager 中启动对应的 Task。Task 包裹了具有用户执行逻辑的 operator。
2、ResultPartition：代表由一个Task的生成的数据，和ExecutionGraph中的IntermediateResultPartition一一对应。
3、ResultSubpartition：是ResultPartition的一个子分区。每个ResultPartition包含多个
ResultSubpartition，其数目要由下游消费 Task 数和 DistributionPattern 来决定。
4、InputGate：代表Task的输入封装，和JobGraph中JobEdge一一对应。每个InputGate消费了一个或多个的ResultPartition。
5、InputChannel：每个InputGate会包含一个以上的InputChannel，和ExecutionGraph中的
ExecutionEdge一一对应，也和ResultSubpartition一对一地相连，即一个InputChannel接收一个
ResultSubpartition的输出。

1、Task 调度

JobMaster.startJobExecution -> resetAndStartScheduler
  -> startScheduling -> startScheduling -> SchedulerBase.startScheduling
  -> startSchedulingInternal -> startScheduling -> PipelinedRegionSchedulingStrategy.startScheduling
  -> DefaultScheduler.allocateSlotsAndDeploy -> waitForAllSlotsAndDeploy -> deployAll
  -> deployOrHandleError -> deployTaskSafe -> deploy -> Execution.deploy
	public void deploy() throws JobException {
    
    
            ...
			// TODO 将 IntermediateResultPartition 转化成 ResultPartition
			// TODO 将 ExecutionEdge 转成 InputChannelDeploymentDescriptor（最终会在执行时转化成InputGate）
			final TaskDeploymentDescriptor deployment = TaskDeploymentDescriptorFactory
				.fromExecutionVertex(vertex, attemptNumber)
				.createDeploymentDescriptor(
					slot.getAllocationId(),
					slot.getPhysicalSlotNumber(),
					taskRestore,
					producedPartitions.values());

			// We run the submission in the future executor so that the serialization of large TDDs does not block
			// the main thread and sync back to the main thread once submission is completed.
			// TODO
			CompletableFuture.supplyAsync(() -> taskManagerGateway.submitTask(deployment, rpcTimeout), executor)
				.thenCompose(Function.identity())
				.whenCompleteAsync(...)
		}

2、 Task 执行

taskManagerGateway.submitTask -> ... -> TaskExecutor.submitTask(){
    
    Task task = new Task;task.startTaskThread();}
->Task.run() -> doRun(){
    
    
         /*TODO 加载和实例化task的可执行代码*/
	     invokable = loadAndInstantiateInvokable(userCodeClassLoader.asClassLoader(), nameOfInvokableClass, env);
	     /*TODO 执行代码（ invokable即为operator对象实例,通过反射创建, 比如 StreamTask里）*/
		 invokable.invoke();
}
->nameOfInvokableClass 在生成 StreamGraph 的时候，就已经确定了,见StreamGraph.addOperator 方法
public <IN, OUT> void addOperator(
	Integer vertexID,
	@Nullable String slotSharingGroup,
	@Nullable String coLocationGroup,
	StreamOperatorFactory<OUT> operatorFactory,
	TypeInformation<IN> inTypeInfo,
	TypeInformation<OUT> outTypeInfo,
	String operatorName) {
    
    
	Class<? extends AbstractInvokable> invokableClass =
	     operatorFactory.isStreamSource() ? SourceStreamTask.class : OneInputStreamTask.class;
	     addOperator(vertexID, slotSharingGroup, coLocationGroup, operatorFactory, inTypeInfo,
	                 outTypeInfo, operatorName, invokableClass);
}
这里的 OneInputStreamTask.class 即为生成的 StreamNode 的 vertexClass。这个值会一直
传递，当 StreamGraph 被转化成 JobGraph 的时候，这个值会被传递到 JobVertex 的
invokableClass。然后当 JobGraph 被转成 ExecutionGraph 的时候，这个值被传入到
ExecutionJobVertex.TaskInformation.invokableClassName 中，一直传到 Task 中

-> StreamTask.invoke() {
    
    
	public final void invoke() throws Exception {
    
    
		try {
    
    
			// TODO 调用前的准备工作
			beforeInvoke();
			
			// let the task do its work
			/*TODO 关键逻辑：运行任务*/
			runMailboxLoop();
			
			// TODO 运行任务之后的清理工作
			afterInvoke();
		}
}
以map算子为例
--> runMailboxLoop(); -> mailboxProcessor.runMailboxLoop() -> runDefaultAction
  -> StreamTask(this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor))
   -> processInput -> StreamOneInputProcessor.processInput -> StreamTaskNetworkInput.emitNext
    -> processElement -> OneInputStreamTask.StreamTaskNetworkOutput.emitRecord -> operator.processElement(record)
     -> StreamMap.processElement()
     	public void processElement(StreamRecord<IN> element) throws Exception {
    
    
		// TODO userFunction.map() 就是用户定义的MapFunction里的map方法
		// TODO 数据经过用户定义的 map 算子，通过采集器往下游发送
		output.collect(element.replace(userFunction.map(element.getValue())));
	}

数据传输

TaskExecutor.submitTask
Task task = new Task
task.startTaskThread();

入口：
sourcestreamtask.LegacySourceFunctionThread.run
headOperator.run

StreamTask
this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor);
processInput - inputProcessor.processInput() input.emitNext(output);
StreamTaskNetworkInput.emitNext
	checkpointedInputGate.pollNext();
	inputGate.pollNext(); SingleInputGate.pollNext getNextBufferOrEvent
	getNextBufferOrEvent waitAndGetNextData
	getChannel


StreamTaskNetworkInput.emitNext.processBufferOrEvent
  setNextBuffer