Physical execution graph: After the JobManager schedules the Job according to the ExecutionGraph,
the "graph" formed after the Task is deployed on each TaskManager is not a specific data structure
. The main abstract concepts it contains are:
1. Task: After the Execution is scheduled Start the corresponding Task in the assigned TaskManager. Task wraps the operator with user execution logic.
2. ResultPartition: Represents the data generated by a Task, and corresponds to the IntermediateResultPartition in the ExecutionGraph one-to-one.
3. ResultSubpartition: It is a subpartition of ResultPartition. Each ResultPartition contains multiple
ResultSubpartitions, the number of which is determined by the number of downstream consumer Tasks and DistributionPattern.
4. InputGate: Represents the input package of Task, which corresponds to JobEdge in JobGraph one-to-one. Each InputGate consumes one or more ResultPartitions.
5. InputChannel: Each InputGate will contain more than one InputChannel, which
corresponds to the ExecutionEdge in the ExecutionGraph one-to-one, and is also one-to-one connected to the ResultSubpartition, that is, one InputChannel receives
the output of one ResultSubpartition.
1. Task scheduling
JobMaster.startJobExecution -> resetAndStartScheduler
-> startScheduling -> startScheduling -> SchedulerBase.startScheduling
-> startSchedulingInternal -> startScheduling -> PipelinedRegionSchedulingStrategy.startScheduling
-> DefaultScheduler.allocateSlotsAndDeploy -> waitForAllSlotsAndDeploy -> deployAll
-> deployOrHandleError -> deployTaskSafe -> deploy -> Execution.deploy
public void deploy() throws JobException {
...
// TODO 将 IntermediateResultPartition 转化成 ResultPartition
// TODO 将 ExecutionEdge 转成 InputChannelDeploymentDescriptor(最终会在执行时转化成InputGate)
final TaskDeploymentDescriptor deployment = TaskDeploymentDescriptorFactory
.fromExecutionVertex(vertex, attemptNumber)
.createDeploymentDescriptor(
slot.getAllocationId(),
slot.getPhysicalSlotNumber(),
taskRestore,
producedPartitions.values());
// We run the submission in the future executor so that the serialization of large TDDs does not block
// the main thread and sync back to the main thread once submission is completed.
// TODO
CompletableFuture.supplyAsync(() -> taskManagerGateway.submitTask(deployment, rpcTimeout), executor)
.thenCompose(Function.identity())
.whenCompleteAsync(...)
}
2. Task execution
taskManagerGateway.submitTask -> ... -> TaskExecutor.submitTask(){
Task task = new Task;task.startTaskThread();}
->Task.run() -> doRun(){
/*TODO 加载和实例化task的可执行代码*/
invokable = loadAndInstantiateInvokable(userCodeClassLoader.asClassLoader(), nameOfInvokableClass, env);
/*TODO 执行代码( invokable即为operator对象实例,通过反射创建, 比如 StreamTask里)*/
invokable.invoke();
}
->nameOfInvokableClass 在生成 StreamGraph 的时候,就已经确定了,见StreamGraph.addOperator 方法
public <IN, OUT> void addOperator(
Integer vertexID,
@Nullable String slotSharingGroup,
@Nullable String coLocationGroup,
StreamOperatorFactory<OUT> operatorFactory,
TypeInformation<IN> inTypeInfo,
TypeInformation<OUT> outTypeInfo,
String operatorName) {
Class<? extends AbstractInvokable> invokableClass =
operatorFactory.isStreamSource() ? SourceStreamTask.class : OneInputStreamTask.class;
addOperator(vertexID, slotSharingGroup, coLocationGroup, operatorFactory, inTypeInfo,
outTypeInfo, operatorName, invokableClass);
}
这里的 OneInputStreamTask.class 即为生成的 StreamNode 的 vertexClass。这个值会一直
传递,当 StreamGraph 被转化成 JobGraph 的时候,这个值会被传递到 JobVertex 的
invokableClass。然后当 JobGraph 被转成 ExecutionGraph 的时候,这个值被传入到
ExecutionJobVertex.TaskInformation.invokableClassName 中,一直传到 Task 中
-> StreamTask.invoke() {
public final void invoke() throws Exception {
try {
// TODO 调用前的准备工作
beforeInvoke();
// let the task do its work
/*TODO 关键逻辑:运行任务*/
runMailboxLoop();
// TODO 运行任务之后的清理工作
afterInvoke();
}
}
以map算子为例
--> runMailboxLoop(); -> mailboxProcessor.runMailboxLoop() -> runDefaultAction
-> StreamTask(this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor))
-> processInput -> StreamOneInputProcessor.processInput -> StreamTaskNetworkInput.emitNext
-> processElement -> OneInputStreamTask.StreamTaskNetworkOutput.emitRecord -> operator.processElement(record)
-> StreamMap.processElement()
public void processElement(StreamRecord<IN> element) throws Exception {
// TODO userFunction.map() 就是用户定义的MapFunction里的map方法
// TODO 数据经过用户定义的 map 算子,通过采集器往下游发送
output.collect(element.replace(userFunction.map(element.getValue())));
}
data transmission
TaskExecutor.submitTask
Task task = new Task
task.startTaskThread();
入口:
sourcestreamtask.LegacySourceFunctionThread.run
headOperator.run
StreamTask
this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor);
processInput - inputProcessor.processInput() input.emitNext(output);
StreamTaskNetworkInput.emitNext
checkpointedInputGate.pollNext();
inputGate.pollNext(); SingleInputGate.pollNext getNextBufferOrEvent
getNextBufferOrEvent waitAndGetNextData
getChannel
StreamTaskNetworkInput.emitNext.processBufferOrEvent
setNextBuffer