Flink task scheduling source code analysis 3 (ExecutionGraph construction and submission source code analysis)

ExecutionGraph: JobManager (JobMaster) generates ExecutionGraph based on JobGraph.
ExecutionGraph is a parallelized version of JobGraph. It is the core data structure of the scheduling layer.
After the client generates the JobGraph, it is submitted to the JobManager through submitJob, and the JobManager will
generate the corresponding ExecutionGraph based on the JobGraph.

The main abstract concepts it contains are:
1. ExecutionJobVertex: One-to-one correspondence with JobVertex in JobGraph. Every ExecutionJobVertex has
as many ExecutionVertex as concurrency.
2. ExecutionVertex: Represents one of the concurrent subtasks of ExecutionJobVertex, the input is ExecutionEdge, and the output
is IntermediateResultPartition.
3. IntermediateResult: One-to-one correspondence with IntermediateDataSet in JobGraph. An
IntermediateResult contains multiple IntermediateResultPartitions, the number of which is equal to the concurrency of the operator.
4. IntermediateResultPartition: Represents an output partition of
ExecutionVertex, the producer is ExecutionVertex, and the consumer is several ExecutionEdges.
5. ExecutionEdge: Represents the input of ExecutionVertex, the source is IntermediateResultPartition, and the
target is ExecutionVertex. Both source and target can only be one.
6. Execution: It is an attempt to execute an ExecutionVertex. When a failure occurs or the data needs to be recalculated, the
ExecutionVertex may have multiple ExecutionAttemptIDs. An Execution pass
ExecutionAttemptID to uniquely identify. The task deployment and task status update between JM and TM all use
ExecutionAttemptID to determine the message recipient.

入口:runJob 时在 Dispacher 创建 JobManagerRunner 时,调用 createJobManagerRunner
 -> new JobManagerRunnerImpl -> createJobMasterService -> new JobMaster(){
    
    
 this.schedulerNG = createScheduler(executionDeploymentTracker, jobManagerJobMetricGroup);
 } -> schedulerNGFactory.createInstance -> new DefaultScheduler -> super 
   -> public SchedulerBase(this.executionGraph = createAndRestoreExecutionGraph)
   -> createExecutionGraph ->  ExecutionGraphBuilder.buildGraph
   // 核心逻辑: 将拓扑排序过的 JobGraph 添加到 executionGraph 数据结构中。
   executionGraph.attachJobGraph(sortedTopology)
	public void attachJobGraph(List<JobVertex> topologiallySorted) throws JobException {
    
    

		assertRunningInJobMasterMainThread();

		LOG.debug("Attaching {} topologically sorted vertices to existing job graph with {} " +
				"vertices and {} intermediate results.",
			topologiallySorted.size(),
			tasks.size(),
			intermediateResults.size());

		// TODO ExecutionJobVertex 是执行图的节点
		final ArrayList<ExecutionJobVertex> newExecJobVertices = new ArrayList<>(topologiallySorted.size());
		final long createTimestamp = System.currentTimeMillis();

		// TODO 遍历Job Vertex  执行并行化生成 ExecutioinVertex
		for (JobVertex jobVertex : topologiallySorted) {
    
    

			if (jobVertex.isInputVertex() && !jobVertex.isStoppable()) {
    
    
				this.isStoppable = false;
			}

			// create the execution job vertex and attach it to the graph
			/*TODO 实例化执行图节点,根据每⼀个job vertex,创建对应的 ExecutionVertex*/
			ExecutionJobVertex ejv = new ExecutionJobVertex(
					this,
					jobVertex,
					1,
					maxPriorAttemptsHistoryLength,
					rpcTimeout,
					globalModVersion,
					createTimestamp);

			/*TODO 核心逻辑:将创建的ExecutionJobVertex与前置的IntermediateResult连接起来*/
			ejv.connectToPredecessors(this.intermediateResults);

			ExecutionJobVertex previousTask = this.tasks.putIfAbsent(jobVertex.getID(), ejv);
			if (previousTask != null) {
    
    
				throw new JobException(String.format("Encountered two job vertices with ID %s : previous=[%s] / new=[%s]",
					jobVertex.getID(), ejv, previousTask));
			}

			for (IntermediateResult res : ejv.getProducedDataSets()) {
    
    
				IntermediateResult previousDataSet = this.intermediateResults.putIfAbsent(res.getId(), res);
				if (previousDataSet != null) {
    
    
					throw new JobException(String.format("Encountered two intermediate data set with ID %s : previous=[%s] / new=[%s]",
						res.getId(), res, previousDataSet));
				}
			}
			
			this.verticesInCreationOrder.add(ejv);
			// TODO 节点总数量需要加上当前执行图节点的并⾏度,因为执行图是作业图的并行化版本
			this.numVerticesTotal += ejv.getParallelism();
			/*TODO 将当前执⾏图节点加⼊到图中*/
			newExecJobVertices.add(ejv);
		}

		// the topology assigning should happen before notifying new vertices to failoverStrategy
		executionTopology = DefaultExecutionTopology.fromExecutionGraph(this);

		failoverStrategy.notifyNewVertices(newExecJobVertices);

		partitionReleaseStrategy = partitionReleaseStrategyFactory.createInstance(getSchedulingTopology());
	}
	
-> ejv.connectToPredecessors(this.intermediateResults);
	public void connectToPredecessors(Map<IntermediateDataSetID, IntermediateResult> intermediateDataSets) throws JobException {
    
    

		/* TODO 获取输入的JobEdge列表 */
		List<JobEdge> inputs = jobVertex.getInputs();

		if (LOG.isDebugEnabled()) {
    
    
			LOG.debug(String.format("Connecting ExecutionJobVertex %s (%s) to %d predecessors.", jobVertex.getID(), jobVertex.getName(), inputs.size()));
		}

		// TODO 遍历每条JobEdge
		for (int num = 0; num < inputs.size(); num++) {
    
    
			JobEdge edge = inputs.get(num);

			if (LOG.isDebugEnabled()) {
    
    
				if (edge.getSource() == null) {
    
    
					LOG.debug(String.format("Connecting input %d of vertex %s (%s) to intermediate result referenced via ID %s.",
							num, jobVertex.getID(), jobVertex.getName(), edge.getSourceId()));
				} else {
    
    
					LOG.debug(String.format("Connecting input %d of vertex %s (%s) to intermediate result referenced via predecessor %s (%s).",
							num, jobVertex.getID(), jobVertex.getName(), edge.getSource().getProducer().getID(), edge.getSource().getProducer().getName()));
				}
			}

			// fetch the intermediate result via ID. if it does not exist, then it either has not been created, or the order
			// in which this method is called for the job vertices is not a topological order
			/*TODO 通过 ID获取当前JobEdge的输入所对应的 IntermediateResult*/
			IntermediateResult ires = intermediateDataSets.get(edge.getSourceId());
			if (ires == null) {
    
    
				throw new JobException("Cannot connect this job graph to the previous graph. No previous intermediate result found for ID "
						+ edge.getSourceId());
			}

			/*TODO 将IntermediateResult加入到当前ExecutionJobVertex的输入中*/
			this.inputs.add(ires);

			/*TODO 为 IntermediateResult 注册 consumer,就是当前节点*/
			int consumerIndex = ires.registerConsumer();

			// TODO 根据并行度来设置 ExecutionVertex 由于每⼀个并行度都对应⼀个节点。所以要把每个节点都和前面中间结果相连。
			for (int i = 0; i < parallelism; i++) {
    
    
				ExecutionVertex ev = taskVertices[i];
				/*TODO 将 ExecutionVertex与 IntermediateResult关联起来*/
				ev.connectSource(num, ires, edge, consumerIndex);
			}
		}
	}
 -> connectSource
 	public void connectSource(int inputNumber, IntermediateResult source, JobEdge edge, int consumerNumber) {
    
    

		// TODO 只有forward的方式的情况下,pattern才是 POINTWISE的,否则均为 ALL_TO_ALL
		final DistributionPattern pattern = edge.getDistributionPattern();
		final IntermediateResultPartition[] sourcePartitions = source.getPartitions();

		ExecutionEdge[] edges;

		switch (pattern) {
    
    
			case POINTWISE:
				edges = connectPointwise(sourcePartitions, inputNumber);
				break;

			case ALL_TO_ALL:
				edges = connectAllToAll(sourcePartitions, inputNumber);
				break;

			default:
				throw new RuntimeException("Unrecognized distribution pattern.");

		}

		inputEdges[inputNumber] = edges;

		// add the consumers to the source
		// for now (until the receiver initiated handshake is in place), we need to register the
		// edges as the execution graph
		/*TODO 为IntermediateResultPartition添加consumer,即关联到ExecutionEdge上(之前已经为IntermediateResult添加了consumer)*/
		for (ExecutionEdge ee : edges) {
    
    
			ee.getSource().addConsumer(ee, consumerNumber);
		}
	}
-> connectAllToAll 
	private ExecutionEdge[] connectAllToAll(IntermediateResultPartition[] sourcePartitions, int inputNumber) {
    
    
		ExecutionEdge[] edges = new ExecutionEdge[sourcePartitions.length];

		for (int i = 0; i < sourcePartitions.length; i++) {
    
    
			IntermediateResultPartition irp = sourcePartitions[i];
			edges[i] = new ExecutionEdge(irp, this, inputNumber);
		}

		return edges;
	}
看这个方法之前,需要知道, ExecutionVertex 的 inputEdges 变量,是一个二维数据。它
表示了这个 ExecutionVertex 上每一个 input 所包含的 ExecutionEdge 列表。
即,如果 ExecutionVertex 有两个不同的输入:输入 A 和 B。其中输入 A 的 partition=1,
输 入 B 的 partition=8 , 那 么 这 个 二 维 数 组 inputEdges 如 下 ( 以 irp 代 替IntermediateResultPartition)
[ ExecutionEdge[ A.irp[0]] ]
[ ExecutionEdge[ B.irp[0], B.irp[1], ..., B.irp[7] ]

So far, ExecutionJobGraph has been created.

Guess you like

Origin blog.csdn.net/m0_46449152/article/details/113790254