Flink job submission StreamGraph construction source code analysis

Table of contents

GraphOverview

1. Graph conversion in stream computing applications

2. Graph conversion for batch processing applications

3.Graph conversion of Table & SQL API

Flow graph

1.StreamNode

2.StreamEdge

3.StreamGraph generates source code analysis

GraphOverview

The corresponding relationships between different Flink APIs and different levels of Graph are shown in the figure below.

1. Graph conversion in stream computing applications

For stream computing applications, the DataStream API call must first be converted into Transformation, and then go through StreamGraph->JobGraph->ExecutionGraph 3-layer conversion (Flink’s built-in data structure), and finally go through Flink’s scheduling execution and start in the Flink cluster. Computing tasks form a physical execution graph , which is the topological relationship between physically executed tasks. However, there is no corresponding Graph data structure in Flink, which is a runtime concept.

2. Graph conversion for batch processing applications

For batch processing applications, first convert the DataSet API to OptimizedPlan, and then convert it to JobGraph. Batch processing and stream computing are unified on JobGraph.

3.Graph conversion of Table & SQL API

Table & SQL API is a high-level API. During development, it does not distinguish whether it is batch processing or stream computing. There is basically no difference in syntax between the two. Currently, the DataStream API can uniformly write batch computing tasks and stream processing tasks. The new Blink Table Planner and the old Flink Table Planner are used in the Table & SQL module. Flink Table Planner will be gradually abandoned in the future and eventually removed from Flink.

In Blink Table Planner, both batch processing and stream computing rely on the stream computing system, so the Graph conversion process is the same whether it is batch processing or stream computing applications. In the old Flink Table Planner, stream computing relied on the DataStream API, and its Graph conversion process was the Graph conversion process of stream computing applications. Batch processing relied on the DataSet API, so its conversion process was the Graph conversion process of batch processing applications.

In the future, the DataSet API will be abandoned and eventually removed from Flink, so its Graph conversion process will no longer exist. The following takes the conversion process of stream computing as an example.

Flow graph

As you can see from the above figure, StreamGraph consists of StreamNode and StreamEdge

1.StreamNode

StreamNode is a node in StreamGraph. It is converted from Transformation and can be understood as a StreamNode representing an operator. Logically speaking, StreamNode has entity and virtual StreamNode in StreamGraph. StreamNode can have multiple inputs and multiple outputs.

2.StreamEdge

StreamEdge is an edge in StreamGraph, used to connect two StreamNodes. A StreamNode can have multiple outgoing and incoming edges. StreamEdge contains bypass output, partitioner, field filter output (the same logic as selecting fields in SQL Select) and other information.

3.StreamGraph generates source code analysis

The entrance to generate StreamGraph is in StreamExecutionEnvironment

  public JobExecutionResult execute(String jobName) throws Exception {
        Preconditions.checkNotNull(jobName, "Streaming Job name should not be null.");

        return execute(getStreamGraph(jobName));
    }

Enter the getStreamGraph() method

 @Internal
    public StreamGraph getStreamGraph(String jobName, boolean clearTransformations) {
        StreamGraph streamGraph = getStreamGraphGenerator().setJobName(jobName).generate();
        if (clearTransformations) {
            this.transformations.clear();
        }
        return streamGraph;
    }

StreamGraph is generated in StreamGraphGenerator, tracing forward from SinkTransformation (output) to SourceTransformation. Construct StreamGraph while traversing during the traversal process

Enter the generate() method

public StreamGraph generate() {
        streamGraph = new StreamGraph(executionConfig, checkpointConfig, savepointRestoreSettings);
   ...
       //实际StreamGraph的生成
        for (Transformation<?> transformation : transformations) {
            transform(transformation);
        }

   ...
        final StreamGraph builtStreamGraph = streamGraph;

   ...

        return builtStreamGraph;
    }

When entering the transform (transformation) method, different translate methods will be entered depending on whether the translator exists.

     if (translator != null) {
            transformedIds = translate(translator, transform);
        } else {
            transformedIds = legacyTransform(transform);
        }

Enter the translate (translator, transform) method, and adjust different methods according to Batch and Streaming.

 return shouldExecuteInBatchMode
        ? translator.translateForBatch(transform, context)
        : translator.translateForStreaming(transform, context);

Enter translateForStreaming(transform,context), and enter different translateForStreaming(transform,context) methods according to different Transformations.

Here we analyze OneInputTransformation.java, and finally call translateInternal() in AbstractOneInputTransformationTranslator to build StreamGraph

First add the operator to StreamGraph

streamGraph.addOperator(
                transformationId,
                slotSharingGroup,
                transformation.getCoLocationGroupKey(),
                operatorFactory,
                inputType,
                transformation.getOutputType(),
                transformation.getName());

SetStateKeySelector

if (stateKeySelector != null) {
            TypeSerializer<?> keySerializer = stateKeyType.createSerializer(executionConfig);
            streamGraph.setOneInputStateKey(transformationId, stateKeySelector, keySerializer);
        }

Set parallelism and maximum parallelism

 streamGraph.setParallelism(transformationId, parallelism);
        streamGraph.setMaxParallelism(transformationId, transformation.getMaxParallelism());

Construct the edge of StreamEdge and associate the upstream and downstream StreamNode

for (Integer inputId : context.getStreamNodeIds(parentTransformations.get(0))) {
            streamGraph.addEdge(inputId, transformationId, 0);
        }

In the translateInternal() method in PartitionTransformationTranslator.java

 private Collection<Integer> translateInternal(
            final PartitionTransformation<OUT> transformation, final Context context) {
      ...

        final StreamGraph streamGraph = context.getStreamGraph();


      ...
        List<Integer> resultIds = new ArrayList<>();

        for (Integer inputId : context.getStreamNodeIds(input)) {
            final int virtualId = Transformation.getNewNodeId();
            //todo 添加一个虚拟分区节点，不会生成StreamNode
            streamGraph.addVirtualPartitionNode(
                    inputId,
                    virtualId,
                    transformation.getPartitioner(),
                    transformation.getShuffleMode());
            resultIds.add(virtualId);
        }
        return resultIds;
    }

As you can see from the code above, the capture and replacement of PartitionTransformation does not generate specific StreamNode and StreamEdge, but adds a virtual node through the streamGraph.addVirtualPartitionNode() method. When the downstream Tramsformation of the data partition adds StreamEdge (calling streamGraph.addEdge()), the Partitioner partitioner will be encapsulated into StreamEdge, as shown in the following code

 private void addEdgeInternal(
            Integer upStreamVertexID,
            Integer downStreamVertexID,
            int typeNumber,
            StreamPartitioner<?> partitioner,
            List<String> outputNames,
            OutputTag outputTag,
            ShuffleMode shuffleMode) {
        //todo 如果上游是sideOutput时，递归调用，并传入sideOutput 信息
        if (virtualSideOutputNodes.containsKey(upStreamVertexID)) {
            int virtualId = upStreamVertexID;
            upStreamVertexID = virtualSideOutputNodes.get(virtualId).f0;
            if (outputTag == null) {
                outputTag = virtualSideOutputNodes.get(virtualId).f1;
            }
            addEdgeInternal(
                    upStreamVertexID,
                    downStreamVertexID,
                    typeNumber,
                    partitioner,
                    null,
                    outputTag,
                    shuffleMode);
         //todo 如果上游是Partition时，递归调用，并传入Partition信息
        } else if (virtualPartitionNodes.containsKey(upStreamVertexID)) {
            int virtualId = upStreamVertexID;
            upStreamVertexID = virtualPartitionNodes.get(virtualId).f0;
            if (partitioner == null) {
                partitioner = virtualPartitionNodes.get(virtualId).f1;
            }
            shuffleMode = virtualPartitionNodes.get(virtualId).f2;
            addEdgeInternal(
                    upStreamVertexID,
                    downStreamVertexID,
                    typeNumber,
                    partitioner,
                    outputNames,
                    outputTag,
                    shuffleMode);
        } else {
            //todo 不是以上逻辑，构建真正的StreamEdge
            StreamNode upstreamNode = getStreamNode(upStreamVertexID);
            StreamNode downstreamNode = getStreamNode(downStreamVertexID);
            //todo 没有指定Partition时，会为其选择forwa或者rebalance分区
            if (partitioner == null
                    && upstreamNode.getParallelism() == downstreamNode.getParallelism()) {
                partitioner = new ForwardPartitioner<Object>();
            } else if (partitioner == null) {
                partitioner = new RebalancePartitioner<Object>();
            }

          ...

            if (shuffleMode == null) {
                shuffleMode = ShuffleMode.UNDEFINED;
            }

            //todo  创建StreamEdge,并将该StreamEdge
            StreamEdge edge =
                    new StreamEdge(
                            upstreamNode,
                            downstreamNode,
                            typeNumber,
                            partitioner,
                            outputTag,
                            shuffleMode);
            //todo 创建StreamEdge,并将StreamEdge添加到上游的输出、下游的输入
            getStreamNode(edge.getSourceId()).addOutEdge(edge);
            getStreamNode(edge.getTargetId()).addInEdge(edge);
        }
    }

Follow the public account and learn more about Flink