flink spilt和side output分流原理分析

split和side-output分流源码分析

在日常开发中,我们常常需要对数据流进行拆分处理,flink提供了split/side output方式进行分流(filter分流方式使用场景有限,不作讨论)。首先,同一个流上不能同时使用split和side-output进行分流。否则,会抛出如下异常:

throw new UnsupportedOperationException("getSideOutput() and split() may not be called on the same DataStream. " +
				"As a work-around, please add a no-op map function before the split() call.");

split分流源码分析

	@Override
	public SplitStream<T> split(OutputSelector<T> outputSelector) {
    
    
		if (requestedSideOutputs.isEmpty()) {
    
    
			wasSplitApplied = true;
			return super.split(outputSelector);
		} else {
    
    
			throw new UnsupportedOperationException("getSideOutput() and split() may not be called on the same DataStream. " +
				"As a work-around, please add a no-op map function before the split() call.");
		}
	}

当在SingleOutputStreamOperator上调用split方法后,会首先对requestedSideOutputs进行检查,requestedSideOutputs是一个Map<OutputTag<?>, TypeInformation>集合,如果此流没有进行过side-output分流,则将wasSplitApplied = true;然后调用SingleOutputStreamOperator父类DataStream上的split方法,如下:

public SplitStream<T> split(OutputSelector<T> outputSelector) {
    
    
		return new SplitStream<>(this, clean(outputSelector));
	}

返回一个SplitStream,SplitStream继承了DataStream,SplitStream通过selectOutput方法将SplitStream再次转化成DataStream流。

  • 注意:split不能连续分流,经过split分流后的DataStream不能再继续进行split分流。否则会抛如下异常:
Exception in thread "main" java.lang.IllegalStateException: Consecutive multiple splits are not supported. Splits are deprecated. Please use side-outputs.

side-outputs分流源码分析(推荐使用)

如果想对流进行多次拆分,则必须使用side-output的方式进行分流,SingleOutputStreamOperator类提供了getSideOutput方法对流进行拆分,源码如下:

public <X> DataStream<X> getSideOutput(OutputTag<X> sideOutputTag) {
    
    
		if (wasSplitApplied) {
    
    
			throw new UnsupportedOperationException("getSideOutput() and split() may not be called on the same DataStream. " +
				"As a work-around, please add a no-op map function before the split() call.");
		}

		sideOutputTag = clean(requireNonNull(sideOutputTag));

		// make a defensive copy
		sideOutputTag = new OutputTag<X>(sideOutputTag.getId(), sideOutputTag.getTypeInfo());

		TypeInformation<?> type = requestedSideOutputs.get(sideOutputTag);
		if (type != null && !type.equals(sideOutputTag.getTypeInfo())) {
    
    
			throw new UnsupportedOperationException("A side output with a matching id was " +
					"already requested with a different type. This is not allowed, side output " +
					"ids need to be unique.");
		}

		requestedSideOutputs.put(sideOutputTag, sideOutputTag.getTypeInfo());

		SideOutputTransformation<X> sideOutputTransformation = new SideOutputTransformation<>(this.getTransformation(), sideOutputTag);
		return new DataStream<>(this.getExecutionEnvironment(), sideOutputTransformation);
	}

该方法需要传入一个OutputTag,然后检查是否已经执行过split方法对流进行拆分,同时检查sideOutputTag的全局唯一性,sideOutputTag的id必须唯一,然后返回一个新的DataStream。

猜你喜欢

转载自blog.csdn.net/weixin_41197407/article/details/114022197