Flink 之 Data Sink

First Chinese interpretation Sink is:

Sinking; sinking; sink; the sink; the sinking; fall; sit down;

 Therefore, the corresponding bit of the meaning Data sink down data storage (for falling) means;

 

Source data source ----> Compute calculated -----> sink for falling

As shown above, Source is the source of the data, the middle Compute Flink is actually doing something, you can do a series of operations, operating results after the data is calculated to put the Sink somewhere. (Could be MySQL, ElasticSearch, Kafka, Cassandra, etc.).

Here I said that under their current alarm to do this is to calculate the result of Compute Sink alarm directly out (send an alarm message to the nail group, e-mail, SMS, etc.), the sink does not necessarily have to mean data should say stored somewhere to go.

In fact, the official website with the Connector to describe the more appropriate place to go, the Connector can have MySQL, ElasticSearch, Kafka, Cassandra RabbitMQ and so on.

 

Data Source introduces the Flink Data Source which, here also look Flink Data Sink supports which:

 

 

 

Look at the source code, what does?

You can see Kafka, ElasticSearch, Socket, RabbitMQ, JDBC, Cassandra POJO, File, Print, etc. Sink manner.

 

 

From the figure can be seen SinkFunction interfaces invoke method, which has a RichSinkFunction abstract class.

Above that comes with Sink you can see all inherited RichSinkFunction abstract class that implements the methods, so if we own define your own Sink words actually want to follow this routine to do.

Here Take a relatively simple PrintSinkFunction source code under the terms of:

@PublicEvolving
public class PrintSinkFunction<IN> extends RichSinkFunction<IN> {
	private static final long serialVersionUID = 1L;

	private static final boolean STD_OUT = false;
	private static final boolean STD_ERR = true;

	private boolean target;
	private transient PrintStream stream;
	private transient String prefix;

	/**
	 * Instantiates a print sink function that prints to standard out.
	 */
	public PrintSinkFunction() {}

	/**
	 * Instantiates a print sink function that prints to standard out.
	 *
	 * @param stdErr True, if the format should print to standard error instead of standard out.
	 */
	public PrintSinkFunction(boolean stdErr) {
		target = stdErr;
	}

	public void setTargetToStandardOut() {
		target = STD_OUT;
	}

	public void setTargetToStandardErr() {
		target = STD_ERR;
	}

	@Override
	public void open(Configuration parameters) throws Exception {
		super.open(parameters);
		StreamingRuntimeContext context = (StreamingRuntimeContext) getRuntimeContext();
		// get the target stream
		stream = target == STD_OUT ? System.out : System.err;

		// set the prefix if we have a >1 parallelism
		prefix = (context.getNumberOfParallelSubtasks() > 1) ?
				((context.getIndexOfThisSubtask() + 1) + "> ") : null;
	}

	@Override
	public void invoke(IN record) {
		if (prefix != null) {
			stream.println(prefix + record.toString());
		}
		else {
			stream.println(record.toString());
		}
	}

	@Override
	public void close() {
		this.stream = null;
		this.prefix = null;
	}

	@Override
	public String toString() {
		return "Print to " + (target == STD_OUT ? "System.out" : "System.err");
	}
}

  

可以看到它就是实现了 RichSinkFunction 抽象类,然后实现了 invoke 方法,这里 invoke 方法就是把记录打印出来了就是,没做其他的额外操作。

如何使用?

SingleOutputStreamOperator.addSink(new PrintSinkFunction<>();

  

这样就可以了,如果是其他的 Sink Function 的话需要换成对应的。

使用这个 Function 其效果就是打印从 Source 过来的数据,和直接 Source.print() 效果一样。

 

 

 

下篇文章我们将讲解下如何自定义自己的 Sink Function,并使用一个 demo 来教大家,让大家知道这个套路,且能够在自己工作中自定义自己需要的 Sink Function,来完成自己的工作需求。

最后

本文主要讲了下 Flink 的 Data Sink,并介绍了常见的 Data Sink,也看了下源码的 SinkFunction,介绍了一个简单的 Function 使用, 告诉了大家自定义 Sink Function 的套路,下篇文章带大家写个。

 

原创地址为:http://www.54tianzhisheng.cn/2018/10/29/flink-sink/

 

Guess you like

Origin www.cnblogs.com/Allen-rg/p/11593245.html