About sink
The following picture is from Flink official. The red box is the sink. It can be seen that the real-time data starts at the Source and ends at the sink after completing the business logic in the Transformation stage. Therefore, the sink can be used to process the calculation results, such as console output or save the database:
A series of articles about "Flink's sink combat"
This article is the first part of "Flink's sink combat", which aims to get a preliminary understanding of sink. Through analysis and research on the basic API and addSink method, it will lay a solid foundation for subsequent coding combat;
Start with a sample code
- The following is a simple flink application code, the print method in the red box is the sink operation:
- The following figure is the official sink method, which is the API of the DataStream class. Direct call can be used to realize the sink. The print in the code just now is one of them:
- Next, look at the source code of the API in the above figure. First look at the print method. In DataStream.java, as follows, the addSink method is actually called, and the input parameter is PrintSinkFunction:
- Another commonly used API is writeAsText, the source code is as follows, the writeUsingOutputFormat method is called:
- Tracing writeUsingOutputFormat found that addSink was also called, and the input parameter was OutputFormatSinkFunction :
- AddSink is called behind print and writeAsText, so what about another commonly used writeAsCsv method? Is it possible to call addSink? Opened it and sure enough, and writeAsText as called writeUsingOutputFormat , and the method which is calling addSink:
- In summary, the key to the data sink is the input parameter of addSink , that is , the implementation of the SinkFunction interface. Through the class diagram, you can intuitively see how common sink capabilities are implemented:
- From the above figure, we can see that the abstract class RichSinkFunction is closely related to various sink capabilities. We should focus on it and display the method signature on the class diagram, as shown below:
- As shown in the above figure, RichSinkFunction itself has no content, but it implements SinkFunction and inherits AbstractRichFunction , which is a combination of RichFunction and SinkFunction .
- The characteristics of RichFunction have been understood in the previous "Flink DataSource Trilogy" , which is the open and close of resources;
- What are the characteristics of SinkFunction ? Obviously it is used to process the calculation results. The class diagram shows two invoke methods. Take a look at the official PrintSinkFunction.java :
- The source code of writer.write (record) is in PrintSinkOutputWriter.java, as shown below:
summary
So far, we have a basic understanding of Flink's sink:
- Responsible for the processing of real-time calculation results (such as output or persistence);
- The main implementation method is to call the DataStream.addSink method;
- The main way to realize various sink capabilities is to implement the interface defined by the input parameters of the addSink method;
In the following chapters, let's conduct the actual coding of the sink together. The direction of the actual combat: experience the officially provided sink capabilities, and implement custom sink capabilities;