One of Flink's sink combat: a preliminary exploration

About sink

The following picture is from Flink official. The red box is the sink. It can be seen that the real-time data starts at the Source and ends at the sink after completing the business logic in the Transformation stage. Therefore, the sink can be used to process the calculation results, such as console output or save the database:
Insert picture description here

A series of articles about "Flink's sink combat"

This article is the first part of "Flink's sink combat", which aims to get a preliminary understanding of sink. Through analysis and research on the basic API and addSink method, it will lay a solid foundation for subsequent coding combat;

Start with a sample code

  1. The following is a simple flink application code, the print method in the red box is the sink operation:
    Insert picture description here
  2. The following figure is the official sink method, which is the API of the DataStream class. Direct call can be used to realize the sink. The print in the code just now is one of them:
    Insert picture description here
  3. Next, look at the source code of the API in the above figure. First look at the print method. In DataStream.java, as follows, the addSink method is actually called, and the input parameter is PrintSinkFunction:
    Insert picture description here
  4. Another commonly used API is writeAsText, the source code is as follows, the writeUsingOutputFormat method is called:
    Insert picture description here
  5. Tracing writeUsingOutputFormat found that addSink was also called, and the input parameter was OutputFormatSinkFunction :
    Insert picture description here
  6. AddSink is called behind print and writeAsText, so what about another commonly used writeAsCsv method? Is it possible to call addSink? Opened it and sure enough, and writeAsText as called writeUsingOutputFormat , and the method which is calling addSink:
    Insert picture description here
  7. In summary, the key to the data sink is the input parameter of addSink , that is , the implementation of the SinkFunction interface. Through the class diagram, you can intuitively see how common sink capabilities are implemented:
    Insert picture description here
  8. From the above figure, we can see that the abstract class RichSinkFunction is closely related to various sink capabilities. We should focus on it and display the method signature on the class diagram, as shown below:
    Insert picture description here
  9. As shown in the above figure, RichSinkFunction itself has no content, but it implements SinkFunction and inherits AbstractRichFunction , which is a combination of RichFunction and SinkFunction .
  10. The characteristics of RichFunction have been understood in the previous "Flink DataSource Trilogy" , which is the open and close of resources;
  11. What are the characteristics of SinkFunction ? Obviously it is used to process the calculation results. The class diagram shows two invoke methods. Take a look at the official PrintSinkFunction.java :
    Insert picture description here
  12. The source code of writer.write (record) is in PrintSinkOutputWriter.java, as shown below:
    Insert picture description here

summary

So far, we have a basic understanding of Flink's sink:

  1. Responsible for the processing of real-time calculation results (such as output or persistence);
  2. The main implementation method is to call the DataStream.addSink method;
  3. The main way to realize various sink capabilities is to implement the interface defined by the input parameters of the addSink method;

In the following chapters, let's conduct the actual coding of the sink together. The direction of the actual combat: experience the officially provided sink capabilities, and implement custom sink capabilities;

Welcome to pay attention to my public number: programmer Xinchen

Insert picture description here

Published 376 original articles · praised 986 · 1.28 million views

Guess you like

Origin blog.csdn.net/boling_cavalry/article/details/105597628