How does Flink's late data update the result of the previous window calculation

Recently, a friend asked me, when Flink uses window calculations, if you set the allowable data to arrive late, at this time another piece of data belonging to the previous window comes, but the calculation result of the previous window has been written to mysql, how to correct it? Instead of writing a piece of data? Today's article will introduce how to use Flink's window function to update the incomplete calculation (inaccurate result)

For late data, there are generally several common processing methods:

 

1, directly discard it (this is also the default method of the window, which means that a late element will not create a new window)

2.Using the flow measurement output method, the late elements can be obtained, and the relevant logical processing can be carried out, or saved.

3.Update the previous window calculation results, which is the approach to be introduced today.

 

Due to the presence of late elements, the calculated window results are inaccurate and incomplete. We can use late elements to update the window results that have already been calculated.

If we require an operator to support recalculation and update of the results that have been issued, we need to save all the previous state after the results are issued for the first time. But obviously we can't keep all the states all the time. We will definitely clear the state at a certain point in time. Once the state is cleared, the result can no longer be recalculated or updated. The late elements can only be discarded or sent to the side output stream.

The window operator API provides methods to explicitly declare that we are waiting for late elements. When using event-time window, we can specify a time period called allowed lateness. If the window operator sets allowed lateness, the window operator will not delete the window and the state in the window when the water mark has not passed the window end time. The window will retain all elements for a period of time (set by allowed lateness).

When a late element arrives within the allowed lateness time, the late element will be processed in real time and sent to the trigger. When the water mark has not passed the window end time + allowed lateness time, the window will be deleted, and all later late elements will be discarded.

 

First look at the specific code implementation:</

Guess you like

Origin blog.csdn.net/xianpanjia4616/article/details/106005985