flink experience

Savepoints and Checkpoints

1. The savepoint is specified when submitting the task, and the savepoint path is manually specified when restoring to restore the stored state value

2. The checkpoint is specified in the program, the time interval is set, and the storage state value is automatically saved

 

Watermark is used to handle out-of-order events, and the correct handling of out-of-order events is usually achieved by using the Watermark mechanism combined with the window .

The Watermark in the data stream is used to indicate that the data whose timestamp is smaller than the Watermark has arrived. Therefore, the execution of the window is also triggered by the Watermark .

Watermark can be understood as a delayed trigger mechanism. We can set the delay time of Watermark t , and each time the system will check the maximum maxEventTime among the data that has arrived , and then determine that all data with eventTime less than maxEventTime - t has arrived, if any The stop time of the window is equal to maxEventTime - t , then this window is triggered to execute .

 

When Flink receives each piece of data, it will generate a Watermark . This Watermark is equal to the maxEventTime- delay time in all the currently arriving data . That is to say, the Watermark is carried by the data. If the stop time of the window is later, the execution of the corresponding window will be triggered. Since Watermark is carried by data, if new data cannot be obtained during operation, the untriggered window will never be triggered .

Flink's Window and window start time

The start time of the window
Take EventTime and Dongba District as examples:
Generally, the window opening time in hours, minutes, and seconds is correct.
For example, in hours, eventTime: 2020-2-15 21:57:40
Window start time: 2020-2-15 21:00:00
Window end time: 2020-2-15 22:00:00
However, when the window is opened according to the day, due to the domestic time zone problem, it may be different from the assumption. The default start time of the window is 8 every day point.
The start time of the window is calculated according to the getWindowStartWithOffset method of the TimeWindow class, the parameter unit is ms, and windowSize is the window length

public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) {
        return timestamp - (timestamp - offset + windowSize) % windowSize;
    }


According to the calculation formula, if you want to make the window roll by one day, from 0:00 to 24:00, you need to use the following method, and set the second parameter offset to 16 hours. If not set, the default window is from 8:00 to 8:00 the next day
. window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(16)))
After setting this way, the window will open from 0:00 to 0:00 Yes, you can take the start and end of the window in the ProcessFunction afterwards

Test code:

 

public static void main(String[] args) {
        // 注意是毫秒为单位
        long windowsize = 86400000L;
        // 注意是毫秒为单位,滚动窗口 offset = 0L
        long offset = 0L;

        SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
        long a1 = 1577808000000L;
        long a2 = 1577822400000L;
        long a3 = 1577836799000L;
        long a4 = 1577836801000L;
        long b5 = 1577876400000L;
        long b6 = 1577890800000L;

        System.out.println(a1 + " -> " + format.format(a1) + "\t所属窗口的起始时间是: " + getWindowStartWithOffset(a1, offset, windowsize) + " -> " + format.format(getWindowStartWithOffset(a1, offset, windowsize)));
        System.out.println(a2 + " -> " + format.format(a2) + "\t所属窗口的起始时间是: " + getWindowStartWithOffset(a2, offset, windowsize) + " -> " + format.format(getWindowStartWithOffset(a2, offset, windowsize)));
        System.out.println(a3 + " -> " + format.format(a3) + "\t所属窗口的起始时间是: " + getWindowStartWithOffset(a3, offset, windowsize) + " -> " + format.format(getWindowStartWithOffset(a3, offset, windowsize)));
        System.out.println(a4 + " -> " + format.format(a4) + "\t所属窗口的起始时间是: " + getWindowStartWithOffset(a4, offset, windowsize) + " -> " + format.format(getWindowStartWithOffset(a4, offset, windowsize)));
        System.out.println(b5 + " -> " + format.format(b5) + "\t所属窗口的起始时间是: " + getWindowStartWithOffset(b5, offset, windowsize) + " -> " + format.format(getWindowStartWithOffset(b5, offset, windowsize)));
        System.out.println(b6 + " -> " + format.format(b6) + "\t所属窗口的起始时间是: " + getWindowStartWithOffset(b6, offset, windowsize) + " -> " + format.format(getWindowStartWithOffset(b6, offset, windowsize)));

    }
    private static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) {
        return timestamp - (timestamp - offset + windowSize) % windowSize;
    }


Test Results:

1577808000000 -> 2020-01-01 00:00:00.000    所属窗口的起始时间是: 1577750400000 -> 2019-12-31 08:00:00.000
1577822400000 -> 2020-01-01 04:00:00.000    所属窗口的起始时间是: 1577750400000 -> 2019-12-31 08:00:00.000
1577836799000 -> 2020-01-01 07:59:59.000    所属窗口的起始时间是: 1577750400000 -> 2019-12-31 08:00:00.000
1577836801000 -> 2020-01-01 08:00:01.000    所属窗口的起始时间是: 1577836800000 -> 2020-01-01 08:00:00.000
1577876400000 -> 2020-01-01 19:00:00.000    所属窗口的起始时间是: 1577836800000 -> 2020-01-01 08:00:00.000
1577890800000 -> 2020-01-01 23:00:00.000    所属窗口的起始时间是: 1577836800000 -> 2020-01-01 08:00:00.000

 

 

Guess you like

Origin blog.csdn.net/qq_35240226/article/details/105122766