Flink Window API

Window (window) is a way to cut an infinite flow into a finite flow, which distributes flow data into buckets of finite size for analysis

window type

  1. Time Window
  • rolling time window
  • sliding time window
  • session window

2. Count Window

  • rolling count window
  • sliding count window
Tumbling Windows
  • Segment the data according to a fixed window length
  • Time aligned, window length fixed, no overlap

rolling window

Sliding Windows
  • A sliding window is a more generalized form of a fixed window, which consists of a fixed window length and a sliding interval
  • The window length is fixed and can overlap
    sliding window
Session Windows
  • It consists of a series of events combined with a timeout gap of a specified length of time, that is, a new window will be generated if no new data is received for a period of time
  • Features: no time alignment

session window

API

  • Window allocator - window() method
  • We can use .window() to define a window, and then do some aggregation or other processing operations based on this window. Note that the window () method must be used after keyBy
  • Flink provides simpler .timeWindow and .countWindow methods for defining time windows and count windows.

window api

create window
  • Tumbling time window

rolling time window

  • sliding time window

sliding time window

  • session window

session window

  • Tumbling count window

rolling count window

  • sliding count window
    sliding count window
window function
  • Incremental aggregation functions (incremental aggregation functions) calculate each piece of data when it arrives, and maintain a simple state ReduceFunction, AggregateFunction
  • Full window functions (full window functions) first collect all the data of the window, and then traverse all the data when calculating ProcessWindowFunction, WindowFunction
  • trigger() trigger. Define when the window is closed, trigger the calculation and output the result
  • .evictor() remover
  • Define logic to remove certain data. .allowedLateness() allows processing late data
  • .sideOutputLateData() puts late data into the side output stream
  • .getSideOutput() Gets the side output stream

Guess you like

Origin blog.csdn.net/wolfjson/article/details/118359060