Window Mechanism in Stream Computing

Window Mechanism in Stream Computing

what is window

In stream computing, the data flow is continuous, so it is impossible to wait for all the data to arrive before starting processing. The function of Window is to split the infinite Streaming into batches of limited size, and we can apply and calculate the data in each Window.

Basic functions of a typical Window

This article covers rolling windows, sliding windows, and session windows

Tumble Window

Rolling window features:

  • The windows do not overlap, each data can only belong to one window

  • fixed window length

  • When the time is greater than or equal to Window end, trigger the output of the corresponding Window at one time

Sliding Window

Rolling window features:

  • The sliding window continuously slides forward with a step size, and the length of the window is fixed.

  • windows may overlap

  • When the window length is greater than the step size of the sliding window, the data may belong to multiple windows

  • When the window length is less than the step size of the sliding window, the data may not belong to any window

  • When the time is greater than or equal to Window end, trigger the output of the corresponding Window at one time

Session Window

  • Session gap refers to the interval between sessions. Generally, the maximum gap of a session is set, such as 1 minute. When the session gap is greater than 1 minute, the data will be divided into different sessions.

  • The window length varies

  • When the time is greater than or equal to Window end, trigger the output of the corresponding Window at one time

Handling of late data

  • Definition of being late: After watermark drives a certain window to trigger output , if data comes later in this window, then this situation is considered late data.

  • Solution:

  1. discard directly (default)

  1. Set an allowable late time. In this case, the data will not be cleared immediately after the normal calculation time of the window ends, but an additional "late time" will be reserved. If data arrives within this period, the calculation will continue

  1. Turn late arriving data into a separate stream, and let the user decide what to do with it (side output stream)

Incremental calculation and full calculation

  • Incremental calculation: After each piece of data arrives, it directly participates in the calculation, but does not output the result for the time being

  • Full calculation: After each piece of data arrives, put it into a buffer first, and this buffer will be stored in the state, and all the data will be taken out for unified calculation until the window triggers the output

EMIT trigger

  • Background: A normal window will only output at the end of the window. For example, the window time is one day, and the result will be output only at the end of the day. At this time, the meaning of real-time calculation will be lost.

  • 作用:EMIT触发是一种可以提前把窗口内容输出的机制,比如窗口时间为一天的窗口,设置其5s输出一次,使下游更快的获得到窗口计算的结果。

Guess you like

Origin blog.csdn.net/m0_51561690/article/details/128546381