Flink(七) —— 窗口

窗口概念

Windows are at the heart of processing infinite streams(无界流). Windows split the stream into “buckets” of finite (有限的)size, over which we can apply computations. This document focuses on how windowing is performed in Flink and how the programmer can benefit to the maximum from its offered functionality.

窗口:将无限流切割为有限流,将流数据分发到有限大小的桶(bucket)中进行分析。

窗口的类型:

  • 时间窗口
    • 滚动时间窗口
    • 滑动时间窗口
    • 会话窗口
  • 计数窗口
    • 滚动计数窗口
    • 滑动计数窗口

滚动窗口(Tumbling Windows)

A tumbling windows assigner assigns each element to a window of a specified window size. Tumbling windows have a fixed size and do not overlap. For example, if you specify a tumbling window with a size of 5 minutes, the current window will be evaluated and a new window will be started every five minutes as illustrated by the following figure.

滑动窗口(Sliding Windows)

The sliding windows assigner assigns elements to windows of fixed length. Similar to a tumbling windows assigner, the size of the windows is configured by the window size parameter. An additional window slide parameter controls how frequently a sliding window is started. Hence, sliding windows can be overlapping if the slide is smaller than the window size. In this case elements are assigned to multiple windows.

For example, you could have windows of size 10 minutes that slides by 5 minutes. With this you get every 5 minutes a window that contains the events that arrived during the last 10 minutes as depicted by the following figure.

会话窗口(Session Windows)

The session windows assigner groups elements by sessions of activity. Session windows do not overlap and do not have a fixed start and end time, in contrast to tumbling windows and sliding windows. Instead a session window closes when it does not receive elements for a certain period of time, i.e., when a gap of inactivity occurred. A session window assigner can be configured with either a static session gap or with a session gap extractor function which defines how long the period of inactivity is. When this period expires, the current session closes and subsequent elements are assigned to a new session window.

窗口函数

  • 增量聚合函数
  • 全窗口函数

时间语义

由于网络、分布式等原因,会导致乱序数据的产生。

乱序数据会让窗口计算不准确。

水位线(Watermark)

参考文档

Flink官方文档 —— Event Time
Flink官方文档 —— Windows
Flink官方文档 —— Generating Timestamps / Watermarks

猜你喜欢

转载自www.cnblogs.com/fonxian/p/12391962.html