Introduction and Description of Windows for Flink Streaming Computing

1 Introduction

Streaming streaming computing is a data processing engine designed to process infinite data sets, and infinite data sets refer to a growing and essentially infinite data set, while window is a method of cutting infinite data into finite blocks. means of handling.
Window can be divided into two categories:
 CountWindow: Generate a Window according to the specified number of data items, regardless of time.
 TimeWindow: Generate Window according to time.
For TimeWindow, it can be divided into three categories according to the principle of window implementation: Tumbling Window, Sliding Window and Session Window.
Tumbling Windows
slices the data according to a fixed window length.
Features: Time alignment, fixed window length, no overlap .
insert image description here
Sliding window (Sliding Windows)
Sliding window is a more generalized form of fixed window. The sliding window consists of a fixed window length and a sliding interval.
Features: Time alignment, fixed window length, and overlap is possible .
insert image description here
A session window (Session Windows)
is composed of a series of events combined with a timeout gap of a specified length of time, similar to the session of a web application, that is, a new window will be generated if no new data is received for a period of time.
Features: no time alignment
insert image description here

2 time windows

TimeWindow is to combine all the data within the specified time range into a window, and calculate all the data in a window at a time.

2.1 Rolling time window

Flink's default time window is divided into windows based on Processing Time, and the data obtained by Flink is divided into different windows according to the time when it enters Flink.

2.2 demo

import com.chen.flink.part01.SensorReading
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time

object TimeWindowdemo {
    
    
  def main(args: Array[String]): Unit = {
    
    

    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    val dataStream: DataStream[String] = env.socketTextStream("192.168.199.101", 7777)
    val sensorStream: DataStream[SensorReading] = dataStream.map(
      data => {
    
    
        val strings = data.split(",")
        SensorReading(strings(0), strings(1).toLong, strings(2).toDouble)
      }
    )

    val resultStream: DataStream[(String, Double)] = sensorStream.map(x => (x.id, x.temperature)).keyBy(0).timeWindow(Time.seconds(15))
      .reduce((r1, r2) => (r1._1, r1._2.min(r2._2)))

    resultStream.print().setParallelism(1)
    env.execute("timewindow")
  }
}

The output results will be grouped according to id within a 15-second window and the minimum temperature value will be obtained

insert image description here

2.3 Sliding window

The function names of the sliding window and the rolling window are exactly the same, but two parameters need to be passed in when passing parameters, one is window_size and the other is sliding_size.

//滑动窗口
val resultStream: DataStream[(String, Double)] = sensorStream.map(x => (x.id, x.temperature)).keyBy(0)
  .timeWindow(Time.seconds(15), Time.seconds(5))
  .reduce((r1, r2) => (r1._1, r1._2.min(r2._2)))

Output result:
insert image description here

3 counting windows

CountWindow triggers the execution based on the number of elements of the same key in the window, and only calculates the results corresponding to the key whose number of elements reaches the window size.
Note: The window_size of CountWindow refers to the number of elements with the same Key, not the total number of all input elements.

3.1 Rolling window

The default CountWindow is a scrolling window, you only need to specify the window size, when the number of elements reaches the window size, it will trigger the execution of the window

import com.chen.flink.part01.SensorReading
import org.apache.flink.streaming.api.scala._
object CountWindowDemo {
    
    
  def main(args: Array[String]): Unit = {
    
    
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    val dataStream: DataStream[String] = env.socketTextStream("192.168.199.101", 7777)
    val sensorStream: DataStream[SensorReading] = dataStream.map(
      data => {
    
    
        val strings = data.split(",")
        SensorReading(strings(0), strings(1).toLong, strings(2).toDouble)
      }
    )

    //滚动计数窗口
    val resultStream: DataStream[(String, Double)] = sensorStream.map(x => (x.id, x.temperature)).keyBy(0)
      .countWindow(5).reduce((r1, r2) => (r1._1, r1._2.max(r2._2)))  
    resultStream.print("countwindow").setParallelism(1)
    env.execute("countwindow")
  }
}

Output the result, the result will be output when the input data content reaches 5,
insert image description here
and then input 3 sensor_6 data, you can see the corresponding output
insert image description here

3.2 Sliding window

The function names of the sliding window and the rolling window are exactly the same, but two parameters need to be passed in when passing parameters, one is window_size and the other is sliding_size.
The sliding_size in the code below is set to 2, that is to say, it is calculated every time two data with the same key are received, and the window range of each calculation is 10 elements.
When two senor_1 data are input, the existing output data is generated.
insert image description here
When two senor_1 data are input again, the output will continue to be obtained. The output at this time is compared with all previous (that is, the input of 4 sensor_1)
insert image description here

4 window function

The window function defines the calculation operations to be performed on the data collected in the window, which can be mainly divided into two categories:
 Incremental aggregation functions (incremental aggregation functions)
perform calculations when each piece of data arrives, and maintain a simple state. Typical incremental aggregate functions include ReduceFunction and AggregateFunction.
 Full window functions (full window functions)
first collect all the data in the window, and then iterate through all the data when calculating. ProcessWindowFunction is a full window function.

Guess you like

Origin blog.csdn.net/Keyuchen_01/article/details/118498683