Flink from entry to real fragrance (12, Flink a big weapon-time window)

Flink supports a variety of windows, including: time window, session window, statistics window, etc., basically everything imaginable can be realized

Time Windows (Time Windows)
The simplest and most commonly used form of windows is time-based windows. Flink supports three kinds of time windows:

The first one: tumbling time window

The window of the rollover time window is fixed. For example, if a 1-minute time window is set, the time window will only calculate the data within the current 1 minute, and will not care about the data of the previous 1 minute or the next 1 minute.
Time is aligned, data will not appear in two windows at the same time, and will not overlap
Flink from entry to real fragrance (12, Flink a big weapon-time window)

The second: sliding time window (sliding time window)

Sliding window, as the name implies, the time window is sliding. So, conceptually, there are two concepts to understand:

Window: Need to define the size of the window
Sliding: Need to define the size of the sliding in the window, but theoretically speaking, the size of the sliding cannot exceed the window size. The
sliding window is a more general form of the fixed window. The sliding window is composed of a fixed window length and The sliding interval composition
window length is fixed, there can be overlapping parts
Flink from entry to real fragrance (12, Flink a big weapon-time window)

Third: Session Windows (Session Windows)

It consists of a series of events combined with a timeout gap of a specified length of time, that is, a new window will be generated if no new data is received for a period of time. The
main features are: Time is not aligned

Flink from entry to real fragrance (12, Flink a big weapon-time window)

window() 方法接收的输入参数是一个WindowAssigner

WindowAssigner 负责将每条输入的数据分发到正确的window中

Flink提供了通用的WindowAssigner
滚动窗口(tumbling window)
滑动窗口(sliding window)
会话窗口(session window)
全局窗口(global window)

创建不同类型的窗口

滚动时间窗口(tumbling time window)
timeWindow(Time.seconds(15))
滑动时间窗口(sliding time window)
.timeWindow(Time.seconds(15),Time.seconds(5))

会话窗口(session window)
.window(EventTimeSessionWindows.withGap(Time.minutes(10))

窗口函数(window function)
window function 定义了要对窗口中收集的数据做的计算操作,可以分为两类;
增量聚合函数(incrementalggergation functions)
每条数据来了就会进行计算,保持一个简单的状态
ReduceFunction, AggregateFunction
全窗口函数(full windowfunctions)
先把窗口所有数据收集起来,等到计算的时候会遍历所有数据
ProcessWindowFunction

其他一些常用的API
.trigger()---------触发器
定义window什么时候关闭,触发计算并输出结果
.evicotr()---------移除器
定义移除某些数据的逻辑
.allowedLateness()   ------允许处理迟到的数据
.sideOutputLateData() -----将迟到的数据放入侧输出流
.getSideOutput() ----获取侧输出流

The theory says it’s still cute for a long time, last chestnut

Assuming that a batch of data is read from a file, statistics are made every 15 seconds, and the minimum value of all the temperatures of each sensor in the window and the minimum time stamp are obtained

Create a new scala Object WindowTest.scala

package com.mafei.apitest

import com.mafei.sinktest.SensorReadingTest5
import org.apache.flink.api.common.functions.ReduceFunction
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, createTypeInformation}
import org.apache.flink.streaming.api.windowing.time.Time

object WindowTest {
  def main(args: Array[String]): Unit = {
    //创建执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
//    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)  //以事件时间作为窗口聚合
//env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)   //以数据进入flink的时间作为窗口时间
//    env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) //以Flink实际处理时间作为窗口时间
    //如果发现没有输出,那可能是因为数据太少,不到15s都处理完成了,可以换成socket或者kafka来进行测试
    val inputStream = env.readTextFile("/opt/java2020_study/maven/flink1/src/main/resources/sensor.txt")

    env.setParallelism(1)
    inputStream.print()

    //先转换成样例类类型
    val dataStream = inputStream
      .map(data => {
        val arr = data.split(",") //按照,分割数据,获取结果
        SensorReadingTest5(arr(0), arr(1).toLong, arr(2).toDouble) //生成一个传感器类的数据,参数中传toLong和toDouble是因为默认分割后是字符串类别
      })

    //每15秒统计一次,窗口内各传感器所有温度的最小值,以及最小的时间戳
    val resultStream = dataStream
      .map(data=>(data.id,data.temperature,data.timestamp))
      .keyBy(_._1) //按照二元组的第一个元素(id)分组
//      .window(TumblingEventTimeWindows.of(Time.seconds(15))) //滚动时间窗口
//      .window(SlidingProcessingTimeWindows.of(Time.seconds(15),Time.seconds(3))) //滑动时间窗口,15秒一个窗口,每次往后划3秒
//      .window(EventTimeSessionWindows.withGap(Time.seconds(15))) //会话窗口,超过15秒算下一个会话
//      .countWindow(15) //滚动计数窗口
      .timeWindow(Time.seconds(15))  //每15秒统计一次,滚动时间窗口
//      .minBy(1)  //第二个元素做最小值的统计,如果只是获取所有温度的最小值,直接用这个方法就可以了。。
      .reduce((curRes,newData)=>(curRes._1, curRes._2.min(newData._2),newData._3))

    resultStream.print()
    env.execute()

  }
}

//上面reduce代码如果用这个自定义的方式也是一样可以实现,效果是一样的
class MyReducer extends ReduceFunction[SensorReadingTest5]{
  override def reduce(t: SensorReadingTest5, t1: SensorReadingTest5): SensorReadingTest5 =
    SensorReadingTest5(t.id, t1.timestamp,t.temperature.min(t1.temperature))
}

Prepare a sensor.txt and place it in the specified directory:

sensor1,1603766281,1
sensor2,1603766282,42
sensor3,1603766283,43
sensor4,1603766240,40.1
sensor4,1603766284,20
sensor4,1603766249,40.2

The structure of the final code and its running effect
Flink from entry to real fragrance (12, Flink a big weapon-time window)

Guess you like

Origin blog.51cto.com/mapengfei/2554577