The four major WindowAssigner of flink window

Windows is the core of stream computing. Windows will flow into a finite size "buckets", we can calculate its application to polymeric ( ProcessWindowFunction, ReduceFunction, AggregateFunctionor FoldFunctionthe like). The basic structure of writing a window calculation in Flink is as follows:

Keyed Windows

stream
    .keyBy(...)                
    .window(...)               <-  必须制定: 窗口类型
    [.trigger(...)]            <-  可选: "trigger" (都有默认 触发器),决定窗口什么时候触发
    [.evictor(...)]            <-  可选: "evictor" (默认 没有剔出),剔出窗口中的元素
    [.allowedLateness(...)]    <-  可选: "lateness" (默认 0),不允许又迟到的数据
    [.sideOutputLateData(...)] <-  可选: "output tag" 将迟到的数据输出到 指定流中
    .reduce/aggregate/fold/apply()  <-  必须指定: "function",实现对窗口数据的聚合计算
    [.getSideOutput(...)]      <-  可选: "output tag" 获取Sideout的数据,一般处理迟到数据

Non-Keyed Windows

stream
    .windowAll(...)            <-  必须制定: 窗口类型
    [.trigger(...)]            <-  可选: "trigger" (都有默认 触发器),决定窗口什么时候触发
    [.evictor(...)]            <-  可选: "evictor" (默认 没有剔出),剔出窗口中的元素
    [.allowedLateness(...)]    <-  可选: "lateness" (默认 0),不允许又迟到的数据
    [.sideOutputLateData(...)] <-  可选: "output tag" 将迟到的数据输出到 指定流中
    .reduce/aggregate/fold/apply()  <-  必须指定: "function",实现对窗口数据的聚合计算
    [.getSideOutput(...)]      <-  可选: "output tag" 获取Sideout的数据,一般处理迟到数据

Once the first element that should belong to the window arrives, the window will be created, and when the time (Event/Process Time) exceeds its end timestamp, the window will be completely deleted. Flink guarantees that only time-based windows are deleted, not Delete other types of windows, such as global windows.

In addition, each window will have a trigger (Trigger) and a function ("ProcessWindowFunction", "ReduceFunction", "AggregateFunction" or "FoldFunction) attached to it. The function will contain the calculations to be applied to the window content, and the Trigger Specify the condition under which the window is considered to be able to apply the function.

In addition to the above, you can also specify an "Evictor" that will be able to remove elements from the window after the trigger is triggered and before and/or after applying this function.

This article mainly introduces Flink's Window Assigners (Window Assigners)

Window Assigners

The window allocator defines how to allocate elements to windows. This is done by specifying the WindowAssigner of your choice in the window(...) (for keyed streams) or windowAll() (for noneded streams) calls.

WindowAssigner is responsible for assigning each incoming element to one or more windows. Flink comes with predefined window allocators for the most common use cases, namely scrolling windows , sliding windows , conversation windows and global windows . Following the bloggers will share cases for each window. This article is based on ProcessTime processing, and subsequent sharing will be based on Event Time. Case.

(1)Tumbling Windows

The length of the rolling window is fixed, the sliding interval is equal to the length of the window, and there is no overlap between window elements.
Insert picture description here

var env=StreamExecutionEnvironment.getExecutionEnvironment
	env.socketTextStream("centos",9999)
		.flatMap(_.split("\\s+"))
		.map((_,1))
		.keyBy(0)
		.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
		.reduce((v1,v2)=>(v1._1,v1._2+v2._2))
		.print()
env.execute("window")

(2) Sliding Windows

The length of the sliding window is fixed, the length of the window is greater than the sliding interval of the window, and the elements overlap.
Insert picture description here

var env=StreamExecutionEnvironment.getExecutionEnvironment
	env.socketTextStream("centos",9999)
	.flatMap(_.split("\\s+"))
	.map((_,1))
	.keyBy(_._1)
	.window(SlidingProcessingTimeWindows.of(Time.seconds(4),Time.seconds(2)))
	.process(new ProcessWindowFunction[(String,Int),String,String,TimeWindow]{
    
    
	    override def process(key: String, context: Context,
	                         elements: Iterable[(String, Int)],
	                         out: Collector[String]): Unit = {
    
    
	        val sdf = new SimpleDateFormat("HH:mm:ss")
	        val window = context.window
	        println(sdf.format(window.getStart)+"\t"+sdf.format(window.getEnd))
	        for(e <- elements){
    
    
	            print(e+"\t")
	        }
	        println()
	    }
	})
env.execute("window")

(3) Session Windows(MergerWindow)

By calculating the element time interval, if the interval is less than the session gap, it will be merged into one window; if it is greater than the time interval, the current window is closed, and subsequent elements belong to the new window. Unlike rolling windows and sliding windows, the session window does not have a fixed window size, and the bottom layer essentially does window merging.
Insert picture description here

var env=StreamExecutionEnvironment.getExecutionEnvironment
env.socketTextStream("centos",9999)
	.flatMap(_.split("\\s+"))
	.map((_,1))
	.keyBy(_._1)
	.window(ProcessingTimeSessionWindows.withGap(Time.seconds(5)))
	.apply(new WindowFunction[(String,Int),String,String,TimeWindow]{
    
    
	    override def apply(key: String, window: TimeWindow, input: Iterable[(String, Int)], out: Collector[String]): Unit = {
    
    
	        val sdf = new SimpleDateFormat("HH:mm:ss")
	        println(sdf.format(window.getStart)+"\t"+sdf.format(window.getEnd))
	        for(e<- input){
    
    
	            print(e+"\t")
	        }
	        println()
	    }
	})
env.execute("window")

(4) Global Windows

The global window will put all the elements with the same key in a window. By default, the window will never be closed (never triggered), because the window does not have a default window trigger Trigger, so users need to customize the Trigger.

Note : The global window is different from the previous three windows. The first three windows are all based on time, while the global window is based on count.

var env=StreamExecutionEnvironment.getExecutionEnvironment
env.socketTextStream("centos",9999)
	.flatMap(_.split("\\s+"))
	.map((_,1))
	.keyBy(_._1)
	.window(GlobalWindows.create())
	.trigger(CountTrigger.of[GlobalWindow](3))
	.apply(new WindowFunction[(String,Int),String,String,GlobalWindow]{
    
    
	    override def apply(key: String, window: GlobalWindow, input: Iterable[(String, Int)], out: Collector[String]): Unit = {
    
    
	        println("=======window========")
	        for(e<- input){
    
    
	            print(e+"\t")
	        }
	        println()
	    }
	})
env.execute("window")

Well, the WindowAssigner of flink is relatively simple to understand. Here the blogger made a brief introduction. In the following blog post, the blogger will share the four major Window Functions of flink. Click here if you like!

Guess you like

Origin blog.csdn.net/qq_44962429/article/details/112912432