Flink and Window in Time

A, Time

In the streaming Flink, it involves different concepts of time

Event Time: Event time is created. It is generally described by a time stamp in the event, for example, log data acquired in each log will record their generation time, the time stamp dispenser Flink by accessing event timestamp

Ingestion Time: time data into the Flink

Processing Time: is performed for each operator based on the local system time of the operation time, associated with a machine, the default attribute is time Processing Time.

Flink example of a log into the time 2017-11-1210: 00: 00.123 system arrival time window is 2017-11-1210: 00: 01.234, the log follows:

2017-11-02 18:37:15.624 INFO Fair over to rm2

For business, the failure to count the number of logs in the 1min, what time is the most significant? ----- eventTime, because we want to generate statistics based on the time log.

  

If it is to the polymerization, no solution possible for the data stream is polymerized.

 

Two, Window

1, streaming flow calculation is a data processing engine is designed for unlimited processing data set, the data set is infinitely an increasingly attractive essentially unlimited data set, and a cutting window data unlimited processing means limited blocks.

Window core unlimited data stream is processed, Window will be split into an infinite stream of "buckets" bucket limited size, we can do the calculation operations on these buckets.

There are two types, five kinds of time window.

2, Window types (categories)

2.1, CountWindow: generating a window in accordance with a specified number of pieces of data, regardless of the time

2.2, TimeWindow: generating a window in time. (Divided in accordance Processing Time Window)

For TimeWindow and CountWindow, depending on the window can be divided into three types of realization of the principle: the rolling window (Tumbling Window), the sliding window (Sliding Window) and a session window (Session Window).

(1) rolling window (Tumbling Windows)

The data segmentation data according to a fixed window length.

Features: time alignment, the window length is fixed, there is no overlap.

Each rolling window distributor element is assigned to a specified window size of window, scroll a window has a fixed size, and do not overlap.

(2) sliding window (Sliding Windows)

Sliding window is a more generalized form of a fixed window, sliding window and a fixed window length of the sliding spacer composition.

Features: time alignment, the window length is fixed, there is overlap.

Sliding window distributor element is assigned to a fixed length window, similar to a scroll window, the window size to the window size parameter configured by the other sliding window control parameters of frequency starts sliding window.

Accordingly, if the slide sliding window parameter is less than the window size, then the window can overlap, in which case the elements will be assigned to the plurality of windows.

Usage scenarios: statistics for the most recent period of time (recently seeking an interface failure rate 5min to decide whether or not to alarm.)

(3) session window (Session Windows)

A series combination of a timeout event a specified length of time slots composition. Similar session web application, i.e. a period of time no new data is received will generate a new window.

Features: No aligned.

session window to group elements of the dispenser through the session activity, session window to scroll the window and compared with the sliding window, there will be no overlap and the case of a fixed start time and end time, on the contrary, when it is in a fixed

No longer receive the elements within the time period, that is generated at intervals of inactivity, then this window will close. Session is configured by a window interval a session, this session interval defines the length of the inactive period, when the inactive

Generating cycle, then the current session will be closed and the subsequent elements will be assigned to the window to a new session.

 

三, Window API

3.1、CountWindow

The same number of CountWindow window to trigger execution of the key element, counting only the number of elements to achieve the result of the key corresponding to the window size is performed.

Note: CountWindow window_size refers to the number of key elements of the same, the total number of all the elements are not input.

Import org.apache.flink.api.java.tuple.Tuple
 Import org.apache.flink.streaming.api.scala. {DataStream, KeyedStream, StreamExecutionEnvironment} 


/ ** 
  * CountWindow the rolling window (the Windows Tumbling) 
  * Data splitting the data according to a fixed window length. 
  * / 
Object TimeAndWindow { 
  DEF main (args: the Array [String]): Unit = { 

    Val the env = StreamExecutionEnvironment.getExecutionEnvironment 
    Val Stream: DataStream [String] = env.socketTextStream ( "localhost", 11111 ) 
    Val streamKeyBy: KeyedStream [(String , Long), Tuple] = stream.map (Item => (Item, 1L)). keyBy (0 )
     //Note: CountWindow window_size refers to the number of key elements of the same, the total number of all the elements are not input. 
    streamWindow Val: DataStream [(String, Long)] streamKeyBy.countWindow = (. 5 ) 
                .reduce ((ITEM1, ITEM2) => (item1._1, item1._2 + item2._2)) 

    streamWindow.print () 
    env.execute ( "TimeAndWindow" ) 

  } 
}

3.2

Import org.apache.flink.api.java.tuple.Tuple
 Import org.apache.flink.streaming.api.scala. {DataStream, KeyedStream, StreamExecutionEnvironment} 


/ ** 
  * CountWindow the sliding window (Sliding the Windows) 
  * Data splitting the data according to a fixed window length. 
  * / 
Object TimeAndWindow { 
  DEF main (args: the Array [String]): Unit = { 

    Val the env = StreamExecutionEnvironment.getExecutionEnvironment 
    Val Stream: DataStream [String] = env.socketTextStream ( "localhost", 11111 ) 
    Val streamKeyBy: KeyedStream [(String , Long), Tuple] = stream.map (Item => (Item, 1L)). keyBy (0 )
     //Note: CountWindow window_size refers to the number of key elements of the same, the total number of all the elements are not input.
    // meet step, executed once, by the length of the first parameter 
    Val streamWindow: DataStream [(String, Long)] = streamKeyBy.countWindow (5,2 ) 
                .reduce ((ITEM1, ITEM2) => (ITEM1. _1, item1._2 + item2._2)) 

    streamWindow.print () 
    env.execute ( "TimeAndWindow" ) 

  } 
}

Four, EventTime and Window

1, EventTime introduction

In streaming Flink, the vast majority of businesses will use eventTime, usually only when eventTime can not be used, or will be forced to use ProcessingTime IngestionTime.

To use EventTime, then the need to introduce EventTime time stamp, adding methods as follows:

2、Watermark

  Concept: We know that is generated from the event stream processing, to flow through the source, then operator, the middle is a process and time, although in most cases, the data flow operator are generated by event

Events stamp order, but does not rule out due to network, back pressure and other reasons, resulting in out of sequence, called out of order, the order refers Flink received event is not in strict accordance with EventTime sequence of events arranged.

  Watermark Event Time is a mechanism to measure progress, which is itself a hidden attribute data, the data itself carries the corresponding Watermark.

  Watermark is out of order for processing the events, and correct handling events out of order, usually implemented in conjunction with Watermark window mechanism.

  Watermark data stream is less than the number used to represent eventTime Watermark, have been reached, therefore, is performed by the Watermark window is triggered.

  Watermark can be understood as a trigger delay. Watermark long delay time t, we can set, each time the system verifies the data has reached the maximum maxEventTime, then finds less than eventTime

MaxEventTime-t all data have been reached. If there is a window of time equal to stop maxEventTime-t, then the window being triggered.

Scrolling windows / sliding window / dialog window

 
 
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.{EventTimeSessionWindows, SlidingEventTimeWindows, TumblingEventTimeWindows}
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow

/**
* TimeWindow
*/
object EventTimeAndWindow {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
//开启watermark
// Each time a stream additional features from the call start time to env created.
env.setStreamTimeCharacteristic (TimeCharacteristic.EventTime)

Val Stream: KeyedStream [(String, Long), Tuple] = env.socketTextStream ( "192.168.218.130", 1111) .assignTimestampsAndWatermarks (
new new BoundedOutOfOrdernessTimestampExtractor [String] (Time.milliseconds (3000)) {
the override DEF extractTimestamp (Element: String): Long = {
// Event log generation time is eventTime Word, we resolved from the log EventTime
Val eventTime element.split = ( "") (0) .toLong
the println (eventTime)
eventTime
}
}
) .map (Item => (item.split ( "") (. 1), 1L)). keyBy (0)
// add scrolling window, the window size is 5s, the call window api
// val streamWindow: WindowedStream[(String, Long), Tuple, TimeWindow] = stream.window(TumblingEventTimeWindows.of(Time.seconds(5)))
//滑动窗口
// val streamWindow: WindowedStream[(String, Long), Tuple, TimeWindow] = stream.window(SlidingEventTimeWindows.of(Time.seconds(10),Time.seconds(5)))
//会话窗口
val streamWindow: WindowedStream[(String, Long), Tuple, TimeWindow] = stream.window(EventTimeSessionWindows.withGap(Time.seconds(5)))
val streamReduce = streamWindow.reduce((item1,item2)=>(item1._1,item1._2+item2._2))
streamReduce.print()

env.execute("EventTimeAndWindow")
}
}

 

Guess you like

Origin www.cnblogs.com/ssqq5200936/p/11014296.html