01-flink time semantics and implementation of the basic concepts and principles Window

Flink more time Semantic Comparison

Flink support streaming applications in different concepts of Time, like on there Processing Time, Event Time and Ingestion Time. Here we take a look at these three Time.

Processing Time

Processing Time refers to the time when the system is event-processing machine.

If the time our strategy Flink Job Processing Time is set, then both will be back all the time to use the machine's system time based on the time of operation (such as time window). Processing Time hour window will include all events between the system clock indicates the entire hour to reach a particular operation.

Processing Time is the most simple coordination between the "Time" concept, and does not require flow machine that provides the best performance and lowest latency. However, in a distributed and asynchronous environments, Processing Time not provide certainty, because it is susceptible to system event reaches the speed (e.g. from the message queue), the flow of events in the system operation speed, and the interrupt.

 

Event Time

Event Time is the time the event occurred, the data itself is generally carried by time. This time is usually before the event to reach Flink determined, and can be obtained from each event to event timestamp. In the Event Time, the time depends on the data, while with others nothing. Event Time program must specify how to generate Event Time watermark, which is a mechanism Event Time schedule.

Perfect said that both events will arrive or how their ordering, and finally the Event Time will produce exactly the same and the result of the determination. However, unless the event in accordance with a known sequence (the time sequence of events generated) arrives, because otherwise it will have to wait for some of disorderly events produce some delay when dealing Event Time. Since only wait for a limited period of time, so it is difficult to ensure the process will produce results Event Time and exactly determined.

Assuming that all the data have arrived, Event Time operation will run as expected, will produce accurate and consistent results even when dealing with disorderly events, delayed event, reprocess historical data. For example, events per hour time window will include all records with event time stamp falls within the hour, regardless of how the order that they arrive (according to whether the time of the event generation).

 

Ingestion Time

Ingestion Time to enter the event Flink time. In operation data source at the time when the time as a time stamp (when entering Flink source), each event will enter Flink, and time-based operations (e.g., time window) will use the time stamp.

Ingestion Time Event Time positioned between and Processing Time conceptually. Compared with the Processing Time, cost may be a little higher, but the results are more predictable. Since the use of stable Ingestion Time timestamp (allocated only once when entering Flink), so that different operating window of the event will use the same timestamp (first allocation of time stamp), and in Processing Time each window operator can be assigned to different events window (machine-based systems and arrival delay time).

Compared with the Event Time, Ingestion Time can not handle any incidents of disorder or delayed data, but the program does not have to specify how to generate a watermark.

In the Flink, Ingestion Time and Event Time is very similar, the only difference is that a watermark having Ingestion Time function and automatically assigned a time stamp generated automatically.

 

 

Three kinds of Time comparison results

In summary a diagram of the above three Time:

 

 

 

  • Processing Time: events are processed when the machine system time
  • Event Time: Event time itself
  • Ingestion Time: event entering the time Flink

A picture image described above, said three Time:

 

Use scene analysis

Through the above two graphs I believe we have three of the Time Flink understand, then our actual production environment which is usually the Time How to choose?

一般来说在生产环境中将 Event Time 与 Processing Time 对比的比较多,这两个也是我们常用的策略,Ingestion Time 一般用的较少。

用 Processing Time 的场景大多是用户不关心事件时间,它只需要关心这个时间窗口要有数据进来,只要有数据进来了,我就可以对进来窗口中的数据进行一系列的计算操作,然后再将计算后的数据发往下游。

而用 Event Time 的场景一般是业务需求需要时间这个字段(比如购物时是要先有下单事件、再有支付事件;借贷事件的风控是需要依赖时间来做判断的;机器异常检测触发的告警也是要具体的异常事件的时间展示出来;商品广告及时精准推荐给用户依赖的就是用户在浏览商品的时间段/频率/时长等信息),只能根据事件时间来处理数据,而且还要从事件中获取到事件的时间。

但是使用事件时间的话,就可能有这样的情况:数据源采集的数据往消息队列中发送时可能因为网络抖动、服务可用性、消息队列的分区数据堆积的影响而导致数据到达的不一定及时,可能会出现数据出现一定的乱序、延迟几分钟等,庆幸的是 Flink 支持通过 WaterMark 机制来处理这种延迟的数据。关于 WaterMark 的机制我会在后面的文章讲解。

 

如何设置 Time 策略?

在创建完流运行环境的时候,然后就可以通过 env.setStreamTimeCharacteristic 设置时间策略:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 
// 其他两种: 
// env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime); 
// env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);

 

 

Guess you like

Origin www.cnblogs.com/whaleup/p/12243221.html