Flink SQL Project Record

A, Flink SQL level

For the highest level of API Flink, easy to use, so more widely, eg. ETL, statistical analysis, real-time reporting, real-time risk control and so on.

Flink SQL hierarchy in which:

 

 

 

Two, Flink polymerization:

1、Window Aggregate

Built three commonly used window:

TUMBLE (time, INTERVAL '5' SECOND); // similar flink DataStream API in the intermediate layer in the window to scroll the window

HOP (time, INTERVAL '10' SECOND, INTERVAL '5' SECOND); // similar flink DataStream API in the intermediate layer of the sliding window window, every 10 seconds in the last 5 seconds of data Statistics

SESSION(time, INTERVAL '5' SECOND)

time time in two formats, one is proctime is the system time, and the other is rowtime.

 

 

2、 Group Aggregate

 

When adding data continues:

 

 Continue to enter data:

 

 

The result is a process of continuous renewal.

Window Aggregate Group Aggregate difference with the

1), Window Aggregate Group Aggregate and there are some obvious difference. The main difference is, the Window End Aggregate only when the window is output, the result is the final output value, will not modify its output stream is a stream Append.

The Group Aggregate are each dealing with a data output on the latest results, the result is constantly updated , if the data in the database, as its output stream is a stream Update.

 

2)、另外一个区别是,window Aggregate 由于有 watermark ,可以精确知道哪些窗口已经过期了,所以可以及时清理过期状态,保证状态维持在稳定的大小。

而 Group Aggregate 因为不知道哪些数据是过期的,所以状态会无限增长,这对于生产作业来说不是很稳定,所以建议对 Group Aggregate 的作业配上 State TTL 的配置。

 

对比图:

 

 项目代码设置:

tEnv.getConfig().setIdleStateRetentionTime(org.apache.flink.api.common.time.Time.minutes(1),org.apache.flink.api.common.time.Time.minutes(10));

Guess you like

Origin www.cnblogs.com/gxyandwmm/p/12076729.html