And the spark of difference flink

1, spark stateless, stateful Flink

spark itself is stateless, so we can see it as a rdd of an operator to handle a rdd, that can be viewed as segmentation process.

But flink is an event-driven application is a class of applications with the state , we take it as one event record to deal with, when it comes to window will be blocked waiting for the polymerization operation of the window is stateless. After a window DataStream operators aggregation operations is an operating state, so flink should aggregate operations are put before the window, you can perform aggregation operations stateless . The whole spark are stateless, it may be where the polymerization.

2, the concept of window

Window infinite stream will be split into a finite size "buckets" barrel, we can do the calculation operations on these buckets. When the window is a data operator will be executed when the window did not perform data operator.

3, watermark concept

Each event comes with eventTime. Watermark is equal to the current of all incoming data maxEventTime - long delay. Once the data carried Watermark later than the stop time that is not currently trigger window, it will trigger the execution of the corresponding window.

Published 159 original articles · won praise 75 · views 190 000 +

Guess you like

Origin blog.csdn.net/xuehuagongzi000/article/details/103480849