1, spark stateless, stateful Flink
spark itself is stateless, so we can see it as a rdd of an operator to handle a rdd, that can be viewed as segmentation process.
But flink is an event-driven application is a class of applications with the state , we take it as one event record to deal with, when it comes to window will be blocked waiting for the polymerization operation of the window is stateless. After a window DataStream operators aggregation operations is an operating state, so flink should aggregate operations are put before the window, you can perform aggregation operations stateless . The whole spark are stateless, it may be where the polymerization.
2, the concept of window
Window infinite stream will be split into a finite size "buckets" barrel, we can do the calculation operations on these buckets. When the window is a data operator will be executed when the window did not perform data operator.
3, watermark concept
Each event comes with eventTime. Watermark is equal to the current of all incoming data maxEventTime - long delay. Once the data carried Watermark later than the stop time that is not currently trigger window, it will trigger the execution of the corresponding window.