2. Flink Stream Computing - Out of Order Time

1. The importance of time, to figure out why order is needed?

When the trajectory data uploaded by the IoT device is out of order, there will be errors in the processing of the business, such as calculations that exceed the speed and are inaccurate. In the live delivery, the sales amount is calculated every hour, and the anchor is paid on time. If the time is messed up, and the 19:00 is counted to 20:00, then it is a problem.

For the out-of-order in stream computing, how should it be sorted?

In Flink, the data of stream computing is events, and each event will have its own generation time (for example: each GPS point will have its own time), so the sorting in stream computing is by time dimension, and Flink is also sorted by time to achieve.

2. Flink time type

 

1) The time when the event was generated (Event Time)

2) The event enters the Flink time (Ingestion Time)

3) Event processing time (Processing Time)

    Provides the simplest timing, has excellent performance, and is less deterministic in a distributed environment.

For the above three types of time, Ingestion Time and Processing Time are generated after the business enters Flink, and will not be affected by disorder. Event Time is generated externally. When the time sequence of events entering Flink is different from the time sequence generated, it is an out-of-order problem .

3. Other out-of-order generation

1) Out-of-order business generation:

   For example, for Kafka, the data of the same device is distributed in different partitions. At the same time, Flink processes different partitions at different speeds, which leads to disorder.

   The business partition rules are inconsistent with the grouping rules entering flink for business statistics. (Business is partitioned by date, and flink is grouped and counted by type)

2) Disorder of program misuse

  1. The data source is ordered

  2. Move different maps through rebalance (random) transfer

  3. After the map is directly sinked to the db, the order of the business data will be messed up

How to solve the above out-of-order operators?

 

ds.setParallelism(3).keyBy(...).map(...).setParallelism(3)

Use the keyBy operator to group by business attributes to achieve the purpose of order.

Summary: In Flink, as long as the shuffle mechanism is rebalance, there will be out-of-order problems. As for whether out-of-order has an impact on the business, it needs to be considered according to the scenario.

Guess you like

Origin blog.csdn.net/lzzyok/article/details/120685308