Big Data Flink (96): DML: Deduplication

Article directory

DML:Deduplication


DML:Deduplication

Deduplication definition (supports Batch\Streaming) : Deduplication is actually deduplication, that is, the scenario of row_number = 1 in TopN introduced above, but one difference here is that the sorting field must be a time attribute column and cannot be other non- Ordinary column for time attributes. When row_number = 1, if the sorting field is an ordinary column, the planner will translate it into the TopN operator. If it is a time attribute column, the planner will translate it into Deduplication. The final execution operators of the two are different. Deduplication is compared to the TopN operator. Specifically made to correspond

Guess you like

Origin blog.csdn.net/xiaoweite1/article/details/133443292