Apache beam windowing: consider late data but emit only one pane

Joe Stoker :

I would like to emit a single pane when the watermark reaches x minutes past the end of the window. This let's me ensure I handle some late data, but still only emit one pane. I am currently working in java.

At the moment I can't find proper solutions to this problem. I could emit a single pane when the watermark reaches the end of the window, but then any late data is dropped. I could emit the pane at the end of the window and then again when I receive late data, however in this case I am not emitting a single pane.

I currently have code similar to this:

.triggering(
    // This is going to emit the pane, but I don't want emit the pane yet!                                  
    AfterWatermark.pastEndOfWindow()

    // This is going to emit panes each time I receive late data, however 
    // I would like to only emit one pane at the end of the allowedLateness
).withAllowedLateness(allowedLateness).accumulatingFiredPanes())

In case there is still confusion, I would like to only emit a single pane when the watermark passes the allowedLateness.

Joe Stoker :

Thanks Guillem, in the end I used your answer to find this very useful link with lots of apache beam examples. From this I came up with the following solution:

 // We first specify to never emit any panes
 .triggering(Never.ever())

 // We then specify to fire always when closing the window. This will emit a
 // single final pane at the end of allowedLateness
 .withAllowedLateness(allowedLateness, Window.ClosingBehavior.FIRE_ALWAYS)
 .discardingFiredPanes())

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=149477&siteId=1