Flink - Introduction and usage of Scala/Java trigger

I. Introduction

Flink uses windowAll to generate AllwindowedStream and then calls Trigger to execute the window triggering logic. Let's make a basic understanding of Trigger trigger.

2. Introduction to Trigger

Trigger is translated as trigger, trigger, and its function is to trigger the window for calculation under certain conditions. If it is an internal operator, it will execute the corresponding operator. If the custom implementation of ProcessAllWindowFunction, it will trigger the custom execution logic. The trigger determines when the window (formed by the window evaluator) is ready to be processed by the window function. Every WindowAssigner has a default trigger. If the default trigger does not suit your needs, you can specify a custom trigger using trigger(...). Trigger is more common in real-time output of large window data. For example, for a 100s window, the data is triggered every 10s to execute the window logic.

1. Trigger method

· onElement

public abstract TriggerResult onElement(T var1, long var2, W var4, Trigger.TriggerContext var5) throws Exception;

The onElement() method is called for each element added to the window. Taking the most basic CountTrigger as an example, every time an element arrives, the corresponding Trigger class will perform counter accumulation and judgment. If the number of arrivals is accumulated to the corresponding count, it will trigger and execute a window logic.

· onEventTime

public abstract TriggerResult onEventTime(long var1, W var3, Trigger.TriggerContext var4) throws Exception;

When the registered event time timer fires, the onEventTime() method is called. Generally, the execution time will be reset or the eventTime of the next execution will be registered after being triggered.

· OnProcessingTime

public abstract TriggerResult onProcessingTime(long var1, W var3, Trigger.TriggerContext var4) throws Exception;

The onProcessingTime() method is called when the registered processing time timer fires. The basic processing method is the same as above.

· onMerge

public void onMerge(W window, Trigger.OnMergeContext ctx) throws Exception {
    throw new UnsupportedOperationException("This trigger does not support merging.");
}

The onMerge() method is related to stateful triggers, where the states of two triggers can be merged when their corresponding windows are merged, such as when using session windows. When two windows are merged, the state values ​​of the two are merged, which can be regarded as a reduce function, which combines the state variables of TimeWindow1 and Timewindow2 into one.

· clear

public abstract void clear(W var1, Trigger.TriggerContext var2) throws Exception;

Finally, the clear() method does whatever is needed to delete the corresponding window. Taking the most basic CountTrigger as an example, clear will clear the counter state, that is, reset it to 0.

· canMerge

    public boolean canMerge() {
        return false;
    }

Whether the trigger supports the onMerge method to merge the two states.

2. Trigger state

The three methods of onElement, onProcessTime, and onEventTime will return a TriggerResult, which is an enumeration class and corresponds to the window operation returned after the method is executed.

· TriggerResult.CONTINUE - skip, do nothing

TriggerResult.FIRE - trigger window calculation

· TriggerResult.PURGE - clears the window element

TriggerResult.FIRE_AND_PURGE - triggers a window action, then clears the window element

Taking CountTrigger as an example, every time Count elements are accumulated, TriggerResult.FIRE will be returned to execute the window logic, and when there are not enough Count elements, TiggerResult.CONTINUE will be returned.

3. Flink comes with Trigger

 The Flink org.apache.flink.streaming.api.windowing.triggers class comes with the following window triggers. If you need to customize the trigger, you only need to implement the trigger method of the Trigger class. For example, you can combine CountTrigger and ProcessingTimeTrigger to achieve a CountAndProcessingTime Trigger with dual triggers for count and processing time.

ContinuousEventTimeTrigger Continuous Event Time Trigger
ContinuousProcessingTimeTrigger Continuous processing of time triggers
CountTrigger count trigger
DeltaTrigger Threshold trigger
EventTimeTrigger event time trigger
ProcessingTimeoutTrigger Processing time timeout trigger
ProcessingTimeTrigger Processing time triggers
PurgingTrigger Force PURGE trigger

3. API example

1. Scala example

The following example aggregates the original DataStream in a rolling window of 10s, where Trigger is set to CountTrigger, and triggers every 30 elements.

    val allwindowedStream = dataStream
      .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(10)))
      .trigger(CountTrigger.of[TimeWindow](30L))
      .process(new ProcessAllWindowFunction[String, String, TimeWindow] {
        override def process(context: Context, elements: Iterable[String], out: Collector[String]): Unit = {
          val info = elements.toArray.mkString(",")
          out.collect(info)
        }
      }).setParallelism(1)
    allwindowedStream.print()

Tips:

The Trigger parameter needs to specify implicit T, which is the [TimeWindow] after of. If the output type T of the corresponding data is added here, an error will be reported Required: Trigger[_ >: String,_ >: TimeWindow] :

Required: Trigger[_ >: String,_ >: TimeWindow]
Found: ContinuousProcessingTimeTrigger[String]

2. Java Example

The following example generates a rolling window of 10s for the original DataStream, and triggers the processing logic of the window every 5s according to the continuous processing time, namely ProcessFunction.

       dataStream 
           .setParallelism(processParallel)
           .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(10)))
           .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(5)))
           .process(new ProcessFunction())
           .addSink(outputSink)
           .setParallelism(processParallel)
           .print()

4. Summary and matters needing attention

1. Default trigger

EventTime-based windows use EventTimeTrigger by default, and ProcessTime-based windows use ProcessingTimeTrigger by default

2.GlobalWindow

The default trigger for GlobalWindow is NeverTrigger, which never fires. So when using GlobalWindow you always have to define a custom trigger.

3. Window trigger logic

Once the trigger determines that the window is ready for processing, it fires, that is, it returns FIRE or FIRE_AND_PURGE. This is the signal that the window operator emits the result of the current window. Given a window with ProcessWindowFunction, all elements are passed to ProcessWindowFunction. Windows with ReduceFunction or AggregateFunction just send their aggregated results.

4.FIRE AND PURGE

FIRE triggers the window without clearing the window elements, PURGE triggers the window but clears the window elements. When custom editing, you need to pay attention to avoid losing a batch of data after the window is triggered. Secondly, PURGE only clears the elements of the window, and some custom metadata of the window. and base properties are not cleared.

Guess you like

Origin blog.csdn.net/BIT_666/article/details/123653502