Flink - CountAndProcessingTimeTrigger trigger window based on Count and Time

I. Introduction

The previous article mentioned  CountTrigger && ProcessingTimeTrigger . The former CountTrigger specifies the number of counts and triggers a trigger when the elements in the window meet the logic. The latter registers the window expiration time through TimeServer and triggers a trigger after expiration. This article customizes the Trigger to achieve both The combination of Count and ProcessingTime will trigger a window when any condition is met.

2. Detailed code explanation

1.CountAndProcessingTimeTrigger

The overall code is as follows. The main logic is included in onElement and onProcessingTime. The former is mainly responsible for triggering according to count, that is, to realize the function of CountTrigger, while the latter mainly realizes the function of ProcessingTime. It is necessary to define two ReduceValues ​​in advance to record Count and Time respectively. The detailed usage of ReduceValue can be Referring to the above, the main methods are analyzed below.

class CountAndProcessingTimeTrigger(maxCount: Long, interval: Long) extends Trigger[String, TimeWindow] {

  // 条数计数器
  val countStateDesc = new ReducingStateDescriptor[Long]("count", new ReduceSum(), classOf[Long])

  // 时间计数器,保存下一次触发的时间
  val timeStateDesc = new ReducingStateDescriptor[Long]("interval", new ReduceMin(), classOf[Long])

  // 元素过来后执行的操作
  override def onElement(t: String, time: Long, window: TimeWindow, triggerContext: Trigger.TriggerContext): TriggerResult = {

    // 获取 count state 并累加数量
    val count = triggerContext.getPartitionedState(countStateDesc)
    val fireTimestamp = triggerContext.getPartitionedState(timeStateDesc)

    // 考虑count是否足够
    count.add(1L)
    if (count.get() >= maxCount) {
      val log = s"CountTrigger Triggered Count: ${count.get()}"
      println(formatString(log))
      count.clear()
      // 不等于默认窗口的触发时间
      if (fireTimestamp.get() != window.maxTimestamp()) {
        triggerContext.deleteProcessingTimeTimer(fireTimestamp.get())
      }
      fireTimestamp.clear()
      return TriggerResult.FIRE
    }

    // 添加窗口的下次触发时间
    val currentTimeStamp = triggerContext.getCurrentProcessingTime
    if (fireTimestamp.get() == null) {
      val nextFireTimeStamp = currentTimeStamp + interval
      triggerContext.registerProcessingTimeTimer(nextFireTimeStamp)
      fireTimestamp.add(nextFireTimeStamp)
    }

    TriggerResult.CONTINUE
  }

  override def onProcessingTime(time: Long, window: TimeWindow, triggerContext: Trigger.TriggerContext): TriggerResult = {
    // 获取 count state
    val count = triggerContext.getPartitionedState(countStateDesc)
    // 获取 Interval state
    val fireTimestamp = triggerContext.getPartitionedState(timeStateDesc)

    // time default trigger
    if (time == window.maxTimestamp()) {
      val log = s"Window Trigger By maxTimeStamp: $time FireTimestamp: ${fireTimestamp.get()}"
      println(formatString(log))
      count.clear()
      triggerContext.deleteProcessingTimeTimer(fireTimestamp.get())
      fireTimestamp.clear()
      fireTimestamp.add(triggerContext.getCurrentProcessingTime + interval)
      triggerContext.registerProcessingTimeTimer(fireTimestamp.get())
      return TriggerResult.FIRE
    } else if (fireTimestamp.get() != null && fireTimestamp.get().equals(time)) {
      val log = s"TimeTrigger Triggered At: ${fireTimestamp.get()}"
      println(formatString(log))
      count.clear()
      fireTimestamp.clear()
      fireTimestamp.add(triggerContext.getCurrentProcessingTime + interval)
      triggerContext.registerProcessingTimeTimer(fireTimestamp.get())
      return TriggerResult.FIRE
    }
    TriggerResult.CONTINUE
  }

  override def onEventTime(l: Long, w: TimeWindow, triggerContext: Trigger.TriggerContext): TriggerResult = {
    TriggerResult.CONTINUE
  }

  override def clear(w: TimeWindow, triggerContext: Trigger.TriggerContext): Unit = {
    // 获取 count state
    val count = triggerContext.getPartitionedState(countStateDesc)
    // 获取 Interval state
    val fireTimestamp = triggerContext.getPartitionedState(timeStateDesc)

    count.clear()
    fireTimestamp.clear()
  }

}

2.onElement

Execute count.add to count when each element arrives, and trigger the operation if it exceeds the defined maxCount:

---- Reach MaxCount

A.log - print log to identify this trigger from CountTrigger

B.count.clear - clear the value to re-accumulate count and trigger

C.deleteProcessingTime - clears the TimeServer counter, because both Count and ProcessingTime have to be recounted or timed after triggering

----- MaxCount not reached

A.currentTime - get the current ProcessingTime through the ctx context

B.registerProcessingTimeTimer - determine whether the time value has a value, if there is no value, calculate the next window trigger time corresponding to ProcessingTime according to current & interval

----- are not satisfied

A.TriggerResult.CONTINUE - do not trigger, wait for TimeServer to expire

  override def onElement(t: String, time: Long, window: TimeWindow, triggerContext: Trigger.TriggerContext): TriggerResult = {

    // 获取 count state 并累加数量
    val count = triggerContext.getPartitionedState(countStateDesc)
    val fireTimestamp = triggerContext.getPartitionedState(timeStateDesc)

    // 考虑count是否足够
    count.add(1L)
    if (count.get() >= maxCount) {
      val log = s"CountTrigger Triggered Count: ${count.get()}"
      println(formatString(log))
      count.clear()
      // 不等于默认窗口的触发时间
      if (fireTimestamp.get() != window.maxTimestamp()) {
        triggerContext.deleteProcessingTimeTimer(fireTimestamp.get())
      }
      fireTimestamp.clear()
      return TriggerResult.FIRE
    }

    // 添加窗口的下次触发时间
    val currentTimeStamp = triggerContext.getCurrentProcessingTime
    if (fireTimestamp.get() == null) {
      val nextFireTimeStamp = currentTimeStamp + interval
      triggerContext.registerProcessingTimeTimer(nextFireTimeStamp)
      fireTimestamp.add(nextFireTimeStamp)
    }

    TriggerResult.CONTINUE
  }

3.onProcessingTime

The operation performed when the specified processing time window is reached. We mentioned above that the window will call the onProcessingTime method at two times. One is to reach the ProcessintTimeTimer defined by itself, and the window will fire trigger. At this time, the trigger data is part of the window data, and One is to reach window.maxTimeStamp, that is, to reach window.getEnd - 1L. At this time, the data triggered by the window Fire is all the data within the time range defined by windowAll, such as defining Time.seconds(10), the former triggers part of the time data, the latter triggers Full 10s window.

----- Arrival window default trigger time

A.window.maxTimestamp - the default time to reach the window, print the corresponding log identifier

B.count.clear - clear the count state

C.deleteProcessingTime - clear the original counter, because this window will be recounted and timed after it is triggered

D.registerProcessingTIme - register the next time based on the current ProcessingTime + interval

E.TriggerResult.FIRE - perform full data window triggering

----- Reach the custom interval interval

A. Log logo - print TimeTriggered logo This trigger comes from custom ProcessingTime

B.clear - clears the original count state after the window is triggered

C.registerProcessingTIme - register the next time based on the current ProcessingTime + interval

D.TriggerResult.FIRE - trigger window data

----- are not satisfied

A.TriggerResult.CONTINUE - do nothing

  override def onProcessingTime(time: Long, window: TimeWindow, triggerContext: Trigger.TriggerContext): TriggerResult = {
    // 获取 count state
    val count = triggerContext.getPartitionedState(countStateDesc)
    // 获取 Interval state
    val fireTimestamp = triggerContext.getPartitionedState(timeStateDesc)

    // time default trigger
    if (time == window.maxTimestamp()) {
      val log = s"Window Trigger By maxTimeStamp: $time FireTimestamp: ${fireTimestamp.get()}"
      println(formatString(log))
      count.clear()
      triggerContext.deleteProcessingTimeTimer(fireTimestamp.get())
      fireTimestamp.clear()
      fireTimestamp.add(triggerContext.getCurrentProcessingTime + interval)
      triggerContext.registerProcessingTimeTimer(fireTimestamp.get())
      return TriggerResult.FIRE
    } else if (fireTimestamp.get() != null && fireTimestamp.get().equals(time)) {
      val log = s"TimeTrigger Triggered At: ${fireTimestamp.get()}"
      println(formatString(log))
      count.clear()
      fireTimestamp.clear()
      fireTimestamp.add(triggerContext.getCurrentProcessingTime + interval)
      triggerContext.registerProcessingTimeTimer(fireTimestamp.get())
      return TriggerResult.FIRE
    }
    TriggerResult.CONTINUE
  }

4.onEventTime

Because it is based on Count and ProcessingTime, onEventTime returns TriggerResult.CONTINUE

5.clear

Clear the ReduceValue corresponding to Count and FireTimestamp

3. Code Practice

1. Main function

The Soucre of CountTrigger and ProcessingTimeTrigger above are fixed data sources, sending 30 pieces of data every s. In order to verify CountAndProcessingTimeTrigger, sockets are used here to implement custom sending data. The local nc-lk port can be opened, and processFunction can realize the data in the window. The min, max statistics and processing time output.

object CountAndProcessTimeTriggerDemo {

  val dateFormat = new SimpleDateFormat("yyyy-MM-dd:HH-mm-ss")

  def main(args: Array[String]): Unit = {

    val env = StreamExecutionEnvironment.getExecutionEnvironment

    env
      .socketTextStream("localhost", 9999)
      .setParallelism(1)
      .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(10)))
      .trigger(new CountAndProcessingTimeTrigger(10, 5000))
      .process(new ProcessAllWindowFunction[String, String, TimeWindow] {
        override def process(context: Context, elements: Iterable[String], out: Collector[String]): Unit = {
          val cla = Calendar.getInstance()
          cla.setTimeInMillis(System.currentTimeMillis())
          val date = dateFormat.format(cla.getTime)
          val info = elements.toArray.map(_.toInt)
          val min = info.min
          val max = info.max
          val output = s"==========[$date] Window Elem Num: ${elements.size} Min: $min -> Max $max=========="
          out.collect(output)
        }
      }).print()

    env.execute()

  }
}

2. Data Validation

The above CountAndProcessintTimeTrigger is set to count = 10, interval = 5s

color Trigger method enter process
blue CountTrigger 1-10 Satisfy count=10, trigger CountTrigger
red DeaultTrigger Default trigger, full window data is 1-10
green ProcessingTimeTrigger 11,12 Enter 11, 12 to trigger ProcessingTimeTrigger
yellow DeaultTrigger 13 Enter 13 before the end, the full window data is 11-13
Ash ProcessingTimeTrigger 14 Enter 14, wait for ProcessingTimeTrigger to trigger
White DeaultTrigger Triggered by default, at this time the full window data is 14

Here is an explanation why the interval is 5s, but the output log time of ProcessingTimeTrigger is 16 and 27. This is because there will be a delay in manually entering the Socket. If the machine sends data by default, the log will be corrected to 15 and 25 to trigger the ProcessingTimeTrigger.

4. More

The logic of CountTrigger is relatively simple. ProcessingTimeTrigger is only a method of definition here, that is, the expiration time is reset at the end of the window, or it can be defined across windows or not set an integer time. If you are interested, you can customize it. For CountTrigger, ProcessingTrigger and CountAndProcessingTimeTrigger customized in this article, we all found that when the window is triggered, only FIRE is called, and FIRE_AND_PURGE is not called to do the clear operation. If the window data is not cleared, will it accumulate more and more burst storage? Yes, the WindowOperator function has a built-in default onProcessingTime method, which will judge and call the clearAllState method internally to clear the data:

 WindowOperator is the entry point for all window processing logic. If our Trigger returns TiggerResult.FIRE, the window will execute the clearAllState method to clear all the current window states when it reaches CleanupTime; if it returns TriggerResult.FIRE_AND_PURGE, windowOperator will call the clear method of Trigger @override, For example, ProcessingTimeTrigger will clear the timer of the window, and if it returns FIRE_AND_PURGE in this example, it will clear the two ReduceValue values ​​corresponding to count and fireTimestamp at the same time:

Guess you like

Origin blog.csdn.net/BIT_666/article/details/123740650