Flink | ProcessFunction API (underlying API)

 

watermark time of acquisition

Before the conversion operator is unable to access the event timestamp information and water level information. And that in some scenarios, extremely important. For example MapFunction this map conversion operator can not access the timestamp of current event or event time. Based on this, DataStream API provides a range of Low Level conversion operator. Can be accessed time stamp, watermark and registered timed event. You can also export some special events, such as time-out events. Process Function used to build event-driven applications, and implement custom business logic before using the window function and the conversion operator can not be achieved. For example, Flink SQL is to use Process Function implemented.

Flink offers 8 Process Function

  •  ProcessFunction
  •  KeyedProcessFunction
  •  CoProcessFunction
  •  ProcessJoinFunction
  •  BroadcastProcessFunction
  •  KeyedBroadcastProcessFunction
  •  ProcessWindowFunction
  • ProcessAllWindowFunction
KeyedProcessFunction 
for operating KeyedStream. KeyedProcessFunction will process each stream element output is zero, one or a plurality of elements. All Process Function RichFunction are inherited from the interface, 
so every open (), close () and getRuntimeContext () methods. And KeyedProcessFunction [KEY, IN, OUT] additionally provides two methods:    processElement (V: the IN, CTX: the Context, OUT: Collector [OUT]), each element stream will call this method, the result will call Collector on the output data type.
    Context can be accessed timestamp elements, key elements, as well as TimerService time service. Context can also be output to another stream (Side)    the onTimer (timestamp: Long, CTX: OnTimerContext, OUT: Collector [OUT]) is a callback function. Called when the timer fires before registering.
    Parameters timestamp is a timestamp timer set trigger. Collector for the output set. Like Context parameters OnTimerContext and processElement provide some contextual information,
    such as time information of the timer-triggered event time or processing time.

 

TimerService and timers Timers 
Context and OnTimerContext held TimerService objects have the following methods: 
   currentProcessingTime (): Returns the current Long processing time 
   currentWatermark (): Long Returns the current timestamp watermark 
   registerProcessingTimeTimer (timestamp: Long): Unit will register the current key the 
     proces sing time timer. When the timer time reaches the processing time, the trigger timer. 
   registerEventTimeTimer (timestamp: Long): Unit registers the current key event time timer. When the water level greater than or equal timer registration time, triggering a timer callback function. 
   deleteProcessingTimeTimer (timestamp: Long): registration processing time Timer Unit before deleting. Without this timestamp timer is not performed.

 

How KeyedProcessFunction operation KeyedStream.
Demand: a temperature sensor to monitor the temperature value, if the temperature rises in the continuous (processing time) of a second, the alarm.

package com.xxx.fink.api.windowapi

import com.xxx.fink.api.sourceapi.SensorReading
import org.apache.flink.api.common.state.{ValueState, ValueStateDescriptor}
import org.apache.flink.streaming.api.functions.KeyedProcessFunction
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.util.Collector
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic


/**
  * 1s之内温度连续上升就报警
  */
object ProcessFunctionTest {
  def main(args: Array[String]): Unit = {
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    val stream: DataStream[String] = env.socketTextStream("hadoop101", 7777)

    val dataStream: DataStream[SensorReading] = stream.map(data => {
      val dataArray: Array[String] = data.split(",")
      SensorReading(dataArray(0).trim, dataArray(1).trim.toLong, dataArray(2).trim.toDouble)
    })
      .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SensorReading](Time.seconds(1)) {
        override def extractTimestamp(element: SensorReading): Long = element.timestamp * 1000
      })


    val processedStream: DataStream[String] = dataStream.keyBy(_.id)
      .process(new TempIncreAlert())

    dataStream.print("Input data:")
    processedStream.print("process data:")

    env.execute("Window test")

  }


} 

Class TempIncreAlert () the extends KeyedProcessFunction [String, SensorReading, String] { 


  // temperature rises continuously, to keep a data do comparison; to save the current state
   // temperature on a defined data state stored value 
  lazy val lastTemp: ValueState [ Double] = getRuntimeContext.getState ( new new ValueStateDescriptor [double] ( "lastTemp" , classOf [double]))
   // define a state, the timer is used to save the timestamp 
  lazy val currentTimer: ValueState [Long] = getRuntimeContext.getState ( new new ValueStateDescriptor [Long] ( "currentTimer" , classOf [Long])) 

  the override DEF processElement (value: SensorReading, CTX: KeyedProcessFunction [String, SensorReading, String] #Context, OUT: Collector [String]): Unit ={
     // first remove one temperature value 
    Val preTemp = lastTemp.value ()
     // updated temperature value 
    lastTemp.update (value.temperature) 

    Val curTimerTs = currentTimer.value () 

    // temperature rises and had not set a timer, the registration timer 
    IF (value.temperature> preTemp && curTimerTs == 0 ) {
       // if the temperature drops, or the first data, and deleting the timer empty state 
      val timerTs = ctx.timerService (). currentProcessingTime () + 10000L 
      ctx.timerService (). registerProcessingTimeTimer (timerTs) 
      currentTimer.update (timerTs) 
    } the else  IF (preTemp> value.temperature || preTemp == 0.0 ) {
       //And temperature rises too timer is not set, the registration timer 
      ctx.timerService () deleteProcessingTimeTimer (curTimerTs). 
      CurrentTimer.clear () 

    } 
  } 

  the override the onTimer DEF (timestamp: Long, CTX: KeyedProcessFunction [String, SensorReading, String] #OnTimerContext , OUT: Collector [String]): Unit = {
     // alarm information is output 
    out.collect (ctx.getCurrentKey + "continuously raised temperature" ) 
    currentTimer.clear () 
  } 


}

 

Data INPUT:> SensorReading (sensor_1,1547718199,35.0 ) 
INPUT Data: > SensorReading (sensor_1,1547718199,36.0 ) 
Process Data: > continuous temperature rise sensor_1

 

Side output stream (SideOutput)

Operator most DataStream API output is a single output, which is a data type stream.
In addition to split operator, may be divided into a plurality of flow streams, the data streams of these types are also the same.
The side outputs function process function may generate a plurality of streams and these streams of data types may not be the same.
Can be defined as a side output Out putTag [X] objects, X is the data type of the output stream. an event process function may be transmitted to one or more Context object side outputs by

 

 

 

 

import com.xxx.fink.api.sourceapi.SensorReading
import org.apache.flink.streaming.api.functions.ProcessFunction
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala.{DataStream, OutputTag, StreamExecutionEnvironment}
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.util.Collector
import org.apache.flink.api.scala._

/**
  * 侧输出流 代替split
  */
object SideOutputTest {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)
    val stream = env.socketTextStream("hadoop101", 7777)
    val dataStream = stream.map(data => {
      val dataArray = data.split(",")
      SensorReading(dataArray(0).trim, dataArray(1).trim.toLong, dataArray(2).trim.toDouble)
    }).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SensorReading](Time.seconds(1)) {
      override def extractTimestamp(element: SensorReading): Long = element.timestamp * 1000
    })
    processedStream Val: DataStream [SensorReading] = dataStream.process ( new new FreezingAlert ()) 
    processedStream.print ( "Processed Data:" ) 
    processedStream.getSideOutput ( new new . OutputTag [String] ( "Alert with freezing")) Print ( "Data Alert" ) 

    env.execute ( "Test the Window" ) 
  } 
} 

// freezing alarm, if less than 32F, alarm information is output to the output side of the stream 
class FreezingAlert () the extends ProcessFunction [SensorReading, SensorReading] { 

  the lazy Val alertOutPut: OutputTag [String] = new new OutputTag [String] ( "with freezing Alert" )

  override def processElement(value: SensorReading, ctx: ProcessFunction[SensorReading, SensorReading]#Context, out: Collector[SensorReading]): Unit = {
    if (value.temperature < 32.0) {
      ctx.output(alertOutPut, "freezing alert for" + value.id)
    } else {
      out.collect(value)
    }

  }
}

test:

processed data: > SensorReading(sensor_1,1547718199,35.8)
alert data> freezing alert forsensor_6
alert data> freezing alert forsensor_7
processed data: > SensorReading(sensor_10,1547718205,38.1)

 

Guess you like

Origin www.cnblogs.com/shengyang17/p/12543524.html